Skip to content Skip to footer
-70%

Python for Data Analysis: Data Wrangling with pandas, NumPy, and Jupyter 3rd Edition, ISBN-13: 978-1098104030

Original price was: $50.00.Current price is: $14.99.

 Safe & secure checkout

Description

Description

Python for Data Analysis: Data Wrangling with pandas, NumPy, and Jupyter 3rd Edition, ISBN-13: 978-1098104030

[PDF eBook eTextbook] – Available Instantly

  • Publisher: ‎ O’Reilly Media; 3rd edition (September 20, 2022)
  • Language: ‎ English
  • 579 pages
  • ISBN-10: ‎ 109810403X
  • ISBN-13: ‎ 978-1098104030

Get the definitive handbook for manipulating, processing, cleaning, and crunching datasets in Python. Updated for Python 3.10 and pandas 1.4, the third edition of this hands-on guide is packed with practical case studies that show you how to solve a broad set of data analysis problems effectively. You’ll learn the latest versions of pandas, NumPy, and Jupyter in the process.

Written by Wes McKinney, the creator of the Python pandas project, this book is a practical, modern introduction to data science tools in Python. It’s ideal for analysts new to Python and for Python programmers new to data science and scientific computing. Data files and related material are available on GitHub.

  • Use the Jupyter notebook and IPython shell for exploratory computing
  • Learn basic and advanced features in NumPy
  • Get started with data analysis tools in the pandas library
  • Use flexible tools to load, clean, transform, merge, and reshape data
  • Create informative visualizations with matplotlib
  • Apply the pandas groupby facility to slice, dice, and summarize datasets
  • Analyze and manipulate regular and irregular time series data
  • Learn how to solve real-world data analysis problems with thorough, detailed examples

Table of Contents:

Preface

Conventions Used in This Book

Using Code Examples

O’Reilly Online Learning

How to Contact Us

Acknowledgments

In Memoriam: John D. Hunter (1968–2012)

Acknowledgments for the Third Edition (2022)

Acknowledgments for the Second Edition (2017)

Acknowledgments for the First Edition (2012)

Preliminaries

1.1 What Is This Book About?

What Kinds of Data?

1.2 Why Python for Data Analysis?

Python as Glue

Solving the “Two-Language” Problem

Why Not Python?

1.3 Essential Python Libraries

NumPy

pandas

matplotlib

IPython and Jupyter

SciPy

scikit-learn

statsmodels

Other Packages

1.4 Installation and Setup

Miniconda on Windows

GNU/Linux

Miniconda on macOS

Installing Necessary Packages

Integrated Development Environments and Text Editors

1.5 Community and Conferences

1.6 Navigating This Book

Code Examples

Data for Examples

Import Conventions

Python Language Basics, IPython, and Jupyter Notebooks

2.1 The Python Interpreter

2.2 IPython Basics

Running the IPython Shell

Running the Jupyter Notebook

Tab Completion

Introspection

2.3 Python Language Basics

Language Semantics

Scalar Types

Control Flow

2.4 Conclusion

Built-In Data Structures, Functions, and Files

3.1 Data Structures and Sequences

Tuple

List

Dictionary

Set

Built-In Sequence Functions

List, Set, and Dictionary Comprehensions

3.2 Functions

Namespaces, Scope, and Local Functions

Returning Multiple Values

Functions Are Objects

Anonymous (Lambda) Functions

Generators

Errors and Exception Handling

3.3 Files and the Operating System

Bytes and Unicode with Files

3.4 Conclusion

NumPy Basics: Arrays and Vectorized Computation

4.1 The NumPy ndarray: A Multidimensional Array Object

Creating ndarrays

Data Types for ndarrays

Arithmetic with NumPy Arrays

Basic Indexing and Slicing

Boolean Indexing

Fancy Indexing

Transposing Arrays and Swapping Axes

4.2 Pseudorandom Number Generation

4.3 Universal Functions: Fast Element-Wise Array Functions

4.4 Array-Oriented Programming with Arrays

Expressing Conditional Logic as Array Operations

Mathematical and Statistical Methods

Methods for Boolean Arrays

Sorting

Unique and Other Set Logic

4.5 File Input and Output with Arrays

4.6 Linear Algebra

4.7 Example: Random Walks

Simulating Many Random Walks at Once

4.8 Conclusion

Getting Started with pandas

5.1 Introduction to pandas Data Structures

Series

DataFrame

Index Objects

5.2 Essential Functionality

Reindexing

Dropping Entries from an Axis

Indexing, Selection, and Filtering

Arithmetic and Data Alignment

Function Application and Mapping

Sorting and Ranking

Axis Indexes with Duplicate Labels

5.3 Summarizing and Computing Descriptive Statistics

Correlation and Covariance

Unique Values, Value Counts, and Membership

5.4 Conclusion

Data Loading, Storage, and File Formats

6.1 Reading and Writing Data in Text Format

Reading Text Files in Pieces

Writing Data to Text Format

Working with Other Delimited Formats

JSON Data

XML and HTML: Web Scraping

6.2 Binary Data Formats

Reading Microsoft Excel Files

Using HDF5 Format

6.3 Interacting with Web APIs

6.4 Interacting with Databases

6.5 Conclusion

Data Cleaning and Preparation

7.1 Handling Missing Data

Filtering Out Missing Data

Filling In Missing Data

7.2 Data Transformation

Removing Duplicates

Transforming Data Using a Function or Mapping

Replacing Values

Renaming Axis Indexes

Discretization and Binning

Detecting and Filtering Outliers

Permutation and Random Sampling

Computing Indicator/Dummy Variables

7.3 Extension Data Types

7.4 String Manipulation

Python Built-In String Object Methods

Regular Expressions

String Functions in pandas

7.5 Categorical Data

Background and Motivation

Categorical Extension Type in pandas

Computations with Categoricals

Categorical Methods

7.6 Conclusion

Data Wrangling: Join, Combine, and Reshape

8.1 Hierarchical Indexing

Reordering and Sorting Levels

Summary Statistics by Level

Indexing with a DataFrame’s columns

8.2 Combining and Merging Datasets

Database-Style DataFrame Joins

Merging on Index

Concatenating Along an Axis

Combining Data with Overlap

8.3 Reshaping and Pivoting

Reshaping with Hierarchical Indexing

Pivoting “Long” to “Wide” Format

Pivoting “Wide” to “Long” Format

8.4 Conclusion

Plotting and Visualization

9.1 A Brief matplotlib API Primer

Figures and Subplots

Colors, Markers, and Line Styles

Ticks, Labels, and Legends

Annotations and Drawing on a Subplot

Saving Plots to File

matplotlib Configuration

9.2 Plotting with pandas and seaborn

Line Plots

Bar Plots

Histograms and Density Plots

Scatter or Point Plots

Facet Grids and Categorical Data

9.3 Other Python Visualization Tools

9.4 Conclusion

Data Aggregation and Group Operations

10.1 How to Think About Group Operations

Iterating over Groups

Selecting a Column or Subset of Columns

Grouping with Dictionaries and Series

Grouping with Functions

Grouping by Index Levels

10.2 Data Aggregation

Column-Wise and Multiple Function Application

Returning Aggregated Data Without Row Indexes

10.3 Apply: General split-apply-combine

Suppressing the Group Keys

Quantile and Bucket Analysis

Example: Filling Missing Values with Group-Specific Values

Example: Random Sampling and Permutation

Example: Group Weighted Average and Correlation

Example: Group-Wise Linear Regression

10.4 Group Transforms and “Unwrapped” GroupBys

10.5 Pivot Tables and Cross-Tabulation

Cross-Tabulations: Crosstab

10.6 Conclusion

Time Series

11.1 Date and Time Data Types and Tools

Converting Between String and Datetime

11.2 Time Series Basics

Indexing, Selection, Subsetting

Time Series with Duplicate Indices

11.3 Date Ranges, Frequencies, and Shifting

Generating Date Ranges

Frequencies and Date Offsets

Shifting (Leading and Lagging) Data

11.4 Time Zone Handling

Time Zone Localization and Conversion

Operations with Time Zone-Aware Timestamp Objects

Operations Between Different Time Zones

11.5 Periods and Period Arithmetic

Period Frequency Conversion

Quarterly Period Frequencies

Converting Timestamps to Periods (and Back)

Creating a PeriodIndex from Arrays

11.6 Resampling and Frequency Conversion

Downsampling

Upsampling and Interpolation

Resampling with Periods

Grouped Time Resampling

11.7 Moving Window Functions

Exponentially Weighted Functions

Binary Moving Window Functions

User-Defined Moving Window Functions

11.8 Conclusion

Introduction to Modeling Libraries in Python

12.1 Interfacing Between pandas and Model Code

12.2 Creating Model Descriptions with Patsy

Data Transformations in Patsy Formulas

Categorical Data and Patsy

12.3 Introduction to statsmodels

Estimating Linear Models

Estimating Time Series Processes

12.4 Introduction to scikit-learn

12.5 Conclusion

Data Analysis Examples

13.1 Bitly Data from 1.USA.gov

Counting Time Zones in Pure Python

Counting Time Zones with pandas

13.2 MovieLens 1M Dataset

Measuring Rating Disagreement

13.3 US Baby Names 1880–2010

Analyzing Naming Trends

13.4 USDA Food Database

13.5 2012 Federal Election Commission Database

Donation Statistics by Occupation and Employer

Bucketing Donation Amounts

Donation Statistics by State

13.6 Conclusion

Advanced NumPy

A.1 ndarray Object Internals

NumPy Data Type Hierarchy

A.2 Advanced Array Manipulation

Reshaping Arrays

C Versus FORTRAN Order

Concatenating and Splitting Arrays

Repeating Elements: tile and repeat

Fancy Indexing Equivalents: take and put

A.3 Broadcasting

Broadcasting over Other Axes

Setting Array Values by Broadcasting

A.4 Advanced ufunc Usage

ufunc Instance Methods

Writing New ufuncs in Python

A.5 Structured and Record Arrays

Nested Data Types and Multidimensional Fields

Why Use Structured Arrays?

A.6 More About Sorting

Indirect Sorts: argsort and lexsort

Alternative Sort Algorithms

Partially Sorting Arrays

numpy.searchsorted: Finding Elements in a Sorted Array

A.7 Writing Fast NumPy Functions with Numba

Creating Custom numpy.ufunc Objects with Numba

A.8 Advanced Array Input and Output

Memory-Mapped Files

HDF5 and Other Array Storage Options

A.9 Performance Tips

The Importance of Contiguous Memory

More on the IPython System

B.1 Terminal Keyboard Shortcuts

B.2 About Magic Commands

The %run Command

Executing Code from the Clipboard

B.3 Using the Command History

Searching and Reusing the Command History

Input and Output Variables

B.4 Interacting with the Operating System

Shell Commands and Aliases

Directory Bookmark System

B.5 Software Development Tools

Interactive Debugger

Timing Code: %time and %timeit

Basic Profiling: %prun and %run -p

Profiling a Function Line by Line

B.6 Tips for Productive Code Development Using IPython

Reloading Module Dependencies

Code Design Tips

B.7 Advanced IPython Features

Profiles and Configuration

B.8 Conclusion

Index

About the Author

Wes McKinney is a Nashville-based software developer and entrepreneur. After finishing his undergraduate degree in mathematics at MIT in 2007, he went on to do quantitative finance work at AQR Capital Management in Greenwich, CT. Frustrated by cumbersome data analysis tools, he learned Python and started building what would later become the pandas project. He’s now an active member of the Python data community and is an advocate for the use of Python in data analysis, finance, and statistical computing applications.

Wes was later the cofounder and CEO of DataPad, whose technology assets and team were acquired by Cloudera in 2014. He has since become involved in big data technology, joining the Project Management Committees for the Apache Arrow and Apache Parquet projects in the Apache Software Foundation. In 2018, he founded Ursa Labs, a not-for-profit organization focused Apache Arrow development, in partnership with RStudio and Two Sigma Investments. In 2021, he cofounded technology startup Voltron Data, where he currently works as the Chief Technology Officer.

What makes us different?

• Instant Download

• Always Competitive Pricing

• 100% Privacy

• FREE Sample Available

• 24-7 LIVE Customer Support

Delivery Info

Reviews (0)