Cover of Python for Data Analysis by Wes McKinney, featuring the O'Reilly animal design against a clean editorial background

Pages

Published

2013

Data Analytics

Python for Data Analysis

Data Wrangling with pandas, NumPy, and IPython

Master the Python tools that practicing data analysts use every day, written by the engineer who built pandas.

Python for Data Analysis teaches you to work with real data using the libraries that define modern data work in Python. Written by Wes McKinney, the creator of the pandas library, this book moves quickly from Python fundamentals into practical data manipulation, aggregation, and visualization. You will learn to load, clean, reshape, and analyze datasets that reflect the messiness of production data, not textbook examples. At 471 pages, it covers IPython, NumPy, pandas, and matplotlib in enough depth to make you independently productive.

Buy on Amazon →

About this book

Most data analysis books teach statistics first and tools second. This one is different. Python for Data Analysis starts with the tools you will actually open every morning: IPython for interactive exploration, NumPy for fast array computation, pandas for structured data manipulation, and matplotlib for plotting results. The author is not a technical writer who learned pandas — he wrote it. That background shows in every chapter.

The book is organized around the workflow a practicing analyst follows. You start by getting comfortable with Python and IPython as an environment, then build up to NumPy arrays and vectorized operations before moving into the core of the book: pandas. You will spend significant time with Series and DataFrame objects, learning to index, slice, group, merge, reshape, and clean data the way real datasets demand. Time series handling, a notoriously fiddly area, gets its own dedicated treatment.

Real datasets appear throughout. Examples do not assume clean, well-structured input. You will handle missing values, duplicate rows, mixed-type columns, and mismatched indexes — the everyday friction that separates analysts who can work independently from those who stay stuck waiting for clean data.

Load data from CSV, Excel, databases, and web APIs using pandas I/O tools
Reshape wide and long datasets with pivot tables, stack, and unstack
Aggregate and summarize data with groupby split-apply-combine patterns
Merge and join DataFrames with the same precision you would use in SQL
Handle time series data, date ranges, resampling, and rolling windows
Produce publication-ready plots with matplotlib integrated into your analysis workflow

By the end, you will have a repeatable mental model for attacking a new dataset: how to inspect it, clean it, reshape it into the form your analysis requires, and extract the summary statistics or visualizations that answer your question. These are skills you will use on every project, regardless of domain.

This is the first edition, published in 2013. The core concepts and pandas fundamentals it teaches remain foundational to modern data analysis in Python. Readers who want coverage of newer pandas features and syntax should be aware that some API details have evolved since publication.

🎯 What you'll learn

Navigate the IPython shell and notebook environment to explore data interactively
Construct and manipulate NumPy arrays with vectorized operations that avoid slow Python loops
Load, inspect, and clean messy real-world datasets using pandas Series and DataFrame
Apply groupby aggregations to summarize data across categories with split-apply-combine logic
Merge, join, and reshape DataFrames to match the structure your analysis requires
Handle missing data, duplicates, and mixed types without breaking your analysis pipeline
Work with time series data including resampling, rolling windows, and date range generation
Produce line, bar, scatter, and histogram plots using matplotlib within your pandas workflow

👤 Who is this book for?

Python developers who want to move into data analysis and need a structured path through the pandas and NumPy ecosystem
Analysts with spreadsheet or SQL experience who are learning Python as a data manipulation environment
Students in data science programs who want a practitioner-focused complement to their statistics coursework
Researchers and scientists who work with tabular data and want to replace manual data wrangling with reproducible Python scripts
Self-taught programmers who have read introductory Python material and are ready to work with real datasets

01

Preliminaries

Sets up the Python and IPython environment you will use throughout the book and explains why pandas and NumPy are the right tools for data analysis work.
02

Introductory Examples

Walks through several complete, end-to-end data analysis examples to show how the tools fit together before covering any of them in depth.
03

IPython: An Interactive Computing and Development Environment

Teaches you to use IPython efficiently for exploration, introspection, and debugging, including magic commands, tab completion, and the notebook interface.
04

NumPy Basics: Arrays and Vectorized Computation

Introduces ndarray, NumPy's core data structure, and covers indexing, slicing, reshaping, and vectorized arithmetic that replace slow element-by-element Python loops.
05

Getting Started with pandas

Introduces Series and DataFrame, the two primary pandas objects, and covers the indexing, alignment, and basic operations you will rely on in every subsequent chapter.
06

Data Loading, Storage, and File Formats

Shows how to read and write data from CSV, Excel, JSON, HTML, and databases using pandas I/O tools, and covers common parsing options for messy files.
07

Data Wrangling: Clean, Transform, Merge, Reshape

Covers the practical mechanics of cleaning missing values, removing duplicates, merging DataFrames, and pivoting data between wide and long formats.
08

Plotting and Visualization

Demonstrates how to create line, bar, scatter, and histogram plots using matplotlib and pandas plotting helpers to communicate analysis results visually.
09

Data Aggregation and Group Operations

Explains the groupby split-apply-combine pattern in depth, showing how to compute summary statistics, apply custom functions, and build pivot tables.
10

Time Series

Covers date and time indexing, resampling, rolling and expanding window operations, and handling time zones for time-stamped datasets.

Frequently asked questions

Do I need prior experience with pandas or NumPy before reading this book?

No prior pandas or NumPy experience is required. You do need a working knowledge of Python basics such as lists, dictionaries, and functions. The book introduces both libraries from scratch.

This was published in 2013 — is it still useful?

The core mental models and pandas fundamentals are still valid and widely taught. Some API syntax has changed in newer pandas versions, so you may occasionally need to consult current pandas documentation when a specific method call differs from what is shown.

Does the book include the datasets and code used in the examples?

O'Reilly provided companion materials with early editions of this book. Check the publisher's website or the author's GitHub profile for any available code and data files.

Is this book more suitable for analysts or for software engineers?

Both audiences use it, but the framing is analytical rather than engineering-focused. If your goal is manipulating and understanding data rather than building data pipelines for production systems, the book is a strong fit.

Does the book cover machine learning or statistical modeling?

No. The focus is entirely on data manipulation, cleaning, aggregation, and visualization. It does not cover scikit-learn, statsmodels, or predictive modeling techniques.

Get this book

Buy on Amazon →

Specs

Publisher: O'Reilly Media, Inc.
Published: Jan 2013
Pages: 471
Language: English

About the author

Wes McKinney

New

Storytelling with Data

A Practical Guide to Communicating Effectively with Data Visualizations and Charts

by Cole Nussbaumer Knaflic

Data Analytics

2025 View →

New

Data Science: The Hard Parts

Techniques for Thinking Analytically and Solving Real Data Problems

by Daniel Vaughan

Data Analytics

2023 View →

New

Fundamentals of Data Engineering

A practical guide to the complete data engineering lifecycle, from ingestion to serving

by Joe Reis, Matt Housley

Data Analytics

2022 View →

New

Data Analysis with Python and PySpark

A hands-on guide to scalable data analytics using Python and PySpark

by Jonathan Rioux

Data Analytics

2022 View →

Python for Data Analysis

About this book

🎯 What you'll learn

👤 Who is this book for?

Table of contents

Preliminaries

Introductory Examples

IPython: An Interactive Computing and Development Environment

NumPy Basics: Arrays and Vectorized Computation

Getting Started with pandas

Data Loading, Storage, and File Formats

Data Wrangling: Clean, Transform, Merge, Reshape

Plotting and Visualization

Data Aggregation and Group Operations

Time Series

Frequently asked questions