Cover of Python for Data Analysis by Wes McKinney, featuring the O'Reilly animal design against a clean editorial background

Pages

471

Published

2013

Python for Data Analysis

Data Wrangling with pandas, NumPy, and IPython

Master the Python tools that practicing data analysts use every day, written by the engineer who built pandas.

Python for Data Analysis teaches you to work with real data using the libraries that define modern data work in Python. Written by Wes McKinney, the creator of the pandas library, this book moves quickly from Python fundamentals into practical data manipulation, aggregation, and visualization. You will learn to load, clean, reshape, and analyze datasets that reflect the messiness of production data, not textbook examples. At 471 pages, it covers IPython, NumPy, pandas, and matplotlib in enough depth to make you independently productive.

About this book

Most data analysis books teach statistics first and tools second. This one is different. Python for Data Analysis starts with the tools you will actually open every morning: IPython for interactive exploration, NumPy for fast array computation, pandas for structured data manipulation, and matplotlib for plotting results. The author is not a technical writer who learned pandas β€” he wrote it. That background shows in every chapter.

The book is organized around the workflow a practicing analyst follows. You start by getting comfortable with Python and IPython as an environment, then build up to NumPy arrays and vectorized operations before moving into the core of the book: pandas. You will spend significant time with Series and DataFrame objects, learning to index, slice, group, merge, reshape, and clean data the way real datasets demand. Time series handling, a notoriously fiddly area, gets its own dedicated treatment.

Real datasets appear throughout. Examples do not assume clean, well-structured input. You will handle missing values, duplicate rows, mixed-type columns, and mismatched indexes β€” the everyday friction that separates analysts who can work independently from those who stay stuck waiting for clean data.

  • Load data from CSV, Excel, databases, and web APIs using pandas I/O tools
  • Reshape wide and long datasets with pivot tables, stack, and unstack
  • Aggregate and summarize data with groupby split-apply-combine patterns
  • Merge and join DataFrames with the same precision you would use in SQL
  • Handle time series data, date ranges, resampling, and rolling windows
  • Produce publication-ready plots with matplotlib integrated into your analysis workflow

By the end, you will have a repeatable mental model for attacking a new dataset: how to inspect it, clean it, reshape it into the form your analysis requires, and extract the summary statistics or visualizations that answer your question. These are skills you will use on every project, regardless of domain.

This is the first edition, published in 2013. The core concepts and pandas fundamentals it teaches remain foundational to modern data analysis in Python. Readers who want coverage of newer pandas features and syntax should be aware that some API details have evolved since publication.

🎯 What you'll learn

  • Navigate the IPython shell and notebook environment to explore data interactively
  • Construct and manipulate NumPy arrays with vectorized operations that avoid slow Python loops
  • Load, inspect, and clean messy real-world datasets using pandas Series and DataFrame
  • Apply groupby aggregations to summarize data across categories with split-apply-combine logic
  • Merge, join, and reshape DataFrames to match the structure your analysis requires
  • Handle missing data, duplicates, and mixed types without breaking your analysis pipeline
  • Work with time series data including resampling, rolling windows, and date range generation
  • Produce line, bar, scatter, and histogram plots using matplotlib within your pandas workflow

πŸ‘€ Who is this book for?

  • Python developers who want to move into data analysis and need a structured path through the pandas and NumPy ecosystem
  • Analysts with spreadsheet or SQL experience who are learning Python as a data manipulation environment
  • Students in data science programs who want a practitioner-focused complement to their statistics coursework
  • Researchers and scientists who work with tabular data and want to replace manual data wrangling with reproducible Python scripts
  • Self-taught programmers who have read introductory Python material and are ready to work with real datasets

Table of contents

  1. 01

    Preliminaries

    Sets up the Python and IPython environment you will use throughout the book and explains why pandas and NumPy are the right tools for data analysis work.

  2. 02

    Introductory Examples

    Walks through several complete, end-to-end data analysis examples to show how the tools fit together before covering any of them in depth.

  3. 03

    IPython: An Interactive Computing and Development Environment

    Teaches you to use IPython efficiently for exploration, introspection, and debugging, including magic commands, tab completion, and the notebook interface.

  4. 04

    NumPy Basics: Arrays and Vectorized Computation

    Introduces ndarray, NumPy's core data structure, and covers indexing, slicing, reshaping, and vectorized arithmetic that replace slow element-by-element Python loops.

  5. 05

    Getting Started with pandas

    Introduces Series and DataFrame, the two primary pandas objects, and covers the indexing, alignment, and basic operations you will rely on in every subsequent chapter.

  6. 06

    Data Loading, Storage, and File Formats

    Shows how to read and write data from CSV, Excel, JSON, HTML, and databases using pandas I/O tools, and covers common parsing options for messy files.

  7. 07

    Data Wrangling: Clean, Transform, Merge, Reshape

    Covers the practical mechanics of cleaning missing values, removing duplicates, merging DataFrames, and pivoting data between wide and long formats.

  8. 08

    Plotting and Visualization

    Demonstrates how to create line, bar, scatter, and histogram plots using matplotlib and pandas plotting helpers to communicate analysis results visually.

  9. 09

    Data Aggregation and Group Operations

    Explains the groupby split-apply-combine pattern in depth, showing how to compute summary statistics, apply custom functions, and build pivot tables.

  10. 10

    Time Series

    Covers date and time indexing, resampling, rolling and expanding window operations, and handling time zones for time-stamped datasets.

Frequently asked questions

Do I need prior experience with pandas or NumPy before reading this book?

No prior pandas or NumPy experience is required. You do need a working knowledge of Python basics such as lists, dictionaries, and functions. The book introduces both libraries from scratch.

This was published in 2013 β€” is it still useful?

The core mental models and pandas fundamentals are still valid and widely taught. Some API syntax has changed in newer pandas versions, so you may occasionally need to consult current pandas documentation when a specific method call differs from what is shown.

Does the book include the datasets and code used in the examples?

O'Reilly provided companion materials with early editions of this book. Check the publisher's website or the author's GitHub profile for any available code and data files.

Is this book more suitable for analysts or for software engineers?

Both audiences use it, but the framing is analytical rather than engineering-focused. If your goal is manipulating and understanding data rather than building data pipelines for production systems, the book is a strong fit.

Does the book cover machine learning or statistical modeling?

No. The focus is entirely on data manipulation, cleaning, aggregation, and visualization. It does not cover scikit-learn, statsmodels, or predictive modeling techniques.

You might also like

πŸ“¬ Weekly Newsletter

Stay ahead of the curve

Get the best programming tutorials, data analytics tips, and tool reviews delivered to your inbox every week.

No spam. Unsubscribe anytime.