New
Storytelling with Data
A Practical Guide to Communicating Effectively with Data Visualizations and Charts
Pages
471
Published
2013
Data Wrangling with pandas, NumPy, and IPython
Master the Python tools that practicing data analysts use every day, written by the engineer who built pandas.
Python for Data Analysis teaches you to work with real data using the libraries that define modern data work in Python. Written by Wes McKinney, the creator of the pandas library, this book moves quickly from Python fundamentals into practical data manipulation, aggregation, and visualization. You will learn to load, clean, reshape, and analyze datasets that reflect the messiness of production data, not textbook examples. At 471 pages, it covers IPython, NumPy, pandas, and matplotlib in enough depth to make you independently productive.
Most data analysis books teach statistics first and tools second. This one is different. Python for Data Analysis starts with the tools you will actually open every morning: IPython for interactive exploration, NumPy for fast array computation, pandas for structured data manipulation, and matplotlib for plotting results. The author is not a technical writer who learned pandas β he wrote it. That background shows in every chapter.
The book is organized around the workflow a practicing analyst follows. You start by getting comfortable with Python and IPython as an environment, then build up to NumPy arrays and vectorized operations before moving into the core of the book: pandas. You will spend significant time with Series and DataFrame objects, learning to index, slice, group, merge, reshape, and clean data the way real datasets demand. Time series handling, a notoriously fiddly area, gets its own dedicated treatment.
Real datasets appear throughout. Examples do not assume clean, well-structured input. You will handle missing values, duplicate rows, mixed-type columns, and mismatched indexes β the everyday friction that separates analysts who can work independently from those who stay stuck waiting for clean data.
By the end, you will have a repeatable mental model for attacking a new dataset: how to inspect it, clean it, reshape it into the form your analysis requires, and extract the summary statistics or visualizations that answer your question. These are skills you will use on every project, regardless of domain.
This is the first edition, published in 2013. The core concepts and pandas fundamentals it teaches remain foundational to modern data analysis in Python. Readers who want coverage of newer pandas features and syntax should be aware that some API details have evolved since publication.
Sets up the Python and IPython environment you will use throughout the book and explains why pandas and NumPy are the right tools for data analysis work.
Walks through several complete, end-to-end data analysis examples to show how the tools fit together before covering any of them in depth.
Teaches you to use IPython efficiently for exploration, introspection, and debugging, including magic commands, tab completion, and the notebook interface.
Introduces ndarray, NumPy's core data structure, and covers indexing, slicing, reshaping, and vectorized arithmetic that replace slow element-by-element Python loops.
Introduces Series and DataFrame, the two primary pandas objects, and covers the indexing, alignment, and basic operations you will rely on in every subsequent chapter.
Shows how to read and write data from CSV, Excel, JSON, HTML, and databases using pandas I/O tools, and covers common parsing options for messy files.
Covers the practical mechanics of cleaning missing values, removing duplicates, merging DataFrames, and pivoting data between wide and long formats.
Demonstrates how to create line, bar, scatter, and histogram plots using matplotlib and pandas plotting helpers to communicate analysis results visually.
Explains the groupby split-apply-combine pattern in depth, showing how to compute summary statistics, apply custom functions, and build pivot tables.
Covers date and time indexing, resampling, rolling and expanding window operations, and handling time zones for time-stamped datasets.
No prior pandas or NumPy experience is required. You do need a working knowledge of Python basics such as lists, dictionaries, and functions. The book introduces both libraries from scratch.
The core mental models and pandas fundamentals are still valid and widely taught. Some API syntax has changed in newer pandas versions, so you may occasionally need to consult current pandas documentation when a specific method call differs from what is shown.
O'Reilly provided companion materials with early editions of this book. Check the publisher's website or the author's GitHub profile for any available code and data files.
Both audiences use it, but the framing is analytical rather than engineering-focused. If your goal is manipulating and understanding data rather than building data pipelines for production systems, the book is a strong fit.
No. The focus is entirely on data manipulation, cleaning, aggregation, and visualization. It does not cover scikit-learn, statsmodels, or predictive modeling techniques.
New
A Practical Guide to Communicating Effectively with Data Visualizations and Charts
New
Techniques for Thinking Analytically and Solving Real Data Problems
New
A practical guide to the complete data engineering lifecycle, from ingestion to serving
by Joe Reis, Matt Housley
New
A hands-on guide to scalable data analytics using Python and PySpark