Python Machine Learning book cover by Sebastian Raschka, featuring abstract data and algorithmic patterns on a dark editorial background

Pages

Published

2015

AI Learning ✨ New

Python Machine Learning

A practical guide to building machine learning systems with Python, scikit-learn, and the fundamentals of deep learning

Go from raw data to trained, evaluated, and deployed machine learning models using Python and scikit-learn, with no hand-waving over the hard parts.

S Sebastian Raschka

Python Machine Learning by Sebastian Raschka takes you through the core concepts and algorithms behind modern machine learning, implementing each one in Python from the ground up before showing you how scikit-learn automates the heavy lifting. You will train classifiers, build regression models, tune hyperparameters, reduce dimensionality, and cluster unlabeled data, finishing with a practical introduction to neural networks and sentiment analysis on real datasets.

Buy on Amazon →

About this book

Most machine learning tutorials ask you to trust the library and move on. This book does the opposite. Sebastian Raschka walks you through the mathematics and intuition behind each algorithm, then shows you the Python code that brings it to life, and then shows you how scikit-learn implements the same idea at production scale. You leave with understanding, not just working code.

The book opens with the perceptron and Adaline, two historically important linear classifiers you implement from scratch in NumPy. That foundation pays off immediately: when logistic regression, SVMs, and decision trees appear in the following chapters, you already understand what the optimization loop is actually doing. Raschka never skips the cost function or the gradient.

From classification the book moves into the full supervised-learning workflow: preprocessing raw features, encoding categoricals, handling missing values, splitting and scaling data correctly, and evaluating models with cross-validation and learning curves that tell you whether you are overfitting or underfitting. These chapters are worth the price of the book on their own for anyone who has cargo-culted their way through a Kaggle pipeline.

The second half shifts to unsupervised learning, dimensionality reduction with PCA and LDA, and ensemble methods including random forests and gradient boosting. A chapter on sentiment analysis with a logistic regression pipeline on a real movie-review dataset ties the techniques together in an end-to-end project. The final chapters introduce artificial neural networks and the basics of deep learning, giving you a solid footing before you reach for TensorFlow or PyTorch.

Implement the perceptron, Adaline, logistic regression, SVMs, decision trees, random forests, and k-means from scratch in NumPy
Use scikit-learn pipelines to automate preprocessing, feature selection, and model training in a single, reproducible object
Tune hyperparameters with grid search and randomized search, and select models with stratified k-fold cross-validation
Reduce high-dimensional data with PCA and LDA before feeding it to a classifier
Build a sentiment analysis pipeline on the IMDb movie-review dataset
Understand the forward and backpropagation mechanics that underpin every neural network

The code targets Python 3 and the scientific Python stack. Every algorithm is accompanied by worked examples, visualizations, and plain-language explanations of the underlying math. Raschka assumes you can write Python and remember enough calculus and linear algebra to follow a partial derivative, but he does not assume a statistics or computer science degree.

At 455 pages, the book is dense but not padded. Each chapter earns its place by adding a concept you will use in the next one.

🎯 What you'll learn

Implement core classification algorithms from scratch in NumPy so you understand what scikit-learn is doing under the hood
Build end-to-end supervised learning pipelines that handle preprocessing, training, and evaluation in one reproducible object
Diagnose overfitting and underfitting using learning curves, validation curves, and cross-validation scores
Reduce dimensionality with PCA and LDA before feeding data into a classifier
Cluster unlabeled data with k-means and hierarchical clustering, and evaluate the results honestly
Train and tune ensemble models including random forests and gradient boosting classifiers
Build a working sentiment analysis system on real movie-review text using a logistic regression pipeline
Understand the forward pass and backpropagation well enough to follow modern deep learning frameworks

👤 Who is this book for?

Python developers who want to move beyond tutorial snippets and understand how machine learning algorithms actually work
Data analysts familiar with pandas and NumPy who are ready to build and evaluate predictive models
Software engineers transitioning into machine learning roles who need a rigorous but practical foundation
Students in a machine learning course who want a second, code-first explanation alongside their textbook
Practitioners who have used scikit-learn by copying examples but want to understand the decisions they are making

01

Machine Learning and the Python Ecosystem

You get a map of the machine learning landscape, covering supervised, unsupervised, and reinforcement learning, and set up the Python scientific stack that every subsequent chapter depends on.
02

Training Simple Classifiers from Scratch

You implement the perceptron and Adaline in NumPy, tracing the weight-update rule step by step so the concept of gradient descent becomes concrete before you ever open scikit-learn.
03

Classifiers with scikit-learn

You apply logistic regression, SVMs, decision trees, k-nearest neighbors, and naive Bayes to the same datasets you built by hand, learning the scikit-learn API and when to reach for each algorithm.
04

Building Good Training Datasets

You preprocess raw features correctly: imputing missing values, encoding categoricals, scaling numerical features, and selecting the variables that matter most, so your models train on clean input.
05

Dimensionality Reduction

You compress high-dimensional data using PCA and LDA, visualize the transformed feature spaces, and measure how much predictive information survives the reduction.
06

Model Evaluation and Hyperparameter Tuning

You evaluate models honestly with stratified k-fold cross-validation, plot learning and validation curves to diagnose bias and variance, and tune hyperparameters with grid search and randomized search.
07

Ensemble Methods

You combine weak learners into strong ones using bagging with random forests and boosting with AdaBoost and gradient boosting, then compare their performance and interpretability trade-offs.
08

Sentiment Analysis with Logistic Regression

You build a complete text classification pipeline on the IMDb movie-review dataset, covering tokenization, TF-IDF vectorization, and out-of-core learning on data that does not fit in memory.
09

Clustering Unlabeled Data

You apply k-means and hierarchical clustering to unlabeled datasets, choose the right number of clusters using the elbow method and dendrograms, and evaluate cohesion with silhouette scores.
10

Neural Networks and the Basics of Deep Learning

You implement a single-hidden-layer neural network from scratch, work through the backpropagation algorithm by hand, and connect what you have built to the architecture of modern deep learning frameworks.

Frequently asked questions

What Python version and libraries does the book use?

The book targets Python 3 and the core scientific Python stack: NumPy, pandas, matplotlib, and scikit-learn. The neural network chapters use the same stack rather than a framework, so the concepts transfer to any modern deep learning library.

Do I need a math background to follow along?

You need enough linear algebra to be comfortable with vectors and matrices, and enough calculus to follow a partial derivative. Raschka explains the intuition alongside the notation, so you do not need a formal statistics or computer science degree.

Is this book still relevant given how fast the field moves?

The core algorithms covered, logistic regression, SVMs, random forests, PCA, and backpropagation, are as relevant today as they were at publication. The scikit-learn API calls may need minor updates for current library versions, but the concepts and code structure remain accurate.

Does the book cover deep learning frameworks like TensorFlow or PyTorch?

No. The book introduces neural networks and backpropagation from first principles in NumPy. It is designed to give you the foundation to learn a framework confidently, not to teach one specifically.

Who is this book not for?

If you are already comfortable training, tuning, and evaluating models in scikit-learn and are looking to specialize in deep learning or MLOps, this book will feel introductory. It is aimed at practitioners who want to build that foundation correctly.

Are the datasets and code examples available separately?

The book was originally published with companion code on GitHub maintained by the author. Check Sebastian Raschka's public repositories for the source code that accompanies the text.

Get this book

Buy on Amazon →

Specs

Publisher: Packt Publishing Ltd
Published: Sep 2015
Pages: 455
Language: English

About the author

Sebastian Raschka

New

Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow

A practical, project-driven introduction to machine learning and deep learning with Python

by Aurélien Géron

AI Learning

2022 View →

New

Designing Machine Learning Systems

An Iterative Process for Production-Ready Machine Learning Applications

by Chip Huyen

AI Learning

2022 View →

New

Probabilistic Machine Learning

A rigorous foundation in Bayesian reasoning, probabilistic models, and modern machine learning methods

by Kevin P. Murphy

AI Learning

2022 View →

Cover of Artificial Intelligence: A Modern Approach by Russell and Norvig, showing abstract symbolic representation of intelligent systems

New

Artificial Intelligence: A Modern Approach, Global Edition

The definitive textbook on intelligent systems, from foundational search and logic to modern machine learning and probabilistic reasoning

by Peter Norvig, Stuart Russell

AI Learning

2021 View →

Python Machine Learning

About this book

🎯 What you'll learn

👤 Who is this book for?

Table of contents

Machine Learning and the Python Ecosystem

Training Simple Classifiers from Scratch

Classifiers with scikit-learn

Building Good Training Datasets

Dimensionality Reduction

Model Evaluation and Hyperparameter Tuning

Ensemble Methods

Sentiment Analysis with Logistic Regression

Clustering Unlabeled Data

Neural Networks and the Basics of Deep Learning

Frequently asked questions

You might also like

Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow

Designing Machine Learning Systems

Probabilistic Machine Learning

Artificial Intelligence: A Modern Approach, Global Edition

Python Machine Learning

About this book

🎯 What you'll learn

👤 Who is this book for?

Table of contents

Machine Learning and the Python Ecosystem

Training Simple Classifiers from Scratch

Classifiers with scikit-learn

Building Good Training Datasets

Dimensionality Reduction

Model Evaluation and Hyperparameter Tuning

Ensemble Methods

Sentiment Analysis with Logistic Regression

Clustering Unlabeled Data

Neural Networks and the Basics of Deep Learning

Frequently asked questions

You might also like

Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow

Designing Machine Learning Systems

Probabilistic Machine Learning

Artificial Intelligence: A Modern Approach, Global Edition

Stay ahead of the curve