New
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow
A practical, project-driven introduction to machine learning and deep learning with Python
Pages
455
Published
2015
A practical guide to building machine learning systems with Python, scikit-learn, and the fundamentals of deep learning
Go from raw data to trained, evaluated, and deployed machine learning models using Python and scikit-learn, with no hand-waving over the hard parts.
Python Machine Learning by Sebastian Raschka takes you through the core concepts and algorithms behind modern machine learning, implementing each one in Python from the ground up before showing you how scikit-learn automates the heavy lifting. You will train classifiers, build regression models, tune hyperparameters, reduce dimensionality, and cluster unlabeled data, finishing with a practical introduction to neural networks and sentiment analysis on real datasets.
Most machine learning tutorials ask you to trust the library and move on. This book does the opposite. Sebastian Raschka walks you through the mathematics and intuition behind each algorithm, then shows you the Python code that brings it to life, and then shows you how scikit-learn implements the same idea at production scale. You leave with understanding, not just working code.
The book opens with the perceptron and Adaline, two historically important linear classifiers you implement from scratch in NumPy. That foundation pays off immediately: when logistic regression, SVMs, and decision trees appear in the following chapters, you already understand what the optimization loop is actually doing. Raschka never skips the cost function or the gradient.
From classification the book moves into the full supervised-learning workflow: preprocessing raw features, encoding categoricals, handling missing values, splitting and scaling data correctly, and evaluating models with cross-validation and learning curves that tell you whether you are overfitting or underfitting. These chapters are worth the price of the book on their own for anyone who has cargo-culted their way through a Kaggle pipeline.
The second half shifts to unsupervised learning, dimensionality reduction with PCA and LDA, and ensemble methods including random forests and gradient boosting. A chapter on sentiment analysis with a logistic regression pipeline on a real movie-review dataset ties the techniques together in an end-to-end project. The final chapters introduce artificial neural networks and the basics of deep learning, giving you a solid footing before you reach for TensorFlow or PyTorch.
The code targets Python 3 and the scientific Python stack. Every algorithm is accompanied by worked examples, visualizations, and plain-language explanations of the underlying math. Raschka assumes you can write Python and remember enough calculus and linear algebra to follow a partial derivative, but he does not assume a statistics or computer science degree.
At 455 pages, the book is dense but not padded. Each chapter earns its place by adding a concept you will use in the next one.
You get a map of the machine learning landscape, covering supervised, unsupervised, and reinforcement learning, and set up the Python scientific stack that every subsequent chapter depends on.
You implement the perceptron and Adaline in NumPy, tracing the weight-update rule step by step so the concept of gradient descent becomes concrete before you ever open scikit-learn.
You apply logistic regression, SVMs, decision trees, k-nearest neighbors, and naive Bayes to the same datasets you built by hand, learning the scikit-learn API and when to reach for each algorithm.
You preprocess raw features correctly: imputing missing values, encoding categoricals, scaling numerical features, and selecting the variables that matter most, so your models train on clean input.
You compress high-dimensional data using PCA and LDA, visualize the transformed feature spaces, and measure how much predictive information survives the reduction.
You evaluate models honestly with stratified k-fold cross-validation, plot learning and validation curves to diagnose bias and variance, and tune hyperparameters with grid search and randomized search.
You combine weak learners into strong ones using bagging with random forests and boosting with AdaBoost and gradient boosting, then compare their performance and interpretability trade-offs.
You build a complete text classification pipeline on the IMDb movie-review dataset, covering tokenization, TF-IDF vectorization, and out-of-core learning on data that does not fit in memory.
You apply k-means and hierarchical clustering to unlabeled datasets, choose the right number of clusters using the elbow method and dendrograms, and evaluate cohesion with silhouette scores.
You implement a single-hidden-layer neural network from scratch, work through the backpropagation algorithm by hand, and connect what you have built to the architecture of modern deep learning frameworks.
The book targets Python 3 and the core scientific Python stack: NumPy, pandas, matplotlib, and scikit-learn. The neural network chapters use the same stack rather than a framework, so the concepts transfer to any modern deep learning library.
You need enough linear algebra to be comfortable with vectors and matrices, and enough calculus to follow a partial derivative. Raschka explains the intuition alongside the notation, so you do not need a formal statistics or computer science degree.
The core algorithms covered, logistic regression, SVMs, random forests, PCA, and backpropagation, are as relevant today as they were at publication. The scikit-learn API calls may need minor updates for current library versions, but the concepts and code structure remain accurate.
No. The book introduces neural networks and backpropagation from first principles in NumPy. It is designed to give you the foundation to learn a framework confidently, not to teach one specifically.
If you are already comfortable training, tuning, and evaluating models in scikit-learn and are looking to specialize in deep learning or MLOps, this book will feel introductory. It is aimed at practitioners who want to build that foundation correctly.
The book was originally published with companion code on GitHub maintained by the author. Check Sebastian Raschka's public repositories for the source code that accompanies the text.
New
A practical, project-driven introduction to machine learning and deep learning with Python
New
An Iterative Process for Production-Ready Machine Learning Applications
by Chip Huyen
New
A rigorous foundation in Bayesian reasoning, probabilistic models, and modern machine learning methods
New
The definitive textbook on intelligent systems, from foundational search and logic to modern machine learning and probabilistic reasoning
by Peter Norvig, Stuart Russell