New
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow
A practical, project-driven introduction to machine learning and deep learning with Python
Published
2016
A rigorous probabilistic treatment of machine learning and statistical pattern recognition
Build a deep, principled understanding of how machine learning algorithms actually work by mastering the probabilistic and statistical foundations that drive them.
Pattern Recognition and Machine Learning by Christopher M. Bishop is the definitive graduate-level textbook on the probabilistic approach to machine learning. Covering everything from Bayesian inference and graphical models to neural networks and kernel methods, it equips you with the mathematical framework to understand why algorithms work, not just how to apply them. This is the book practitioners reach for when they need rigorous grounding rather than surface-level intuition.
Most machine learning resources teach you to use algorithms. This book teaches you to understand them. Christopher M. Bishop builds every method from first principles, grounding each technique in probability theory and statistical inference so that you can reason about models rather than just configure them.
The book opens with the foundations: probability distributions, decision theory, and information theory. From there it develops the tools you need to approach any learning problem with rigor: linear models for regression and classification, kernel methods, sparse models, and graphical models that make complex dependency structures explicit and tractable.
A central theme throughout is the Bayesian perspective. Rather than treating parameters as fixed unknowns to be estimated, Bishop frames learning as inference over distributions. This viewpoint unlocks a coherent treatment of model complexity, overfitting, and uncertainty quantification that simpler frequentist accounts cannot provide. Expectation Maximization, variational inference, and sampling methods are developed in full, giving you the tools to approximate intractable posteriors in real problems.
The final sections connect these foundations to the methods that defined modern machine learning before the deep learning era, including support vector machines, relevance vector machines, principal component analysis, independent component analysis, and sequential models for time-series data. Each is derived rather than presented as a black box, so the relationships between methods become clear.
Whether you are a graduate student building the theoretical foundations for research, or an experienced practitioner who wants to move beyond API calls and understand what the math is actually doing, this book rewards sustained study. It is dense, precise, and honest about complexity. It does not simplify ideas into misconceptions. That is exactly why it has remained the standard reference in the field since its first publication.
Establishes the core problem of pattern recognition through a polynomial curve-fitting example, then introduces the probability theory, decision theory, and information-theoretic concepts that underpin every method in the book.
Develops the key parametric distributions used throughout the text, including Gaussian, Dirichlet, Wishart, and exponential family forms, covering both maximum likelihood estimation and Bayesian conjugate priors for each.
Builds linear regression from maximum likelihood through to Bayesian linear regression, introducing the evidence approximation and demonstrating how a fully probabilistic treatment resolves model complexity selection.
Covers discriminant functions, probabilistic generative and discriminative classifiers, logistic regression, and the Laplace approximation, showing how classification reduces to density estimation and inference.
Derives feed-forward neural networks as a flexible class of parametric nonlinear models, covering backpropagation, regularization, mixture density networks, and Bayesian neural networks via the evidence framework.
Introduces the kernel trick and Gaussian processes as the natural Bayesian nonparametric counterpart to kernel regression, connecting the two perspectives through the concept of dual representations.
Derives the support vector machine from the margin-maximization principle and develops the relevance vector machine as a sparse Bayesian alternative that produces probabilistic predictions.
Introduces directed and undirected graphical models as a language for representing conditional independence structure, then develops exact inference algorithms including belief propagation and the junction tree algorithm.
Presents the expectation-maximization algorithm in full generality using the lower-bound view, applying it to Gaussian mixtures, factor analysis, and a range of other latent variable models.
Covers variational Bayes, expectation propagation, Markov chain Monte Carlo, Gibbs sampling, and slice sampling as practical tools for posterior approximation when exact inference is computationally infeasible.
You should be comfortable with multivariate calculus, linear algebra, and basic probability at an undergraduate level. Familiarity with maximum likelihood estimation helps, though the book recaps the key ideas as it goes.
It works for both, but self-study requires patience. The material is dense and builds cumulatively, so readers who skip chapters often find later sections opaque. Setting aside time to work through the exercises is strongly recommended.
No. The book predates the deep learning era and does not cover convolutional networks, attention mechanisms, or large language models. Its neural network chapter treats shallow networks in a probabilistic framework. It is the foundation you build on before reaching those topics.
The publisher does not provide an official solutions manual. Partial community solutions exist online, but the book itself does not include worked answers. The exercises range from straightforward derivations to genuinely challenging proofs.
The 2016 edition is a corrected reprint of the original 2006 text, not a new edition. The core content is unchanged, but known errata from earlier printings have been addressed.
If you are looking for practical implementation guidance, API walkthroughs, or applied project tutorials, this is not the right starting point. The book focuses entirely on mathematical theory and derivations, with no code.
New
A practical, project-driven introduction to machine learning and deep learning with Python
New
An Iterative Process for Production-Ready Machine Learning Applications
by Chip Huyen
New
A rigorous foundation in Bayesian reasoning, probabilistic models, and modern machine learning methods
New
The definitive textbook on intelligent systems, from foundational search and logic to modern machine learning and probabilistic reasoning
by Peter Norvig, Stuart Russell