New
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow
A practical, project-driven introduction to machine learning and deep learning with Python
Pages
801
Published
2016
The definitive textbook on neural networks and deep learning theory, from foundational math to modern architectures
Build a rigorous understanding of deep learning from first principles so you can design, train, and reason about neural networks with confidence.
Written by three of the field's leading researchers, Deep Learning covers the mathematical and conceptual foundations that underpin modern neural networks. Starting from applied mathematics and probability, the book progresses through core machine learning concepts, feedforward networks, regularization, optimization, and convolutional and recurrent architectures. At 801 pages, it is the standard reference for graduate students, researchers, and engineers who need more than a tutorial — they need to understand why the methods work.
Most practitioners learn deep learning through tutorials, blog posts, and trial-and-error experimentation. Those resources get you started, but they leave gaps. When a model fails to converge, when a regularization trick behaves unexpectedly, or when you need to adapt an existing architecture to a new problem, shallow knowledge runs out fast.
Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville gives you the foundations to fill those gaps permanently. The authors built and shaped many of the techniques described in the book, and the explanations reflect that firsthand understanding. Every concept is grounded in mathematics — linear algebra, probability, information theory, and numerical computation — then connected to practical implementation decisions you will recognize from your own work.
The book is organized in three parts. Part one establishes the applied mathematics and machine learning prerequisites: vectors and matrices, probability distributions, maximum likelihood, bias-variance tradeoff, and the geometry of high-dimensional data. Part two covers the core deep learning models: feedforward networks, training with gradient-based optimization, regularization strategies, convolutional networks for structured data, and recurrent networks for sequences. Part three surveys the research frontier, including optimization dynamics, autoencoders, representation learning, structured probabilistic models, and the Monte Carlo methods that connect theory to practice.
This is not a cookbook. There are no step-by-step installation instructions or framework tutorials. What the book provides is durable knowledge — the kind that outlasts any particular library version and transfers cleanly to new problem domains. Readers who work through it seriously emerge with the vocabulary, intuition, and mathematical confidence to read primary research papers and contribute to the field rather than merely consume it.
Published by MIT Press and freely available in HTML form at deeplearningbook.org, the print edition remains the preferred format for sustained study. If you are serious about understanding deep learning rather than just applying it, this is the text the field converges on.
Establishes the motivating problems deep learning addresses and traces the historical development of neural networks from early perceptrons through the deep learning resurgence. You come away with a clear map of where the field stands and how the rest of the book is organized.
Reviews vectors, matrices, eigendecompositions, and the singular value decomposition with the emphasis each receives in machine learning contexts. You build the algebraic vocabulary needed to follow derivations throughout the rest of the text.
Covers probability distributions, Bayes' rule, continuous random variables, and information-theoretic quantities such as entropy and KL divergence. You learn to reason formally about uncertainty, a prerequisite for understanding loss functions and generative models.
Addresses floating-point precision, overflow and underflow, gradient-based optimization geometry, and the challenges of constrained optimization. You understand the computational realities that separate mathematical ideals from working implementations.
Introduces capacity, overfitting, underfitting, the bias-variance tradeoff, maximum likelihood estimation, and the design of learning algorithms from a unified perspective. You establish a principled framework that the subsequent deep learning chapters extend.
Constructs the feedforward network model in full: forward propagation, activation functions, output units matched to loss functions, and backpropagation derived from the chain rule. You learn to build and train multilayer networks with a clear account of what each design choice controls.
Surveys parameter norm penalties, dataset augmentation, noise injection, early stopping, dropout, and ensemble methods, connecting each to the underlying theory of reducing generalization error. You gain a principled basis for choosing regularization strategies rather than guessing.
Examines the properties of the loss landscape, the behavior of SGD and its momentum-based variants, adaptive learning-rate methods, and initialization strategies. You understand why certain optimizers converge reliably and what can go wrong in practice.
Derives convolution as a structured form of linear operation exploiting local connectivity and parameter sharing, then surveys pooling, padding, and common architectural patterns. You can reason about why convolutional networks generalize well on spatially structured data.
Covers recurrent network unrolling, backpropagation through time, the vanishing-gradient problem, and gating mechanisms including LSTM and GRU. You understand both the power and the limitations of sequence models and the design choices that address those limitations.
Comfort with undergraduate calculus and basic linear algebra is expected. The book's opening chapters review the specific mathematical topics in depth, so gaps can be filled in place, but readers with no prior exposure to probability or matrix operations may find the pace demanding.
Both audiences benefit, but in different ways. Practitioners gain the theoretical grounding to debug models and make principled design decisions. Researchers use it as a reference and entry point into the primary literature. If you only want step-by-step framework tutorials, this is not the right book.
No. Published in 2016, the book predates the widespread adoption of transformer-based models. It covers foundational architectures — feedforward, convolutional, and recurrent networks — along with generative and probabilistic models. For architectures developed after 2016, you will need supplementary sources.
The book is primarily theoretical and does not include framework-specific code tutorials or programming exercises. The focus is on mathematical understanding, derivations, and conceptual analysis rather than hands-on implementation.
The freely available HTML version contains the same content as the print edition. The print edition is preferred by many readers for extended study because of typesetting quality and the convenience of working away from a screen.
Ian Goodfellow, Yoshua Bengio, and Aaron Courville are active deep learning researchers who developed or contributed to techniques described in the book. The explanations reflect firsthand understanding of where methods come from and where they fail, which distinguishes this text from second-hand accounts.
New
A practical, project-driven introduction to machine learning and deep learning with Python
New
An Iterative Process for Production-Ready Machine Learning Applications
by Chip Huyen
New
A rigorous foundation in Bayesian reasoning, probabilistic models, and modern machine learning methods
New
The definitive textbook on intelligent systems, from foundational search and logic to modern machine learning and probabilistic reasoning
by Peter Norvig, Stuart Russell