Cover of Data Science for Business by Provost and Fawcett, featuring abstract data and analytics imagery on an O'Reilly Media design

Pages

Published

2013

Data Analytics

Data Science for Business

What every business professional needs to know about data-driven decision making and machine learning

Bridge the gap between data science and business strategy so you can ask the right questions, evaluate results, and drive decisions with data.

F Foster Provost T Tom Fawcett

Data Science for Business teaches you the core concepts behind modern data analytics and machine learning, explained through a business lens. Written by two leading researchers and practitioners, it equips managers, analysts, and technical teams with the frameworks they need to extract real value from data. You will learn how models are built, how to interpret their outputs critically, and how to avoid the traps that derail data projects before they deliver results.

Buy on Amazon →

About this book

Most business professionals encounter data science outputs every day, but few understand what is actually happening inside the models producing them. That gap causes bad decisions: over-trusting a model that performs well on paper, under-investing in the data infrastructure a project actually needs, or asking analysts the wrong questions entirely.

Data Science for Business closes that gap. Foster Provost and Tom Fawcett, both experienced researchers and practitioners, built this book around a single governing idea: data science is a set of principles for extracting useful knowledge from data, and those principles are learnable by anyone willing to think carefully.

The book does not skip the math, but it does not require a statistics PhD either. Every concept is anchored to a concrete business scenario. You will see why a classifier with 99% accuracy can still be useless for fraud detection, how lift curves help you decide whether a model is worth deploying, and why the choice of evaluation metric matters as much as the choice of algorithm.

Key topics covered include:

Supervised learning: classification and regression from first principles
How to formulate a business problem as a data-mining problem
Overfitting, generalization, and what cross-validation actually tells you
Decision trees, rule induction, and linear models explained without black-box treatment
Similarity-based reasoning and clustering for customer segmentation
Probability estimation, expected value, and how to use them to rank model outputs
Text mining and the bag-of-words representation
Causal reasoning and why correlation is not enough for good decisions

Whether you are a product manager reviewing a data team's proposal, an analyst building your first predictive model, or a technical lead trying to communicate results to stakeholders, this book gives you a shared vocabulary and a set of principled frameworks that hold up in practice. It has become a standard text in graduate business and data science programs precisely because it respects both the business context and the underlying science without sacrificing either.

🎯 What you'll learn

Frame any business problem as a well-defined data-mining task before touching a single dataset.
Distinguish between models that look accurate and models that actually create business value.
Interpret confusion matrices, ROC curves, and lift charts to make deployment decisions with confidence.
Recognize overfitting in practice and apply the right evaluation strategy to avoid it.
Apply clustering and similarity-based methods to segment customers and detect anomalies.
Use expected value calculations to translate model performance into business impact in dollars.
Reason carefully about causality and avoid the correlation traps that sink otherwise well-executed data projects.
Communicate model tradeoffs clearly to both technical and non-technical audiences.

👤 Who is this book for?

Business analysts and managers who need to evaluate data science proposals and interpret model outputs without becoming data scientists themselves.
Product managers overseeing data-driven features who want to ask sharper questions and spot flawed reasoning before it reaches production.
Early-career data scientists and analysts who want a rigorous conceptual foundation before diving into implementation-heavy courses or libraries.
MBA students and graduate business school participants taking their first formal data analytics course.
Technical leads who need a clear way to explain model performance, tradeoffs, and limitations to non-technical stakeholders.
Domain experts in marketing, finance, or operations who are increasingly expected to participate in data projects and need the vocabulary to do so.

01

Data-Analytic Thinking

Introduces the core framework of data-analytic thinking and explains how to structure a business problem so that data science can actually address it. You will see why framing the question correctly is often harder and more consequential than choosing an algorithm.
02

Business Problems and Data Science Solutions

Walks through the main families of data-mining tasks, from classification to regression to similarity matching, and maps common business objectives to the appropriate task type. You will learn to recognize which tool fits which problem before any data is touched.
03

Introduction to Predictive Modeling

Builds the concept of a predictive model from first principles using decision trees as the running example. You will understand how a model learns from data, what a split criterion does, and how to read a learned tree as a business rule.
04

Fitting a Model to Data

Examines the bias-variance tradeoff and explains why a model that fits training data perfectly is often the worst choice for deployment. You will apply cross-validation to estimate how a model will actually perform on new data.
05

Overfitting and Its Avoidance

Digs into the mechanics of overfitting across multiple model types and presents practical strategies, including pruning, regularization, and held-out test sets, for keeping it in check. Real business examples show how undetected overfitting has caused high-profile project failures.
06

Similarity, Neighbors, and Clusters

Covers distance-based reasoning, k-nearest neighbors classification, and clustering algorithms including k-means. You will apply these methods to customer segmentation scenarios and learn how the choice of distance metric shapes the results you get.
07

Decision Analytic Thinking and Expected Value

Introduces expected value as the bridge between model performance metrics and actual business decisions. You will build cost-benefit matrices for classification scenarios and use them to decide whether a model should be deployed and how its threshold should be set.
08

Visualizing Model Performance

Explains ROC curves, cumulative lift charts, and profit curves in detail, showing exactly what each visualization reveals about a classifier's behavior. You will learn to choose the right visualization for the business question you are trying to answer.
09

Evidence and Probabilities

Covers Bayesian reasoning, probability estimation, and the Naive Bayes classifier. You will see how to combine prior knowledge with data evidence and why well-calibrated probability scores matter more than raw predicted labels for many business applications.
10

Representing and Mining Text

Introduces the bag-of-words model, TF-IDF weighting, and the basics of text classification. You will apply these techniques to realistic scenarios such as spam filtering and customer review analysis, understanding both what the methods can and cannot do.

Frequently asked questions

Do I need a background in statistics or programming to get value from this book?

No programming background is required, and the statistics prerequisites are minimal. The book explains mathematical concepts from first principles and always connects them to business scenarios. Readers comfortable with high-school algebra will follow the core arguments without difficulty.

Is this book still relevant given that it was published in 2013?

The foundational principles the book teaches, how models generalize, how to evaluate them, how to frame business problems, have not changed. The specific algorithms and evaluation frameworks covered remain central to data science practice. More recent deep-learning developments are outside its scope, but the conceptual grounding it provides is directly applicable to understanding those methods too.

Is this primarily a book for business managers or for technical practitioners?

It is written for both, which is part of its strength. Business professionals gain enough depth to evaluate and guide data projects critically, while early-career analysts get the principled framework that implementation-focused courses often skip. Both groups find it useful.

Does the book include code examples or hands-on exercises?

The book focuses on concepts and principles rather than code implementation. It uses illustrative examples and worked scenarios throughout, but it does not provide programming tutorials or a code repository. Readers looking for hands-on Python or R practice will want to pair it with an implementation-focused resource.

Is this book used in university courses?

Yes. It is a standard text in many graduate business analytics and data science programs at universities around the world. That adoption reflects its balance of rigor and accessibility rather than any particular certification or endorsement.

Get this book

Buy on Amazon →

Specs

Publisher: O'Reilly Media, Inc.
Published: Jul 2013
Pages: 414
Language: English

About the authors

Foster Provost

Tom Fawcett

New

Storytelling with Data

A Practical Guide to Communicating Effectively with Data Visualizations and Charts

by Cole Nussbaumer Knaflic

Data Analytics

2025 View →

New

Data Science: The Hard Parts

Techniques for Thinking Analytically and Solving Real Data Problems

by Daniel Vaughan

Data Analytics

2023 View →

New

Fundamentals of Data Engineering

A practical guide to the complete data engineering lifecycle, from ingestion to serving

by Joe Reis, Matt Housley

Data Analytics

2022 View →

New

Data Analysis with Python and PySpark

A hands-on guide to scalable data analytics using Python and PySpark

by Jonathan Rioux

Data Analytics

2022 View →

Data Science for Business

About this book

🎯 What you'll learn

👤 Who is this book for?

Table of contents

Data-Analytic Thinking

Business Problems and Data Science Solutions

Introduction to Predictive Modeling

Fitting a Model to Data

Overfitting and Its Avoidance

Similarity, Neighbors, and Clusters

Decision Analytic Thinking and Expected Value

Visualizing Model Performance

Evidence and Probabilities

Representing and Mining Text

Frequently asked questions