New
Storytelling with Data
A Practical Guide to Communicating Effectively with Data Visualizations and Charts
Pages
414
Published
2013
What every business professional needs to know about data-driven decision making and machine learning
Bridge the gap between data science and business strategy so you can ask the right questions, evaluate results, and drive decisions with data.
Data Science for Business teaches you the core concepts behind modern data analytics and machine learning, explained through a business lens. Written by two leading researchers and practitioners, it equips managers, analysts, and technical teams with the frameworks they need to extract real value from data. You will learn how models are built, how to interpret their outputs critically, and how to avoid the traps that derail data projects before they deliver results.
Most business professionals encounter data science outputs every day, but few understand what is actually happening inside the models producing them. That gap causes bad decisions: over-trusting a model that performs well on paper, under-investing in the data infrastructure a project actually needs, or asking analysts the wrong questions entirely.
Data Science for Business closes that gap. Foster Provost and Tom Fawcett, both experienced researchers and practitioners, built this book around a single governing idea: data science is a set of principles for extracting useful knowledge from data, and those principles are learnable by anyone willing to think carefully.
The book does not skip the math, but it does not require a statistics PhD either. Every concept is anchored to a concrete business scenario. You will see why a classifier with 99% accuracy can still be useless for fraud detection, how lift curves help you decide whether a model is worth deploying, and why the choice of evaluation metric matters as much as the choice of algorithm.
Key topics covered include:
Whether you are a product manager reviewing a data team's proposal, an analyst building your first predictive model, or a technical lead trying to communicate results to stakeholders, this book gives you a shared vocabulary and a set of principled frameworks that hold up in practice. It has become a standard text in graduate business and data science programs precisely because it respects both the business context and the underlying science without sacrificing either.
Introduces the core framework of data-analytic thinking and explains how to structure a business problem so that data science can actually address it. You will see why framing the question correctly is often harder and more consequential than choosing an algorithm.
Walks through the main families of data-mining tasks, from classification to regression to similarity matching, and maps common business objectives to the appropriate task type. You will learn to recognize which tool fits which problem before any data is touched.
Builds the concept of a predictive model from first principles using decision trees as the running example. You will understand how a model learns from data, what a split criterion does, and how to read a learned tree as a business rule.
Examines the bias-variance tradeoff and explains why a model that fits training data perfectly is often the worst choice for deployment. You will apply cross-validation to estimate how a model will actually perform on new data.
Digs into the mechanics of overfitting across multiple model types and presents practical strategies, including pruning, regularization, and held-out test sets, for keeping it in check. Real business examples show how undetected overfitting has caused high-profile project failures.
Covers distance-based reasoning, k-nearest neighbors classification, and clustering algorithms including k-means. You will apply these methods to customer segmentation scenarios and learn how the choice of distance metric shapes the results you get.
Introduces expected value as the bridge between model performance metrics and actual business decisions. You will build cost-benefit matrices for classification scenarios and use them to decide whether a model should be deployed and how its threshold should be set.
Explains ROC curves, cumulative lift charts, and profit curves in detail, showing exactly what each visualization reveals about a classifier's behavior. You will learn to choose the right visualization for the business question you are trying to answer.
Covers Bayesian reasoning, probability estimation, and the Naive Bayes classifier. You will see how to combine prior knowledge with data evidence and why well-calibrated probability scores matter more than raw predicted labels for many business applications.
Introduces the bag-of-words model, TF-IDF weighting, and the basics of text classification. You will apply these techniques to realistic scenarios such as spam filtering and customer review analysis, understanding both what the methods can and cannot do.
No programming background is required, and the statistics prerequisites are minimal. The book explains mathematical concepts from first principles and always connects them to business scenarios. Readers comfortable with high-school algebra will follow the core arguments without difficulty.
The foundational principles the book teaches, how models generalize, how to evaluate them, how to frame business problems, have not changed. The specific algorithms and evaluation frameworks covered remain central to data science practice. More recent deep-learning developments are outside its scope, but the conceptual grounding it provides is directly applicable to understanding those methods too.
It is written for both, which is part of its strength. Business professionals gain enough depth to evaluate and guide data projects critically, while early-career analysts get the principled framework that implementation-focused courses often skip. Both groups find it useful.
The book focuses on concepts and principles rather than code implementation. It uses illustrative examples and worked scenarios throughout, but it does not provide programming tutorials or a code repository. Readers looking for hands-on Python or R practice will want to pair it with an implementation-focused resource.
Yes. It is a standard text in many graduate business analytics and data science programs at universities around the world. That adoption reflects its balance of rigor and accessibility rather than any particular certification or endorsement.
New
A Practical Guide to Communicating Effectively with Data Visualizations and Charts
New
Techniques for Thinking Analytically and Solving Real Data Problems
New
A practical guide to the complete data engineering lifecycle, from ingestion to serving
by Joe Reis, Matt Housley
New
A hands-on guide to scalable data analytics using Python and PySpark