Skip to content Skip to footer
-70%

Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications, ISBN-13: 978-1098107963

Original price was: $50.00.Current price is: $14.99.

 Safe & secure checkout

Description

Description

Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications, ISBN-13: 978-1098107963

[PDF eBook eTextbook] – Available Instantly

  • Publisher: ‎ O’Reilly Media; 1st edition (June 21, 2022)
  • Language: ‎ English
  • 386 pages
  • ISBN-10: ‎ 1098107969
  • ISBN-13: ‎ 978-1098107963

Machine learning systems are both complex and unique. Complex because they consist of many different components and involve many different stakeholders. Unique because they’re data dependent, with data varying wildly from one use case to the next. In this book, you’ll learn a holistic approach to designing ML systems that are reliable, scalable, maintainable, and adaptive to changing environments and business requirements.

Author Chip Huyen, co-founder of Claypot AI, considers each design decision–such as how to process and create training data, which features to use, how often to retrain models, and what to monitor–in the context of how it can help your system as a whole achieve its objectives. The iterative framework in this book uses actual case studies backed by ample references.

This book will help you tackle scenarios such as:

  • Engineering data and choosing the right metrics to solve a business problem
  • Automating the process for continually developing, evaluating, deploying, and updating models
  • Developing a monitoring system to quickly detect and address issues your models might encounter in production
  • Architecting an ML platform that serves across use cases
  • Developing responsible ML systems

Table of Contents:

Preface

Who This Book Is For

What This Book Is Not

Navigating This Book

GitHub Repository and Community

Conventions Used in This Book

Using Code Examples

O’Reilly Online Learning

How to Contact Us

Acknowledgments

1. Overview of Machine Learning Systems

When to Use Machine Learning

Machine Learning Use Cases

Understanding Machine Learning Systems

Machine Learning in Research Versus in Production

Machine Learning Systems Versus Traditional Software

Summary

2. Introduction to Machine Learning Systems Design

Business and ML Objectives

Requirements for ML Systems

Reliability

Scalability

Maintainability

Adaptability

Iterative Process

Framing ML Problems

Types of ML Tasks

Objective Functions

Mind Versus Data

Summary

3. Data Engineering Fundamentals

Data Sources

Data Formats

JSON

Row-Major Versus Column-Major Format

Text Versus Binary Format

Data Models

Relational Model

NoSQL

Structured Versus Unstructured Data

Data Storage Engines and Processing

Transactional and Analytical Processing

ETL: Extract, Transform, and Load

Modes of Dataflow

Data Passing Through Databases

Data Passing Through Services

Data Passing Through Real-Time Transport

Batch Processing Versus Stream Processing

Summary

4. Training Data

Sampling

Nonprobability Sampling

Simple Random Sampling

Stratified Sampling

Weighted Sampling

Reservoir Sampling

Importance Sampling

Labeling

Hand Labels

Natural Labels

Handling the Lack of Labels

Class Imbalance

Challenges of Class Imbalance

Handling Class Imbalance

Data Augmentation

Simple Label-Preserving Transformations

Perturbation

Data Synthesis

Summary

5. Feature Engineering

Learned Features Versus Engineered Features

Common Feature Engineering Operations

Handling Missing Values

Scaling

Discretization

Encoding Categorical Features

Feature Crossing

Discrete and Continuous Positional Embeddings

Data Leakage

Common Causes for Data Leakage

Detecting Data Leakage

Engineering Good Features

Feature Importance

Feature Generalization

Summary

6. Model Development and Offline Evaluation

Model Development and Training

Evaluating ML Models

Ensembles

Experiment Tracking and Versioning

Distributed Training

AutoML

Model Offline Evaluation

Baselines

Evaluation Methods

Summary

7. Model Deployment and Prediction Service

Machine Learning Deployment Myths

Myth 1: You Only Deploy One or Two ML Models at a Time

Myth 2: If We Don’t Do Anything, Model Performance Remains the Same

Myth 3: You Won’t Need to Update Your Models as Much

Myth 4: Most ML Engineers Don’t Need to Worry About Scale

Batch Prediction Versus Online Prediction

From Batch Prediction to Online Prediction

Unifying Batch Pipeline and Streaming Pipeline

Model Compression

Low-Rank Factorization

Knowledge Distillation

Pruning

Quantization

ML on the Cloud and on the Edge

Compiling and Optimizing Models for Edge Devices

ML in Browsers

Summary

8. Data Distribution Shifts and Monitoring

Causes of ML System Failures

Software System Failures

ML-Specific Failures

Data Distribution Shifts

Types of Data Distribution Shifts

General Data Distribution Shifts

Detecting Data Distribution Shifts

Addressing Data Distribution Shifts

Monitoring and Observability

ML-Specific Metrics

Monitoring Toolbox

Observability

Summary

9. Continual Learning and Test in Production

Continual Learning

Stateless Retraining Versus Stateful Training

Why Continual Learning?

Continual Learning Challenges

Four Stages of Continual Learning

How Often to Update Your Models

Test in Production

Shadow Deployment

A/B Testing

Canary Release

Interleaving Experiments

Bandits

Summary

10. Infrastructure and Tooling for MLOps

Storage and Compute

Public Cloud Versus Private Data Centers

Development Environment

Dev Environment Setup

Standardizing Dev Environments

From Dev to Prod: Containers

Resource Management

Cron, Schedulers, and Orchestrators

Data Science Workflow Management

ML Platform

Model Deployment

Model Store

Feature Store

Build Versus Buy

Summary

11. The Human Side of Machine Learning

User Experience

Ensuring User Experience Consistency

Combatting “Mostly Correct” Predictions

Smooth Failing

Team Structure

Cross-functional Teams Collaboration

End-to-End Data Scientists

Responsible AI

Irresponsible AI: Case Studies

A Framework for Responsible AI

Summary

Epilogue

Index

About the Author

Chip Huyen (https://huyenchip.com) is a co-founder of Claypot AI, a platform for real-time machine learning. Through her work at NVIDIA, Netflix, and Snorkel AI, she has helped some of the world’s largest organizations develop and deploy machine learning systems. She teaches CS 329S: Machine Learning Systems Design at Stanford, whose lecture notes this book is based on.

LinkedIn included her among Top Voices in Software Development (2019) and Top Voices in Data Science & AI (2020). She is also the author of four bestselling Vietnamese books, including the series Xach ba lo len va Di (Pack Your Bag and Go). She also runs a Discord server on MLOps with over 6,000 members.

What makes us different?

• Instant Download

• Always Competitive Pricing

• 100% Privacy

• FREE Sample Available

• 24-7 LIVE Customer Support

Delivery Info

Reviews (0)