Skip to content Skip to footer
-70%

Essential Math for AI: Next-Level Mathematics for Efficient and Successful AI Systems, ISBN-13: 978-1098107635

Original price was: $50.00.Current price is: $14.99.

 Safe & secure checkout

Description

Description

Essential Math for AI: Next-Level Mathematics for Efficient and Successful AI Systems, ISBN-13: 978-1098107635

[PDF eBook eTextbook] – Available Instantly

  • Publisher: ‎ O’Reilly Media; 1st edition (February 14, 2023)
  • Language: ‎ English
  • 602 pages
  • ISBN-10: ‎ 1098107632
  • ISBN-13: ‎ 978-1098107635

Companies are scrambling to integrate AI into their systems and operations. But to build truly successful solutions, you need a firm grasp of the underlying mathematics. This accessible guide walks you through the math necessary to thrive in the AI field such as focusing on real-world applications rather than dense academic theory.

Engineers, data scientists, and students alike will examine mathematical topics critical for AI–including regression, neural networks, optimization, backpropagation, convolution, Markov chains, and more–through popular applications such as computer vision, natural language processing, and automated systems. And supplementary Jupyter notebooks shed light on examples with Python code and visualizations. Whether you’re just beginning your career or have years of experience, this book gives you the foundation necessary to dive deeper in the field.

  • Understand the underlying mathematics powering AI systems, including generative adversarial networks, random graphs, large random matrices, mathematical logic, optimal control, and more
  • Learn how to adapt mathematical methods to different applications from completely different fields
  • Gain the mathematical fluency to interpret and explain how AI systems arrive at their decisions

Who This Book Is Not For?

This book is not for a person who likes to sit down and do many exercises to master a particular mathematical technique or method, a person who likes to write and prove theorems, or a person who wants to learn coding and development. This is not a math textbook. There are many excellent textbooks that teach calculus, linear algebra, and probability (but few books relate this math to AI). That said, this book has many in-text pointers to the relevant books and scientific publications for readers who want to dive into technicalities, rigorous statements, and proofs. This is also not a coding book. The emphasis is on concepts, intuition, and general understanding, rather than on implementing and developing the technology.

What Math Background Is Expected from You to Be Able to Read This Book?

This book is self-contained in the sense that we motivate everything that we need to use. I do hope that you have been exposed to calculus and some linear algebra, including vector and matrix operations, such as addition, multiplication, and some matrix decompositions. I also hope that you know what a function is and how it maps an input to an output. Most of what we do mathematically in AI involves constructing a function, evaluating a function, optimizing a function, or composing a bunch of functions. You need to know about derivatives (these measure how fast things change) and the chain rule for derivatives. You do not necessarily need to know how to compute them for each function, as computers, Python, Desmos, and/or Wolfram|Alpha mathematics do a lot for us nowadays, but you need to know their meaning. Some exposure to probabilistic and statistical thinking are helpful as well. If you do not know any of the above, that is totally fine. You might have to sit down and do some examples (from some other books) on your own to familiarize yourself with certain concepts. The trick here is to know when to look up the things that you do not know…​only when you need them, meaning only when you encounter a term that you do not understand, and you have a good idea of the context within which it appeared. If you are truly starting from scratch, you are not too far behind. This book tries to avoid technicalities at all costs.

Table of Contents:

Preface

Why I Wrote This Book

Who Is This Book For?

Who Is This Book Not For?

How Will the Math Be Presented in This Book?

Infographic

What Math Background Is Expected from You to Be Able to Read This Book?

Overview of the Chapters

My Favorite Books on AI

Conventions Used in This Book

Using Code Examples

O’Reilly Online Learning

How to Contact Us

Acknowledgments

1. Why Learn the Mathematics of AI?

What Is AI?

Why Is AI So Popular Now?

What Is AI Able to Do?

An AI Agent’s Specific Tasks

What Are AI’s Limitations?

What Happens When AI Systems Fail?

Where Is AI Headed?

Who Are the Current Main Contributors to the AI Field?

What Math Is Typically Involved in AI?

Summary and Looking Ahead

2. Data, Data, Data

Data for AI

Real Data Versus Simulated Data

Mathematical Models: Linear Versus Nonlinear

An Example of Real Data

An Example of Simulated Data

Mathematical Models: Simulations and AI

Where Do We Get Our Data From?

The Vocabulary of Data Distributions, Probability, and Statistics

Random Variables

Probability Distributions

Marginal Probabilities

The Uniform and the Normal Distributions

Conditional Probabilities and Bayes’ Theorem

Conditional Probabilities and Joint Distributions

Prior Distribution, Posterior Distribution, and Likelihood Function

Mixtures of Distributions

Sums and Products of Random Variables

Using Graphs to Represent Joint Probability Distributions

Expectation, Mean, Variance, and Uncertainty

Covariance and Correlation

Markov Process

Normalizing, Scaling, and/or Standardizing a Random Variable or Data Set

Common Examples

Continuous Distributions Versus Discrete Distributions (Density Versus Mass)

The Power of the Joint Probability Density Function

Distribution of Data: The Uniform Distribution

Distribution of Data: The Bell-Shaped Normal (Gaussian) Distribution

Distribution of Data: Other Important and Commonly Used Distributions

The Various Uses of the Word “Distribution”

A/B Testing

Summary and Looking Ahead

3. Fitting Functions to Data

Traditional and Very Useful Machine Learning Models

Numerical Solutions Versus Analytical Solutions

Regression: Predict a Numerical Value

Training Function

Loss Function

Optimization

Logistic Regression: Classify into Two Classes

Training Function

Loss Function

Optimization

Softmax Regression: Classify into Multiple Classes

Training Function

Loss Function

Optimization

Incorporating These Models into the Last Layer of a Neural Network

Other Popular Machine Learning Techniques and Ensembles of Techniques

Support Vector Machines

Decision Trees

Random Forests

k-means Clustering

Performance Measures for Classification Models

Summary and Looking Ahead

4. Optimization for Neural Networks

The Brain Cortex and Artificial Neural Networks

Training Function: Fully Connected, or Dense, Feed Forward Neural Networks

A Neural Network Is a Computational Graph Representation of the Training Function

Linearly Combine, Add Bias, Then Activate

Common Activation Functions

Universal Function Approximation

Approximation Theory for Deep Learning

Loss Functions

Optimization

Mathematics and the Mysterious Success of Neural Networks

Gradient Descent ω → i+1 = ω → i – η ∇ L ( ω → i )

Explaining the Role of the Learning Rate Hyperparameter η

Convex Versus Nonconvex Landscapes

Stochastic Gradient Descent

Initializing the Weights ω → 0 for the Optimization Process

Regularization Techniques

Dropout

Early Stopping

Batch Normalization of Each Layer

Control the Size of the Weights by Penalizing Their Norm

Penalizing the l 2 Norm Versus Penalizing the l 1 Norm

Explaining the Role of the Regularization Hyperparameter α

Hyperparameter Examples That Appear in Machine Learning

Chain Rule and Backpropagation: Calculating ∇ L ( ω → i )

Backpropagation Is Not Too Different from How Our Brain Learns

Why Is It Better to Backpropagate?

Backpropagation in Detail

Assessing the Significance of the Input Data Features

Summary and Looking Ahead

5. Convolutional Neural Networks and Computer Vision

Convolution and Cross-Correlation

Translation Invariance and Translation Equivariance

Convolution in Usual Space Is a Product in Frequency Space

Convolution from a Systems Design Perspective

Convolution and Impulse Response for Linear and Translation Invariant Systems

Convolution and One-Dimensional Discrete Signals

Convolution and Two-Dimensional Discrete Signals

Filtering Images

Feature Maps

Linear Algebra Notation

The One-Dimensional Case: Multiplication by a Toeplitz Matrix

The Two-Dimensional Case: Multiplication by a Doubly Block Circulant Matrix

Pooling

A Convolutional Neural Network for Image Classification

Summary and Looking Ahead

6. Singular Value Decomposition: Image Processing, Natural Language Processing, and Social Media

Matrix Factorization

Diagonal Matrices

Matrices as Linear Transformations Acting on Space

Action of A on the Right Singular Vectors

Action of A on the Standard Unit Vectors and the Unit Square Determined by Them

Action of A on the Unit Circle

Breaking Down the Circle-to-Ellipse Transformation According to the Singular Value Decomposition

Rotation and Reflection Matrices

Action of A on a General Vector x →

Three Ways to Multiply Matrices

The Big Picture

The Condition Number and Computational Stability

The Ingredients of the Singular Value Decomposition

Singular Value Decomposition Versus the Eigenvalue Decomposition

Computation of the Singular Value Decomposition

Computing an Eigenvector Numerically

The Pseudoinverse

Applying the Singular Value Decomposition to Images

Principal Component Analysis and Dimension Reduction

Principal Component Analysis and Clustering

A Social Media Application

Latent Semantic Analysis

Randomized Singular Value Decomposition

Summary and Looking Ahead

7. Natural Language and Finance AI: Vectorization and Time Series

Natural Language AI

Preparing Natural Language Data for Machine Processing

Statistical Models and the log Function

Zipf’s Law for Term Counts

Various Vector Representations for Natural Language Documents

Term Frequency Vector Representation of a Document or Bag of Words

Term Frequency-Inverse Document Frequency Vector Representation of a Document

Topic Vector Representation of a Document Determined by Latent Semantic Analysis

Topic Vector Representation of a Document Determined by Latent Dirichlet Allocation

Topic Vector Representation of a Document Determined by Latent Discriminant Analysis

Meaning Vector Representations of Words and of Documents Determined by Neural Network Embeddings

Cosine Similarity

Natural Language Processing Applications

Sentiment Analysis

Spam Filter

Search and Information Retrieval

Machine Translation

Image Captioning

Chatbots

Other Applications

Transformers and Attention Models

The Transformer Architecture

The Attention Mechanism

Transformers Are Far from Perfect

Convolutional Neural Networks for Time Series Data

Recurrent Neural Networks for Time Series Data

How Do Recurrent Neural Networks Work?

Gated Recurrent Units and Long Short-Term Memory Units

An Example of Natural Language Data

Finance AI

Summary and Looking Ahead

8. Probabilistic Generative Models

What Are Generative Models Useful For?

The Typical Mathematics of Generative Models

Shifting Our Brain from Deterministic Thinking to Probabilistic Thinking

Maximum Likelihood Estimation

Explicit and Implicit Density Models

Explicit Density-Tractable: Fully Visible Belief Networks

Example: Generating Images via PixelCNN and Machine Audio via WaveNet

Explicit Density-Tractable: Change of Variables Nonlinear Independent Component Analysis

Explicit Density-Intractable: Variational Autoencoders Approximation via Variational Methods

Explicit Density-Intractable: Boltzman Machine Approximation via Markov Chain

Implicit Density-Markov Chain: Generative Stochastic Network

Implicit Density-Direct: Generative Adversarial Networks

How Do Generative Adversarial Networks Work?

Example: Machine Learning and Generative Networks for High Energy Physics

Other Generative Models

Naive Bayes Classification Model

Gaussian Mixture Model

The Evolution of Generative Models

Hopfield Nets

Boltzmann Machine

Restricted Boltzmann Machine (Explicit Density and Intractable)

The Original Autoencoder

Probabilistic Language Modeling

Summary and Looking Ahead

9. Graph Models

Graphs: Nodes, Edges, and Features for Each

Example: PageRank Algorithm

Inverting Matrices Using Graphs

Cayley Graphs of Groups: Pure Algebra and Parallel Computing

Message Passing Within a Graph

The Limitless Applications of Graphs

Brain Networks

Spread of Disease

Spread of Information

Detecting and Tracking Fake News Propagation

Web-Scale Recommendation Systems

Fighting Cancer

Biochemical Graphs

Molecular Graph Generation for Drug and Protein Structure Discovery

Citation Networks

Social Media Networks and Social Influence Prediction

Sociological Structures

Bayesian Networks

Traffic Forecasting

Logistics and Operations Research

Language Models

Graph Structure of the Web

Automatically Analyzing Computer Programs

Data Structures in Computer Science

Load Balancing in Distributed Networks

Artificial Neural Networks

Random Walks on Graphs

Node Representation Learning

Tasks for Graph Neural Networks

Node Classification

Graph Classification

Clustering and Community Detection

Graph Generation

Influence Maximization

Link Prediction

Dynamic Graph Models

Bayesian Networks

A Bayesian Network Represents a Compactified Conditional Probability Table

Making Predictions Using a Bayesian Network

Bayesian Networks Are Belief Networks, Not Causal Networks

Keep This in Mind About Bayesian Networks

Chains, Forks, and Colliders

Given a Data Set, How Do We Set Up a Bayesian Network for the Involved Variables?

Graph Diagrams for Probabilistic Causal Modeling

A Brief History of Graph Theory

Main Considerations in Graph Theory

Spanning Trees and Shortest Spanning Trees

Cut Sets and Cut Vertices

Planarity

Graphs as Vector Spaces

Realizability

Coloring and Matching

Enumeration

Algorithms and Computational Aspects of Graphs

Summary and Looking Ahead

10. Operations Research

No Free Lunch

Complexity Analysis and O() Notation

Optimization: The Heart of Operations Research

Thinking About Optimization

Optimization: Finite Dimensions, Unconstrained

Optimization: Finite Dimensions, Constrained Lagrange Multipliers

Optimization: Infinite Dimensions, Calculus of Variations

Optimization on Networks

Traveling Salesman Problem

Minimum Spanning Tree

Shortest Path

Max-Flow Min-Cut

Max-Flow Min-Cost

The Critical Path Method for Project Design

The n-Queens Problem

Linear Optimization

The General Form and the Standard Form

Visualizing a Linear Optimization Problem in Two Dimensions

Convex to Linear

The Geometry of Linear Optimization

The Simplex Method

Transportation and Assignment Problems

Duality, Lagrange Relaxation, Shadow Prices, Max-Min, Min-Max, and All That

Sensitivity

Game Theory and Multiagents

Queuing

Inventory

Machine Learning for Operations Research

Hamilton-Jacobi-Bellman Equation

Operations Research for AI

Summary and Looking Ahead

11. Probability

Where Did Probability Appear in This Book?

What More Do We Need to Know That Is Essential for AI?

Causal Modeling and the Do Calculus

An Alternative: The Do Calculus

Paradoxes and Diagram Interpretations

Monty Hall Problem

Berkson’s Paradox

Simpson’s Paradox

Large Random Matrices

Examples of Random Vectors and Random Matrices

Main Considerations in Random Matrix Theory

Random Matrix Ensembles

Eigenvalue Density of the Sum of Two Large Random Matrices

Essential Math for Large Random Matrices

Stochastic Processes

Bernoulli Process

Poisson Process

Random Walk

Wiener Process or Brownian Motion

Martingale

Levy Process

Branching Process

Markov Chain

Itô’s Lemma

Markov Decision Processes and Reinforcement Learning

Examples of Reinforcement Learning

Reinforcement Learning as a Markov Decision Process

Reinforcement Learning in the Context of Optimal Control and Nonlinear Dynamics

Python Library for Reinforcement Learning

Theoretical and Rigorous Grounds

Which Events Have a Probability?

Can We Talk About a Wider Range of Random Variables?

A Probability Triple (Sample Space, Sigma Algebra, Probability Measure)

Where Is the Difficulty?

Random Variable, Expectation, and Integration

Distribution of a Random Variable and the Change of Variable Theorem

Next Steps in Rigorous Probability Theory

The Universality Theorem for Neural Networks

Summary and Looking Ahead

12. Mathematical Logic

Various Logic Frameworks

Propositional Logic

From Few Axioms to a Whole Theory

Codifying Logic Within an Agent

How Do Deterministic and Probabilistic Machine Learning Fit In?

First-Order Logic

Relationships Between For All and There Exist

Probabilistic Logic

Fuzzy Logic

Temporal Logic

Comparison with Human Natural Language

Machines and Complex Mathematical Reasoning

Summary and Looking Ahead

13. Artificial Intelligence and Partial Differential Equations

What Is a Partial Differential Equation?

Modeling with Differential Equations

Models at Different Scales

The Parameters of a PDE

Changing One Thing in a PDE Can Be a Big Deal

Can AI Step In?

Numerical Solutions Are Very Valuable

Continuous Functions Versus Discrete Functions

PDE Themes from My Ph.D. Thesis

Discretization and the Curse of Dimensionality

Finite Differences

Finite Elements

Variational or Energy Methods

Monte Carlo Methods

Some Statistical Mechanics: The Wonderful Master Equation

Solutions as Expectations of Underlying Random Processes

Transforming the PDE

Fourier Transform

Laplace Transform

Solution Operators

Example Using the Heat Equation

Example Using the Poisson Equation

Fixed Point Iteration

AI for PDEs

Deep Learning to Learn Physical Parameter Values

Deep Learning to Learn Meshes

Deep Learning to Approximate Solution Operators of PDEs

Numerical Solutions of High-Dimensional Differential Equations

Simulating Natural Phenomena Directly from Data

Hamilton-Jacobi-Bellman PDE for Dynamic Programming

PDEs for AI?

Other Considerations in Partial Differential Equations

Summary and Looking Ahead

14. Artificial Intelligence, Ethics, Mathematics, Law, and Policy

Good AI

Policy Matters

What Could Go Wrong?

From Math to Weapons

Chemical Warfare Agents

AI and Politics

Unintended Outcomes of Generative Models

How to Fix It?

Addressing Underrepresentation in Training Data

Addressing Bias in Word Vectors

Addressing Privacy

Addressing Fairness

Injecting Morality into AI

Democratization and Accessibility of AI to Nonexperts

Prioritizing High Quality Data

Distinguishing Bias from Discrimination

The Hype

Final Thoughts

Index

About the Author

Hala Nelson is an Associate Professor of Mathematics at James Madison University. She has a Ph.D. in mathematics from the Courant Institute of Mathematical Sciences at New York University. Prior to James Madison University, she was a postdoctoral assistant professor at the University of Michigan, Ann Arbor.

She specializes in mathematical modeling and consults for emergency and infrastructure services in the public sector. She likes to translate complex ideas into simple and practical terms. To her, most mathematical concepts are painless and relatable, unless the person presenting them either does not understand them very well or is trying to show off.

Other facts: Hala Nelson grew up in Lebanon during its brutal civil war. She lost her hair at a very young age in a missile explosion. This event, and many that followed, shaped her interests in human behavior, the nature of intelligence, and AI. Her dad taught her math, at home and in French, until she graduated high school. Her favorite quote from her dad about math is, “It is the one clean science”.

What makes us different?

• Instant Download

• Always Competitive Pricing

• 100% Privacy

• FREE Sample Available

• 24-7 LIVE Customer Support

Delivery Info

Reviews (0)