Skip to content Skip to footer
-70%

Data Science from Scratch: First Principles with Python 2nd Edition by Joel Grus, ISBN-13: 978-1492041139

Original price was: $50.00.Current price is: $14.99.

 Safe & secure checkout

Description

Description

Data Science from Scratch: First Principles with Python 2nd Edition by Joel Grus, ISBN-13: 978-1492041139

[PDF eBook eTextbook] – Available Instantly

  • Publisher: ‎ O’Reilly Media
  • Publication date: ‎ June 11, 2019
  • Edition: ‎ 2nd
  • Language: ‎ English
  • 403 pages
  • ISBN-10: ‎ 1492041130
  • ISBN-13: ‎ 978-1492041139

To really learn data science, you should not only master the tools―data science libraries, frameworks, modules, and toolkits―but also understand the ideas and principles underlying them. Updated for Python 3.6, this second edition of Data Science from Scratch shows you how these tools and algorithms work by implementing them from scratch.

If you have an aptitude for mathematics and some programming skills, author Joel Grus will help you get comfortable with the math and statistics at the core of data science, and with the hacking skills you need to get started as a data scientist. Packed with new material on deep learning, statistics, and natural language processing, this updated book shows you how to find the gems in today’s messy glut of data.

  • Get a crash course in Python
  • Learn the basics of linear algebra, statistics, and probability―and how and when they’re used in data science
  • Collect, explore, clean, munge, and manipulate data
  • Dive into the fundamentals of machine learning
  • Implement models such as k-nearest neighbors, Naïve Bayes, linear and logistic regression, decision trees, neural networks, and clustering
  • Explore recommender systems, natural language processing, network analysis, MapReduce, and databases

Table of Contents:

Preface to the Second Edition

Conventions Used in This Book

Using Code Examples

O’Reilly Online Learning

How to Contact Us

Acknowledgments

Preface to the First Edition

Data Science

From Scratch

1. Introduction

The Ascendance of Data

What Is Data Science?

Motivating Hypothetical: DataSciencester

Finding Key Connectors

Data Scientists You May Know

Salaries and Experience

Paid Accounts

Topics of Interest

Onward

2. A Crash Course in Python

The Zen of Python

Getting Python

Virtual Environments

Whitespace Formatting

Modules

Functions

Strings

Exceptions

Lists

Tuples

Dictionaries

defaultdict

Counters

Sets

Control Flow

Truthiness

Sorting

List Comprehensions

Automated Testing and assert

Object-Oriented Programming

Iterables and Generators

Randomness

Regular Expressions

Functional Programming

zip and Argument Unpacking

args and kwargs

Type Annotations

How to Write Type Annotations

Welcome to DataSciencester!

For Further Exploration

3. Visualizing Data

matplotlib

Bar Charts

Line Charts

Scatterplots

For Further Exploration

4. Linear Algebra

Vectors

Matrices

For Further Exploration

5. Statistics

Describing a Single Set of Data

Central Tendencies

Dispersion

Correlation

Simpson’s Paradox

Some Other Correlational Caveats

Correlation and Causation

For Further Exploration

6. Probability

Dependence and Independence

Conditional Probability

Bayes’s Theorem

Random Variables

Continuous Distributions

The Normal Distribution

The Central Limit Theorem

For Further Exploration

7. Hypothesis and Inference

Statistical Hypothesis Testing

Example: Flipping a Coin

p-Values

Confidence Intervals

p-Hacking

Example: Running an A/B Test

Bayesian Inference

For Further Exploration

8. Gradient Descent

The Idea Behind Gradient Descent

Estimating the Gradient

Using the Gradient

Choosing the Right Step Size

Using Gradient Descent to Fit Models

Minibatch and Stochastic Gradient Descent

For Further Exploration

9. Getting Data

stdin and stdout

Reading Files

The Basics of Text Files

Delimited Files

Scraping the Web

HTML and the Parsing Thereof

Example: Keeping Tabs on Congress

Using APIs

JSON and XML

Using an Unauthenticated API

Finding APIs

Example: Using the Twitter APIs

Getting Credentials

For Further Exploration

10. Working with Data

Exploring Your Data

Exploring One-Dimensional Data

Two Dimensions

Many Dimensions

Using NamedTuples

Dataclasses

Cleaning and Munging

Manipulating Data

Rescaling

An Aside: tqdm

Dimensionality Reduction

For Further Exploration

11. Machine Learning

Modeling

What Is Machine Learning?

Overfitting and Underfitting

Correctness

The Bias-Variance Tradeoff

Feature Extraction and Selection

For Further Exploration

12. k-Nearest Neighbors

The Model

Example: The Iris Dataset

The Curse of Dimensionality

For Further Exploration

13. Naive Bayes

A Really Dumb Spam Filter

A More Sophisticated Spam Filter

Implementation

Testing Our Model

Using Our Model

For Further Exploration

14. Simple Linear Regression

The Model

Using Gradient Descent

Maximum Likelihood Estimation

For Further Exploration

15. Multiple Regression

The Model

Further Assumptions of the Least Squares Model

Fitting the Model

Interpreting the Model

Goodness of Fit

Digression: The Bootstrap

Standard Errors of Regression Coefficients

Regularization

For Further Exploration

16. Logistic Regression

The Problem

The Logistic Function

Applying the Model

Goodness of Fit

Support Vector Machines

For Further Investigation

17. Decision Trees

What Is a Decision Tree?

Entropy

The Entropy of a Partition

Creating a Decision Tree

Putting It All Together

Random Forests

For Further Exploration

18. Neural Networks

Perceptrons

Feed-Forward Neural Networks

Backpropagation

Example: Fizz Buzz

For Further Exploration

19. Deep Learning

The Tensor

The Layer Abstraction

The Linear Layer

Neural Networks as a Sequence of Layers

Loss and Optimization

Example: XOR Revisited

Other Activation Functions

Example: FizzBuzz Revisited

Softmaxes and Cross-Entropy

Dropout

Example: MNIST

Saving and Loading Models

For Further Exploration

20. Clustering

The Idea

The Model

Example: Meetups

Choosing k

Example: Clustering Colors

Bottom-Up Hierarchical Clustering

For Further Exploration

21. Natural Language Processing

Word Clouds

n-Gram Language Models

Grammars

An Aside: Gibbs Sampling

Topic Modeling

Word Vectors

Recurrent Neural Networks

Example: Using a Character-Level RNN

For Further Exploration

22. Network Analysis

Betweenness Centrality

Eigenvector Centrality

Matrix Multiplication

Centrality

Directed Graphs and PageRank

For Further Exploration

23. Recommender Systems

Manual Curation

Recommending What’s Popular

User-Based Collaborative Filtering

Item-Based Collaborative Filtering

Matrix Factorization

For Further Exploration

24. Databases and SQL

CREATE TABLE and INSERT

UPDATE

DELETE

SELECT

GROUP BY

ORDER BY

JOIN

Subqueries

Indexes

Query Optimization

NoSQL

For Further Exploration

25. MapReduce

Example: Word Count

Why MapReduce?

MapReduce More Generally

Example: Analyzing Status Updates

Example: Matrix Multiplication

An Aside: Combiners

For Further Exploration

26. Data Ethics

What Is Data Ethics?

No, Really, What Is Data Ethics?

Should I Care About Data Ethics?

Building Bad Data Products

Trading Off Accuracy and Fairness

Collaboration

Interpretability

Recommendations

Biased Data

Data Protection

In Summary

For Further Exploration

27. Go Forth and Do Data Science

IPython

Mathematics

Not from Scratch

NumPy

pandas

scikit-learn

Visualization

R

Deep Learning

Find Data

Do Data Science

Hacker News

Fire Trucks

T-Shirts

Tweets on a Globe

And You?

Index

Joel Grus is a research engineer at the Allen Institute for Artificial Intelligence. Previously he worked as a software engineer at Google and a data scientist at several startups. He lives in Seattle, where he regularly attends data science happy hours.

What makes us different?

• Instant Download

• Always Competitive Pricing

• 100% Privacy

• FREE Sample Available

• 24-7 LIVE Customer Support

Delivery Info

Reviews (0)