Skip to content Skip to footer
-70%

Hands-On Large Language Models: Language Understanding and Generation by Jay Alammar, ISBN-13: 978-1098150969

Original price was: $50.00.Current price is: $14.99.

 Safe & secure checkout

Description

Description

Hands-On Large Language Models: Language Understanding and Generation by Jay Alammar, ISBN-13: 978-1098150969

[PDF eBook eTextbook] – Available Instantly

  • Publisher: ‎ O’Reilly Media; 1st edition (October 15, 2024)
  • Language: ‎ English
  • 425 pages
  • ISBN-10: ‎ 1098150961
  • ISBN-13: ‎ 978-1098150969

AI has acquired startling new language capabilities in just the past few years. Driven by rapid advances in deep learning, language AI systems are able to write and understand text better than ever before. This trend is enabling new features, products, and entire industries. Through this book’s visually educational nature, readers will learn practical tools and concepts they need to use these capabilities today.

You’ll understand how to use pretrained large language models for use cases like copywriting and summarization; create semantic search systems that go beyond keyword matching; and use existing libraries and pretrained models for text classification, search, and clusterings.

This book also helps you:

  • Understand the architecture of Transformer language models that excel at text generation and representation
  • Build advanced LLM pipelines to cluster text documents and explore the topics they cover
  • Build semantic search engines that go beyond keyword search, using methods like dense retrieval and rerankers
  • Explore how generative models can be used, from prompt engineering all the way to retrieval-augmented generation
  • Gain a deeper understanding of how to train LLMs and optimize them for specific applications using generative model fine-tuning, contrastive fine-tuning, and in-context learning

Table of Contents:

Preface

An Intuition-First Philosophy

Prerequisites

Book Structure

Part I: Understanding Language Models

Part II: Using Pretrained Language Models

Part III: Training and Fine-Tuning Language Models

Hardware and Software Requirements

API Keys

Conventions Used in This Book

Using Code Examples

O’Reilly Online Learning

How to Contact Us

Acknowledgments

I. Understanding Language Models

1. An Introduction to Large Language Models

What Is Language AI?

A Recent History of Language AI

Representing Language as a Bag-of-Words

Better Representations with Dense Vector Embeddings

Types of Embeddings

Encoding and Decoding Context with Attention

Attention Is All You Need

Representation Models: Encoder-Only Models

Generative Models: Decoder-Only Models

The Year of Generative AI

The Moving Definition of a “Large Language Model”

The Training Paradigm of Large Language Models

Large Language Model Applications: What Makes Them So Useful?

Responsible LLM Development and Usage

Limited Resources Are All You Need

Interfacing with Large Language Models

Proprietary, Private Models

Open Models

Open Source Frameworks

Generating Your First Text

Summary

2. Tokens and Embeddings

LLM Tokenization

How Tokenizers Prepare the Inputs to the Language Model

Downloading and Running an LLM

How Does the Tokenizer Break Down Text?

Word Versus Subword Versus Character Versus Byte Tokens

Comparing Trained LLM Tokenizers

Tokenizer Properties

Token Embeddings

A Language Model Holds Embeddings for the Vocabulary of Its Tokenizer

Creating Contextualized Word Embeddings with Language Models

Text Embeddings (for Sentences and Whole Documents)

Word Embeddings Beyond LLMs

Using pretrained Word Embeddings

The Word2vec Algorithm and Contrastive Training

Embeddings for Recommendation Systems

Recommending Songs by Embeddings

Training a Song Embedding Model

Summary

3. Looking Inside Large Language Models

An Overview of Transformer Models

The Inputs and Outputs of a Trained Transformer LLM

The Components of the Forward Pass

Choosing a Single Token from the Probability Distribution (Sampling/Decoding)

Parallel Token Processing and Context Size

Speeding Up Generation by Caching Keys and Values

Inside the Transformer Block

Recent Improvements to the Transformer Architecture

More Efficient Attention

The Transformer Block

Positional Embeddings (RoPE)

Other Architectural Experiments and Improvements

Summary

II. Using Pretrained Language Models

4. Text Classification

The Sentiment of Movie Reviews

Text Classification with Representation Models

Model Selection

Using a Task-Specific Model

Classification Tasks That Leverage Embeddings

Supervised Classification

What If We Do Not Have Labeled Data?

Text Classification with Generative Models

Using the Text-to-Text Transfer Transformer

ChatGPT for Classification

Summary

5. Text Clustering and Topic Modeling

ArXiv’s Articles: Computation and Language

A Common Pipeline for Text Clustering

Embedding Documents

Reducing the Dimensionality of Embeddings

Cluster the Reduced Embeddings

Inspecting the Clusters

From Text Clustering to Topic Modeling

BERTopic: A Modular Topic Modeling Framework

Adding a Special Lego Block

The Text Generation Lego Block

Summary

6. Prompt Engineering

Using Text Generation Models

Choosing a Text Generation Model

Loading a Text Generation Model

Controlling Model Output

Intro to Prompt Engineering

The Basic Ingredients of a Prompt

Instruction-Based Prompting

Advanced Prompt Engineering

The Potential Complexity of a Prompt

In-Context Learning: Providing Examples

Chain Prompting: Breaking up the Problem

Reasoning with Generative Models

Chain-of-Thought: Think Before Answering

Self-Consistency: Sampling Outputs

Tree-of-Thought: Exploring Intermediate Steps

Output Verification

Providing Examples

Grammar: Constrained Sampling

Summary

7. Advanced Text Generation Techniques and Tools

Model I/O: Loading Quantized Models with LangChain

Chains: Extending the Capabilities of LLMs

A Single Link in the Chain: Prompt Template

A Chain with Multiple Prompts

Memory: Helping LLMs to Remember Conversations

Conversation Buffer

Windowed Conversation Buffer

Conversation Summary

Agents: Creating a System of LLMs

The Driving Power Behind Agents: Step-by-step Reasoning

ReAct in LangChain

Summary

8. Semantic Search and Retrieval-Augmented Generation

Overview of Semantic Search and RAG

Semantic Search with Language Models

Dense Retrieval

Reranking

Retrieval Evaluation Metrics

Retrieval-Augmented Generation (RAG)

From Search to RAG

Example: Grounded Generation with an LLM API

Example: RAG with Local Models

Advanced RAG Techniques

RAG Evaluation

Summary

9. Multimodal Large Language Models

Transformers for Vision

Multimodal Embedding Models

CLIP: Connecting Text and Images

How Can CLIP Generate Multimodal Embeddings?

OpenCLIP

Making Text Generation Models Multimodal

BLIP-2: Bridging the Modality Gap

Preprocessing Multimodal Inputs

Use Case 1: Image Captioning

Use Case 2: Multimodal Chat-Based Prompting

Summary

III. Training and Fine-Tuning Language Models

10. Creating Text Embedding Models

Embedding Models

What Is Contrastive Learning?

SBERT

Creating an Embedding Model

Generating Contrastive Examples

Train Model

In-Depth Evaluation

Loss Functions

Fine-Tuning an Embedding Model

Supervised

Augmented SBERT

Unsupervised Learning

Transformer-Based Sequential Denoising Auto-Encoder

Using TSDAE for Domain Adaptation

Summary

11. Fine-Tuning Representation Models for Classification

Supervised Classification

Fine-Tuning a Pretrained BERT Model

Freezing Layers

Few-Shot Classification

SetFit: Efficient Fine-Tuning with Few Training Examples

Fine-Tuning for Few-Shot Classification

Continued Pretraining with Masked Language Modeling

Named-Entity Recognition

Preparing Data for Named-Entity Recognition

Fine-Tuning for Named-Entity Recognition

Summary

12. Fine-Tuning Generation Models

The Three LLM Training Steps: Pretraining, Supervised Fine-Tuning, and Preference Tuning

Supervised Fine-Tuning (SFT)

Full Fine-Tuning

Parameter-Efficient Fine-Tuning (PEFT)

Instruction Tuning with QLoRA

Templating Instruction Data

Model Quantization

LoRA Configuration

Training Configuration

Training

Merge Weights

Evaluating Generative Models

Word-Level Metrics

Benchmarks

Leaderboards

Automated Evaluation

Human Evaluation

Preference-Tuning / Alignment / RLHF

Automating Preference Evaluation Using Reward Models

The Inputs and Outputs of a Reward Model

Training a Reward Model

Training No Reward Model

Preference Tuning with DPO

Templating Alignment Data

Model Quantization

Training Configuration

Training

Summary

Afterword

Index

About the Authors

Jay Alammar is Director and Engineering Fellow at Cohere (pioneering provider of large language models as an API). In this role, he advises and educates enterprises and the developer community on using language models for practical use cases). Through his popular AI/ML blog, Jay has helped millions of researchers and engineers visually understand machine learning tools and concepts from the basic (ending up in the documentation of packages like NumPy and pandas) to the cutting-edge (Transformers, BERT, GPT-3, Stable Diffusion). Jay is also a co-creator of popular machine learning and natural language processing courses on Deeplearning.ai and Udacity.

Maarten Grootendorst is a Senior Clinical Data Scientist at IKNL (Netherlands Comprehensive Cancer Organization). He holds master’s degrees in organizational psychology, clinical psychology, and data science which he leverages to communicate complex Machine Learning concepts to a wide audience. With his popular blogs, he has reached millions of readers by explaining the fundamentals of Artificial Intelligence–often from a psychological point of view. He is the author and maintainer of several open-source packages that rely on the strength of Large Language Models, such as BERTopic, PolyFuzz, and KeyBERT. His packages are downloaded millions of times and used by data professionals and organizations worldwide.

What makes us different?

• Instant Download

• Always Competitive Pricing

• 100% Privacy

• FREE Sample Available

• 24-7 LIVE Customer Support

Delivery Info

Reviews (0)