English | MP4 | AVC 1280×720 | AAC 44KHz 2ch | 18h 21m | 4.05 GB
Programming Generative AI is a hands-on tour of deep generative modeling, taking you from building simple feedforward neural networks in PyTorch all the way to working with large multimodal models capable of simultaneously understanding text and images. Along the way, you will learn how to train your own generative models from scratch to create an infinity of images, generate text with large language models similar to the ones that power applications like ChatGPT, write your own text-to-image pipeline to understand how prompt- based generative models actually work, and personalize large pretrained models like stable diffusion to generate images of novel subjects in unique visual styles (among other things).
Learn How To
- Train a variational autoencoder with PyTorch to learn a compressed latent space of images
- Generate and edit realistic human faces with unconditional diffusion models and SDEdit
- Use large language models such as GPT2 to generate text with Hugging Face Transformers
- Perform text-based semantic image search using multimodal models such as CLIP
- Program your own text-to-image pipeline to understand how prompt-based generative models such as Stable Diffusion actually work
- Properly evaluate generative models, both qualitatively and quantitatively
- Automatically caption images using pretrained foundation models
- Generate images in a specific visual style by efficiently fine-tuning Stable Diffusion with LoRA.
- Create personalized AI avatars by teaching pretrained diffusion models new subjects and concepts with Dreambooth.
- Guide the structure and composition of generated images using depth- and edge- conditioned ControlNets
- Perform near real-time inference with SDXL Turbo for frame-based video-to-video translation
Who Should Take This Course
Engineers and developers interested in building generative AI systems and applications
Data scientists interested in working with state-of-the-art deep learning models
Students, researchers, and academics looking for an applied or hands-on resource to complement theoretical or conceptual knowledge they may have.
Technical artists and creative coders who want to augment their creative practice
Anyone interested in working with generative AI who does not know where or how to start
Course Requirements
- Comfortable programming in Python
- Knowledge of machine learning basics
- Familiarity with deep learning and neural networks will be helpful but is not required
Lesson 1: The What, Why, and How of Generative AI
Lesson 1 starts off with an introduction to what generative AI actually is, at least as it’s relevant to this course, before moving into the specifics of deep generative modeling. It covers the plethora of possible multimodal models (in terms of input and output modalities) and how it is possible for algorithms to actually generate rich media seemingly out of thin air. The lesson wraps up with a bit of the formalization and theory of deep generative models, and the tradeoffs between the various types of generative modeling architectures.
Lesson 2: PyTorch for the Impatient
Lesson 2 begins with an introduction to PyTorch and deep learning frameworks in general. Jonathan shows you how the combination of automatic differentiation and transparent computation on GPUs have really enabled the current explosion of deep learning research. He also shows you how you can use PyTorch to implement and learn a linear regression model as a stepping stone to building much more complex neural networks. Finally, the lesson demonstrates how to combine all of the components that PyTorch provides to build a simple feedforward multi-layer perceptron.
Lesson 3: Latent Space Rules Everything Around Me
Lesson 3 starts with a primer on how computer programs actually represent images as tensors of numbers. Jonathan covers the details of convolutional neural networks and the specific architectural features that enable computers “to see”. Next, you get your first taste of latent variable models by building and training a simple autoencoder to learn a compressed representation of input images. At the end of the lesson, you encounter your first proper generative model by adding probabilistic sampling to the autoencoder architecture to arrive at the variational autoencoder (VAE)—a key component in future generative models that we will encounter.
Lesson 4: Demystifying Diffusion
Lesson 4 begins with a conceptual introduction to diffusion models, a key component in current state of the art text-to-image systems such as Stable Diffusion. Lesson 4 is your first real introduction to the Hugging Face ecosystem of open-source libraries, where you will see how we can use the Diffusers library to generate images from random noise. The lesson then slowly peels back the layers on the library to deconstruct the diffusion process and show you the specifics of how a diffusion pipeline actually works. Finally, you learn how to leverage the unique affordances of a diffusion model’s iterative denoising process to interpolate between images, perform image-to-image translation, and even restore and enhance images.
Lesson 5: Generating and Encoding Text with Transformers
Just as Lesson 4 was all about images, Lesson 5 is all about text. It starts with a conceptual introduction to the natural language processing pipeline, as well as an introduction to probabilistic models of language. You then learn how you can convert text into a representation more readily understood by generative models and explore the broader utility of representing words as vectors. The lesson ends with a treatment of the transformer architecture, where you will see how you can use the Hugging Face Transformers library to perform inference with pre-trained large language models (LLMs) to generate text from scratch.
Lesson 6: Connecting Text and Images
Lesson 6 starts off with a conceptual introduction to multimodal models and the requisite components needed. You see how contrastive language image pre-training jointly learns a shared model of images and text, and learn how that shared latent space can be used to build a semantic image search engine. The lesson ends with a conceptual overview of latent diffusion models, before deconstructing a Stable Diffusion pipeline to see precisely how text-to-image systems can turn a user supplied prompt into a never-before-seen image.
Lesson 7: Post-Training Procedures for Diffusion Models
Lesson 7 is all about adapting and augmenting existing pre-trained multimodal models. It starts with the more mundane, but exceptionally important, task of evaluating generative models before moving on to methods and techniques for parameter efficient fine tuning. You then learn how to teach a pre-trained text-to-image model such as Stable Diffusion about new styles, subjects, and conditionings. The lesson finishes with techniques to make diffusion much more efficient to approach near real-time image generation.
Table of Contents
Introduction
Programming Generative AI: Introduction
Lesson 1: The What, Why, and How of Generative AI
Topics
1.1 Generative AI in the Wild
1.2 Defining Generative AI
1.3 Multitudes of Media
1.4 How Machines Create
1.5 Formalizing Generative Models
1.6 Generative versus Discriminative Models
1.7 The Generative Modeling Trilemma
1.8 Introduction to Google Colab
Lesson 2: PyTorch for the Impatient
Topics
2.1 What Is PyTorch?
2.2 The PyTorch Layer Cake
2.3 The Deep Learning Software Trilemma
2.4 What Are Tensors, Really?
2.5 Tensors in PyTorch
2.6 Introduction to Computational Graphs
2.7 Backpropagation Is Just the Chain Rule
2.8 Effortless Backpropagation with torch.autograd
2.9 PyTorch’s Device Abstraction (i.e., GPUs)
2.10 Working with Devices
2.11 Components of a Learning Algorithm
2.12 Introduction to Gradient Descent
2.13 Getting to Stochastic Gradient Descent (SGD)
2.14 Comparing Gradient Descent and SGD
2.15 Linear Regression with PyTorch
2.16 Perceptrons and Neurons
2.17 Layers and Activations with torch.nn
2.18 Multi-layer Feedforward Neural Networks (MLP)
Lesson 3: Latent Space Rules Everything Around Me
Topics
3.1 Representing Images as Tensors
3.2 Desiderata for Computer Vision
3.3 Features of Convolutional Neural Networks
3.4 Working with Images in Python
3.5 The FashionMNIST Dataset
3.6 Convolutional Neural Networks in PyTorch
3.7 Components of a Latent Variable Model (LVM)
3.8 The Humble Autoencoder
3.9 Defining an Autoencoder with PyTorch
3.10 Setting up a Training Loop
3.11 Inference with an Autoencoder
3.12 Look Ma, No Features!
3.13 Adding Probability to Autoencoders (VAE)
3.14 Variational Inference: Not Just for Autoencoders
3.15 Transforming an Autoencoder into a VAE
3.16 Training a VAE with PyTorch
3.17 Exploring Latent Space
3.18 Latent Space Interpolation and Attribute Vectors
Lesson 4: Demystifying Diffusion
Topics
4.1 Generation as a Reversible Process
4.2 Sampling as Iterative Denoising
4.3 Diffusers and the Hugging Face Ecosystem
4.4 Generating Images with Diffusers Pipelines
4.5 Deconstructing the Diffusion Process
4.6 Forward Process as Encoder
4.7 Reverse Process as Decoder
4.8 Interpolating Diffusion Models
4.9 Image-to-Image Translation with SDEdit
4.10 Image Restoration and Enhancement
Lesson 5: Generating and Encoding Text with Transformers
Topics
5.1 The Natural Language Processing Pipeline
5.2 Generative Models of Language
5.3 Generating Text with Transformers Pipelines
5.4 Deconstructing Transformers Pipelines
5.5 Decoding Strategies
5.6 Transformers are Just Latent Variable Models for Sequences
5.7 Visualizing and Understanding Attention
5.8 Turning Words into Vectors
5.9 The Vector Space Model
5.10 Embedding Sequences with Transformers
5.11 Computing the Similarity Between Embeddings
5.12 Semantic Search with Embeddings
5.13 Contrastive Embeddings with Sentence Transformers
Lesson 6: Connecting Text and Images
Topics
6.1 Components of a Multimodal Model
6.2 Vision-Language Understanding
6.3 Contrastive Language-Image Pretraining
6.4 Embedding Text and Images with CLIP
6.5 Zero-Shot Image Classification with CLIP
6.6 Semantic Image Search with CLIP
6.7 Conditional Generative Models
6.8 Introduction to Latent Diffusion Models
6.9 The Latent Diffusion Model Architecture
6.10 Failure Modes and Additional Tools
6.11 Stable Diffusion Deconstructed
6.12 Writing Our Own Stable Diffusion Pipeline
6.13 Decoding Images from the Stable Diffusion Latent Space
6.14 Improving Generation with Guidance
6.15 Playing with Prompts
Lesson 7: Post-Training Procedures for Diffusion Models
Topics
7.1 Methods and Metrics for Evaluating Generative AI
7.2 Manual Evaluation of Stable Diffusion with DrawBench
7.3 Quantitative Evaluation of Diffusion Models with Human Preference Predictors
7.4 Overview of Methods for Fine-Tuning Diffusion Models
7.5 Sourcing and Preparing Image Datasets for Fine-Tuning
7.6 Generating Automatic Captions with BLIP-2
7.7 Parameter Efficient Fine-Tuning with LoRA
7.8 Inspecting the Results of Fine-Tuning
7.9 Inference with LoRAs for Style-Specific Generation
7.10 Conceptual Overview of Textual Inversion
7.11 Subject-Specific Personalization with Dreambooth
7.12 Dreambooth versus LoRA Fine-Tuning
7.13 Dreambooth Fine-Tuning with Hugging Face
7.14 Inference with Dreambooth to Create Personalized AI Avatars
7.15 Adding Conditional Control to Text-to-Image Diffusion Models
7.16 Creating Edge and Depth Maps for Conditioning
7.17 Depth and Edge-Guided Stable Diffusion with ControlNet
7.18 Understanding and Experimenting with ControlNet Parameters
7.19 Generative Text Effects with Font Depth Maps
7.20 Few Step Generation with Adversarial Diffusion Distillation (ADD)
7.21 Reasons to Distill
7.22 Comparing SDXL and SDXL Turbo
7.23 Text-Guided Image-to-Image Translation
7.24 Video-Driven Frame-by-Frame Generation with SDXL Turbo
7.25 Near Real-Time Inference with PyTorch Performance Optimizations
Summary
Programming Generative AI: Summary
Resolve the captcha to access the links!