Detailed Syllabus and Lectures


Lecture 13: Multimodal Pretraining (slides)

feature representations for vision and language, model architectures, pre-training tasks, downstream tasks, what's next

Please study the following material in preparation for the class:

Required Reading:

Suggested Video Material:


Additional Resources:


Lecture 12: Pretraining Language Models (slides)

RNN-based language models, contextualized word embeddings, scaling up generative pretraining (GPT-1, GPT-2, GPT-3) models, masked language modeling and BERT-based models

Please study the following material in preparation for the class:

Required Reading (more s denote higher priority):

Suggested Video Material:


Additional Resources:


Lecture 11: Self-Supervised Learning (slides)

denoising autoencoder, in-painting, colorization, split-brain autoencoder, proxy tasks in computer vision: relative patch prediction, jigjaw puzzles, rotations, contrastive learning: word2vec, contrastive predictive coding, instance discrimination, current instance discrimination models

Please study the following material in preparation for the class:

Key Readings:

Suggested Video Material:


Additional Resources:


Lecture 10: Strengths and Weaknesses of Current Models (slides)

a critique of autoregressive models, flow-based models, latent variable models, implicit models, and diffusion models.

Please study the following material in preparation for the class:

Suggested Video Material:


Additional Resources:


Lecture 9: Diffusion Models (slides)

denoising diffusion models, latent diffusion models, classifier-free guidance, video diffusion models, diffusion GANs

Please study the following material in preparation for the class

Key Readings:

Suggested Video Material:


Additional Resources:


Lecture 8: Generative Adversarial Networks (slides)

implicit models, generative adversarial networks (GANs), evaluation metrics, theory behind GANs, GAN architectures, conditional GANs, cycle-consistent adversarial networks, representation learning in GANs, applications

Please study the following material in preparation for the class:

Key Readings:

Suggested Video Material:


Additional Resources:


Lecture 7: Variational Autoencoders (slides)

latent variable models, variational autoencoders, importance weighted autoencoders, variational lower bound/evidence lower bound, likelihood ratio gradients vs. reparameterization trick gradients, Beta-VAE, variational dequantization

Please study the following material in preparation for the class:

Ket Readings:

Suggested Video Material:


Additional Resources:


Lecture 6: Normalizing Flow Models (slides)

1-D flows, change of variables, autoregressive flows, inverse autoregressive flows, affine flows, RealNVP, Glow, Flow++, FFJORD, multi-scale flows, dequantization

Please study the following material in preparation for the class:

Key Readings:

Suggested Video Material:


Additional Resources:


Lecture 5: Autoregressive Models (slides)

histograms as simple generative models, parameterized distributions and maximum likelihood, Bayes’ Nets, MADE, Causal Masked Neural Models, RNN-based autoregressive models, masking-based autoregressive models

Please study the following material in preparation for the class:

Key Readings:

Suggested Video Material:


Additional Resources:


Lecture 4: Neural Building Blocks III: Attention and Transformers (slides)

content-based attention, location-based attention, soft vs. hard attention, self-attention, attention for image captioning, transformer networks

Please study the following material in preparation for the class:

Key Readings:

Suggested Video Material:


Additional Resources:


Lecture 3: Neural Building Blocks II: Sequential Processing with Recurrent Neural Networks (slides)

sequence modeling, recurrent neural networks (RNNs), RNN applications, vanilla RNN, training RNNs, long short-term memory (LSTM), LSTM variants, gated recurrent unit (GRU)

Please study the following material in preparation for the class:

Key Readings:

Suggested Video Material:


Additional Resources:


Lecture 2: Neural Building Blocks I: Spatial Processing with CNNs (slides)

deep learning, computation in a neural net, optimization, backpropagation, convolutional neural networks, residual connections, training tricks

Please study the following material in preparation for the class:

Key Readings:

Suggested Video Material:


Additional Resources:



Lecture 1: Introduction to the course (slides)

course information, unsupervised learning

Please study the following material in preparation for the class:

Key Readings: