Detailed Syllabus and Lectures
Lecture 11: Flow Matching (slides)
continuous normalizing flow, flow matching
Please study the following material in preparation for the class
Key Readings:
- Variational Inference with Normalizing Flows, Danilo Jimenez Rezende, Shakir Mohamed, ICML 2015.
- Rectified Flow: A Marginal Preserving Approach to Optimal Transport, Qiang Liu, ArXiv Preprint arXiv:2209.14577, 2022.
- Flow Matching for Generative Modeling, Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, Matthew Le, ICLR 2023.
- InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation, Xingchao Liu, Xiwen Zhang, Jianzhu Ma, Jian Peng, Qiang Liu, ICLR 2024.
- Improving the Training of Rectified Flows, Sangyun Lee, Zinan Lin, Giulia Fanti, NeurIPS 2024.
Suggested Video Material:
Additional Resources:
Lecture 10: Diffusion Models (slides)
denoising diffusion models, latent diffusion models, classifier-free guidance, video diffusion models, diffusion GANs
Please study the following material in preparation for the class
Key Readings:
- Denoising Diffusion Probabilistic Models, Jonathan Ho, Ajay Jain, Pieter Abbeel, NeurIPS 2020.
- Diffusion Models Beat GANs on Image Synthesis, Prafulla Dhariwal, Alex Nichol, NeurIPS 2021.
- Classifier-free diffusion guidance, Jonathan Ho and Tim Salimans, NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications.
- GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models, Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, Mark Chen, arXiv preprint arXiv:2112.10741, 2021.
- Zero-Shot Text-to-Image Generation, Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, Ilya Sutskever, ICML 2021
- Hierarchical Text-Conditional Image Generation with CLIP Latents, Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen, arXiv:2204.06125, 2022.
- Improving Image Generation with Better Captions, James Betker et al., OpenAI Technical Report, 2023.
- Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding, Chitwan Saharia et al., NeurIPS 2022.
- Denoising Diffusion Implicit Models, Jiaming Song, Chenlin Meng, Stefano Ermonm ICLR 2021.
- Progressive distillation for fast sampling of diffusion models., Tim Salimans and Jonathan Ho, ICLR 2022.
- Common Diffusion Noise Schedules and Sample Steps are Flawed, Shanchuan Lin, Bingchen Liu, Jiashi Li, Xiao Yang, WACV 2025.
- High-resolution image synthesis with latent diffusion models., Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Bjorn Ommer, CVPR 2022.
- Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets, Andreas Blattmann et al., arXiv:2311.15127, 2023.
- Cascaded diffusion models for high fidelity image generation, Jonathan Ho et al., JMLR 23, 2022.
- Imagen Video: High Definition Video Generation with Diffusion Models, Jonathan Ho et al., arXiv:2210.02303, 2022.
- Scalable Diffusion Models with Transformers, William Peebles, Saining Xie, ICCV 2023.
- Photorealistic Video Generation with Diffusion Models, Agrim Gupta, arXiv:2312.06662, 2023.
- SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations, Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, Stefano Ermon, ICLR 2025.
- Prompt-to-prompt image editing with cross attention control, Amir Hertz, et al., ICLR 2023.
- Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation, Nataniel Ruiz et al., CVPR 2023.
- Adding conditional control to text-to-image diffusion models, Lvmin Zhang, Anyi Rao, and Maneesh Agrawala, ICCV 2023.
- Tune-a-video: One-shot tuning of image diffusion models for text-to-video generation, Jay Zhangjie Wu, et al., ICCV 2023.
- Dreamix: Video diffusion models are general video editors, Eyal Modal et al., arXiv:2302.01329, 2023.
- DreamFusion: Text-to-3D using 2D Diffusion, Ben Poole et al., ICLR 2023.
- Probabilistic Adaptation of Text-to-Video Models, Sherry Yang et al., ICLR 2023.
- Tackling the Generative Learning Trilemma with Denoising Diffusion GANs, Zhisheng Xiao, Karsten Kreis, Arash Vahda, ICLR 2025.
- MobileDiffusion: Subsecond Text-to-Image Generation on Mobile Devices, Yang Zhao et al., arXiv preprint arXiv:2311.16567, 2023.
- Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers, Katherine Crowson et al., arXiv preprint arXiv:2401.11605, 2025.
Suggested Video Material:
Additional Resources:
Lecture 9: Energy and Score Based Models (slides)
energy based models, score based models
Please study the following material in preparation for the class
Key Readings:
- Learning deep energy models, Jiquan Ngiam, Zhenghao Chen, Pang W Koh, and Andrew Y Ng, ICLML 2011.
- Implicit generation and modeling with energy based models, Yilun Du, Igor Mordatch, NeurIPS 2019.
- Denoising Diffusion Probabilistic Models, Jonathan Ho, Ajay Jain, Pieter Abbeel, NeurIPS 2020.
- Deep Unsupervised Learning using Nonequilibrium Thermodynamics, Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, Surya Ganguli, ICML, 2015
- Generative Modeling by Estimating Gradients of the Data Distribution, Yang Song, Stefano Ermon, NeurIPS 2019.
Suggested Video Material:
Additional Resources:
Lecture 8: Generative Adversarial Networks (slides)
implicit models, generative adversarial networks (GANs), evaluation metrics, theory behind GANs, GAN architectures, conditional GANs, cycle-consistent adversarial networks, representation learning in GANs, applications
Please study the following material in preparation for the class:
Key Readings:
- Sections 20.10.4 of the Deep Learning textbook.
- Generative Adversarial Networks, Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio, NIPS 2014.
- Unrolled Generative Adversarial Networks, Luke Metz, Ben Poole, David Pfau, Jascha Sohl-Dickstein, ICLR 2017.
- A note on the evaluation of generative models, Lucas Theis, Aäron van den Oord, Matthias Bethge, ICLR 2016.
- On the Robustness of Quality Measures for GANs, Motasem Alfarra, Juan C. Pérez, Anna Frühstück, Philip H. S. Torr, Peter Wonka, Bernard Ghanem, arXiv Preprint arXiv:2201.13019, 2025.
- Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, Alec Radford, Luke Metz, Soumith Chintala, ICLR 2016.
- Improved Techniques for Training GANs, Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, Xi Chen, Xi Chen, NIPS 2016.
- Projected GANs Converge Faster, Axel Sauer, Kashyap Chitta, Jens Müller, Andreas Geiger, NeurIPS 2021.
- Wasserstein Generative Adversarial Networks, Martin Arjovsky, Soumith Chintala, Léon Bottou, ICML 2017.
- Improved Training of Wasserstein GANs, Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, Aaron C. Courville, NIPS 2017.
- Progressive Growing of GANs for Improved Quality, Stability, and Variation, Tero Karras, Timo Aila, Samuli Laine, Jaakko Lehtinen, ICLR 2018.
- Spectral Normalization for Generative Adversarial Networks , Takeru Miyato, Toshiki Kataoka, Masanori Koyama, Yuichi Yoshida, ICLR 2018.
- Self-Attention Generative Adversarial Networks, Han Zhang, Ian Goodfellow, Dimitris Metaxas, Augustus Odena, ICML 2019.
- Large Scale GAN Training for High Fidelity Natural Image Synthesis, Andrew Brock, Jeff Donahue, Karen Simonyan, ICLR 2019.
- A Style-Based Generator Architecture for Generative Adversarial Networks, Tero Karras, Samuli Laine, Timo Aila, CVPR 2019.
- Analyzing and Improving the Image Quality of StyleGAN, Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, Timo Aila, CVPR 2020.
- Alias-Free Generative Adversarial Networks , Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, Timo Aila, NeurIPS 2021.
- StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets, Axel Sauer, Katja Schwarz, Andreas Geiger, arXiv preprint arXiv:2202.00273, 2025.
- Self-Distilled StyleGAN: Towards Generation from Internet Photos, Ron Mokady, Michal Yarom, Omer Tov, Oran Lang, Daniel Cohen-Or, Tali Dekel, Michal Irani and Inbar Mosseri, arXiv preprint arXiv:2202.12211, 2025.
- Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow, Xue Bin Peng, Angjoo Kanazawa, Sam Toyer, Pieter Abbeel, Sergey Levine, ICLR 2019.
- Image-to-Image Translation with Conditional Adversarial Networks, Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros, CVPR 2017
- Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros, ICCV 2017
- InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets, Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, Pieter Abbeel, NIPS 2016.
- Adversarially Learned Inference, Vincent Dumoulin, Ishmael Belghazi, Ben Poole, Olivier Mastropietro, Alex Lamb, Martin Arjovsky, Aaron Courville, ICLR 2017.
- Large Scale Adversarial Representation Learning, Jeff Donahue, Karen Simonyan, NeurIPS 2019.
Suggested Video Material:
Additional Resources:
- GAN Lab, Minsuk Kahng, Nikhil Thorat, Polo Chau, Fernanda Viégas, and Martin Wattenberg, 2019.
- [Blog post] A Gentle Introduction to BigGAN the Big Generative Adversarial Network, Jason Brownlee
- [Blog post] GANs and Divergence Minimization, Colin Raffel.
- [Blog post] From GAN to WGAN, Lilian Weng
- [Blog post] An Alternative Update Rule for Generative Adversarial Networks, Ferenc Huszár
- Open Questions about Generative Adversarial Networks, Distill, 2019.
- Generating Videos with Scene Dynamics, Carl Vondrick, Hamed Pirsiavash, Antonio Torralba, NIPS 2016.
- Adversarial Video Generation on Complex Datasets, Aidan Clark, Jeff Donahue, Karen Simonyan, arXiv preprint arXiv:1907.06571, 2019.
- Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling, Jiajun Wu. Chengkai Zhang, Tianfan Xue, William T. Freeman, Joshua B. Tenenbaum, NIPS 2016.
- HoloGAN: Unsupervised Learning of 3D Representations From Natural Images, Thu Nguyen-Phuoc, Chuan Li, Lucas Theis, Christian Richardt, Yong-Liang Yang, ICCV 2019.
- Video-to-Video Synthesis, Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Guilin Liu, Andrew Tao, Jan Kautz, Bryan Catanzaro, NeurIPS 2018.
- Everybody Dance Now, Caroline Chan, Shiry Ginosar, Tinghui Zhou, Alexei A. Efros, ICCV 2019.
- StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks, Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas, ICCV 2017.
- Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network, Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, Wenzhe Shi, CVPR 2017.
- Context Encoders: Feature Learning by Inpainting, Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, Alexei A. Efros, CVPR 2016.
- Domain Separation Networks, Konstantinos Bousmalis, George Trigeorgis, Nathan Silberman, Dilip Krishnan, Dumitru Erhan, NIPS 2016.
- Semantic Image Synthesis with Spatially-Adaptive Normalization, Taesung Park, Ming-Yu Liu, Ting-Chun Wang, Jun-Yan Zhu, CVPR 2019.
- Manipulating Attributes of Natural Scenes via Hallucination, Levent Karacan, Zeynep Akata, Aykut Erdem, Erkut Erdem, ACM Transactions on Graphics, November 2019, Article No: 7.
- Image Synthesis in Multi-Contrast MRI with Conditional Generative Adversarial Networks, Salman Ul Hassan Dar, Mahmut Yurt, Levent Karacan, Aykut Erdem, Erkut Erdem, Tolga Çukur, IEEE Trans. Med. Imag., Vol. 38, Issue 10, pp. 2375-2388, October 2019.
- Adversarial Audio Synthesis, Chris Donahue, Julian McAuley, Miller Puckette, ICLR 2019.
- MaskGAN: Better Text Generation via Filling in the _______ , William Fedus, Ian Goodfellow, Andrew M. Dai, ICLR 2018.
Lecture 7: Latent Variable Models (slides)
latent variable models, variational autoencoders, importance weighted autoencoders, variational lower bound/evidence lower bound, likelihood ratio gradients vs. reparameterization trick gradients, Beta-VAE, variational dequantization
Please study the following material in preparation for the class:
Ket Readings:
- Sections 20.10.3 of the Deep Learning textbook.
- Chapter 2 of An Introduction to Variational Autoencoders, Kingma and Welling.
- Importance Weighted Autoencoders, Yuri Burda, Roger B. Grosse, Ruslan Salakhutdinov
- Auto-Encoding Variational Bayes, Diederik P. Kingma, Max Welling, ICLR 2014.
- Neural Discrete Representation Learning, Aaron van den Oord, Oriol Vinyals, Koray Kavukcuoglu, NIPS 2017.
- Generating Diverse High-Fidelity Images with VQ-VAE-2, Ali Razavi, Aäron van den Oord, Oriol Vinyals, NeurIPS 2019.
- beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework, Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, Alexander Lerchner, ICLR 2017.
Suggested Video Material:
Additional Resources:
- Variational Inference lecture notes by David Blei.
- [Blog post] How I learned to stop worrying and write ELBO (and its gradients) in a billion ways, Yuge Shi.
- [Blog post] Intuitively Understanding Variational Autoencoders, Irhum Shafkat.
- [Blog post] A Beginner's Guide to Variational Methods: Mean-Field Approximation, Eric Jang.
- [Blog post] Tutorial - What is a variational autoencoder?, Jaan Altosaar
- [Blog post] MusicVAE: Creating a palette for musical scores with machine learning, Adam Roberts, Jesse Engel, Colin Raffel, Ian Simon, Curtis Hawthorne
- Discrete VAE’s, John Thickstun.
- Jukebox: A Generative Model for Music, Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever, arXiv preprint arXiv:2005.00341, 2019.
- Improved Variational Inference with Inverse Autoregressive Flow, Durk P. Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, Max Welling, NIPS 2016.
- PixelVAE: A Latent Variable Model for Natural Images, Ishaan Gulrajani, Kundan Kumar, Faruk Ahmed, Adrien Ali Taiga, Francesco Visin, David Vazquez, Aaron Courville, ICLR 2017.
- Flow++: Improving Flow-Based Generative Models with Variational Dequantization and Architecture Design, Jonathan Ho, Xi Chen, Aravind Srinivas, Yan Duan, Pieter Abbeel, ICML 2019.
Lecture 6: Normalizing Flow Models (slides)
1-D flows, change of variables, autoregressive flows, inverse autoregressive flows, affine flows, RealNVP, Glow, TarFlow, Flow++, FFJORD, multi-scale flows, dequantization
Please study the following material in preparation for the class:
Key Readings:
- NICE: NICE: Non-linear Independent Components Estimation, Laurent Dinh, David Krueger, and Yoshua Bengio, ICLR 2015.
- IAF: Improved variational inference with inverse autoregressive flow, Durk P. Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, Max Welling, NIPS 2016.
- RealNVP: Density estimation using Real NVP, Laurent Dinh, Jascha Sohl-Dickstein, Samy Bengio, ICLR 2017.
- Masked Autoregressive Flow for Density Estimation, George Papamakarios, Theo Pavlakou, Iain Murray, NIPS 2017.
- Neural autoregressive flows, Chin-Wei Huang, David Krueger, Alexandre Lacoste, Aaron Courville, ICML 2018.
- Glow: Generative Flow with Invertible 1×1 Convolutions, Diederik P. Kingma, Prafulla Dhariwal, NeurIPS 2018.
- Flow++L Flow++: Improving Flow-Based Generative Models with Variational Dequantization and Architecture Design, Jonathan Ho, Xi Chen, Aravind Srinivas, Yan Duan, Pieter Abbeel, ICML 2019.
- Neural Importance Sampling, Thomas Müller, Brian McWilliams, Fabrice Rousselle, Markus Gross, Jan Novák, SIGGRAPH 2019.
- Ffjord: Ffjord: Free-form continuous dynamics for scalable reversible generative models, Will Grathwohl, Ricky T. Q. Chen, Jesse Bettencourt, Ilya Sutskever, David Duvenaud, ICLR 2019.
- Residual Flows for Invertible Generative Modeling, Ricky T. Q. Chen, Jens Behrmann, David Duvenaud, Jörn-Henrik Jacobsen, NeurIPS 2019.
- MintNet: MintNet: Building Invertible Neural Networks with Masked Convolutions, Yang Song, Chenlin Meng, Stefano Ermon, NeurIPS 2019.
- SRFLow: SRFlow: Learning the Super-Resolution Space with Normalizing Flow, Andreas Lugmayr, Martin Danelljan, Luc Van Gool, Radu Timofte, ECCV 2020.
- Continuous Language Generative Flow, Zineng Tang, Shiyue Zhang, Hyounghun Kim, Mohit Bansal. ACL 2021.
- FloWaveNet: FloWaveNet : A Generative Flow for Raw Audio, Sungwon Kim, Sang-gil Lee, Jongyoon Song, Jaehyeon Kim, Sungroh Yoon, ICML 2019.
- Go with the Flows: Mixtures of Normalizing Flows for Point Cloud Generation and Reconstruction, Janis Postels, Mengya Liu, Riccardo Spezialetti, Luc Van Gool, Federico Tombari, 3DV 2021.
- TarFlow: Normalizing Flows are Capable Generative Models. ArXiv Preprint arXiv:2412.06329v2, December 2024.
Suggested Video Material:
Additional Resources:
- Normalizing Flows: An Introduction and Review of Current Methods, Ivan Kobyzev, Simon J.D. Prince, and Marcus A. Brubaker, IEEE PAMI, 2021.
- Normalizing Flows for Probabilistic Modeling and Inference, George Papamakarios, Eric Nalisnick, Danilo Jimenez Rezende, Shakir Mohamed, Balaji Lakshminarayanan, JMLR, 2021.
- [Blog post] Glow: Better Reversible Generative Models, OpenAI
- [Blog post] Normalizing Flows Tutorial, Part 1: Distributions and Determinants, Eric Jang
- [Blog post] Normalizing Flows Tutorial, Part 2: Modern Normalizing Flows, Eric Jang
- [Blog post] Flow-based Deep Generative Models, Lilian Weng
Lecture 5: Autoregressive Models (slides)
histograms as simple generative models, parameterized distributions and maximum likelihood, Bayes’ Nets, MADE, Causal Masked Neural Models, RNN-based autoregressive models, masking-based autoregressive models
Please study the following material in preparation for the class:
Key Readings:
- Sections 2.1-2.3, 3.1-3.3 of the Deep Generative Modeling textbook.
- Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks, Y. Bengio and S. Bengio. NIPS 1999.
- char-rnn: http://karpathy.github.io/2015/05/21/rnn-effectiveness/
- MADE: MADE: Masked Autoencoder for Distribution Estimation, Mathieu Germain, Karol Gregor, Iain Murray, Hugo Larochelle. ICML 2015.
- WaveNet: WaveNet: A Generative Model for Raw Audio, Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, Koray Kavukcuoglu. arXiv preprint arXiv:1609.03499, 2016.
- PixelCNN: Pixel Recurrent Neural Networks, Aaron Van Oord, Nal Kalchbrenner, Koray Kavukcuoglu. ICML 2016.
- Gated PixelCNN: Conditional Image Generation with PixelCNN Decoders, Aaron van den Oord, Nal Kalchbrenner, Lasse Espeholt, koray kavukcuoglu, Oriol Vinyals, Alex Graves, NIPS 2016.
- PixelCNN++: PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications, Tim Salimans, Andrej Karpathy, Xi Chen, Diederik P. Kingma, ICLR 2017.
- PixelSNAIL: PixelSNAIL: An Improved Autoregressive Generative Model, XI Chen, Nikhil Mishra, Mostafa Rohaninejad, Pieter Abbeel. ICML 2018.
- Fast PixelCNN++: Fast Generation for Convolutional Autoregressive Models, Prajit Ramachandran, Tom Le Paine, Pooya Khorrami, Mohammad Babaeizadeh, Shiyu Chang, Yang Zhang, Mark A. Hasegawa-Johnson, Roy H. Campbell, Thomas S. Huang. ICLR 2017 Workshop.
- Multiscale PixelCNN: Parallel Multiscale Autoregressive Density Estimation, Scott Reed, Aäron Oord, Nal Kalchbrenner, Sergio Gómez Colmenarejo, Ziyu Wang, Yutian Chen, Dan Belov, Nando Freitas. ICML 2017.
- Grayscale PixelCNN: PixelCNN Models with Auxiliary Variables for Natural Image Modeling, Alexander Kolesnikov, Christoph H. Lampert. ICML 2017.
- Subscale Pixel Network: Generating High Fidelity Images with Subscale Pixel Networks and Multidimensional Upscaling, Jacob Menick, Nal Kalchbrenner. ICLR 2019.
- Scaling Autoregressive Video Models, Dirk Weissenborn, Oscar Täckström, Jakob Uszkoreit. ICLR 2020.
- Sparse Attention: Generating Long Sequences with Sparse Transformers, Rewon Child, Scott Gray, Alec Radford, Ilya Sutskever. arXiv preprint arXiv:1904.10509, 2019.
- PixelCNN Super Resolution: Pixel Recursive Super Resolution, Ryan Dahl, Mohammad Norouzi, Jonathon Shlens. ICCV 2017.
- Colorization Transformer: Colorization Transformer, Manoj Kumar, Dirk Weissenborn, Nal Kalchbrenner, ICLR 2021.
- PixelTransformer: PixelTransformer: Sample Conditioned Signal Generation, Shubham Tulsiani. Abhinav Gupta, ICML 2021.
- GPT-1: Improving Language Understanding by Generative Pre-Training, Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, OpenAI Report, 2018.
- GPT-2: Language Models are Unsupervised Multitask Learners, Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, OpenAI Report, 2019.
- GPT-3: Language Models are Few-Shot Learners, Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah et al., NeurIPS 2020.
- iGPT: , Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever, ICML 2020.
- VQ-VAE: Neural Discrete Representation Learning, Aaron van den Oord, Oriol Vinyals, Koray Kavukcuoglu, NIPS 2017.
- VQ-GAN: Taming transformers for high-resolution image synthesis, Patrick Esser, Robin Rombach, Björn Ommer, CVPR 2021.
- MAGVIT-v2: Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation, Lijun Yu, José Lezama, Nitesh B. Gundavarapu et al., ICLR 2024.
- TiTok: An Image is Worth 32 Tokens for Reconstruction and Generation, Qihang Yu, Mark Weber, Xueqing Deng, Xiaohui Shen, Daniel Cremers, Liang-Chieh Chen, NeurIPS 2024.
- VAR: Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction, K. Tian et al., NeurIPS 2024.
- VideoPoet: VideoPoet: A Large Language Model for Zero-Shot Video Generation, Dan Kondratyuk, Lijun Yu, Xiuye Gu et al., arXiv Preprint arXiv:2312.14125, 2023.
- S4: Efficiently Modeling Long Sequences with Structured State Spaces, Albert Gu, Karan Goel, Christopher Ré, ICLR 2022.
- Linear Attention: Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention, Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, François Fleuret, ICML 2020.
- FSQ: Finite Scalar Quantization: VQ-VAE Made Simple, Fabian Mentzer, David Minnen, Eirikur Agustsson, Michael Tschannen, ICLR 2025.
- Gumbel-Softmax: Categorical reparameterization with gumbel-softmax, Eric Jang, Shixiang Gu, Ben Poole, ICLR 2017.
- Concrete Distribution: The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables, Chris J. Maddison, Andriy Mnih, Yee Whye Teh, ICLR 2017.
- Image Transformer: Image transformerNiki Parmar, Ashish Vaswani, Jakob Uszkoreit, Łukasz Kaiser, Noam Shazeer, Alexander Ku, Dustin Tran, ICML 2018.
- Sparse Transformer: Generating Long Sequences with Sparse Transformers, Rewon Child, Scott Gray, Alec Radford, Ilya Sutskever,arXiv preprint arXiv:1904.10509, 2019.
- LVM: Sequential Modeling Enables Scalable Learning for Large Vision Models, Yutong Bai, Xinyang Geng, Karttikeya Mangalam, Amir Bar, Alan Yuille, Trevor Darrell, Jitendra Malik, Alexei A Efros, arXiv Preprint arXiv:2312.00785, 2023.
Suggested Video Material:
Additional Resources:
Lecture 4: Neural Building Blocks III: Attention and Transformers (slides)
content-based attention, location-based attention, soft vs. hard attention, self-attention, attention for image captioning, transformer networks
Please study the following material in preparation for the class:
Key Readings:
- Neural Machine Translation by Jointly Learning to Align and Translate, D. Bahdanau, K. Cho, Y. Bengio, ICLR 2015
- Section 5 of Generating Sequence with Recurrent Neural Networks, A. Graves, ArXiV
- Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, NIPS 2017
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby. ICLR 2021.
Suggested Video Material:
Additional Resources:
- Attention and Augmented Recurrent Neural Networks, Chris Olah and Shan Carter. Distill, 2016
- [Blog post] The Illustrated Transformer, Jay Alammar
- [Blog post] The Transformer Family, Lilian Weng
- Do Transformer Modifications Transfer Across Implementations and Applications?, Sharan Narang et al., arXiv preprint arXiv:2102.11972, 2021.
- Transformers in Vision: A Survey, Salman Khan, Muzammal Naseer, Munawar Hayat, Syed Waqas Zamir, Fahad Shahbaz Khan, and Mubarak Shah, arXiv preprint arXiv:2101.01169, 2021
Lecture 3: Neural Building Blocks II: Sequential Processing with Recurrent Neural Networks (slides)
sequence modeling, recurrent neural networks (RNNs), RNN applications, vanilla RNN, training RNNs, long short-term memory (LSTM), LSTM variants, gated recurrent unit (GRU)
Please study the following material in preparation for the class:
Key Readings:
Suggested Video Material:
- Efstratios Gavves and Max Welling's Lecture 8
Additional Resources:
Lecture 2: Neural Building Blocks I: Spatial Processing with CNNs (slides)
deep learning, computation in a neural net, optimization, backpropagation, convolutional neural networks, residual connections, training tricks
Please study the following material in preparation for the class:
Key Readings:
Suggested Video Material:
Additional Resources:
- Deep Convolutional Neural Networks for Image Classification: A Comprehensive Review, Waseem Rawat and Zenghui Wang. Neural Computation, Vol. 29 , No. 9, 2017
- Why Momentum Really Works, Gabrial Goh. Distill.
- A guide to convolution arithmetic for deep learning, Vincent Dumoulin and Francesco Visin.
- Multi-Scale Context Aggregation by Dilated Convolutions, Fisher Yu and Vladlen Koltun. ICLR 2016
- High-Performance Large-Scale Image Recognition Without Normalization, Andrew Brock, Soham De, Samuel L. Smith, Karen Simonyan
- [Blog post] In-layer normalization techniques for training very deep neural networks, Nikolas Adaloglou
- [Blog post] Understanding Convolutions, Christopher Olah.
- [Blog post] Deconvolution and Checkerboard Artifacts, Augustus Odena, Vincent Dumoulin, Chris Olah.
Lecture 1: Introduction to the course (slides)
course information, unsupervised learning
Please study the following material in preparation for the class:
Key Readings: