Detailed Syllabus and Lectures

Lecture 14: Pretraining for Vision and Language (slides) (video)

feature representations for vision and language, model architectures, pre-training tasks, downstream tasks, what's next

Please study the following material in preparation for the class:

Required Reading:

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks, Jiasen Lu, Dhruv Batra, Devi Parikh, Stefan Lee, NeurIPS 2019.
What Does BERT with Vision Look At?, Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang, ACL 2020.
Fusion of Detected Objects in Text for Visual Question Answering, Chris Alberti, Jeffrey Ling, Michael Collins, David Reitter, EMNLP 2019.
LXMERT: Learning Cross-Modality Encoder Representations from Transformers, Hao Tan, Mohit Bansal, EMNLP-IJCNLP 2019.
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training, Gen Li, Nan Duan, Yuejian Fang, Ming Gong, Daxin Jiang, Ming Zhou, AAAI 2020.
VL-BERT: Pre-training of Generic Visual-Linguistic Representations, Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, Jifeng Dai, ICLR 2020.
Unified Vision-Language Pre-Training for Image Captioning and VQA, Luowei Zhou, Hamid Palangi, Lei Zhang, Houdong Hu, Jason J. Corso, Jianfeng Gao, AAAI 2020.
12-in-1: Multi-Task Vision and Language Representation Learning, Jiasen Lu, Vedanuj Goswami, Marcus Rohrbach, Devi Parikh, Stefan Lee, CVPR 2020.
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks, Xiujun Li, Xi Yin, Chunyuan Li, Pengchuan Zhang, Xiaowei Hu, Lei Zhang, Lijuan Wang, Houdong Hu, Li Dong, Furu Wei, Yejin Choi, Jianfeng Gao, ECCV 2020.
Large-Scale Adversarial Training for Vision-and-Language Representation Learning, NeurIPS 2020.
VideoBERT: A Joint Model for Video and Language Representation Learning, Chen Sun, Austin Myers, Carl Vondrick, Kevin Murphy, Cordelia Schmid, ICCV 2019.
Learning Video Representations using Contrastive Bidirectional Transformer, Chen Sun, Fabien Baradel, Kevin Murphy, Cordelia Schmid, arXiv preprint arXiv:1906.05743, 2019.
End-to-End Learning of Visual Representations from Uncurated Instructional Videos, Antoine Miech, Ivan Laptev, Jean-Baptiste Alayrac, Lucas Smaira, Josef Sivic, Andrew Zisserman, CVPR 2020.
UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation, Huaishao Luo, Lei Ji, Botian Shi, Haoyang Huang, Nan Duan, Tianrui Li, Jason Li, Taroon Bharti, Ming Zhou, arXiv preprint arXiv:2002.06353.
Cross-lingual Visual Pre-training for Multimodal Machine Translation, Ozan Caglayan, Menekse Kuyu, Mustafa Sercan Amac, Pranava Madhyastha, Erkut Erdem, Aykut Erdem, Lucia Specia, EACL 2021.

Additional Resources:

Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models, Jize Cao, Zhe Gan, Yu Cheng, Licheng Yu, Yen-Chun Chen, Jingjing Liu, ECCV 2020.
Are we pretraining it right? Digging deeper into visio-linguistic pretraining, Amanpreet Singh, Vedanuj Goswami, Devi Parikh, arXiv preprint arXiv:2004.08744
Multimodal Pretraining Unmasked: Unifying the Vision and Language BERTs, Emanuele Bugliarello, Ryan Cotterell, Naoaki Okazaki, Desmond Elliott, arXiv preprint arXiv:2011.15124, 2020.

Lecture 13: Pretraining Language Models (slides) (video 1) (video 2)

RNN-based language models, contextualized word embeddings, scaling up generative pretraining (GPT-1, GPT-2, GPT-3) models, masked language modeling and BERT-based models

Please study the following material in preparation for the class:

Required Reading (more s denote higher priority):

Learned in Translation: Contextualized Word Vectors, Bryan McCann, James Bradbury, Caiming Xiong, Richard Socher, NIPS 2016.
A Neural Probabilistic Language Model, Yoshua Bengio, Réjean Ducharme, Pascal Vincent, Christian Jauvin, JMLR, Vol 3., 2003.
Recurrent neural network based language model, Tomáš Mikolov, Martin Karafiát, Lukáš Burget, Jan “Honza” Černocky, Sanjeev Khudanpur, Interspeech 2010.
Generating Text with Recurrent Neural Networks, Ilya Sutskever, James Martens, Geoffrey Hinton, ICML 2011.
Generating Sequence with Recurrent Neural Networks, A. Graves, ArXiV
Skip-Thought Vectors, Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, Antonio Torralba, Raquel Urtasun, Sanja Fidler, NIPS 2015.
Semi-supervised Sequence Learning, Andrew M. Dai, Quoc V. Le, NIPS 2015.
Exploring the Limits of Language Modeling, Rafal Jozefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, Yonghui Wu, ArXiv preprint arXiv:1602.02410, 2016.
Learning to Generate Reviews and Discovering Sentiment, Alec Radford, Rafal Jozefowicz, Ilya Sutskever, arXiv preprint 1704.01444, 2017.
Deep Contextualized Word Representations, Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer, NAACL 2018./li>
Improving Language Understanding by Generative Pre-Training, Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, OpenAI Report, 2018.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, NAACL 2019.
, Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov, ArXiv preprint ArXiv:1907.11692, 2019.
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators, Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning, ICLR 2020.
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer, Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu, JMLR 21(140), 2020.
Language Models are Unsupervised Multitask Learners, Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, OpenAI Report, 2019.
Language Models are Few-Shot Learners, Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah et al., NeurIPS 2020.

Additional Resources:

[Blog post] The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning), Jay Alammar.
[Blog post] Generalized Language Models, Lilian Weng.
A Primer in BERTology: What we know about how BERT works, Anna Rogers, Olga Kovaleva, Anna Rumshisky, TACL, Vol. 8, 2020.

Lecture 12: Self-Supervised Learning (slides) (video 1) (video 2)

denoising autoencoder, in-painting, colorization, split-brain autoencoder, proxy tasks in computer vision: relative patch prediction, jigjaw puzzles, rotations, contrastive learning: word2vec, contrastive predictive coding, instance discrimination, current instance discrimination models

Please study the following material in preparation for the class:

Required Reading:

Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion, Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, Pierre-Antoine Manzagol, Journal of Machine Learning Research 11, 2010.
Context Encoders: Feature Learning by Inpainting, Deepak Pathak, Philipp Krähenbühl, Jeff Donahue, Trevor Darrell, Alexei A. Efros, CVPR 2016.
Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction, Richard Zhang, Phillip Isola, Alexei A. Efros, CVPR 2017.
Unsupervised Visual Representation Learning by Context Prediction, Carl Doersch, Abhinav Gupta, Alexei A. Efros, ICCV 2015.
Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles, Mehdi Noroozi, Paolo Favaro, ECCV 2016.
Unsupervised representation learning by predicting image rotations, Spyros Gidaris, Praveer Singh, Nikos Komodakis, ICLR 2018.
Tracking Emerges by Colorizing Videos, Carl Vondrick, Abhinav Shrivastava, Alireza Fathi, Sergio Guadarrama, Kevin Murphy, ECCV 2018.
Efficient Estimation of Word Representations in Vector Space, Tomás Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean, ICLR Workshop Poster, 2013.
Learning Deep Representations by Mutual Information Estimation and Maximization, R Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Phil Bachman, Adam Trischler, Yoshua Bengio, ICLR 2019.
Representation Learning with Contrastive Predictive Coding, Aaron van den Oord, Yazhe Li, Oriol Vinyals, arXiv Preprint arXiv:1807:1807.03748v2, 2019
Data-Efficient Image Recognition with Contrastive Predictive Coding, Olivier Henaff et al., ICML 2020.
Momentum Contrast for Unsupervised Visual Representation Learning, Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, Ross Girshick, CVPR 2020.
A Simple Framework for Contrastive Learning of Visual Representations, Ting Chen, Simon Kornblith, Mohammad Norouzi, Geoffrey Hinton, arXiv preprint arXiv:2002.05709, 2020.
Improved Baselines with Momentum Contrastive Learning, Xinlei Chen, Haoqi Fan, Ross Girshick, Kaiming He, arXiv preprint arXiv:2003.04297, 2020
Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning, Jean-Bastien Grill et al., NeurIPS 2020.

Additional Resources:

[Blog post] Contrastive Self-Supervised Learning, Ankesh Anand.
[Blog post] Self-supervised learning: The dark matter of intelligence, Yann LeCun and Ishan Misra.
[Blog post] Self-Supervised Representation Learning, Lilian Weng.

Lecture 11: Strengths and Weaknesses of Current Models (slides) (video 1) (video 2)

a critique of autoregressive models, flow-based models, latent variable models, and implicit models

Please study the following material in preparation for the class:

Additional Resources:

Deep Generative Modelling: A Comparative Review of VAEs, GANs, Normalizing Flows, Energy-Based and Autoregressive Models, Sam Bond-Taylor, Adam Leach, Yang Long, Chris G. Willcocks, arXiv Preprint arXiv:2103.04922, 2021.
Musings on typicality, Sander Dieleman

Lecture 10: Discrete Latent Variable Models (slides) (video 1) (video 2)

REINFORCE, Gumbel-Softmax, Straight-through estimator, neural variational inference and learning, vector quantization VAE (VQ-VAE), VQ-VAE-2, VQ-GAN, discrete flows, discrete integer flows, GANs for text: SeqGAN, MaskGAN, ScratchGAN

Please study the following material in preparation for the class (more s denote higher priority):

Required Reading:

Neural Variational Inference and Learning in Belief Networks, Andriy Mnih, Karol Gregor, ICML 2014.
Neural Discrete Representation Learning, Aaron van den Oord, Oriol Vinyals, Koray Kavukcuoglu, NIPS 2017.
Generating Diverse High-Fidelity Images with VQ-VAE-2, Ali Razavi, Aäron van den Oord, Oriol Vinyals, NeurIPS 2019.
Taming Transformers for High-Resolution Image Synthesis, Patrick Esser, Robin Rombach, Björn Ommer, arXiv preprint arXiv:2012.09841, 2021.
Discrete Flows: Invertible Generative Models of Discrete Data, Dustin Tran, Keyon Vafa, Kumar Krishna Agrawal, Laurent Dinh, Ben Poole, NeurIPs 2019.
Integer Discrete Flows and Lossless Compression, Emiel Hoogeboom, Jorn W.T. Peters, Rianne van den Berg, Max Welling, NeurIPs 2019.
SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient, Lantao Yu, Weinan Zhang, Jun Wang, Yong Yu, AAAI 2017.
MaskGAN: Better Text Generation via Filling in the______, William Fedus, Ian Goodfellow, Andrew M. Dai, ICLR 2018.
Training Language GANs from Scratch, Cyprien de Masson d'Autume, Mihaela Rosca, Jack Rae, Shakir Mohamed, NeurIPS 2019.

Additional Resources:

Discrete VAE’s, John Thickstun.
Jukebox: A Generative Model for Music, Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever, arXiv preprint arXiv:2005.00341, 2019.
The Gumbel-Softmax Trick for Inference of Discrete Variables, Gonzalo Mena.

Lecture 8-9: Generative Adversarial Networks (slides) (video 1, 2, 3, 4)

implicit models, generative adversarial networks (GANs), evaluation metrics, theory behind GANs, GAN architectures, conditional GANs, cycle-consistent adversarial networks, representation learning in GANs, applications

Please study the following material in preparation for the class:

Required Reading (more s denote higher priority):

Sections 20.10.4 of the Deep Learning textbook.
Generative Adversarial Networks, Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio, NIPS 2014.
Unrolled Generative Adversarial Networks, Luke Metz, Ben Poole, David Pfau, Jascha Sohl-Dickstein, ICLR 2017./li>
A note on the evaluation of generative models, Lucas Theis, Aäron van den Oord, Matthias Bethge, ICLR 2016.
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, Alec Radford, Luke Metz, Soumith Chintala, ICLR 2016.
Improved Techniques for Training GANs, Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, Xi Chen, Xi Chen, NIPS 2016.
Wasserstein Generative Adversarial Networks, Martin Arjovsky, Soumith Chintala, Léon Bottou, ICML 2017.
Improved Training of Wasserstein GANs, Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, Aaron C. Courville, NIPS 2017.
Progressive Growing of GANs for Improved Quality, Stability, and Variation, Tero Karras, Timo Aila, Samuli Laine, Jaakko Lehtinen, ICLR 2018.
Spectral Normalization for Generative Adversarial Networks , Takeru Miyato, Toshiki Kataoka, Masanori Koyama, Yuichi Yoshida, ICLR 2018.
Self-Attention Generative Adversarial Networks, Han Zhang, Ian Goodfellow, Dimitris Metaxas, Augustus Odena, ICML 2019.
Large Scale GAN Training for High Fidelity Natural Image Synthesis, Andrew Brock, Jeff Donahue, Karen Simonyan, ICLR 2019.
A Style-Based Generator Architecture for Generative Adversarial Networks, Tero Karras, Samuli Laine, Timo Aila, CVPR 2019.
Analyzing and Improving the Image Quality of StyleGAN, Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, Timo Aila, CVPR 2020.
Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow, Xue Bin Peng, Angjoo Kanazawa, Sam Toyer, Pieter Abbeel, Sergey Levine, ICLR 2019.
Image-to-Image Translation with Conditional Adversarial Networks, Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros, CVPR 2017
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros, ICCV 2017
InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets, Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, Pieter Abbeel, NIPS 2016.
Adversarially Learned Inference, Vincent Dumoulin, Ishmael Belghazi, Ben Poole, Olivier Mastropietro, Alex Lamb, Martin Arjovsky, Aaron Courville, ICLR 2017.
Large Scale Adversarial Representation Learning, Jeff Donahue, Karen Simonyan, NeurIPS 2019.

Additional Resources:

GAN Lab, Minsuk Kahng, Nikhil Thorat, Polo Chau, Fernanda Viégas, and Martin Wattenberg, 2019.
[Blog post] A Gentle Introduction to BigGAN the Big Generative Adversarial Network , Jason Brownlee

[Blog post] GANs and Divergence Minimization, Colin Raffel.
[Blog post] From GAN to WGAN, Lilian Weng
[Blog post] An Alternative Update Rule for Generative Adversarial Networks, Ferenc Huszár
Open Questions about Generative Adversarial Networks, Distill, 2019.
Generating Videos with Scene Dynamics, Carl Vondrick, Hamed Pirsiavash, Antonio Torralba, NIPS 2016.
Adversarial Video Generation on Complex Datasets, Aidan Clark, Jeff Donahue, Karen Simonyan, arXiv preprint arXiv:1907.06571, 2019.
Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling, Jiajun Wu. Chengkai Zhang, Tianfan Xue, William T. Freeman, Joshua B. Tenenbaum, NIPS 2016.
HoloGAN: Unsupervised Learning of 3D Representations From Natural Images, Thu Nguyen-Phuoc, Chuan Li, Lucas Theis, Christian Richardt, Yong-Liang Yang, ICCV 2019.
Video-to-Video Synthesis, Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Guilin Liu, Andrew Tao, Jan Kautz, Bryan Catanzaro, NeurIPS 2018.
Everybody Dance Now, Caroline Chan, Shiry Ginosar, Tinghui Zhou, Alexei A. Efros, ICCV 2019.
StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks, Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas, ICCV 2017.
Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network, Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, Wenzhe Shi, CVPR 2017.
Context Encoders: Feature Learning by Inpainting, Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, Alexei A. Efros, CVPR 2016.
Domain Separation Networks, Konstantinos Bousmalis, George Trigeorgis, Nathan Silberman, Dilip Krishnan, Dumitru Erhan, NIPS 2016.
Semantic Image Synthesis with Spatially-Adaptive Normalization, Taesung Park, Ming-Yu Liu, Ting-Chun Wang, Jun-Yan Zhu, CVPR 2019.
Manipulating Attributes of Natural Scenes via Hallucination, Levent Karacan, Zeynep Akata, Aykut Erdem, Erkut Erdem, ACM Transactions on Graphics, November 2019, Article No: 7.
Image Synthesis in Multi-Contrast MRI with Conditional Generative Adversarial Networks, Salman Ul Hassan Dar, Mahmut Yurt, Levent Karacan, Aykut Erdem, Erkut Erdem, Tolga Çukur, IEEE Trans. Med. Imag., Vol. 38, Issue 10, pp. 2375-2388, October 2019.
Adversarial Audio Synthesis, Chris Donahue, Julian McAuley, Miller Puckette, ICLR 2019.
MaskGAN: Better Text Generation via Filling in the _______ , William Fedus, Ian Goodfellow, Andrew M. Dai, ICLR 2018.

Lecture 7: Variational Autoencoders (slides) (video 1) (video 2)

latent variable models, variational autoencoders, importance weighted autoencoders, variational lower bound/evidence lower bound, likelihood ratio gradients vs. reparameterization trick gradients, Beta-VAE, variational dequantization

Please study the following material in preparation for the class:

Required Reading (more s denote higher priority):

Sections 20.10.3 of the Deep Learning textbook.
Chapter 2 of An Introduction to Variational Autoencoders, Kingma and Welling.
Importance Weighted Autoencoders, Yuri Burda, Roger B. Grosse, Ruslan Salakhutdinov
Auto-Encoding Variational Bayes, Diederik P. Kingma, Max Welling, ICLR 2014.
Inference Suboptimality in Variational Autoencoders, Chris Cremer, Xuechen Li, David Duvenaud, ICML 2018.
beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework, Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, Alexander Lerchner, ICLR 2017.

Additional Resources:

Variational Inference lecture notes by David Blei.
[Blog post] How I learned to stop worrying and write ELBO (and its gradients) in a billion ways, Yuge Shi.
[Blog post] Intuitively Understanding Variational Autoencoders, Irhum Shafkat.
[Blog post] A Beginner's Guide to Variational Methods: Mean-Field Approximation, Eric Jang.
[Blog post] Tutorial - What is a variational autoencoder?, Jaan Altosaar
[Blog post] MusicVAE: Creating a palette for musical scores with machine learning, Adam Roberts, Jesse Engel, Colin Raffel, Ian Simon, Curtis Hawthorne
Improved Variational Inference with Inverse Autoregressive Flow, Durk P. Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, Max Welling, NIPS 2016.
PixelVAE: A Latent Variable Model for Natural Images, Ishaan Gulrajani, Kundan Kumar, Faruk Ahmed, Adrien Ali Taiga, Francesco Visin, David Vazquez, Aaron Courville, ICLR 2017.
Flow++: Improving Flow-Based Generative Models with Variational Dequantization and Architecture Design, Jonathan Ho, Xi Chen, Aravind Srinivas, Yan Duan, Pieter Abbeel, ICML 2019.

Lecture 6: Normalizing Flow Models (slides) (video 1) (video 2)

1-D flows, change of variables, autoregressive flows, inverse autoregressive flows, affine flows, RealNVP, Glow, Flow++, FFJORD, multi-scale flows, dequantization

Please study the following material in preparation for the class:

Required Reading:

NICE: Non-linear Independent Components Estimation, Laurent Dinh, David Krueger, and Yoshua Bengio, ICLR 2015.
Improved variational inference with inverse autoregressive flow, Durk P. Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, Max Welling, NIPS 2016.
Density estimation using Real NVP, Laurent Dinh, Jascha Sohl-Dickstein, Samy Bengio, ICLR 2017.
Masked Autoregressive Flow for Density Estimation, George Papamakarios, Theo Pavlakou, Iain Murray, NIPS 2017.
Neural autoregressive flows, Chin-Wei Huang, David Krueger, Alexandre Lacoste, Aaron Courville, ICML 2018.
Glow: Generative Flow with Invertible 1×1 Convolutions, Diederik P. Kingma, Prafulla Dhariwal, NeurIPS 2018.
Flow++: Improving Flow-Based Generative Models with Variational Dequantization and Architecture Design, Jonathan Ho, Xi Chen, Aravind Srinivas, Yan Duan, Pieter Abbeel, ICML 2019.
Neural Importance Sampling, Thomas Müller, Brian McWilliams, Fabrice Rousselle, Markus Gross, Jan Novák, SIGGRAPH 2019.
Ffjord: Free-form continuous dynamics for scalable reversible generative models, Will Grathwohl, Ricky T. Q. Chen, Jesse Bettencourt, Ilya Sutskever, David Duvenaud, ICLR 2019.
Residual Flows for Invertible Generative Modeling, Ricky T. Q. Chen, Jens Behrmann, David Duvenaud, Jörn-Henrik Jacobsen, NeurIPS 2019.
FloWaveNet : A Generative Flow for Raw Audio, Sungwon Kim, Sang-gil Lee, Jongyoon Song, Jaehyeon Kim, Sungroh Yoon, ICML 2019.
SRFlow: Learning the Super-Resolution Space with Normalizing Flow, Andreas Lugmayr, Martin Danelljan, Luc Van Gool, Radu Timofte, ECCV 2020.

Additional Resources:

Normalizing Flows: An Introduction and Review of Current Methods, Ivan Kobyzev, Simon J.D. Prince, and Marcus A. Brubaker, IEEE PAMI, 2020..
Normalizing Flows for Probabilistic Modeling and Inference, George Papamakarios, Eric Nalisnick, Danilo Jimenez Rezende, Shakir Mohamed, Balaji Lakshminarayanan, JMLR, 2021.
[Blog post] Glow: Better Reversible Generative Models, OpenAI
[Blog post] Normalizing Flows Tutorial, Part 1: Distributions and Determinants, Eric Jang
[Blog post] Normalizing Flows Tutorial, Part 2: Modern Normalizing Flows, Eric Jang
[Blog post] Flow-based Deep Generative Models, Lilian Weng

Lecture 5: Autoregressive Models (slides) (video 1) (video 2)

histograms as simple generative models, parameterized distributions and maximum likelihood, RNN-based autoregressive models, masking-based autoregressive models

Please study the following material in preparation for the class:

Required Reading:

Sections 20.10.5-20.10.10 of the Deep Learning textbook.
MADE: Masked Autoencoder for Distribution Estimation, Mathieu Germain, Karol Gregor, Iain Murray, Hugo Larochelle. ICML 2015.
WaveNet: A Generative Model for Raw Audio, Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, Koray Kavukcuoglu. arXiv preprint arXiv:1609.03499, 2016.
Pixel Recurrent Neural Networks, Aaron Van Oord, Nal Kalchbrenner, Koray Kavukcuoglu. ICML 2016.
Conditional Image Generation with PixelCNN Decoders, Aaron van den Oord, Nal Kalchbrenner, Lasse Espeholt, koray kavukcuoglu, Oriol Vinyals, Alex Graves, NIPS 2016.
PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications, Tim Salimans, Andrej Karpathy, Xi Chen, Diederik P. Kingma, ICLR 2017.
PixelSNAIL: An Improved Autoregressive Generative Model, XI Chen, Nikhil Mishra, Mostafa Rohaninejad, Pieter Abbeel. ICML 2018.
Fast Generation for Convolutional Autoregressive Models, Prajit Ramachandran, Tom Le Paine, Pooya Khorrami, Mohammad Babaeizadeh, Shiyu Chang, Yang Zhang, Mark A. Hasegawa-Johnson, Roy H. Campbell, Thomas S. Huang. ICLR 2017 Workshop.
Parallel Multiscale Autoregressive Density Estimation, Scott Reed, Aäron Oord, Nal Kalchbrenner, Sergio Gómez Colmenarejo, Ziyu Wang, Yutian Chen, Dan Belov, Nando Freitas. ICML 2017.
PixelCNN Models with Auxiliary Variables for Natural Image Modeling, Alexander Kolesnikov, Christoph H. Lampert. ICML 2017.
Generating High Fidelity Images with Subscale Pixel Networks and Multidimensional Upscaling, Jacob Menick, Nal Kalchbrenner. ICLR 2019.
Scaling Autoregressive Video Models, Dirk Weissenborn, Oscar Täckström, Jakob Uszkoreit. ICLR 2020.
Generating Long Sequences with Sparse Transformers, Rewon Child, Scott Gray, Alec Radford, Ilya Sutskever. arXiv preprint arXiv:1904.10509, 2019.
Natural Image Manipulation for Autoregressive Models using Fisher Scores, Wilson Yan, Jonathan Ho, Pieter Abbeel. arXiv preprint arXiv:1912.05015, 2019.
Pixel Recursive Super Resolution, Ryan Dahl, Mohammad Norouzi, Jonathon Shlens. ICCV 2017.

Additional Resources:

[Blog post] Auto-Regressive Generative Models (PixelRNN, PixelCNN++), Harshit Sharma, Saurabh Mishra

Lecture 4: Neural Building Blocks III: Attention and Transformers (slides) (video)

content-based attention, location-based attention, soft vs. hard attention, self-attention, attention for image captioning, transformer networks

Please study the following material in preparation for the class:

Required Reading:

Neural Machine Translation by Jointly Learning to Align and Translate, D. Bahdanau, K. Cho, Y. Bengio, ICLR 2015
Section 5 of Generating Sequence with Recurrent Neural Networks, A. Graves, ArXiV
Sequence Modeling with CTC, Awni Hannun, Distill, 2017
Recurrent Models of Visual Attention, V. Mnih, N. Heess, A. Graves, K. Kavukcuoglu, NIPS 2014
DRAW: a Recurrent Neural Network for Image Generation, K. Gregor, I. Danihelka, A. Graves, DJ Rezende, D. Wierstra, ICML 2015
Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, NIPS 2017

Additional Resources:

Attention and Augmented Recurrent Neural Networks, Chris Olah and Shan Carter. Distill, 2016
[Blog post] What is DRAW (Deep Recurrent Attentive Writer)?, Kevin Frans
[Blog post] The Illustrated Transformer, Jay Alammar
[Blog post] The Transformer Family, Lilian Weng
Do Transformer Modifications Transfer Across Implementations and Applications?, Sharan Narang et al., arXiv preprint arXiv:2102.11972, 2021.
Transformers in Vision: A Survey, Salman Khan, Muzammal Naseer, Munawar Hayat, Syed Waqas Zamir, Fahad Shahbaz Khan, and Mubarak Shah, arXiv preprint arXiv:2101.01169, 2021

Lecture 3: Neural Building Blocks II: Sequential Processing with Recurrent Neural Networks (slides) (video)

sequence modeling, recurrent neural networks (RNNs), RNN applications, vanilla RNN, training RNNs, long short-term memory (LSTM), LSTM variants, gated recurrent unit (GRU)

Please study the following material in preparation for the class:

Required Reading:

Chapter #10 of the Deep Learning text book.
Section 1-3 of Generating Sequence with Recurrent Neural Networks, A. Graves, ArXiV

Additional Resources:

[Blog post] Understanding LSTM Networks, Chris Olah.
[Blog post] The Unreasonable Effectiveness of Recurrent Neural Networks, Andrej Karpathy.
Learning Long-Term Dependencies with Gradient Descest is Difficult, Yoshua Bengio, Patrice Simard, and Paolo Frasconi.
Long Short-Term Memory, Sepp Hochreiter and Jürgen Schmidhuber.

Lecture 2: Neural Building Blocks I: Spatial Processing with CNNs (slides) (video)

deep learning, computation in a neural net, optimization, backpropagation, convolutional neural networks, residual connections, training tricks

Please study the following material in preparation for the class:

Required Reading:

Chapter #8 and Chapter #9 of the Deep Learning text book.

Additional Resources:

Deep Convolutional Neural Networks for Image Classification: A Comprehensive Review, Waseem Rawat and Zenghui Wang. Neural Computation, Vol. 29 , No. 9, 2017
Why Momentum Really Works, Gabrial Goh. Distill.
A guide to convolution arithmetic for deep learning, Vincent Dumoulin and Francesco Visin.
Multi-Scale Context Aggregation by Dilated Convolutions, Fisher Yu and Vladlen Koltun. ICLR 2016
High-Performance Large-Scale Image Recognition Without Normalization, Andrew Brock, Soham De, Samuel L. Smith, Karen Simonyan
[Blog post] In-layer normalization techniques for training very deep neural networks, Nikolas Adaloglou
[Blog post] Understanding Convolutions, Christopher Olah.
[Blog post] Deconvolution and Checkerboard Artifacts, Augustus Odena, Vincent Dumoulin, Chris Olah.

Lecture 1: Introduction to the course (slides) (video)

course information, unsupervised learning

Please study the following material in preparation for the class:

Required Reading:

[Blog post] Why generative modeling?, Jakub Tomczak.
The Bandwagon, Claude E. Shannon. IRE Transactions on Information Theory, Vol. 2, Issue 3, 1956

COMP547

COMP547: Deep Unsupervised Learning

Detailed Syllabus and Lectures

Lecture 14: Pretraining for Vision and Language (slides) (video)

Required Reading:

Suggested Video Material:

Additional Resources:

Lecture 13: Pretraining Language Models (slides) (video 1) (video 2)

Required Reading (more s denote higher priority):

Suggested Video Material:

Additional Resources:

Lecture 12: Self-Supervised Learning (slides) (video 1) (video 2)

Required Reading:

Suggested Video Material:

Additional Resources:

Lecture 11: Strengths and Weaknesses of Current Models (slides) (video 1) (video 2)

Suggested Video Material:

Additional Resources:

Lecture 10: Discrete Latent Variable Models (slides) (video 1) (video 2)

Required Reading:

Suggested Video Material:

Additional Resources:

Lecture 8-9: Generative Adversarial Networks (slides) (video 1, 2, 3, 4)

Required Reading (more s denote higher priority):

Suggested Video Material:

Additional Resources:

Lecture 7: Variational Autoencoders (slides) (video 1) (video 2)

Required Reading (more s denote higher priority):

Suggested Video Material:

Additional Resources:

Lecture 6: Normalizing Flow Models (slides) (video 1) (video 2)

Required Reading:

Suggested Video Material:

Additional Resources:

Lecture 5: Autoregressive Models (slides) (video 1) (video 2)

Required Reading:

Suggested Video Material:

Additional Resources:

Lecture 4: Neural Building Blocks III: Attention and Transformers (slides) (video)

Required Reading:

Suggested Video Material:

Additional Resources:

Lecture 3: Neural Building Blocks II: Sequential Processing with Recurrent Neural Networks (slides) (video)

Required Reading:

Suggested Video Material:

Additional Resources:

Lecture 2: Neural Building Blocks I: Spatial Processing with CNNs (slides) (video)

Required Reading:

Suggested Video Material:

Additional Resources:

Lecture 1: Introduction to the course (slides) (video)

Required Reading: