Detailed Syllabus and Lectures

Lecture 13: Multimodal Pretraining (slides)

vision-language landscape before Transformers, vision-language pretraining, multimodal large language models

Please study the following material in preparation for the class:

Required Reading:

MM-LLMs: Recent Advances in MultiModal Large Language Models, Duzhen Zhang et al., 2024.

Additional Resources:

[Blog post] Multimodality and Large Multimodal Models, Chip Huyen.
[Blog post] Understanding Multimodal LLMs, Sebastian Raschka.

Lecture 12: Adapting LLMs (slides)

fine-tuning and fine-tuning methods, instruction tuning, learning from human feedback

Please study the following material in preparation for the class:

Required Reading:

Finetuned Language Models Are Zero-Shot Learners, Jason Wei et al., ICLR 2022.
Aligning language models to follow instructions, Ryan Lowe, Jan Leike, 2022.

Additional Resources:

LoRA: Low-Rank Adaptation of Large Language Models, Edward J Hu et al., ICLR 2022.
Towards a Unified View of Parameter-Efficient Transfer Learning, Junxian He et al., ICLR 2022.
Multitask Prompted Training Enables Zero-Shot Task Generalization, Victor Sanh et al., ICLR 2022.
QLoRA: Efficient Finetuning of Quantized LLMs, Tim Dettmers, NeurIPS 2023.
BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models, Elad Ben Zaken et al., ACL 2022.
Instruction Tuning for Large Language Models: A Survey, Shengyu Zhang et al., December 2024.
Direct Preference Optimization: Your Language Model is Secretly a Reward Model, Rafael Rafailov et al., NeurIPS 2023.

Lecture 11: Large Language Models (slides)

GPT-3, understanding in-context learning, scaling laws, Llama 3, other LLMs, long context models

Please study the following material in preparation for the class:

Required Reading:

The Bitter Lesson, Rich Sutton, March 2019.
Large Language Models: A Survey, Shervin Minaee, Tomas Mikolov, Narjes Nikzad, Meysam Chenaghlu, Richard Socher, Xavier Amatriain, Jianfeng Gao, ArXiv Preprint, 2024.

Additional Resources:

[Blog post] Language Modeling, Lena Voita.
[Blog post] How does in-context learning work? A framework for understanding the differences from traditional supervised learning, Sang Michael Xie and Sewon Min
Introducing Llama 3.1: Our most capable models to date, Meta AI
OLMo Language Models, Allen AI

Lecture 10: Pretraining Language Models (slides)

introduction to language models (LMs), history of neural LMs, pretrained LMs, encoder-based, decoder-based and encoder-decoder based pretraining

Please study the following material in preparation for the class:

Required Reading:

Deep Contextualized Word Representations, Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer, NAACL 2018.
Improving Language Understanding by Generative Pre-Training, Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, OpenAI Report, 2018.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, NAACL 2019.
RoBERTa: A Robustly Optimized BERT Pretraining Approach, Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov, ArXiv preprint ArXiv:1907.11692, 2019.
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators, Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning, ICLR 2020.
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer, Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu, JMLR 21(140), 2020.

Additional Resources:

[Blog post] The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning), Jay Alammar.
[Blog post] Generalized Language Models, Lilian Weng.
A Primer in BERTology: What we know about how BERT works, Anna Rogers, Olga Kovaleva, Anna Rumshisky, TACL, Vol. 8, 2020.
Unifying Language Learning Paradigms, Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler, arXiv preprint, 2024.

Lecture 9: Graph Neural Networks (slides)

graph structured data, graph neural nets (GNNs), GNNs for ”classical” network problems

Please study the following material in preparation for the class:

Required Reading:

Semi-Supervised Classification with Graph Convolutional Networks, Thomas Kipf, Max Welling, ICLR 2017
Relational inductive biases, deep learning, and graph networks, Peter W. Battaglia et al., arXiv Preprint arXiv:1806.01261, 2018

Additional Resources:

A Practical Tutorial on Graph Neural Networks, Isaac Ronald Ward, Jack Joyner, Casey Lickfold, Yulan Guo, Mohammed Bennamoun, ACM Computing Surveys, Vol. 54, No: 10, September 2022.
A Gentle Introduction to Graph Neural Networks, Benjamin Sanchez-Lengeling, Emily Reif, Adam Pearce, Alexander B. Wiltschko, Distill, 2021
[Blog post] Graph Convolutional Networks, Thomas Kipf

Lecture 8: Attention and Transformers (slides)

content-based attention, location-based attention, soft vs. hard attention, self-attention, attention for image captioning, transformer networks, vision transformers

Please study the following material in preparation for the class:

Required Reading:

Attention and Augmented Recurrent Neural Networks, Chris Olah and Shan Carter. Distill, 2016
[Blog post] The Illustrated Transformer, Jay Alammar
[Blog post] Transformers for Image Recognition at Scale, Neil Houlsby and Dirk Weissenborn

Additional Resources:

Neural Machine Translation by Jointly Learning to Align and Translate, D. Bahdanau, K. Cho, Y. Bengio, ICLR 2015
Sequence Modeling with CTC, Awni Hannun, Distill, 2017
Recurrent Models of Visual Attention, V. Mnih, N. Heess, A. Graves, K. Kavukcuoglu, NIPS 2014
DRAW: a Recurrent Neural Network for Image Generation, K. Gregor, I. Danihelka, A. Graves, DJ Rezende, D. Wierstra, ICML 2015
Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, NIPS 2017
[Blog post] What is DRAW (Deep Recurrent Attentive Writer)?, Kevin Frans
[Blog post] The Transformer Family, Lilian Weng

Lecture 7: Recurrent Neural Networks (slides)

sequence modeling, recurrent neural networks (RNNs), RNN applications, vanilla RNN, training RNNs, long short-term memory (LSTM), LSTM variants, gated recurrent unit (GRU)

Please study the following material in preparation for the class:

Required Reading:

Chapter #10 of the Deep Learning text book.
Section 5 of Generating Sequence with Recurrent Neural Networks, A. Graves, ArXiV

Additional Resources:

[Blog post] Understanding LSTM Networks, Chris Olah.
[Blog post] The Unreasonable Effectiveness of Recurrent Neural Networks, Andrej Karpathy.
Learning Long-Term Dependencies with Gradient Descest is Difficult, Yoshua Bengio, Patrice Simard, and Paolo Frasconi.
Long Short-Term Memory, Sepp Hochreiter and Jürgen Schmidhuber.

Lecture 6: Understanding and Visualizing Convolutional Neural Networks (slides)

transfer learning, interpretability, visualizing neuron activations, visualizing class activations, pre-images, adversarial examples, adversarial training

Please study the following material in preparation for the class:

Required Reading:

Matthew D Zeiler and Rob Fergus, Visualizing and Understanding Convolutional Networks, ECCV 2014.
Christian Szegedy et al. Intriguing properties of neural networks, arXiv preprint arXiv:1312.6199v4

Additional Resources:

[Blog post] Understanding Neural Networks Through Deep Visualization, Jason Yosinski, Jeff Clune, Anh Nguyen, Thomas Fuchs, and Hod Lipson.
[Blog post] The Building Blocks of Interpretability, Chris Olah, Arvind Satyanarayan, Ian Johnson, Shan Carter, Ludwig Schubert, Katherine Ye and Alexander Mordvintsev.
[Blog post] Feature Visualization, Chris Olah, Alexander Mordvintsev and Ludwin Schubert.
[Blog post] An Overview of Early Vision in InceptionV1, Chris Olah, Nick Cammarata, Ludwig Schubert, Gabriel Goh, Michael Petrov, Shan Carter.
[Blog post] OpenAI Microscope.
[Blog post] Breaking Linear Classifiers on ImageNet, Andrej Karpathy.
[Blog post] Attacking machine learning with adversarial examples, OpenAI.

Lecture 5: Convolutional Neural Networks (slides)

convolution layer, pooling layer, cnn architectures, design guidelines, semantic segmentation networks, addressing other tasks

Please study the following material in preparation for the class:

Required Reading:

Chapter #9 of the Deep Learning textbook.

Additional Resources:

Andrej Karpathy's CS231n notes on Convolutional Networks.
Hiroshi Kuwajima’s Memo on Backpropagation in Convolutional Neural Networks.
A guide to convolution arithmetic for deep learning, Vincent Dumoulin and Francesco Visin.
Deep Convolutional Neural Networks for Image Classification: A Comprehensive Review, Waseem Rawat and Zenghui Wang.
[Blog post] Understanding Convolutions, Christopher Olah.
[Blog post] Deconvolution and Checkerboard Artifacts, Augustus Odena, Vincent Dumoulin, Chris Olah.
[Blog post] Deep Learning for Object Detection: A Comprehensive Review, Joyce Xu.
[Blog post] A Brief History of CNNs in Image Segmentation: From R-CNN to Mask R-CNN, Dhruv Parthasarathy

Lecture 4: Training Deep Neural Networks (slides)

data preprocessing, weight initialization, normalization, regularization, model ensembles, dropout, optimization methods

Please study the following material in preparation for the class:

Required Reading:

Chapter #7 and Chapter #8 of the Deep Learning text book.

Additional Resources:

Stochastic Gradient Descent Tricks, Leon Bottou.
Section 3 of Practical Recommendations for Gradient-Based Training of Deep Architectures, Yoshua Bengio.
Troubleshooting Deep Neural Networks: A Field Guide to Fixing Your Model, Josh Tobin.
[Blog post] Initializing neural networks, Katanforoosh & Kunin, deeplearning.ai.
[Blog post] Parameter optimization in neural networks, Katanforoosh et al., deeplearning.ai.
[Blog post] The Black Magic of Deep Learning - Tips and Tricks for the practitioner, Nikolas Markou.
[Blog post] An overview of gradient descent optimization algorithms, Sebastian Ruder.
[Blog post] Why Momentum Really Works, Gabriel Goh

Lecture 3: Multi-layer Perceptrons (slides)

feed-forward neural networks, activation functions, chain rule, backpropagation, computational graph, automatic differentiation, distributed word representations

Please study the following material in preparation for the class:

Required Reading:

Chapter 6 of the Deep Learning text book.
Yoav Goldberg's A Primer on Neural Network Models for Natural Language Processing, 3 to 6
Mathieu Blondel's presentation on Automatic differentiation
[Blog post] How Backpropagation Is Able To Reduce the Time Spent on Computing Gradients

Additional Resources:

Hinton's Coursera class on Neural Networks, Lecture 1 to 3.
[Blog post] Neural Networks, Manifolds, and Topology, Christopher Olah.
[Blog post] Calculus on Computational Graphs: Backpropagation, Christopher Olah.
Chapter 16 of Jurafsky and Martin's Speech and Language Processing book (3rd Edition draft)

Lecture 2: Machine Learning Overview (slides)

types of machine learning problems, linear models, loss functions, linear regression, gradient descent, overfitting and generalization, regularization, cross-validation, bias-variance tradeoff, maximum likelihood estimation

Please study the following material in preparation for the class:

Required Reading:

Chapter 5 of the Deep Learning text book.

Additional Resources:

A few useful things to know about machine learning, P. Domingos. Communications of the ACM, 55 (10), 78-87, 2012.
The uneasy relationship between deep learning and (classical) statistics, Boaz Barak, June 2022.

Lecture 1: Introduction to Deep Learning (slides)

course information, what is deep learning, a brief history of deep learning, compositionality, end-to-end learning, distributed representations

Please study the following material in preparation for the class:

Required Reading:

Chapter 1 of the Deep Learning text book.
[Blog post] AI Winter. How Canadians contributed to end it?, Pavan Mirla.
The Bandwagon, Claude E. Shannon. IRE Transactions on Information Theory, Vol. 2, Issue 3, 1956
Chapter 1: The Philosophy and the Approach of David Marr's Vision, 1982.

Additional Resources:

The unreasonable effectiveness of deep learning in artificial intelligence, Terrence J. Sejnowski, PNAS, 2020.
Deep Learning, Yann LeCun, Yoshio Bengio, Geoffrey Hinton. Nature, Vol. 521, 2015.
Deep Learning in Neural Networks: An Overview, Juergen Schmidhuber. Neural Networks, Vol. 61, pp. 85–117, 2015.
On the Origin of Deep Learning, Haohan Wang and Bhiksha Raj, arXiv preprint arXiv:1702.07800v4, 2017

COMP541

COMP541: Deep Learning

Detailed Syllabus and Lectures

Lecture 13: Multimodal Pretraining (slides)

Required Reading:

Suggested Video Material:

Additional Resources:

Lecture 12: Adapting LLMs (slides)

Required Reading:

Suggested Video Material:

Additional Resources:

Lecture 11: Large Language Models (slides)

Required Reading:

Suggested Video Material:

Additional Resources:

Lecture 10: Pretraining Language Models (slides)

Required Reading:

Suggested Video Material:

Additional Resources:

Lecture 9: Graph Neural Networks (slides)

Required Reading:

Suggested Video Material:

Additional Resources:

Lecture 8: Attention and Transformers (slides)

Required Reading:

Suggested Video Material:

Additional Resources:

Lecture 7: Recurrent Neural Networks (slides)

Required Reading:

Suggested Video Material:

Additional Resources:

Lecture 6: Understanding and Visualizing Convolutional Neural Networks (slides)

Required Reading:

Suggested Video Material:

Additional Resources:

Lecture 5: Convolutional Neural Networks (slides)

Required Reading:

Suggested Video Material:

Additional Resources:

Lecture 4: Training Deep Neural Networks (slides)

Required Reading:

Suggested Video Material:

Additional Resources:

Lecture 3: Multi-layer Perceptrons (slides)

Required Reading:

Suggested Video Material:

Additional Resources:

Lecture 2: Machine Learning Overview (slides)

Required Reading:

Suggested Video Material:

Additional Resources:

Lecture 1: Introduction to Deep Learning (slides)

Required Reading:

Additional Resources: