Detailed Syllabus and Lectures
Lecture 11: Large Language Models (slides)
GPT-3, understanding in-context learning, scaling laws, Llama 3, other LLMs, long context models
Please study the following material in preparation for the class:
Required Reading:
- The Bitter Lesson, Rich Sutton, March 2019.
- Large Language Models: A Survey, Shervin Minaee, Tomas Mikolov, Narjes Nikzad, Meysam Chenaghlu, Richard Socher, Xavier Amatriain, Jianfeng Gao, ArXiv Preprint, 2024.
Suggested Video Material:
Additional Resources:
Lecture 10: Pretraining Language Models (slides)
introduction to language models (LMs), history of neural LMs, pretrained LMs, encoder-based, decoder-based and encoder-decoder based pretraining
Please study the following material in preparation for the class:
Required Reading:
- Deep Contextualized Word Representations, Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer, NAACL 2018.
- Improving Language Understanding by Generative Pre-Training, Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, OpenAI Report, 2018.
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, NAACL 2019.
- RoBERTa: A Robustly Optimized BERT Pretraining Approach, Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov, ArXiv preprint ArXiv:1907.11692, 2019.
- ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators, Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning, ICLR 2020.
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer, Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu, JMLR 21(140), 2020.
Suggested Video Material:
Additional Resources:
- [Blog post] The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning), Jay Alammar.
- [Blog post] Generalized Language Models, Lilian Weng.
- A Primer in BERTology: What we know about how BERT works, Anna Rogers, Olga Kovaleva, Anna Rumshisky, TACL, Vol. 8, 2020.
- Unifying Language Learning Paradigms, Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler, arXiv preprint, 2024.
Lecture 9: Graph Neural Networks (slides)
graph structured data, graph neural nets (GNNs), GNNs for ”classical” network problems
Please study the following material in preparation for the class:
Required Reading:
Suggested Video Material:
Additional Resources:
- A Practical Tutorial on Graph Neural Networks, Isaac Ronald Ward, Jack Joyner, Casey Lickfold, Yulan Guo, Mohammed Bennamoun, ACM Computing Surveys, Vol. 54, No: 10, September 2022.
- A Gentle Introduction to Graph Neural Networks, Benjamin Sanchez-Lengeling, Emily Reif, Adam Pearce, Alexander B. Wiltschko, Distill, 2021
- [Blog post] Graph Convolutional Networks, Thomas Kipf
Lecture 8: Attention and Transformers (slides)
content-based attention, location-based attention, soft vs. hard attention, self-attention, attention for image captioning, transformer networks, vision transformers
Please study the following material in preparation for the class:
Required Reading:
Suggested Video Material:
Additional Resources:
- Neural Machine Translation by Jointly Learning to Align and Translate, D. Bahdanau, K. Cho, Y. Bengio, ICLR 2015
- Sequence Modeling with CTC, Awni Hannun, Distill, 2017
- Recurrent Models of Visual Attention, V. Mnih, N. Heess, A. Graves, K. Kavukcuoglu, NIPS 2014
- DRAW: a Recurrent Neural Network for Image Generation, K. Gregor, I. Danihelka, A. Graves, DJ Rezende, D. Wierstra, ICML 2015
- Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, NIPS 2017
- [Blog post] What is DRAW (Deep Recurrent Attentive Writer)?, Kevin Frans
- [Blog post] The Transformer Family, Lilian Weng
Lecture 7: Recurrent Neural Networks (slides)
sequence modeling, recurrent neural networks (RNNs), RNN applications, vanilla RNN, training RNNs, long short-term memory (LSTM), LSTM variants, gated recurrent unit (GRU)
Please study the following material in preparation for the class:
Required Reading:
Suggested Video Material:
Additional Resources:
Lecture 6: Understanding and Visualizing Convolutional Neural Networks (slides)
transfer learning, interpretability, visualizing neuron activations, visualizing class activations, pre-images, adversarial examples, adversarial training
Please study the following material in preparation for the class:
Required Reading:
Suggested Video Material:
Additional Resources:
- [Blog post] Understanding Neural Networks Through Deep Visualization, Jason Yosinski, Jeff Clune, Anh Nguyen, Thomas Fuchs, and Hod Lipson.
- [Blog post] The Building Blocks of Interpretability, Chris Olah, Arvind Satyanarayan, Ian Johnson, Shan Carter, Ludwig Schubert, Katherine Ye and Alexander Mordvintsev.
- [Blog post] Feature Visualization, Chris Olah, Alexander Mordvintsev and Ludwin Schubert.
- [Blog post] An Overview of Early Vision in InceptionV1, Chris Olah, Nick Cammarata, Ludwig Schubert, Gabriel Goh, Michael Petrov, Shan Carter.
- [Blog post] OpenAI Microscope.
- [Blog post] Breaking Linear Classifiers on ImageNet, Andrej Karpathy.
- [Blog post] Attacking machine learning with adversarial examples, OpenAI.
Lecture 5: Convolutional Neural Networks (slides)
convolution layer, pooling layer, cnn architectures, design guidelines, semantic segmentation networks, addressing other tasks
Please study the following material in preparation for the class:
Required Reading:
Suggested Video Material:
Additional Resources:
Lecture 4: Training Deep Neural Networks (slides)
data preprocessing, weight initialization, normalization, regularization, model ensembles, dropout, optimization methods
Please study the following material in preparation for the class:
Required Reading:
Suggested Video Material:
Additional Resources:
- Stochastic Gradient Descent Tricks, Leon Bottou.
- Section 3 of Practical Recommendations for Gradient-Based Training of Deep Architectures, Yoshua Bengio.
- Troubleshooting Deep Neural Networks: A Field Guide to Fixing Your Model, Josh Tobin.
- [Blog post] Initializing neural networks, Katanforoosh & Kunin, deeplearning.ai.
- [Blog post] Parameter optimization in neural networks, Katanforoosh et al., deeplearning.ai.
- [Blog post] The Black Magic of Deep Learning - Tips and Tricks for the practitioner, Nikolas Markou.
- [Blog post] An overview of gradient descent optimization algorithms, Sebastian Ruder.
- [Blog post] Why Momentum Really Works, Gabriel Goh
Lecture 3: Multi-layer Perceptrons (slides)
feed-forward neural networks, activation functions, chain rule, backpropagation, computational graph, automatic differentiation, distributed word representations
Please study the following material in preparation for the class:
Required Reading:
Suggested Video Material:
Additional Resources:
Lecture 2: Machine Learning Overview (slides)
types of machine learning problems, linear models, loss functions, linear regression, gradient descent, overfitting and generalization, regularization, cross-validation, bias-variance tradeoff, maximum likelihood estimation
Please study the following material in preparation for the class:
Required Reading:
Suggested Video Material:
Additional Resources:
Lecture 1: Introduction to Deep Learning (slides)
course information, what is deep learning, a brief history of deep learning, compositionality, end-to-end learning, distributed representations
Please study the following material in preparation for the class:
Required Reading:
Additional Resources:
- The unreasonable effectiveness of deep learning in artificial intelligence, Terrence J. Sejnowski, PNAS, 2020.
- Deep Learning, Yann LeCun, Yoshio Bengio, Geoffrey Hinton. Nature, Vol. 521, 2015.
- Deep Learning in Neural Networks: An Overview, Juergen Schmidhuber. Neural Networks, Vol. 61, pp. 85–117, 2015.
- On the Origin of Deep Learning, Haohan Wang and Bhiksha Raj, arXiv preprint arXiv:1702.07800v4, 2017