Spring 2026
COMP 646: Deep Learning for Vision and Language
Instructor: Vicente Ordóñez-Román (vicenteor at rice.edu), Office Hours: Thursdays 10am-11am.
TA: announced at the start of the semester.
Class Time: Tuesdays and Thursdays from 4pm to 5:15pm.
Course Description: Visual recognition and language understanding are two fundamental tasks in the quest toward Artificial Intelligence. In this course we will study and acquire the skills to build machine learning and deep learning models that can reason about images and text for generating image descriptions, finding objects in images, generating images from text, and other general tasks involving both text and images. On the technical side we will leverage models such as convolutional neural networks (CNNs), Transformer networks, and generative models, with emphasis on re-using multimodal foundation models.
Learning Objectives: (a) Develop intuitions about the connections between language and vision, (b) Understand concepts in representation learning for both images and text, (c) Become familiar with state-of-the-art models for tasks in vision and language, (d) Obtain practical experience in the implementation and adaptation of these models.
Prerequisites: There are no formal strict prerequisites for this class. Students should have basic knowledge of linear algebra, differential calculus, statistics and probability, and some proficiency in Python.
Schedule
| Date | Topic |
|---|---|
| Tue, Jan 13 | Introduction to vision and language #welcome |
| Thu, Jan 15 | Supervised vs unsupervised learning and linear classifiers #machine-learning |
| Tue, Jan 20 | Stochastic Gradient Descent / Generalization #machine-learning |
| Thu, Jan 22 | The Convolutional Operator and Convolutional Neural Networks #computer-vision |
|
Due
Assignment on Pytorch + Image Classification
Monday February 9, 11:59pm CT
|
|
| Tue, Jan 27 | Word Representations, Sequence Models, Language Modeling #natural-language-processing |
| Thu, Jan 29 | #Spring-Recess (No Scheduled Classes) |
|
Due
Final Project Proposal
Monday March 9, 11:59pm CT
|
|
| Tue, Feb 3 | Transformers: BERT, GPT-2, ViT, CLIP #natural-language-processing |
| Thu, Feb 5 | #Quiz |
| Tue, Feb 10 | AutoEncoders (AEs) and Variational AutoEncoders (VAEs) #generative-ai |
|
Due
Assignment on Image Question Answering
Monday March 2, 11:59pm CT
|
|
| Thu, Feb 12 | Diffusion Models #generative-ai |
Disclaimer: The professor reserves the right to make changes to the syllabus, including assignment due dates. These changes will be announced as early as possible.
Grading: Assignments: 30%, Class Project: 60%, Quiz: 10%.
Late Submission Policy: No late assignments will be accepted in this class, unless the student has procured special accommodations for warranted circumstances. If you consider this might be your case please contact the instructor directly as early as possible.
Honor Code and Academic Integrity: In this course, all students will be held to the standards of the Rice Honor Code, a code that you pledged to honor when you matriculated at this institution.