minga

Spring 2026

COMP 646: Deep Learning for Vision and Language

Instructor: Vicente Ordóñez-Román (vicenteor at rice.edu), Office Hours: Thursdays 10am-11am.

TA: announced at the start of the semester.

Class Time: Tuesdays and Thursdays from 4pm to 5:15pm.

Course Description: Visual recognition and language understanding are two fundamental tasks in the quest toward Artificial Intelligence. In this course we will study and acquire the skills to build machine learning and deep learning models that can reason about images and text for generating image descriptions, finding objects in images, generating images from text, and other general tasks involving both text and images. On the technical side we will leverage models such as convolutional neural networks (CNNs), Transformer networks, and generative models, with emphasis on re-using multimodal foundation models.

Learning Objectives: (a) Develop intuitions about the connections between language and vision, (b) Understand concepts in representation learning for both images and text, (c) Become familiar with state-of-the-art models for tasks in vision and language, (d) Obtain practical experience in the implementation and adaptation of these models.

Prerequisites: There are no formal strict prerequisites for this class. Students should have basic knowledge of linear algebra, differential calculus, statistics and probability, and some proficiency in Python.

Schedule

DateTopic
Tue, Jan 13 Introduction to vision and language #welcome
Thu, Jan 15 Supervised vs unsupervised learning and linear classifiers #machine-learning
Tue, Jan 20 Stochastic Gradient Descent / Generalization #machine-learning
Thu, Jan 22 The Convolutional Operator and Convolutional Neural Networks #computer-vision
Due Assignment on Pytorch + Image Classification Monday February 9, 11:59pm CT
Tue, Jan 27 Word Representations, Sequence Models, Language Modeling #natural-language-processing
Thu, Jan 29 #Spring-Recess (No Scheduled Classes)
Due Final Project Proposal Monday March 9, 11:59pm CT
Tue, Feb 3 Transformers: BERT, GPT-2, ViT, CLIP #natural-language-processing
Thu, Feb 5 #Quiz
Tue, Feb 10 AutoEncoders (AEs) and Variational AutoEncoders (VAEs) #generative-ai
Due Assignment on Image Question Answering Monday March 2, 11:59pm CT
Thu, Feb 12 Diffusion Models #generative-ai

Disclaimer: The professor reserves the right to make changes to the syllabus, including assignment due dates. These changes will be announced as early as possible.

Grading: Assignments: 30%, Class Project: 60%, Quiz: 10%.

Late Submission Policy: No late assignments will be accepted in this class, unless the student has procured special accommodations for warranted circumstances. If you consider this might be your case please contact the instructor directly as early as possible.

Honor Code and Academic Integrity: In this course, all students will be held to the standards of the Rice Honor Code, a code that you pledged to honor when you matriculated at this institution.