CS 4476: Introduction to Computer Vision, Fall 2018

College of Computing, Georgia Tech

Class meets Tue, Thu 4:30-5:45pm, Clough Commons 144

Piazza : https://piazza.com/gatech/fall2018/cs4476/home
Canvas : https://gatech.instructure.com/courses/26437


  • [New] Please make sure to go through the (updated) project late day policy.
  • Do not forget to register for the class on Piazza -- 1% of the participation grade reserved for the same.
  • Welcome to the course!

"To see is to know what is where by looking"   -- David Marr

Over the past few decades, machines have come a long way in their ability to "see". Some examples are autonomous navigators such as self-driving cars, medical imaging technologies, image search engines, face detection and recognition systems in apps, aids for the visually impaired, control-free video games, and industrial automation systems.

In this introductory Computer Vision course, we will learn how to "teach machines to see". We will explore several fundamental concepts including image formation, feature detection, segmentation, multiple view geometry, recognition, and video processing. We will use these concepts to build applications that aid machines to see the world around them.


No prior experience with computer vision is assumed, although previous knowledge of visual computing or signal processing will be helpful. The following skills are necessary for this class:

  • Data structures: You'll be writing code that builds representations of images, features, and geometric constructions.
  • Programming: Assignments are to be completed and graded either in Python or Matlab.
  • Math: Linear algebra, vector calculus, and probability. Linear algebra is the most important and students who have not taken a linear algebra course have struggled in the past.

Teaching Staff

Devi Parikh (Instructor)
Arjun Chandrasekaran (TA)
Samyak Datta (TA)
Nidhi Menon (TA)
Office Hours
Office Hours
Office Hours
Office Hours
Right after Devi's lectures
(most Tues, Thu 5:45-6:15pm)
Mon, 9:00-10:00am
(CCB, level 2 lobby)
Wed, 4:30-5:30pm
(CCB, level 2 lobby)
Thu, 3:00-4:00pm
(CCB, level 1 common area)

Tanmay Binaykiya (TA)
Ryan LaFleur (TA)
Justin Yao (TA)
Office Hours
Office Hours
Office Hours
Thu, 9:30-10:30am
(CCB, level 1 common area)
Wed, 11am-12pm
(CCB, level 1 common area)
Friday, 11am-12pm
(CCB, level 1 common area)


Problem Sets (65% of final grade): You will be given 6 problem sets, one approximately every two weeks. These will involve a combination of conceptual questions and programming problems. The programming problems will provide hands-on experience working with techniques covered in or related to the lectures. All code and written responses must be completed individually and submitted to Canvas. Most problem sets will take significant time to complete. Please start early. Problem Set 0 (PS0) will be worth 5% the final grade, and the remaining 5 problem sets will be worth 12% each.

Project (30% of final grade): Your project can be about applying any of the techniques we studied in class to real world problems. You can also extend a technique, or empirically analyze it. Comparisons between two approaches are also welcome. It is wonderful if you design and evaluate a novel approach to an important existing or new vision problem. Be creative! Students are allowed to use existing code for their projects. However, do make sure that you add proper references/citations to any existing code that you might use (for e.g. links to GitHub repos etc.). Also, students are expected to do a substantial amount of work to build on top of the existing implementations (if used) and clearly delineate the scope of what was available v/s what was done by them as part of their project updates. You must work in teams of 4-6. Students should maintain a nice, professional looking, visual, self-contained webpage describing their project. We will link to all project pages from the class webpage. The following are deliverables for your project. All the deliverables (including the proposal) are to be submitted via the project web-page. The webpage source files should be added to a ZIP folder and uploaded to Canvas.

  • Proposal (20% of project grade): A description of the following (to be submitted via the project web-page):
    • Problem statement: Clearly state the goal of your project. When someone uses your system, what is the expected input to the system, and what is the desired output?
    • Approach: Describe the technical approach you plan to employ.
    • Experiments and results: Describe the experimental setup you will follow, which datasets you will use, which existing code you will exploit, what you will implement yourself, and what you would define as a success for the project. If you plan on collecting your own data, describe what data collection protocol you will follow. Provide a list of experiments you will perform. Describe what you expect the experiments to reveal, or what is uncertain about the potential outcomes.
  • Two Project Updates (50% of project grade, 25% each): There will be two updates: a mid-term and a final update (both to be submitted via the project web-page). Here is an outline of what the project web-page is supposed to cover.
    • Abstract: One or two sentences on the motivation behind the problem you are solving. One or two sentences describing the approach you took. One or two sentences on the main result you obtained.
    • Teaser figure: A figure that conveys the main idea behind the project or the main application being addressed.
    • Introduction: Motivation behind the problem you are solving, what applications it has, any brief background on the particular domain you are working in (if not regular RBG photographs), etc. If you are using a new way to solve an existing problem, briefly mention and describe the existing approaches and tell us how your approach is new.
    • Approach: Describe very clearly and systematically your approach to solve the problem. Tell us exactly what existing implementations you used to build your system. Tell us what obstacles you faced and how you addressed them. Justify any design choices or judgment calls you made in your approach.
    • Experiments and results: Provide details about the experimental set up (number of images/videos, number of datasets you experimented with, train/test split if you used machine learning algorithms, etc.). Describe the evaluation metrics you used to evaluate how well your approach is working. Include clear figures and tables, as well as illustrative qualitative examples if appropriate. Be sure to include obvious baselines to see if your approach is doing better than a naive approach (e.g. for classification accuracy, how well would a classifier do that made random decisions?). Also discuss any parameters of your algorithms, and tell us how you set the values of those parameters. You can also show us how the performance varies as you change those parameter values. Be sure to discuss any trends you see in your results, and explain why these trends make sense. Are the results as expected? Why?
    • Qualitative results: Show several visual examples of inputs/outputs of your system (success cases and failures) that help us better understand your approach.
    • Conclusion and future work: Conclusion would likely make the same points as the abstract. Discuss any future ideas you have to make your approach better.
    • References: List out all the references you have used for your work.

    Here are some examples for your reference.
    • See this for a webpage template.
    • See this for an example of a nice, professional looking page.
    • See this for an example of how to lay out the various details of your project. You may need to provide more details than this, because you will not be submitting an associated paper to accompany the webpage. So the page should be self-contained.
  • Project Video (30% of final grade): Teams will prepare a 1 min. YouTube video summarizing the project. The video is a teaser to convey the main points, and gain the viewer's interest in wanting to know more. It should be understandable by anyone familiar with Computer Vision. The YouTube link to the video will be submitted as an assignment via Canvas. Here are some example videos for your reference.

Participation and attendance (5% of final grade): Participation in class and regular attendance is expected. If for whatever reason you are absent, it is your responsibility to find out what you missed that day. Also, 1% of the participation grade is reserved for enrolling for the class on Piazza.

Due Dates: All problem sets/reports are to be submitted by the due date noted on the assignment. Deadlines are firm. Anything from 1 second to 24 hours is one day late.

Late Day policy: Throughout the term you have an allowance of four free late days for your submissions, meaning you can accrue up to four days in late submissions with no penalty. For example, you could turn in one assignment four days late, or two assignments, project proposal and the project report webpage each one day late. Once you have used all your free late days, a late submission will not be accepted and will be awarded 0 credit. Please plan ahead so you can spend your late days wisely. In particular, note that we expect you will find the earlier assignments easier than those later in the course. We will count a full additional day as having passed for submissions 1 minute to 24 hours late.

[New] To be fair to all students in a team, we are now (starting, 11/8) providing the option of using 2 project late days. The project late days can only be used by a project team as a whole — i.e., it can be used by the project team for project-related deliverables. These cannot be used by individual students for assignments. If the team has used up their 2 late days, then the entire team will get a 0 on their late submission. Note that there are now two types of late days — the 2 project late days are for teams, and the 4 individual late days are for individual students. All the teams who have submitted their mid-term project update > 10 minutes late, will be considered to have used up one of their project late days.

Textbook: Computer Vision: Algorithms and Applications, by Rick Szeliski. An electronic copy is available free online here. Some background reading on object recognition is from Kristen Grauman and Bastian Leibe's short book on Visual Object Recognition.

Schedule (tentative)

Note that all deliverables are due at 11:59 pm on their respective due dates as mentioned on the schedule.

Date Topic Readings and Links Lectures Deliverables
Tue, 8/21 Course Intro Sec 1.1-1.3 Intro [ppt] PS0 out
Thu, 8/23 Sec 2.3.2 Color [ppt]
Mon, 8/27 PS0 due
Tue, 8/28 Sec 2.1.1, 2.1.2, 6.1.1 Alignment and 2D image transformations [ppt]
Thu, 8/30 Multiple views and motion Sec 3.6.1, 6.1.4 Homography and image warping [ppt]
Notes on homography matrix
PS1 out
Tue, 9/4 Sec 11.1.1, 11.2-11.5 Image Formation [ppt]
(Guest Lecture : Peter Anderson)
Thu, 9/6 No class
Tue, 9/11 Sec 11.1.1, 11.2-11.5 Epipolar geometry and stereo [ppt]
(Guest Lecture : Peter Anderson)
Wed, 9/12 PS1 due
Thu, 9/13 Structure from motion [ppt]
(Guest Lecture : Peter Anderson)
PS2 out
Tue, 9/18 Features and Filters Sec 3.1.1-2, 3.2 Linear Filters [ppt]
Thu, 9/20 Sec 3.2.3, 4.2 Gradients [ppt]
Tue, 9/25 Sec 3.3.2-4 Edges and binary image analysis [ppt]
Thu, 9/27 Sec 10.5 Texture [ppt]
Mon, 10/1 PS2 due
Tue, 10/2 Sec 4.1 Local invariant features (Part 1) [ppt] PS3 out
Thu, 10/4 Sec 4.1 Local invariant features (Part 2) [ppt]
Tue, 10/9 No class (Fall Break)
Wed, 10/10 Project Proposal due
Thu, 10/11 Grouping and Fitting Sec 5.2-5.4 Segmentation and Clustering [ppt]
(Guest Lecture : Peter Anderson)
Tue, 10/16 Sec 4.3.2 Hough Transform [ppt]
Thu, 10/18 Sec 5.1.1 Deformable Contours [ppt]
Tue, 10/23 Recognition Sec 14.3 Indexing local features and instance recognition [ppt]
Wed, 10/24 PS3 due
Thu, 10/25 Sec 14.1 Intro to category recognition [ppt] PS4 out
Tue, 10/30 Sec 14.1 Face Detection [ppt]
Wed, 10/31 Mid-term project update due
Thu, 11/1 Sec 14.3 Discriminative classifiers for image recognition [ppt]
Tue, 11/6 Sec 14.3 Part-based models [ppt]
Wed,11/7 PS4 due
Thu, 11/8 Face Recognition [ppt]
Tue, 11/13 Video Processing Sec 8.4,12.6.4 Motion and optical flow [ppt] PS5 out
Thu, 11/15 Sec 8.4,12.6.4 Background subtraction, action recognition [ppt]
Tue, 11/20 No class
Thu, 11/22 No class (Thanksgiving)
Mon, 11/26 PS5 due
Tue, 11/27 Sec 5.1.2, 4.1.4 Tracking [ppt]
Wed, 11/28 Final project update due
Thu, 11/29 Convolutional Neural Networks [ppt]
(Guest Lecture: Michael Cogswell)
Mon, 12/3 Project video due
Tue, 12/4 On What's Possible Today [ppt]
(Harsh Agrawal)


  1. Course webpage for a previous offering of the course.
  2. This course closely follows the following course: CS 376: Computer Vision at UT Austin taught by Kristen Grauman
  3. A few other similar courses (by no means an exhaustive list):


Thanks to visualdialog.org for the webpage format.