Learning Mid-Level Vision from Natural Data
Thursday, April 28th, 2022 @ 2:00 p.m. CDT
This talk has ended.
A recording of the talk will be posted for CS faculty and students to view within 24 hours.
Computer vision with deep learning has achieved super-human performance on various benchmarks. However, deep neural network models are highly specialized for the task and the data they are trained on. In contrast, human vision is universal: It is a flexible light meter, an instant geometer, a versatile material comparator, and a holistic parser. More importantly, babies with normal vision eventually all learn to see out of an initial nebulous blur and from their widely different visual experiences.
I attribute this fascinating development of universal visual perception to the ability of learning mid-level visual representations from natural data without any external supervision. My key insight is that there are structures in the visual data that can be discovered with model bottlenecks and minimal priors. I will present our stream of efforts on unsupervised learning of visual recognition: seeing objectness (figure/ground) from watching unlabeled videos, recognizing individual objects and parsing a visual scene into hierarchical semantic concepts simply from a collection of unlabeled images. Our data-driven computational modeling not only sheds light on human visual perception, but also opens up exciting new ways for scientists, engineers, and clinicians to look at their data and make novel discoveries.