Udit Gupta
Faster, Smarter, and Greener Systems for Data-Center Scale AI
Thursday, March 31st, 2022 @ 2:00 p.m. CDT
This talk has ended.
A recording of the talk will be posted for CS faculty and students to view within 24 hours.
The modern Internet is driven by AI-centric services that determine how we interact with technology and society on a daily basis. The exponential rise in AI is largely fueled by the design, development, and deployment of domain-specific software and hardware that have yielded orders of magnitude improvements for deep learning. Despite these efforts, this talk focuses on an important, yet under-studied area: systems for deep learning-based personalized recommendation. Personalized recommendations form the backbone of our interaction with the Internet including search, e-commerce, streaming, and social media. Systems play a crucial role in enabling accurate, efficient, and sustainable recommendation engines.
In this talk I show how modern deep learning-based personalized recommendation engines not only consume the majority of AI training and inference cycles in production data centers, but also introduce unique system design challenges to efficient execution. To tackle these challenges, I design solutions across the software and hardware stack to optimize inference efficiency by jointly considering application-level characteristics, unique neural network model architectures, data-center scale implications, and the underlying hardware. Given the rapidly growing infrastructure demands posed by AI and recommendation engines, my work highlights that systems must go beyond performance, power, and energy efficiency to consider environmental footprint as a first order design target to enable sustainable computing. Finally, I chart paths to designing future systems that enable emerging AI-driven applications by balancing performance, efficiency, sustainability, and privacy.