Today, brilliant ML researcher Ajay Jain, Ph.D explains how a full-length feature film could be created using Stable-Diffusion-style generative A.I. — these models can now output flawless 3D models and compelling video clips.
Ajay:
• Is a Co-Founder of Genmo AI, a platform for using natural language to generate stunning state-of-the-art images, videos and 3D models.
• Prior to Genmo, he worked as a researcher on the Google Brain team in California, in the Uber Advanced Technologies Group in Toronto and on the Applied Machine Learning team at Facebook.
• Holds a degree in Computer Science and Engineering from MIT and did his PhD within the world-class Berkeley A.I. Research (BAIR) lab, where he specialized in deep generative models.
• Has published highly influential papers at all of the most prestigious ML conferences, including NeurIPS, ICML and CVPR.
Today’s episode is on the technical side so will likely appeal primarily to hands-on practitioners, but we did our best to explain concepts so that anyone who’d like to understand the state of the art in image, video and 3D-model generation can get up to speed.
In this episode, Ajay details:
• How the Creative General Intelligence he’s developing will allow humans to express anything in natural language and get it.
• How feature-length films could be created today using generative A.I. alone.
• How the Stable Diffusion approach to text-to-image generation differs from the Generative Adversarial Network approach.
• How neural nets can represent all the aspects of a visual scene so that the scene can be rendered as desired from any perspective.
• Why a self-driving vehicle forecasting pedestrian behavior requires similar modeling capabilities to text-to-video generation.
• What he looks for in the engineers and researchers he hires.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.