The "gold standard" in Machine Learning is to train models with manually labeled data. Shayan Mohanty details why that encodes bias in our models and provides a "weakly-supervised" solution.
Shayan:
• Is the CEO of Watchful, a Bay Area startup he co-founded to automate the injection of subject-matter-expertise into ML models.
• Is Guest Scientist at Los Alamos, a renowned national security lab.
• Previously he worked as a data engineer at Facebook.
• Was co-founder and CEO of a pair of other tech startups.
• Holds a degree in economics from the The University of Texas at Austin.
Today’s episode will be of interest to technical data science experts and non-technical folks alike as it addresses critical issues associated with creating datasets for machine learning models — issues we should be aware of regardless whether we’re more technically or commercially oriented.
In this episode, Shayan details:
• Why bias in general is good.
• Why degenerative bias in particular is bad.
• Arguments against using manual labeling.
• How his company Watchful has devised a better alternative to manual labeling — including its fascinating technical underpinnings such as the Chomsky hierarchy of languages and their high-performance Monte Carlo simulation engine.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.