For today's massive episode, I traveled to Paris to interview Dr. Gael Varoquaux, co-founder of scikit-learn, the standard library for machine learning worldwide (downloaded over 1.4 million times PER DAY 🤯). In it, Gaël fills us in on sklearn's history and future.
More on Gaël:
• Actively leads the development of the ubiquitous scikit-learn Python library today, which has several thousand people contributing open-source code to it.
• Is Research Director at the famed Inria (the French National Institute for Research in Digital Science and Technology), where he leads the Soda ("social data") team that is focused on making a major positive social impact with data science.
• Has been recognized with the Innovation Prize from the French Academy of Sciences and many other awards for his invaluable work.
Today’s episode will likely be of primary interest to hands-on practitioners like data scientists and ML engineers, but anyone who’d like to understand the cutting edge of open-source machine learning should listen in.
In this episode, Gaël details:
• The genesis, present capabilities and fast-moving future direction of scikit-learn.
• How to best apply scikit-learn to your particular ML problem.
• How ever-larger datasets and GPU-based accelerations impact the scikit-learn project.
• How (whether you write code or not!) you can get started on contributing to a mega-impactful open-source project like scikit-learn yourself.
• Hugely successful social-impact data projects his Soda lab has had recently.
• Why statistical rigor is more important than ever and how software tools could nudge us in the direction of making more statistically sound decisions.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.