This week's guest is indefatigable Matthew Russell. An Air Force veteran and author of four data science books, Matthew is now Founder/CEO of Strongest AI, a leading tech platform for fitness.
In this episode, Matthew covers:
• The tech stack he uses to make it possible to provide data from fitness competitions to millions of spectators all over the world in real-time.
• How he rapidly tests machine learning models for deployment into portable devices like the iPhone and the Apple Watch.
• Multi-objective ML functions and why they’re so widely useful in real-world applications.
• The three critical traits he looks for in anyone he hires.
• The values instilled in him by pursuing a military education.
• The key skills he wishes he’d learned earlier in his career.
A bit more on Matthew... he's:
• Founder and CEO of Strongest, the leading technology platform for global fitness events, which is growing into an application that uses ML models to make you stronger, faster, and fitter than ever before.
• Author of four books published by O'Reilly Media, including the classic "Mining the Social Web", which is now in its third edition.
• Prior to founding Strongest, served as CTO at several firms.
• Holds a BS in Computer Science from the US Air Force Academy as well as an MS in Computer Science and Machine Learning from the US Air Force Institute of Technology.
Parts of today’s episode, particularly in the first half, do get fairly technical as we dig into the open-source software stack that enables the scalable deployment of data-intensive real-time applications. That said, much of the episode will appeal to anyone who’s excited about physical fitness or commercializing A.I.
Shout out to Austin Ogilvie for introducing me to Matthew 😀
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Filtering by Category: Interview
Sparking A.I. Innovation — with Nicole Büttner
Looking for ideas on how to spark A.I. innovation in your organization? Nicole Büttner, the eloquent and effervescent Founder/CEO of Merantix Labs, has concrete A.I. innovation frameworks for you in this week's guest episode.
Merantix Labs is a renowned Berlin-based consultancy that enables companies to unlock the value of A.I. across all industries.
In addition to being Founder and CEO of Merantix Labs, Nicole:
• Is a member of the Management Board of Merantix Labs’ parent company Merantix, an A.I. Venture Studio that has raised $30m in funding from the likes of SoftBank Group Corp. to serially originate successful ML start-ups.
• Holds a Masters in Quantitative Economics and Finance from the University of St.Gallen, the world’s leading German-language business school.
• Was a visiting researcher in Economics at Stanford University.
In this episode, Nicole details:
• What an A.I. Venture Studio is and how she founded a thriving A.I. consultancy within it
• How to spark A.I. innovation in a company of any size
• How to effectively use the unlabelled, unbalanced data sets that abound in business
• How to engineer reusable data and software components to tackle related projects efficiently
• The three distinct types of founders she looks for when she puts together the founding team of an A.I. start-up
Today’s episode touches on a few technical details here and there but the episode will largely be of interest to anyone who’s keen to make the most of A.I. innovation in a commercial organization, whether you happen to have a deep technical background today or not.
Special shout-out to the St. Gallen Symposium (Svenja, Rolf), which Nicole and I discuss our love for (as well as how you can get free flights, accommodation, and access — deadline to apply is Feb 1) starting at the 34-minute mark.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Data Observability — with Dr. Kevin Hu
This week's guest is the fun and wildly intelligent entrepreneur, Kevin Hu, PhD. Inspired by his doctoral research at MIT, he co-founded Metaplane, a Y-Combinator-backed data observability platform.
In a bit more detail, Kevin:
• Is Co-Founder/CEO of Metaplane, a platform that observes the quality of data flows, looks for abnormalities in the data, and reports issues
• Completed a PhD in machine learning and data science from the Massachusetts Institute of Technology
In this episode, Kevin covers:
• What data observability is and how it can help identify data quality issues immediately as well as more quickly resolve the source of the issue
• His PhD research on automating data science systems using ML
• How he identified the problem his start-up Metaplane would solve
• His experience in Y-Combinator accelerating Metaplane
• Pros and cons of an academic career relative to the start-up hustle
• The surprising complexity of the software tools he uses daily as a CEO
• What he looks for in the data engineers that he hires
This episode gets a little technical here and there but I think Kevin and I were pretty careful to define technical concepts when they came up, so today’s episode should largely be appealing to anyone who’s keen to learn a lot from a brilliant entrepreneur, especially if you’d like to found or grow a data science start-up yourself. Enjoy!
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Interpretable Machine Learning — with Serg Masís
This week's guest is Serg Masís, an absolutely brilliant data scientist who's specialized in modeling crop yields and climate change. He's also a world-leading author and expert on Interpretable Machine Learning.
Serg:
• Is a Climate & Agronomic Data Scientist at Syngenta.
• Wrote "Interpretable Machine Learning with Python", an epic hands-on guide to techniques that enable us to interpret, improve, and remove biases from ML models that might otherwise be opaque black boxes.
• Holds a Data Science Masters from the Illinois Institute of Technology.
In this episode, Serg details:
• What Interpretable Machine Learning is.
• Key interpretable ML approaches we have today / when they're useful.
• Social and financial ramifications of getting model interpretation wrong.
• What agronomy is and how it’s increasingly integral to being able to feed the growing population on our warming planet.
• What it’s like to be a Climate & Agronomic Data Scientist day-to-day and why you might want to consider getting involved in this fascinating, high-impact field.
• His productivity tips for excelling when you have as many big commitments as he does.
Today’s episode does get technical in parts but Serg and I made an effort to explain many technical concepts at a high level where we could, so today’s episode should be equally appealing to both practicing data scientists and anyone who’s keen to understand the importance and impact of interpretable ML or agronomic data science. Enjoy!
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Data Science Trends for 2022
Happy New Year! To kick it off, this week's episode features the marvelous Sadie St. Lawrence predicting the data science trends for 2022. Topics include AutoML, Deep Fakes, model scalability, NFTs, and data literacy.
In a bit more detail, we discuss:
• How the SDS podcast predictions for 2021 panned out (pretty well!)
• The AutoML tools that are automating parts of data scientists’ jobs.
• The social implications of Deep Fakes, which are becoming so lifelike and easy to create.
• Principles for making A.I. models infinitely scalable in production.
• The impact of the remote-working economy on data science employment.
• Practical uses of blockchain and non-fungible token tech in data science.
• Improving the data literacy of the global workforce across all industries.
Sadie:
• Has taught over 300,000 students data science and machine learning.
• Is the Founder and CEO of WomenInData.org, a community of over 20,000 data professionals across 17 countries.
• Is remarkably well-read on the future of technology across industries.
This episode is relatively high-level. It will be of interest to anyone who’d like to understand the trends that will shape the field of data science and the broader world not only in 2022, but also in the years beyond.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
How to Found, Grow, and Sell a Data Science Start-up
This week's guest is terrifically witty Austin Ogilvie, a prodigiously successful data science entrepreneur. He was founder and CEO of the iconic start-up Yhat and is now founder/CEO of rapidly-scaling Laika.
Austin:
• Was the Founder and CEO of Yhat, a start-up that built tools for data scientists and had a loyal cult following in the data science community.
• In 2018, Yhat was acquired by Alteryx, an analytics automation company.
• More recently he founded Laika, a “compliance-as-a-service” company that dramatically improves your capacity to sell your products.
• Laika last month closed a $35m Series B funding round, bringing the total raised by the firm over two years to a staggering $48m.
In this episode, Austin describes:
• His journey from an arts degree studying foreign languages to teaching himself programming and machine learning, and then bootstrapping a data science start-up into a respected brand and acquisition target.
• His unique take on what makes a great data scientist.
• The hands-on data science tools he finds great value in coding with day-to-day as the founder and CEO of fast-growing tech start-ups.
• His practical tips for growing into a successful technical founder, whether you have a technical background yourself today or not.
Today’s episode will be of great interest to anyone who’s interested in founding, growing, and/or successfully exiting a tech start-up, particularly if you’re thinking of incorporating data or A.I. elements.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Fusion Energy, Cancer Proteomics, and Massive-Scale Machine Vision — with Dr. Brett Tully
This week's guest is Dr. Brett Tully, who leverages his rich cross-domain experience to detail how data science is applied to the fields of nuclear fusion energy, cancer biology, and massive-scale aerial machine vision.
In today’s episode, Brett details for us:
• What nuclear fusion is, how harnessing fusion power commercial could be a pivotal moment in the history of humankind, and how data simulations may play a critical role in realizing it
• How the study of the healthy proteins versus the proteins present in someone with a particular cancer type is accelerating the availability and impact of personalized cancer treatment
• What it means to be a Director of A.I. Output Systems and how this role fits in with other A.I. activities in an organization, such as model research and development
• His favorite software tools for working with geospatial data
• His tricks for the effective management of a team of ML Engineers
• His take on the big A.I. opportunities of the coming years
Brett:
• Is the Director of A.I. Output Systems at Nearmap, a world-leading aerial imagery company that uses massive-scale machine vision to detect and annotate vast images of urban and rural areas with remarkable detail
• As the Head of Simulation at First Light Fusion, he developed state-of-the-art data simulations that could be a key stepping stone toward enabling commercial nuclear fusion reactors
• As the Group Leader of Software Engineering at a major research hospital, he worked to characterize the differences in the proteome — the complete catalog of proteins in your body — between cancer patients and healthy individuals
• As a PhD student at the University of Oxford, he simulated how the cerebrospinal fluid present in our brains flows in order to better understand neurological abnormalities
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Mutable vs Immutable Conditions
This article was originally adapted from a podcast, which you can check out here.
Recently, I had dinner with my wonderful friend Jake Zerrer, who’s a Senior Software Engineer at Flexport, a logistics and supply chain start-up based in San Francisco.
Conversation with Jake is never dull, but I particularly enjoyed a part of the conversation where he brought up an idea for framing problems: He described this framework on the basis of mutable versus immutable conditions.
Read MoreData Science at the Command Line
This week's guest is Dr. Jeroen Janssens, a global expert and bestselling author on effectively leveraging the Unix command line as a data scientist.
Jeroen:
Wrote the popular book "Data Science at the Command Line", the second edition of which was published by O'Reilly Media in October
Is the Founder Data Science Workshops B.V., which provides hands-on workshops to global orgs such as Amazon and The New York Times
Is Organizer of the Data Science Netherlands Meetup (3000+ members)
Former Assistant Professor at the Jheronimus Academy of Data Science
Has worked as a data scientist for Elsevier, YPlan, and Outbrain
Holds a PhD in A.I. from Tilburg University
In today’s episode, Jeroen details:
Why being able to do data science at the command-line — for example, in a Bash terminal — is an invaluable skill for a data scientist to have
How mastering the command line is the glue that facilitates “polyglot data science”, the ability to seamlessly borrow functions from any programming language in a single workflow
His PhD research on detecting anomalous events in time-series data
Why LaTeX is a great typesetting language to consider using particularly for creating lengthy documents or technical figures that adapt automatically to new data
How his consulting company, Data Science Workshops, grew organically out of his success as an author
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
A.I. Robotics at Home
This week's guest is mad genius Dave Niewinski, who creates A.I.-powered robots for use at home (e.g., cold beer retriever, flame-throwing weed killer) and to teach people A.I. robotics via his popular YouTube channel.
In the episode, Dave covers:
• The specific robotics hardware and open-source software incorporated into his wildest and most well-known robots
• Where machine vision algorithms, particularly deep learning models, are critical for enabling robot functionality
• His tips for folks who’d like to get started in A.I. robotics themselves
• How his interest in robotics led him to founding his Dave's Armoury Ltd. consulting business, which allows him to automate and improve real-world industrial processes with robots
• What excites him most about the societal impact A.I. robotics will have in our lifetimes
Specific robots of Dave's that we detail on the show include ones that:
• Play the sandbag-tossing game "cornhole"
• Defend his machine shop from kids by spraying them with a hose
• Exterminate weeds by throwing flames
• Carve pumpkins for Halloween
• Solve Rubik's cubes
• Use Lego pieces to create 2D artwork
• Race kids at assembling 3D Lego unicorns
• Bring cold beer from the fridge to wherever you are in the house
Thanks to Graham McCormick for introducing me to Dave! I learned so much from him and had such a good time hanging with him "on air".
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com. You can check out the Dave's Armoury YouTube channel here!
Automating Data Analytics
Meet the brilliant Dr. Peter Bailis. Inspired by research he carried out as a professor at Stanford University, three years ago he founded Sisu to automate enterprise data analytics and the firm has already raised $128m.
In today’s episode, Peter details:
• The revolutionary work being carried out by Sisu: generating automated, actionable reports in minutes that might otherwise take a team of data analysts days
• His guidance for people looking to succeed at growing a tech start-up, particularly if they come from an academic or technical background
• What he looks for in the data scientists and software engineers he hires
• His most important daily tools for developing software productively
• The academic research he carried out at Stanford that’s behind Sisu’s innovative capabilities
Peter:
• Is CEO of Sisu, the firm he founded in San Francisco
• Has raised $128m in venture capital from some of the most prestigious VC firms out there such as Andreessen Horowitz
• Was an assistant professor of computer science at Stanford University, where he’s still an adjunct faculty member today
• Holds several computer science degrees: an undergrad from Harvard University and a PhD from the University of California, Berkeley
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Hurdling Over Data Career Obstacles
This week's guest is the sensational Karen Jean-François: mathematician, award-winning data analyst, podcast host, and French national champion in the 400m hurdle. She details how to hurdle over data career obstacles.
More specifically, in today’s episode Karen fills us in on:
• How to overcome Imposter Syndrome in the data science industry
• Why you might want to consider becoming a data science manager versus remaining a more specialized individual contributor
• The data tools that she uses regularly
• The productivity and prioritization techniques that enable her to juggle her day job, her thriving podcast, and her world-class athletic pursuits
Karen:
• Manages banking analytics at publicly-listed Cardlytics
• Is the producer and host of the "Women in Data" podcast
• Was recognized last year as one of the "Twenty in Data and Technology"
• Holds degrees in mathematics and computing from Paris-Sud University (Paris XI)
• Was French national champion in the 400m hurdle and bronze medalist in the 100m hurdle
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Open-Source Analytical Computing (pandas, Apache Arrow)
The legend Wes McKinney is this week's guest! He details the genesis of the ubiquitous pandas library, the forthcoming edition of his bestselling book, and how Apache Arrow brings analytics into the distributed computing era.
Wes:
• Created pandas, the industry-standard Python library for data analytics
• Co-created Apache Arrow, a language-agnostic open-source library for efficient analytics on modern distributed CPUs and GPUs
• Wrote the classic O'Reilly Media desk reference "Python for Data Analysis"
• Has worked as technical expert at prestigious firms like Cloudera, RStudio PBC, Two Sigma, and AQR Capital Management
• Today serves as co-founder and CTO of Voltron Data
In this episode, Wes takes us on a technical deep-drive through:
• The creation story of his now-ubiquitous pandas library
• A sneak peek at the third edition of his international-bestselling book
• What the Apache Arrow project is and why it's poised to revolutionize the data science and software industries
• The software and hardware tools that he uses daily to be such an epically productive software developer and entrepreneur
• Responses to great questions by listeners Daniel, David, Doug, and Brett
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Skyrocket Your Career by Sharing Your Writing
This week's guest is Khuyen Tran, one of the most preeminent voices of data science: Her blog alone receives over 100k views/month. In this episode, she details how by publishing your writing, you too can skyrocket your career.
Khuyen:
• Became an author for the Towards Data Science blog less than two years ago; already her articles garner 100k+ views per month
• Writes practical daily posts featuring Python code right here on LinkedIn, leading to her developing a highly engaged following of over 25k — in just one year!
• Recently became a technical writer for NVIDIA's Developer Blog
• Has landed four data science jobs, including her current role at Ocelot Consulting, in part thanks to her writing
• Has a perfect 4.0 GPA in the computational and applied mathematics undergraduate degree that she's on track to complete next year
In today's episode, Khuyen fills us in on:
• How publishing your writing can skyrocket your technical career
• Her tricks for maximizing engagement with the content you publish
• Her favorite data science tools and approaches
• Her tricks for prioritizing and being as epically productive as she is across her studies, her data science work, and her prodigious technical writing
And thanks to Krzysztof and Nikolay for the outstanding audience questions that Khuyen addressed on-air.
The episode's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
A.I. for Good
This week's guest is the eloquent and inspiring James Hodson, founder and CEO of the AI for Good Foundation, which leverages data and machine learning to tackle the United Nations' Sustainable Development Goals.
In this episode, James details:
• Globally impactful case studies from his A.I. for Good organization across public health, DEI, and a practical database of A.I. progress on social issues
• How you yourself can get involved in helping apply A.I. for wide-reaching social benefit, whether you're a technical expert or not
• The hard and soft skills that he looks for in the data scientists that he hires
In addition to his leadership of A.I. for Good, James:
• Is an academic research fellow at the Jozef Stefan Institute, where he's focused on Natural Language Processing research
• Is Chief Science Officer at Cognism, a British tech startup
• Served as A.I. Research Manager at Bloomberg LP
• Completed a degree at Princeton University focused on Machine Translation
Thank you to Claudia Perlich for the intro to James! I learned a ton from him while filming this episode.
The episode's available on all major podcasting platforms, on YouTube, and at SuperDataScience.com.
Courses in Data Science and Machine Learning
This week's guest is super fun Sadie St. Lawrence, an exceptionally popular data science instructor with over 300k students all-time. She fills us in on her exciting new ML Certificate and the global impact of her Women In Data org.
Sadie:
• Teaches data science at the University of California, Davis
• Her Coursera course is one of the all-time most popular
• Is Founder and CEO of Women in Data, a community of 20k across 15 countries
• Holds a Master's in Analytics from Villanova University
In this episode, she digs into:
• The content of her existing iconic data science course
• The curriculum of her epic forthcoming Machine Learning Certificate
• The mission, impact, and vision of the Women in Data organization
• Her path into data science from music performance
• Non-fungible tokens (NFTs) and the future of technology
Thanks to Harpreet Sahota for introducing me to Sadie! I absolutely loved filming this episode.
Listen or watch here.
Accelerating Impact through Community — with Chrys Wu
This week's guest is global tech community builder Chrys Wu who details how you too can leverage communities to accelerate your career. This is the first SuperDataScience episode ever recorded in-person!
In addition to accelerating your career with community, Chrys covers:
• K-pop music and its associated cultural movement
• How the Write/Speak/Code and Hacks/Hackers organizations she co-founded leverage community to make a massive global impact for marginalized genders and journalism, respectively
• Her top resources — social media accounts, blogs, and podcasts — for staying abreast of the latest in data science and machine learning
Chrys is a consultant who specializes in product development and change management. She's also a co-founder of both Write/Speak/Code and Hacks/Hackers, the latter of which has grown to 70 chapters across five continents.
Listen or watch here.
Transformers for Natural Language Processing
This week's guest is award-winning author Denis Rothman. He details how Transformer models (like GPT-3) have revolutionized Natural Language Processing (NLP) in recent years. He also explains Explainable AI (XAI).
Denis:
• Is the author of three technical books on artificial intelligence
• His most recent book, "Transformers for NLP", led him to win this year's Data Community Content Creator Award for technical book author
• Spent 25 years as co-founder of French A.I. company Planilog
• Has been patenting A.I. algos such as those for chatbots since 1982
In this episode, Denis fills us in on:
• What Natural Language Processing is
• What Transformer architectures are (e.g., BERT, GPT-3)
• Tools we can use to explain *why* A.I. algorithms provide a particular output
We covered audience questions from Serg, Chiara, and Jean-charles during filming. For those we didn't get to ask, Denis is kindly answering via a LinkedIn post today!
The episode's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Upcoming guest on the SuperDataScience Podcast: Wes McKinney
Next week, I'm interviewing the monumental Wes McKinney — creator of pandas, co-creator of Apache Arrow, and bestselling author of "Python for Data Analysis" — for a SuperDataScience episode.
Got Qs for him? Tweet them @jonkrohnlearns or send them to me on LinkedIn.
Data Science for Private Investing — LIVE with Drew Conway
This week's guest is prominent data scientist and author Dr. Drew Conway. Working at Two Sigma, one of the world's largest hedge funds, Drew leads data science for private markets (e.g., real estate, private equity).
If you aren't familiar with Drew already, he:
• Serves as Senior Vice President for data science at Two Sigma
• Co-authored the classic O'Reilly Media book "ML for Hackers"
• Was co-founder and CEO of Alluvium, which was acquired in 2019
• Advised countless successful data-focused startups (e.g., Yhat, Reonomy)
• Obtained a PhD in politics from New York University
In this episode, he covers:
• What private investing is
• How data science can lead to better private investment decisions
• The differences between creating and executing models for public markets (such as stock exchanges) relative to private markets
• What he looks for in the data scientists he hires and how he interviews them
This is a special SuperDataScience episode because it's the first one recorded live in front of an audience (at the The New York R Conference in September). Eloquent Drew was the willing guinea pig for this experiment, which was a great success: We filmed in a single unbroken take and fielded excellent audience questions.
Listen or watch here.