Five-Minute Friday this week is a high-level introduction to Classification and Regression problems — two of the main categories of problems tackled by Machine Learning algorithms.
Listen or watch here.
Filtering by Category: SuperDataScience
Deep Reinforcement Learning for Robotics with Pieter Abbeel
Very special guest this week! Pieter Abbeel is a serial A.I. entrepreneur, host of star-studded The Robot Brains Podcast, and the world's most preeminent researcher of Deep Reinforcement Learning applications.
As a professor of Electrical Engineering and Computer Science at the University of California, Berkeley, Pieter directs the Berkeley Robot Learning Lab and co-directs the Berkeley A.I. Research Lab.
As an entrepreneur, he's been exceptionally successful at applying machine learning for commercial value. Gradescope, a machine learning company in the education technology space that he co-founded, was acquired in 2018. And the A.I. robotics firm Covariant, which he co-founded more recently, has raised $147 million so far, including raising $80 million in a Series C funding round in July.
In this episode, Pieter eloquently discusses:
• His exciting current research in the field of Deep Reinforcement Learning
• Top learning resources and skills for becoming an expert in A.I. robotics
• How academic robotics research is vastly different from R&D for industry
• Productivity tips
• Traits he looks for in data scientists he hires
• Skills to succeed as a data scientist in the coming decades
He also had time to answer thoughtful questions from distinguished SuperDataScience listeners Serg Masís and Hsieh-Yu Li.
Listen or watch here.
Managing Imposter Syndrome
The Five-Minute Friday episode this week is on Imposter Syndrome, including what it is and how to manage it.
Thanks to Nikolay for the episode idea and Micayla for doing most of my homework for it!
Listen or watch here.
Statistical Programming with Friends with Jared Lander
This week's guest is THE Jared Lander! He fills us in on real-life communities that support learning about — and effectively applying — open-source statistical-programming languages like Python and R.
In addition, Jared:
• Overviews what data-science consulting is like (with fascinating use-cases from industrial metallurgy to "Money Ball"-ing for the Minnesota Vikings)
• Details the hard and soft skills of successful data-science consultants
• Ventures eloquently into the age-old R versus Python debate
Jared leads the New York Open Statistical Programming Meetup, which is the world's largest R meetup — but it also features other open-source programming languages like Python — for talks from global leaders in data science and machine learning. And Jared runs the R Conference, which is approaching its seventh annual iteration next week, Sep 9-10.
Jared also wrote the bestselling book "R for Everyone" and teaches stats at both Columbia University in the City of New York and Princeton University. And none of the massive responsibilities that I've just mentioned are Jared's day job! Nope, for that he's the CEO and Chief Data Scientist of Lander Analytics, a data-science consulting firm.
Watch or listen here.
P.S.: Jared is kindly providing 20% off admission to next week's R Conference off using promo code SDS20. See rstats.nyc for more details, including the first-ever live episode of SuperDataScience (with Drew Conway as guest)!
Yoga Nidra
Episode 500 of the SuperDataScience podcast is live today! For this special occasion, world-class yogi Jes Allen guides us through a full, deep session of Yoga Nidra — a centering and transformative meditation-like experience.
I'm so excited to share this practice with you and can't wait to hear what you think of it! Thank you to all of you listeners — as well as of course SuperDataScience founder / 400-plus-episode-host Kirill Eremenko — for bringing this podcast to where it is today. And none of this would be possible without the hundreds of inspiring guests we've had over the years, the indefatigable show manager Ivana, and the awesome production team: Mario, Jaime, and JP.
I am honored and grateful to be able to serve all of you and walk alongside you in your data-science career journey. Keep on rockin'! 🎸
You can listen to or watch the episode here.
Data Meshes and Data Reliability
The fun and brilliant Barr Moses joins me this week to detail for us what organization-transforming Data Meshes are, as well as how to track and improve the "Data Uptime" (reliability) of your production systems.
Barr is co-founder and CEO of Monte Carlo, a venture capital-backed start-up that has grown in head count by a remarkable 10x in the past year. Monte Carlo specializes in data reliability, making sure that the data pipelines used for decision-making or production models are available 24/7 and that the data are high quality.
In this SuperDataScience episode, Barr covers:
• What data reliability is, including how we can monitor for the "good pipelines, bad data" problem
• How reliable data enables the creation of a Data Mesh that empowers data-driven decision-makers across all of the departments of a company to independently create and analyze data
• How to build a data science team
• How to get a data-focused start-up off the ground, generating revenue and rapidly scaled up
In addition, Barr took time to answer questions from listeners, including those from Svetlana, Bernard, and A Ramesh. Thanks to Scott Hirleman for suggesting Barr as a guest on the show and thanks to Molly Vorwerck for ensuring everything ran perfectly.
Listen or watch here.
How Only Beginners Know Everything
For Five-Minute Friday, I review a paradoxical pattern I've noticed in myself and in many early-career data scientists: We think we know everything. That is, until we advance past being novices and discover we're not so great.
Listen or watch here.
Maximizing the Global Impact of Your Career
This week, expert Benjamin Todd details how you can find purpose in your work and maximize the global impact of your career. In particular, he emphasizes how data scientists can exert a massive positive influence.
In this mind-expanding and exceptionally inspiring episode, Ben details:
• An effective process for evaluating next steps in your career
• A data-driven guide to the most valuable skills for you to obtain regardless of profession
• Specific impact-maximizing career options that are available to data scientists and related professionals, such as ML engineers and software developers.
Ben has invested the past decade researching how people can have the most meaningful and impactful careers. This research is applied to great effect via his charity 80,000 Hours, which is named after the typical number of hours worked in a human lifetime. The Y Combinator-backed charity has reached over eight million people via its richly detailed, exceptionally thoughtful, and 100% free content and coaching.
Listen or watch here.
A Brain-Computer Interface Story
For Five-Minute Friday this week, I tried something different: I wrote a short sci-fi story! Let me know if you liked it or hated it and, based on your feedback, I'll either do more of it or consider never doing it again :)
Watch or listen here.
Successful AI Projects and AI Startups
This week, the rockstar Greg Coquillo fills us in on how to get a return on investment in A.I. projects and A.I. start-ups. He also introduces Quantum Machine Learning.
In addition, through responding to audience questions, Greg details:
• Element AI's maturity framework for A.I. businesses
• How A.I. startup success comes from understanding your long-term business strategy while iterating tactically
• How machines typically are much faster than people but tend to be less accurate
(Thanks to Bernard, Serg, Kenneth, Nikolay, and Yousef for the questions!)
Greg is LinkedIn's current "Top Voice for A.I. and Data Science". When he's not sharing succinct summaries of both technically-oriented and commercially-oriented A.I. developments with his LinkedIn followers, Greg's a technology manager at Amazon's global HQ in Seattle. Originally from Haiti, Greg obtained his degrees in industrial engineering and engineering management from the University of Florida before settling into a series of management-level process-engineering roles.
Listen or watch here.
Bringing Data to the People
This week's guest is super-cool Anjali Shrivastava. Anjali makes data accessible and broadly appealing by analyzing pop culture — from TikTok mansions to Star Wars timelines — in her fun and creative YouTube videos.
Anjali is an expert in data-science visualization. She has used this skill set to engineer visualizations of data in production systems in a number of roles and recently took up a data science role at the lab technology giant Thermo Fisher Scientific.
We dig into her technical expertise, including her favorite software tools and applications for viz. We also discuss Anjali's mission to bring a face to data, which she accomplishes through journalism as well as through her brilliant and fun "Vastava" YouTube channel.
Anjali holds dual degrees from the prestigious University of California, Berkeley in data science, as well as in industrial engineering and operations research. A recent graduate, she fill us in on what a data science degree curriculum is like at a top university like Berkeley, as well as how anyone can access their world class data science lectures online.
Listen or watch here.
The World is Awful (and it’s Never Been Better)
Feel like the world is kinda poopy? Well, it is! BUT, covid pandemic not withstanding, it's also WAY better than ever before. I articulate this idea with data and charts for this week's Five-Minute Friday episode.
Thanks to Benjamin Todd for pointing me in the direction of a blog post by Max Roser (founder of Our World in Data) that formed the basis of this podcast episode.
Watch or listen here.
R in Production
Dutch national-podium-level powerlifter Veerle van Leemput joins me this week to detail how R is not only an option for production, but may in fact be the *best* production option if data models are central to your application.
Over the course of the episode, Veerle runs down for us her favorite R tools for:
• Data gathering
• Model development
• Deployment into production systems
Veerle has held a number of data-science leadership roles at Dutch companies. She now serves as Managing Director and Head of Data Science at Analytic Health, a London-based firm that builds data-centric software for the healthcare industry. And she was silver medalist in the 57kg class of the 2021 Dutch national powerlifting championships with a total of 335kg (~739 pounds) across the back squat, bench press, and deadlift.
Listen or watch here.
Say No to Pie Charts
Public Service Announcement for this week's Five-Minute Friday: Don't use pie charts! (Nor, in almost all circumstances, ANY circular chart!)
Listen or watch here.
DataScienceGo This Weekend
The DataScienceGO conference is this weekend — registration for Friday and Saturday is 100% free! I'm speaking Saturday on the pros and cons of TensorFlow vs PyTorch for training and deploying deep-learning models.
Awesome speakers — whom you may already be familiar with from recent SuperDataScience episodes — include:
• Erica Greene (episode # 435)
• Harpreet Sahota (# 457)
• Andrew Jones (# 483)
I don't (yet!) personally know the other speakers pictured here but their weighty reputations precede them and I'm looking forward to getting to know them better over the course of the weekend: Gabriela de Queiroz, Karen JEAN-FRANCOIS, Yudan Lin, Ken Jee, and Danny Ma.
Free registration here!
Monetizing Machine Learning
This week's guest is the legendary Vin Vashishta! Vin details his A.I. commercialization strategy, which allows data science teams and machine learning companies alike to be profitable and successful long-term.
Vin is founder of and chief data scientist at V Squared, his own consulting practice that specializes in monetizing machine learning by helping Fortune 100 companies with A.I. strategy. He's also the creator of several platforms (including The ML Rebellion) for learning about critical skill gaps related to artificial intelligence such as commercial strategy, data science leadership, and model explainability.
In addition to the episode's focus on A.I. strategy, Vin answers questions from SuperDataScience listeners (thanks, Serg, Joe, Daniel, Nikhil, and Michael!), including on:
• Efficiency gains from no-code or low-code machine learning tools
• The biggest skills gaps that data scientists have
• The most disturbing data sets
• Investing in socially beneficial models
• The most challenging problem with commercializing AI
Listen or watch here.
(With thanks to Harpreet Sahota for another stellar guest suggestion!)
The Price of Your Attention
Time is money. Every second of your life is yours to use and one of the options you have is to generate income. You can do this hourly, or, as a data scientist, invest time in a digitally-sharable product with a huge potential ROI.
Listen or watch here.
TensorFlow vs PyTorch @ DataScienceGo Virtual
The DataScienceGO Virtual conference is coming up next Saturday and it is FREE! I'm giving a talk on TensorFlow vs PyTorch with lots of time for audience questions.
Fixing Dirty Data
My guest this week is the fixer of dirty data herself, the one and only Susan Walsh. We have a lot of laughs in this episode as we discuss how organizations can save substantial sums by tidying up their data.
Susan has worked for a decade as a data-quality specialist for a wide range of firms across the private and public sectors. For the past four years, she's been doing this work as the founder and managing director of her own company, The Classification Guru Ltd. She's also the author of the forthcoming book, "Between the Spreadsheets", and she hosts her own video interview show called "Live from the Data Den".
Listen or watch here.
The History of Calculus
Y'all seem to love these "History of..." episodes, so for Five-Minute Friday this week, here's another one. It's on the History of Calculus! Enjoy 😄
(Leibniz and Newton, who independently devised modern calculus around the same time, are pictured.)
Listen or watch here.