Baseball has always been a game of numbers. For decades, teams have pored over stats like batting averages and ERAs to gain an edge. But in recent years, artificial intelligence has taken baseball analytics to new heights. In today’s episode, we’ll explore how AI is revolutionizing baseball – from scouting and player performance to in-game strategy and even fan experience – and what that means for the future of sports and other industries.
Read MoreFiltering by Category: Data Science
Microsoft’s “Majorana 1” Chip Brings Quantum ML Closer
Microsoft’s Majorana 1 is a newly unveiled quantum computing chip that marks a major breakthrough in the quest for practical quantum computers. It’s the world’s first quantum processor built on a so-called Topological Core architecture – meaning it uses topological qubits (based on exotic Majorana particles that I’ll dig into more shortly) instead of the fragile qubits found in today’s machines. Microsoft believes this innovation could accelerate the timeline for solving real-world, industrial-scale problems with quantum computing from “decades” to just a few years.
Read MoreNoSQL Is Ideal for AI Applications, with MongoDB’s Richmond Alake
In today's episode (#871), I'm joined by the gifted writer, speaker and ML developer Richmond Alake, who details what NoSQL databases are and why they're ideally suited for A.I. applications.
Richmond:
Is Staff Developer Advocate for AI and Machine Learning at MongoDB, a huge publicly-listed database company with over 5000 employees and over a billion dollars in annual revenue.
With Andrew Ng, he co-developed the DeepLearning.AI course “Prompt Compression and Query Optimization” that has been undertaken by over 13,000 people since its release last year.
Has delivered his courses on Coursera, DataCamp, and O'Reilly.
Authored 200+ technical articles with over a million total views, including as a writer for NVIDIA.
Previously held roles as an ML Architect, Computer Vision Engineer and Web Developer at a range of London-based companies.
Holds a Master’s in computer vision, machine learning and robotics from The University of Surrey in the UK.
Today's episode (filmed in-person at MongoDB's London HQ!) will appeal most to hands-on practitioners like data scientists, ML engineers and software developers, but Richmond does a stellar job of introducing technical concepts so any interested listener should enjoy the episode.
In today’s episode, Richmond details:
How NoSQL databases like MongoDB differ from relational, SQL-style databases.
Why NoSQL databases like MongoDB are particularly well-suited for developing modern A.I. applications, including Agentic A.I. applications.
How Mongo incorporates a native vector database, making it particularly well-suited to RAG (retrieval-augmented generation).
Why 2025 marks the beginning of the "multi-era" that will transform how we build A.I. systems.
His powerful framework for building winning A.I. strategies in today's hyper-competitive landscape.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
OpenAI’s “Deep Research”: Get Days of Human Work Done in Minutes
What does Deep Research do?
Read MoreAI Should Make Humans Wiser (But It Isn’t), with Varun Godbole
Today's trippy, brain-stimulating episode features Varun Godbole, a former Google Gemini LLM researcher who’s turned his attention to the future implications of the crazy-fast-moving exponential moment we're in.
Varun:
Spent the past decade doing Deep Learning research at Google, across pure and applied research projects.
For example, he was co-first author of a Nature paper where a neural network beat expert radiologists at detecting tumors.
Also co-authored the Deep Learning Tuning Playbook (that has nearly 30,000 stars on GitHub!) and, more recently, the LLM Prompt Tuning Playbook.
He's worked on engineering LLMs so that they generate code and most recently spent a few years as a core member of the Gemini team at Google.
Holds a degree in Computer Science as well as in Electrical and Electronic Engineering from The University of Western Australia.
Varun mostly keeps today’s episode high-level so it should appeal to anyone who, like me, is trying to wrap their head around how vastly different society could be in a few years or decades as a result of abundant intelligence.
In today’s episode, Varun details:
How human relationship therapy has helped him master A.I. prompt engineering.
Why focusing on A.I. agents so much today might be the wrong approach — and what we should focus on instead.
How the commoditization of knowledge could make wisdom the key differentiator in tomorrow's economy.
Why the future may belong to "full-stack employees" rather than traditional specialized roles.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
LLMs and Agents Are Overhyped, with Dr. Andriy Burkov
Andriy Burkov's ML books are mega-bestsellers and his newsletter has a wild 900,000 subscribers. He seldom does interviews so don't miss today's episode, in which he takes compelling, contrarian views on LLMs and agents.
More on Dr. Burkov:
His indispensable "100-Page Machine Learning Book" seems to be on *every* data scientist / ML engineer's bookshelf.
He also wrote "ML Engineering" and his latest book, "The 100-Page Language Model Book", was released this year to rave reviews.
His "Artificial Intelligence" newsletter is subscribed to by 900,000 people on LinkedIn.
He's the Machine Learning Lead at TalentNeuron, a global labor-market analytics provider.
He runs his own book-publishing company, True Positive Inc.
Previously held data science / ML roles at Gartner, Fujitsu and more.
Holds a PhD in Computer Science (A.I.) from Université Laval in Quebec, where his doctoral dissertation focused on multi-agent decision-making — 15 years ago!
Despite Dr. Burkov being such a technical individual, most of today’s episode should appeal to anyone interested in A.I. (although some parts here and there will be particularly appealing to hands-on machine-learning practitioners).
In today’s episode, Andriy details:
Why he believes AI agents are destined to fail.
How he managed to create a chatbot that never hallucinates — by deliberately avoiding LLMs.
Why he thinks DeepSeek AI crushed Bay Area A.I. leaders like OpenAI and Anthropic.
What makes human intelligence unique from all other animals and why A.I. researchers need to crack this in order to attain human-level intelligence in machines.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
How to Grow (and Sell) a Data Science Consultancy, with Cal Al-Dhubaib
Today, my ebullient long-time friend Cal Al-Dhubaib makes his debut on my podcast to spill the beans on how you can launch your own thriving (data science / A.I. / ML) consultancy and, eventually, sell it 💰
Cal:
Is Head of AI & Data Science at Further, a data and A.I. company based in Atlanta that has hundreds of employees.
Previously, he was founder and CEO of Pandata, an Ohio-based A.I. and machine learning consultancy that he grew for over eight years until it was acquired by Further a year ago.
Delivers terrific talks — don’t miss him if you have the chance!
Holds a degree in data science from Case Western Reserve University in Cleveland.
Today’s episode should appeal to any listener, particularly anyone that would like to drive revenue and profitability from data science or AI projects.
In it, Cal covers:
Why his first startup was unsuccessful, but how the experience allowed him to discover an untapped market and build Pandata, a thriving data science consultancy.
His unconventional strategy of requiring clients to make a sizable commitment up front that initially scared away clients but ultimately attracted the best ones.
The way core values inspired by his "tin can to Mars" thought experiment shaped his hiring and company culture.
How making data science "boring", helping his clients trust AI systems and delivering a clear return on investment became his formula for success.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
OpenAI’s o3-mini: SOTA reasoning and exponentially cheaper
Today’s episode will fill you in on everything you need to know about an important model OpenAI recently released to the public called o3-mini.
Read MoreTabPFN: Deep Learning for Tabular Data (That Actually Works!), with Prof. Frank Hutter
I've been teaching Deep Learning for a decade. In that time, countless students have been disappointed by applying DL to tabular and time-series data. Finally, thanks to Prof. Frank Hutter, that will no longer be the case!
Frank:
Is a tenured professor of machine learning and head of the Machine Learning Lab at the University of Freiburg, although he has been on leave since May to focus on…
His fellowship on AutoML and Tabular Foundation Models at the ELLIS Institute Tübingen in Germany…
As well as becoming Co-Founder and CEO of Prior Labs, a German startup that provides a commercial counterpart to his tabular deep-learning model research and open-source projects… and that has just announced a huge €9m pre-seed funding round.
Holds a PhD in Computer Science from The University of British Columbia and his research has been extremely impactful: It has been cited over 87,000 times!
Today’s episode is on the technical side and will largely appeal to hands-on practitioners like data scientists, AI/ML engineers, software developers and statisticians (especially Bayesian statisticians)!
For a bit of context: Pretty much everyone works with tabular data, either primarily or occasionally. Tabular data are data stored in a table format, so structured into rows and columns, where the columns might be different data types, say, some numeric, some categorical and some text. For a decade, deep learning has ushered in the A.I. era by making huge advancements across many kinds of data — pixels from cameras, sound from microphones and of course natural language — but through all of this revolution, deep learning has struggled to be impactful on highly ubiquitous tabular data… until now.
In today’s episode, Prof. Hutter details:
How his revolutionary transformer architecture, TabPFN, has finally cracked the code on using deep learning for tabular data and is outperforming traditionally leading approaches like gradient-boosted trees on tabular datasets.
How version 2 of TabPFN, released last month to much fanfare thanks to its publication in the prestigious journal Nature, is a massive advancement, allowing it to handle orders of magnitude more training data.
How embracing Bayesian principles allowed TabPFN v2 to work "out of the box" on time-series data, beating specialized models and setting a new state of the art on the key time-series analysis benchmark.
The breadth of verticals that TabPFN has already been applied to and how you can now get started with this (conveniently!) open-source project on your tabular data today.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
In Case You Missed It in January 2025
Happy Valentine's Day 💘 ! My high-calorie gift to you is today's episode, which features the best highlights from conversations I had with the (absolutely epic!) guests I hosted on my podcast in January.
The specific conversation highlights included in today's episode are:
Famed futurist Azeem Azhar on how to break your linear mindset to prepare for the exponential technological change that we are experiencing (and will experience even more rapidly in years to come).
Global quantum-computing expert Dr. Florian Neukart on practical, real-world applications of quantum computing today.
Kirill Eremenko and Hadelin de Ponteves — who have together taught over 5 million people data science — with their 12-step checklist for selecting an appropriate foundation model (e.g., large language model) for a given application.
Brooke Hopkins (former engineer at Waymo, now founder and CEO of Y Combinator-backed startup Coval) on why you should evaluate A.I. agents with reference-free metrics.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
From Pro Athlete to Data Engineer: Colleen Fotsch’s Inspiring Journey
Colleen Fotsch won national swimming championships and was a pro athlete in both CrossFit and bobsledding. Now she's excelling at data analytics and engineering! Today, hear her fun, inspiring and practical story.
More on Colleen:
As a collegiate swimmer, she won national championships and set an American record in the relay.
As a pro CrossFit athlete, she twice competed at the “Games”, which is the highest echelon of the sport.
And then she simultaneously pursued a degree in data analytics while training with the US Bobsled team.
An injury ended her Olympic Bobsled team dream, but luckily she’d been pursuing that analytics career in parallel!
She began working full-time as a data analyst four years ago and has now grown into a data-engineering leadership role at a healthcare-staffing firm called CHG Healthcare in Utah, where she serves as Senior Technical Manager of their Data Platform.
Inspires her 280,000 Instagram followers on a daily basis.
Today’s episode essentially has two separate parts:
The first half focuses on Colleen’s exciting journey to the highest levels of three sports: swimming, CrossFit and bobsledding. That part should be fascinating to just about anyone.
The second half covers Colleen’s transition into data analytics and data engineering; that part will appeal to technically-minded listeners, particularly ones considering a career in or early on in a career in analytics or engineering.
In today’s episode, Colleen details:
The connection between a competitive sports mindset and data-career success.
Proven strategies for being hired into your first data role later in your career.
Why being "not smart enough" for coding was a mental block she had to overcome.
How analytics engineering bridges the gap between data engineering and analysis.
The huge benefits deskbound professionals can enjoy by including regular exercise in their week, and tips and tricks for developing or growing an exercise habit.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
DeepSeek R1: SOTA Reasoning at 1% of the Cost
In recent weeks, I’m sure you’ve noticed that there’s been a ton of excitement over DeepSeek, a Chinese A.I. company that was spun out of a Chinese hedge fund just two years ago.
Read MoreBAML: The Programming Language for AI, with Vaibhav Gupta
Today's guest, Vaibhav Gupta, has developed BAML, the programming language for AI. If you are calling LLMs, you've gotta check BAML out for instant accuracy improvements and big (20-30%) cost savings.
More on charming and terrifically brilliant Vaibhav:
Founder & CEO of Boundary (YC W23), a Y Combinator-backed startup that has developed a new programming language (BAML) that makes working with LLMs easier and more efficient for developers.
Across his decade of experience as a software engineer, he built predictive pipelines and real-time computer vision solutions at Google, Microsoft and the renowned hedge fund The D. E. Shaw Group.
Holds a degree in Computer Science and Electrical Engineering from The University of Texas at Austin.
This is a relatively technical episode. The majority of it will appeal to folks who interact with LLMs or other model APIs hands-on with code.
In today’s information-dense episode, Vaibhav details:
How his company pivoted 13 times before settling upon developing a programming language for A.I.
Why creating a programming language was "really dumb" but why it’s turning out to be brilliant, including by BAML already saving companies 20-30% on their AI costs.
Fascinating parallels between today's A.I. tools and the early days of web development.
His unconventional hiring process (I’ve never heard of anything remotely close to it) and the psychology behind why it works.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
How to Ensure AI Agents Are Accurate and Reliable, with Brooke Hopkins
Agentic A.I. is powerful because it has infinite breadth of capability. But this is a double-edged sword: Agents entail great risk and testing their performance is tricky... until now — thanks to Brooke Hopkins, today's guest!
Brooke:
Is Founder & CEO of Coval (YC S24), a San Francisco-based startup that provides a simulation and evaluation platform for A.I. agents. A few days ago, they announced a $3.3m fundraise that includes heavyhitter VCs like General Catalyst, MaC and Y Combinator.
Previously was Tech Lead and Senior Software Engineer at Waymo, where she worked on simulation and evaluation for Waymo’s self-driving cars.
Before that, she was a Software Engineer at Google.
She holds a degree in Computer Science and Mathematics from New York University’s Abu Dhabi campus.
Despite Brooke’s highly technical background, our conversation is largely conceptual and high-level, allowing anyone who’s interested in developing and deploying Agentic A.I. applications to enjoy today’s episode.
In today’s episode, Brooke details:
How simulation and testing best practices inspired by autonomous-vehicle development are being applied by her team at Coval to make A.I. agents useful and trustworthy in the real world.
Why voice agents are poised to be the next major platform shift after mobile, creating entirely new ways to interact with technology.
How companies are using creative strategies like "background overthinkers" to make A.I. agents more robust.
What the rise of A.I. agents means for the future of human work and creativity… indeed, how agents will transform all of society.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
The Fastest-Growing Jobs Are AI Jobs
Assessing the fastest-growing job is tricky. For example, using job-posting data isn’t great because there could be lots of duplicate postings out there or a lot of the postings could be going unfilled. Another big issue is defining exactly what a job is: The exact same responsibilities could be associated with the job title “data scientist”, “data engineer” or “ML engineer”, depending on the particular job titles a particular company decides to go with. So, whoever’s evaluating job growth is going to end up bucketing groups of related jobs and responsibilities into one particular, standardized job-title bucket, probably these days in a largely automated, data-driven way; if you dug into individual examples, I’m sure you’d find lots of job-title standardizations you disagreed with but some kind of standardization approach is essential to ensuring identical roles with slightly different job titles get counted as the same thing.
Read MoreExponential Views on AI and Humanity’s Greatest Challenges, with Azeem Azhar
Today, the famed futurist Azeem Azhar eloquently details the exponential forces that are overhauling society — and why A.I. is essential for solving humanity's biggest challenges. This is a special episode; don't miss it!
In case you aren't familiar with his legendary name already, Azeem:
Is creator of the invaluable "Exponential View" newsletter (>100k subscribers).
Hosts the "Exponential View" podcast (well-known guests include Tony Blair and Andrew Ng).
Hosted the Bloomberg TV show "Exponentially" (guests include Sam Altman).
Holds fellowships at Stanford University and Harvard Business School.
Was Founder & CEO of PeerIndex, a venture capital-backed machine-learning startup that was acquired in 2014.
He holds an MA in PPE (Politics, Philosophy and Economics) from the University of Oxford.
Today’s episode will appeal to any listener. In it, Azeem details:
The exponential forces that will overhaul society in the coming decades.
Why AI is essential for solving humanity's biggest challenges.
His own cutting-edge, personal use of A.I. agents, LLMs, and automation.
Why there's no 'solid ground' in the future of work and how we can adapt.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Generative AI for Business, with Kirill Eremenko and Hadelin de Ponteves
Craving an intro to building and deploying commercially successful Generative A.I. applications? In today's episode, superstar data-science instructors Kirill and Hadelin (>5 million students between them) will fill you in!
Kirill Eremenko is one of our two guests today. He's:
Founder and CEO of SuperDataScience, an e-learning platform.
Founded the SuperDataScience Podcast in 2016 and hosted the show until he passed me the reins four years ago.
Our second guest is Hadelin de Ponteves:
Was a data engineer at Google before becoming a content creator.
In 2020, took a break from Data Science content to produce and star in a Bollywood film featuring "Miss Universe" Harnaaz Sandhu.
Together, Kirill and Hadelin:
Have created dozens of data science courses; they are the most popular data science instructors on the Udemy platform, with over five million students between them!
They also co-founded CloudWolf, an education platform for quickly mastering Amazon Web Services (AWS) certification.
And, in today’s episode, they announce (for the first time anywhere!) another (brand-new) venture they co-founded together.
Today’s episode is intended for anyone who’s interested in real-world, commercial applications of Generative A.I. — a technical background is not required.
In today’s episode, Kirill and Hadelin detail:
What generative A.I. models like Large Language Models are and how they fit within the broader category of “Foundation Models”.
The 12 crucial factors to consider when selecting a foundation model for a given application in your organization.
The 8 steps to ensuring foundation models are deployed commercially successfully.
Many real-world examples of how companies are customizing A.I. models quickly and at remarkably low cost.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
In Case You Missed It in December 2024
Today's "In Case You Missed it Episode"... is one not to miss! Several of the most fascinating conversations I've ever had on the SuperDataScience Podcast I host happened in December.
The specific conversation highlights included in today's episode are:
1. The legendary Dr. Andrew Ng on why LLM cost doesn't matter for your A.I. proof of concept.
2. Building directly on Andrew's segment, CTO (and my fellow Nebula.io co-founder) Ed Donner on how to choose the right LLM for a given application.
3. Extremely intelligent and clear-spoken Dr. Eiman Ebrahimi (CEO of Protopia AI) on the future of autonomous systems and data security in our Agentic A.I. future.
4. From our 2024 recap episode, Sadie St. Lawrence's three biggest A.I. "wow" moments of the year... as well as the biggest flop of the year. (One company was behind both!)
5. Harvard/MIT humanist chaplain Greg Epstein (and bestselling author on tech in society) on the ethics of accelerating A.I. advancements. Should we, for example, consider slowing A.I. progress down?
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
2025 AI and Data Science Predictions, with Sadie St. Lawrence
Happy New Year! To prepare you for 2025, today's guest is the clairvoyant Sadie St. Lawrence, who predicts what the biggest A.I. trends will be in the year ahead. We also pick the A.I. winners and losers of 2024.
In a bit more detail, in today’s episode (which will appeal to technical and non-technical listeners alike):
• We cover how Sadie’s predictions for 2024 (which she made a year ago on this show) panned out.
• We award our “wow moment” of 2024, our comeback of the year, our disappointment of the year and our overall winner of 2024.
• And then, of course, we speculate on the five biggest trends to prepare for in 2025.
As with our 2022, 2023 and 2024 predictions episode, our special guest again this year is Sadie St. Lawrence, who is:
• A data science and machine learning instructor whose content has been enjoyed by over 600,000 students.
• The Founder and CEO of the Human Machine Collaboration Institute as well as being founder and chair of Women In Data™️, a community of over 60,000 women across 55 countries.
• Serves on multiple start-up boards.
• Hosts the Data Bytes podcast.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
AI Engineering 101, with Ed Donner
My holiday gift to you is my Nebula.io co-founder Ed Donner, one of the most brilliant, articulate people I know. In today's episode, Ed introduces the exciting, in-demand "A.I. Engineer" career — what's involved and how to become one.
After working daily alongside this world-class mind and exceptional communicator for nearly a decade, it is at long last my great pleasure to have the extraordinary Ed as my podcast guest. Ed:
• Is co-founder and CTO of Nebula, a platform that leverages generative and encoding A.I. models to source, understand, engage and manage talent.
• Previously, was co-founder and CEO of an A.I. startup called untapt that was acquired in 2020.
• Prior to becoming a tech entrepreneur, Ed had a 15-year stint leading technology teams on Wall Street, at the end of which he was a Managing Director at JPMorganChase, leading a team of 300 software engineers.
• He holds a Master’s in Physics from the University of Oxford.
Today’s episode will appeal most to hands-on practitioners, particularly those interested in becoming an A.I. Engineer or leveling up their command of A.I. Engineering skills.
In today’s episode, Ed details:
• What an A.I. Engineer (also known as an LLM Engineer) is.
• How the data indicate A.I. Engineers are in as much demand today as Data Scientists.
• What an A.I. Engineer actually does, day to day.
• How A.I. Engineers decide which LLMs to work with for a given task, including considerations like open- vs closed-source, what model size to select and what leaderboards to follow.
• Tools for efficiently training and deploying LLMs.
• LLM-related techniques including RAG and Agentic A.I.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.