Baseball has always been a game of numbers. For decades, teams have pored over stats like batting averages and ERAs to gain an edge. But in recent years, artificial intelligence has taken baseball analytics to new heights. In today’s episode, we’ll explore how AI is revolutionizing baseball – from scouting and player performance to in-game strategy and even fan experience – and what that means for the future of sports and other industries.
Read MoreFiltering by Category: SuperDataScience
Become Your Best Self Through AI Augmentation — feat. Natalie Monbiot
The deep-thinking and highly articulate Natalie Monbiot returns to my podcast today for a can't-miss episode (one of my favorite convos ever) on how A.I. will overhaul our lives, our work, our society in the coming years.
More on Natalie:
Through her consultancy, Virtual Human Economy, she advises on virtual humans and A.I. clones, including to startups like Wizly and investment firms like Blue Tulip Ventures.
Was previously Head of Strategy at Hour One, a leading virtual-human video-generation startup.
Regularly speaks at the world's largest conferences, including Web Summit and SXSW.
Holds a Master's in Modern Languages and Literature from the University of Oxford.
Today’s fascinating episode will be of great interest to all listeners. In it, Natalie details:
How A.I. is making us dumber — and what we can do about it.
Why the "virtual human economy" could be the next evolution of human civilization.
The two states of being humans are seeking (and how A.I. could help us achieve them).
Why focusing on merely 10x’ing our capabilities misses the much bigger opportunity of A.I.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Microsoft’s “Majorana 1” Chip Brings Quantum ML Closer
Microsoft’s Majorana 1 is a newly unveiled quantum computing chip that marks a major breakthrough in the quest for practical quantum computers. It’s the world’s first quantum processor built on a so-called Topological Core architecture – meaning it uses topological qubits (based on exotic Majorana particles that I’ll dig into more shortly) instead of the fragile qubits found in today’s machines. Microsoft believes this innovation could accelerate the timeline for solving real-world, industrial-scale problems with quantum computing from “decades” to just a few years.
Read MoreNoSQL Is Ideal for AI Applications, with MongoDB’s Richmond Alake
In today's episode (#871), I'm joined by the gifted writer, speaker and ML developer Richmond Alake, who details what NoSQL databases are and why they're ideally suited for A.I. applications.
Richmond:
Is Staff Developer Advocate for AI and Machine Learning at MongoDB, a huge publicly-listed database company with over 5000 employees and over a billion dollars in annual revenue.
With Andrew Ng, he co-developed the DeepLearning.AI course “Prompt Compression and Query Optimization” that has been undertaken by over 13,000 people since its release last year.
Has delivered his courses on Coursera, DataCamp, and O'Reilly.
Authored 200+ technical articles with over a million total views, including as a writer for NVIDIA.
Previously held roles as an ML Architect, Computer Vision Engineer and Web Developer at a range of London-based companies.
Holds a Master’s in computer vision, machine learning and robotics from The University of Surrey in the UK.
Today's episode (filmed in-person at MongoDB's London HQ!) will appeal most to hands-on practitioners like data scientists, ML engineers and software developers, but Richmond does a stellar job of introducing technical concepts so any interested listener should enjoy the episode.
In today’s episode, Richmond details:
How NoSQL databases like MongoDB differ from relational, SQL-style databases.
Why NoSQL databases like MongoDB are particularly well-suited for developing modern A.I. applications, including Agentic A.I. applications.
How Mongo incorporates a native vector database, making it particularly well-suited to RAG (retrieval-augmented generation).
Why 2025 marks the beginning of the "multi-era" that will transform how we build A.I. systems.
His powerful framework for building winning A.I. strategies in today's hyper-competitive landscape.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
OpenAI’s “Deep Research”: Get Days of Human Work Done in Minutes
What does Deep Research do?
Read MoreAI Should Make Humans Wiser (But It Isn’t), with Varun Godbole
Today's trippy, brain-stimulating episode features Varun Godbole, a former Google Gemini LLM researcher who’s turned his attention to the future implications of the crazy-fast-moving exponential moment we're in.
Varun:
Spent the past decade doing Deep Learning research at Google, across pure and applied research projects.
For example, he was co-first author of a Nature paper where a neural network beat expert radiologists at detecting tumors.
Also co-authored the Deep Learning Tuning Playbook (that has nearly 30,000 stars on GitHub!) and, more recently, the LLM Prompt Tuning Playbook.
He's worked on engineering LLMs so that they generate code and most recently spent a few years as a core member of the Gemini team at Google.
Holds a degree in Computer Science as well as in Electrical and Electronic Engineering from The University of Western Australia.
Varun mostly keeps today’s episode high-level so it should appeal to anyone who, like me, is trying to wrap their head around how vastly different society could be in a few years or decades as a result of abundant intelligence.
In today’s episode, Varun details:
How human relationship therapy has helped him master A.I. prompt engineering.
Why focusing on A.I. agents so much today might be the wrong approach — and what we should focus on instead.
How the commoditization of knowledge could make wisdom the key differentiator in tomorrow's economy.
Why the future may belong to "full-stack employees" rather than traditional specialized roles.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
In Case You Missed It in February 2025
February was another insane month on my podcast. In addition to having stunning smiles, all four guests I hosted are fascinating, highly knowledgeable experts. Today's episode features highlights of my convos with them.
The specific conversation highlights included in today's episode are:
Professional-athlete-turned-data-engineer Colleen Fotsch on how DBT simplifies data modeling and documentation.
Engineer-turned-entrepreneur Vaibhav Gupta on the new programming language, BAML, he created for AI applications. He details how BAML will save you time and a considerable amount of money when calling LLM APIs.
Professor Frank Hutter on how TabPFN, the first deep learning approach to become the state of the art for modeling tabular data (i.e., the structured rows and columns of data that, until now, deep learning was feeble at modeling).
The ebullient Cal Al-Dhubaib on the keys to scaling (and selling!) a thriving data science consultancy.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
LLMs and Agents Are Overhyped, with Dr. Andriy Burkov
Andriy Burkov's ML books are mega-bestsellers and his newsletter has a wild 900,000 subscribers. He seldom does interviews so don't miss today's episode, in which he takes compelling, contrarian views on LLMs and agents.
More on Dr. Burkov:
His indispensable "100-Page Machine Learning Book" seems to be on *every* data scientist / ML engineer's bookshelf.
He also wrote "ML Engineering" and his latest book, "The 100-Page Language Model Book", was released this year to rave reviews.
His "Artificial Intelligence" newsletter is subscribed to by 900,000 people on LinkedIn.
He's the Machine Learning Lead at TalentNeuron, a global labor-market analytics provider.
He runs his own book-publishing company, True Positive Inc.
Previously held data science / ML roles at Gartner, Fujitsu and more.
Holds a PhD in Computer Science (A.I.) from Université Laval in Quebec, where his doctoral dissertation focused on multi-agent decision-making — 15 years ago!
Despite Dr. Burkov being such a technical individual, most of today’s episode should appeal to anyone interested in A.I. (although some parts here and there will be particularly appealing to hands-on machine-learning practitioners).
In today’s episode, Andriy details:
Why he believes AI agents are destined to fail.
How he managed to create a chatbot that never hallucinates — by deliberately avoiding LLMs.
Why he thinks DeepSeek AI crushed Bay Area A.I. leaders like OpenAI and Anthropic.
What makes human intelligence unique from all other animals and why A.I. researchers need to crack this in order to attain human-level intelligence in machines.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Bringing Back Extinct Animals like the Wooly Mammoth and Dodo Bird
For this week’s Five-Minute Friday-style episode, I’m diving into a biotechnology story I found mind-blowing: bringing back extinct animals like the wooly mammoth and the dodo bird.
Read MoreHow to Grow (and Sell) a Data Science Consultancy, with Cal Al-Dhubaib
Today, my ebullient long-time friend Cal Al-Dhubaib makes his debut on my podcast to spill the beans on how you can launch your own thriving (data science / A.I. / ML) consultancy and, eventually, sell it 💰
Cal:
Is Head of AI & Data Science at Further, a data and A.I. company based in Atlanta that has hundreds of employees.
Previously, he was founder and CEO of Pandata, an Ohio-based A.I. and machine learning consultancy that he grew for over eight years until it was acquired by Further a year ago.
Delivers terrific talks — don’t miss him if you have the chance!
Holds a degree in data science from Case Western Reserve University in Cleveland.
Today’s episode should appeal to any listener, particularly anyone that would like to drive revenue and profitability from data science or AI projects.
In it, Cal covers:
Why his first startup was unsuccessful, but how the experience allowed him to discover an untapped market and build Pandata, a thriving data science consultancy.
His unconventional strategy of requiring clients to make a sizable commitment up front that initially scared away clients but ultimately attracted the best ones.
The way core values inspired by his "tin can to Mars" thought experiment shaped his hiring and company culture.
How making data science "boring", helping his clients trust AI systems and delivering a clear return on investment became his formula for success.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
OpenAI’s o3-mini: SOTA reasoning and exponentially cheaper
Today’s episode will fill you in on everything you need to know about an important model OpenAI recently released to the public called o3-mini.
Read MoreTabPFN: Deep Learning for Tabular Data (That Actually Works!), with Prof. Frank Hutter
I've been teaching Deep Learning for a decade. In that time, countless students have been disappointed by applying DL to tabular and time-series data. Finally, thanks to Prof. Frank Hutter, that will no longer be the case!
Frank:
Is a tenured professor of machine learning and head of the Machine Learning Lab at the University of Freiburg, although he has been on leave since May to focus on…
His fellowship on AutoML and Tabular Foundation Models at the ELLIS Institute Tübingen in Germany…
As well as becoming Co-Founder and CEO of Prior Labs, a German startup that provides a commercial counterpart to his tabular deep-learning model research and open-source projects… and that has just announced a huge €9m pre-seed funding round.
Holds a PhD in Computer Science from The University of British Columbia and his research has been extremely impactful: It has been cited over 87,000 times!
Today’s episode is on the technical side and will largely appeal to hands-on practitioners like data scientists, AI/ML engineers, software developers and statisticians (especially Bayesian statisticians)!
For a bit of context: Pretty much everyone works with tabular data, either primarily or occasionally. Tabular data are data stored in a table format, so structured into rows and columns, where the columns might be different data types, say, some numeric, some categorical and some text. For a decade, deep learning has ushered in the A.I. era by making huge advancements across many kinds of data — pixels from cameras, sound from microphones and of course natural language — but through all of this revolution, deep learning has struggled to be impactful on highly ubiquitous tabular data… until now.
In today’s episode, Prof. Hutter details:
How his revolutionary transformer architecture, TabPFN, has finally cracked the code on using deep learning for tabular data and is outperforming traditionally leading approaches like gradient-boosted trees on tabular datasets.
How version 2 of TabPFN, released last month to much fanfare thanks to its publication in the prestigious journal Nature, is a massive advancement, allowing it to handle orders of magnitude more training data.
How embracing Bayesian principles allowed TabPFN v2 to work "out of the box" on time-series data, beating specialized models and setting a new state of the art on the key time-series analysis benchmark.
The breadth of verticals that TabPFN has already been applied to and how you can now get started with this (conveniently!) open-source project on your tabular data today.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
In Case You Missed It in January 2025
Happy Valentine's Day 💘 ! My high-calorie gift to you is today's episode, which features the best highlights from conversations I had with the (absolutely epic!) guests I hosted on my podcast in January.
The specific conversation highlights included in today's episode are:
Famed futurist Azeem Azhar on how to break your linear mindset to prepare for the exponential technological change that we are experiencing (and will experience even more rapidly in years to come).
Global quantum-computing expert Dr. Florian Neukart on practical, real-world applications of quantum computing today.
Kirill Eremenko and Hadelin de Ponteves — who have together taught over 5 million people data science — with their 12-step checklist for selecting an appropriate foundation model (e.g., large language model) for a given application.
Brooke Hopkins (former engineer at Waymo, now founder and CEO of Y Combinator-backed startup Coval) on why you should evaluate A.I. agents with reference-free metrics.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
From Pro Athlete to Data Engineer: Colleen Fotsch’s Inspiring Journey
Colleen Fotsch won national swimming championships and was a pro athlete in both CrossFit and bobsledding. Now she's excelling at data analytics and engineering! Today, hear her fun, inspiring and practical story.
More on Colleen:
As a collegiate swimmer, she won national championships and set an American record in the relay.
As a pro CrossFit athlete, she twice competed at the “Games”, which is the highest echelon of the sport.
And then she simultaneously pursued a degree in data analytics while training with the US Bobsled team.
An injury ended her Olympic Bobsled team dream, but luckily she’d been pursuing that analytics career in parallel!
She began working full-time as a data analyst four years ago and has now grown into a data-engineering leadership role at a healthcare-staffing firm called CHG Healthcare in Utah, where she serves as Senior Technical Manager of their Data Platform.
Inspires her 280,000 Instagram followers on a daily basis.
Today’s episode essentially has two separate parts:
The first half focuses on Colleen’s exciting journey to the highest levels of three sports: swimming, CrossFit and bobsledding. That part should be fascinating to just about anyone.
The second half covers Colleen’s transition into data analytics and data engineering; that part will appeal to technically-minded listeners, particularly ones considering a career in or early on in a career in analytics or engineering.
In today’s episode, Colleen details:
The connection between a competitive sports mindset and data-career success.
Proven strategies for being hired into your first data role later in your career.
Why being "not smart enough" for coding was a mental block she had to overcome.
How analytics engineering bridges the gap between data engineering and analysis.
The huge benefits deskbound professionals can enjoy by including regular exercise in their week, and tips and tricks for developing or growing an exercise habit.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
DeepSeek R1: SOTA Reasoning at 1% of the Cost
In recent weeks, I’m sure you’ve noticed that there’s been a ton of excitement over DeepSeek, a Chinese A.I. company that was spun out of a Chinese hedge fund just two years ago.
Read MoreBAML: The Programming Language for AI, with Vaibhav Gupta
Today's guest, Vaibhav Gupta, has developed BAML, the programming language for AI. If you are calling LLMs, you've gotta check BAML out for instant accuracy improvements and big (20-30%) cost savings.
More on charming and terrifically brilliant Vaibhav:
Founder & CEO of Boundary (YC W23), a Y Combinator-backed startup that has developed a new programming language (BAML) that makes working with LLMs easier and more efficient for developers.
Across his decade of experience as a software engineer, he built predictive pipelines and real-time computer vision solutions at Google, Microsoft and the renowned hedge fund The D. E. Shaw Group.
Holds a degree in Computer Science and Electrical Engineering from The University of Texas at Austin.
This is a relatively technical episode. The majority of it will appeal to folks who interact with LLMs or other model APIs hands-on with code.
In today’s information-dense episode, Vaibhav details:
How his company pivoted 13 times before settling upon developing a programming language for A.I.
Why creating a programming language was "really dumb" but why it’s turning out to be brilliant, including by BAML already saving companies 20-30% on their AI costs.
Fascinating parallels between today's A.I. tools and the early days of web development.
His unconventional hiring process (I’ve never heard of anything remotely close to it) and the psychology behind why it works.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Are You The Account Executive We’re Looking For?
We’ve never done an episode like today’s… instead of covering a specific data science-related topic, in today’s episode I’m letting you know about a critical role that we’re hiring for on the SuperDataScience Podcast. Perhaps you are the person we’re looking for or you know the person we are looking for!
Read MoreHow to Ensure AI Agents Are Accurate and Reliable, with Brooke Hopkins
Agentic A.I. is powerful because it has infinite breadth of capability. But this is a double-edged sword: Agents entail great risk and testing their performance is tricky... until now — thanks to Brooke Hopkins, today's guest!
Brooke:
Is Founder & CEO of Coval (YC S24), a San Francisco-based startup that provides a simulation and evaluation platform for A.I. agents. A few days ago, they announced a $3.3m fundraise that includes heavyhitter VCs like General Catalyst, MaC and Y Combinator.
Previously was Tech Lead and Senior Software Engineer at Waymo, where she worked on simulation and evaluation for Waymo’s self-driving cars.
Before that, she was a Software Engineer at Google.
She holds a degree in Computer Science and Mathematics from New York University’s Abu Dhabi campus.
Despite Brooke’s highly technical background, our conversation is largely conceptual and high-level, allowing anyone who’s interested in developing and deploying Agentic A.I. applications to enjoy today’s episode.
In today’s episode, Brooke details:
How simulation and testing best practices inspired by autonomous-vehicle development are being applied by her team at Coval to make A.I. agents useful and trustworthy in the real world.
Why voice agents are poised to be the next major platform shift after mobile, creating entirely new ways to interact with technology.
How companies are using creative strategies like "background overthinkers" to make A.I. agents more robust.
What the rise of A.I. agents means for the future of human work and creativity… indeed, how agents will transform all of society.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
The Fastest-Growing Jobs Are AI Jobs
Assessing the fastest-growing job is tricky. For example, using job-posting data isn’t great because there could be lots of duplicate postings out there or a lot of the postings could be going unfilled. Another big issue is defining exactly what a job is: The exact same responsibilities could be associated with the job title “data scientist”, “data engineer” or “ML engineer”, depending on the particular job titles a particular company decides to go with. So, whoever’s evaluating job growth is going to end up bucketing groups of related jobs and responsibilities into one particular, standardized job-title bucket, probably these days in a largely automated, data-driven way; if you dug into individual examples, I’m sure you’d find lots of job-title standardizations you disagreed with but some kind of standardization approach is essential to ensuring identical roles with slightly different job titles get counted as the same thing.
Read MoreExponential Views on AI and Humanity’s Greatest Challenges, with Azeem Azhar
Today, the famed futurist Azeem Azhar eloquently details the exponential forces that are overhauling society — and why A.I. is essential for solving humanity's biggest challenges. This is a special episode; don't miss it!
In case you aren't familiar with his legendary name already, Azeem:
Is creator of the invaluable "Exponential View" newsletter (>100k subscribers).
Hosts the "Exponential View" podcast (well-known guests include Tony Blair and Andrew Ng).
Hosted the Bloomberg TV show "Exponentially" (guests include Sam Altman).
Holds fellowships at Stanford University and Harvard Business School.
Was Founder & CEO of PeerIndex, a venture capital-backed machine-learning startup that was acquired in 2014.
He holds an MA in PPE (Politics, Philosophy and Economics) from the University of Oxford.
Today’s episode will appeal to any listener. In it, Azeem details:
The exponential forces that will overhaul society in the coming decades.
Why AI is essential for solving humanity's biggest challenges.
His own cutting-edge, personal use of A.I. agents, LLMs, and automation.
Why there's no 'solid ground' in the future of work and how we can adapt.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.