Filtering by Category: SuperDataScience

How AI is Transforming Baseball (with Lessons For All of Us)

Added on March 28, 2025 by Jon Krohn.

Baseball has always been a game of numbers. For decades, teams have pored over stats like batting averages and ERAs to gain an edge. But in recent years, artificial intelligence has taken baseball analytics to new heights. In today’s episode, we’ll explore how AI is revolutionizing baseball – from scouting and player performance to in-game strategy and even fan experience – and what that means for the future of sports and other industries.

Become Your Best Self Through AI Augmentation — feat. Natalie Monbiot

Added on March 25, 2025 by Jon Krohn.

The deep-thinking and highly articulate Natalie Monbiot returns to my podcast today for a can't-miss episode (one of my favorite convos ever) on how A.I. will overhaul our lives, our work, our society in the coming years.

Microsoft’s “Majorana 1” Chip Brings Quantum ML Closer

Added on March 24, 2025 by Jon Krohn.

Microsoft’s Majorana 1 is a newly unveiled quantum computing chip that marks a major breakthrough in the quest for practical quantum computers. It’s the world’s first quantum processor built on a so-called Topological Core architecture – meaning it uses topological qubits (based on exotic Majorana particles that I’ll dig into more shortly) instead of the fragile qubits found in today’s machines. Microsoft believes this innovation could accelerate the timeline for solving real-world, industrial-scale problems with quantum computing from “decades” to just a few years.

NoSQL Is Ideal for AI Applications, with MongoDB’s Richmond Alake

Added on March 18, 2025 by Jon Krohn.

In today's episode (#871), I'm joined by the gifted writer, speaker and ML developer Richmond Alake, who details what NoSQL databases are and why they're ideally suited for A.I. applications.

Richmond:

Is Staff Developer Advocate for AI and Machine Learning at MongoDB, a huge publicly-listed database company with over 5000 employees and over a billion dollars in annual revenue.
With Andrew Ng, he co-developed the DeepLearning.AI course “Prompt Compression and Query Optimization” that has been undertaken by over 13,000 people since its release last year.
Has delivered his courses on Coursera, DataCamp, and O'Reilly.
Authored 200+ technical articles with over a million total views, including as a writer for NVIDIA.
Previously held roles as an ML Architect, Computer Vision Engineer and Web Developer at a range of London-based companies.
Holds a Master’s in computer vision, machine learning and robotics from The University of Surrey in the UK.

Today's episode (filmed in-person at MongoDB's London HQ!) will appeal most to hands-on practitioners like data scientists, ML engineers and software developers, but Richmond does a stellar job of introducing technical concepts so any interested listener should enjoy the episode.

In today’s episode, Richmond details:

How NoSQL databases like MongoDB differ from relational, SQL-style databases.
Why NoSQL databases like MongoDB are particularly well-suited for developing modern A.I. applications, including Agentic A.I. applications.
How Mongo incorporates a native vector database, making it particularly well-suited to RAG (retrieval-augmented generation).
Why 2025 marks the beginning of the "multi-era" that will transform how we build A.I. systems.
His powerful framework for building winning A.I. strategies in today's hyper-competitive landscape.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

OpenAI’s “Deep Research”: Get Days of Human Work Done in Minutes

Added on March 17, 2025 by Jon Krohn.

What does Deep Research do?

AI Should Make Humans Wiser (But It Isn’t), with Varun Godbole

Added on March 11, 2025 by Jon Krohn.

Today's trippy, brain-stimulating episode features Varun Godbole, a former Google Gemini LLM researcher who’s turned his attention to the future implications of the crazy-fast-moving exponential moment we're in.

Varun:

Spent the past decade doing Deep Learning research at Google, across pure and applied research projects.
For example, he was co-first author of a Nature paper where a neural network beat expert radiologists at detecting tumors.
Also co-authored the Deep Learning Tuning Playbook (that has nearly 30,000 stars on GitHub!) and, more recently, the LLM Prompt Tuning Playbook.
He's worked on engineering LLMs so that they generate code and most recently spent a few years as a core member of the Gemini team at Google.
Holds a degree in Computer Science as well as in Electrical and Electronic Engineering from The University of Western Australia.

Varun mostly keeps today’s episode high-level so it should appeal to anyone who, like me, is trying to wrap their head around how vastly different society could be in a few years or decades as a result of abundant intelligence.

In today’s episode, Varun details:

How human relationship therapy has helped him master A.I. prompt engineering.
Why focusing on A.I. agents so much today might be the wrong approach — and what we should focus on instead.
How the commoditization of knowledge could make wisdom the key differentiator in tomorrow's economy.
Why the future may belong to "full-stack employees" rather than traditional specialized roles.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

In Case You Missed It in February 2025

Added on March 9, 2025 by Jon Krohn.

February was another insane month on my podcast. In addition to having stunning smiles, all four guests I hosted are fascinating, highly knowledgeable experts. Today's episode features highlights of my convos with them.

The specific conversation highlights included in today's episode are:

Professional-athlete-turned-data-engineer Colleen Fotsch on how DBT simplifies data modeling and documentation.
Engineer-turned-entrepreneur Vaibhav Gupta on the new programming language, BAML, he created for AI applications. He details how BAML will save you time and a considerable amount of money when calling LLM APIs.
Professor Frank Hutter on how TabPFN, the first deep learning approach to become the state of the art for modeling tabular data (i.e., the structured rows and columns of data that, until now, deep learning was feeble at modeling).
The ebullient Cal Al-Dhubaib on the keys to scaling (and selling!) a thriving data science consultancy.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

LLMs and Agents Are Overhyped, with Dr. Andriy Burkov

Added on March 4, 2025 by Jon Krohn.

Andriy Burkov's ML books are mega-bestsellers and his newsletter has a wild 900,000 subscribers. He seldom does interviews so don't miss today's episode, in which he takes compelling, contrarian views on LLMs and agents.

Bringing Back Extinct Animals like the Wooly Mammoth and Dodo Bird

Added on March 3, 2025 by Jon Krohn.

For this week’s Five-Minute Friday-style episode, I’m diving into a biotechnology story I found mind-blowing: bringing back extinct animals like the wooly mammoth and the dodo bird.

How to Grow (and Sell) a Data Science Consultancy, with Cal Al-Dhubaib

Added on February 25, 2025 by Jon Krohn.

Today, my ebullient long-time friend Cal Al-Dhubaib makes his debut on my podcast to spill the beans on how you can launch your own thriving (data science / A.I. / ML) consultancy and, eventually, sell it 💰

Cal:

Is Head of AI & Data Science at Further, a data and A.I. company based in Atlanta that has hundreds of employees.
Previously, he was founder and CEO of Pandata, an Ohio-based A.I. and machine learning consultancy that he grew for over eight years until it was acquired by Further a year ago.
Delivers terrific talks — don’t miss him if you have the chance!
Holds a degree in data science from Case Western Reserve University in Cleveland.

Today’s episode should appeal to any listener, particularly anyone that would like to drive revenue and profitability from data science or AI projects.

In it, Cal covers:

Why his first startup was unsuccessful, but how the experience allowed him to discover an untapped market and build Pandata, a thriving data science consultancy.
His unconventional strategy of requiring clients to make a sizable commitment up front that initially scared away clients but ultimately attracted the best ones.
The way core values inspired by his "tin can to Mars" thought experiment shaped his hiring and company culture.
How making data science "boring", helping his clients trust AI systems and delivering a clear return on investment became his formula for success.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

OpenAI’s o3-mini: SOTA reasoning and exponentially cheaper

Added on February 22, 2025 by Jon Krohn.

Today’s episode will fill you in on everything you need to know about an important model OpenAI recently released to the public called o3-mini.

TabPFN: Deep Learning for Tabular Data (That Actually Works!), with Prof. Frank Hutter

Added on February 18, 2025 by Jon Krohn.

I've been teaching Deep Learning for a decade. In that time, countless students have been disappointed by applying DL to tabular and time-series data. Finally, thanks to Prof. Frank Hutter, that will no longer be the case!

Frank:

Is a tenured professor of machine learning and head of the Machine Learning Lab at the University of Freiburg, although he has been on leave since May to focus on…
His fellowship on AutoML and Tabular Foundation Models at the ELLIS Institute Tübingen in Germany…
As well as becoming Co-Founder and CEO of Prior Labs, a German startup that provides a commercial counterpart to his tabular deep-learning model research and open-source projects… and that has just announced a huge €9m pre-seed funding round.
Holds a PhD in Computer Science from The University of British Columbia and his research has been extremely impactful: It has been cited over 87,000 times!

Today’s episode is on the technical side and will largely appeal to hands-on practitioners like data scientists, AI/ML engineers, software developers and statisticians (especially Bayesian statisticians)!

For a bit of context: Pretty much everyone works with tabular data, either primarily or occasionally. Tabular data are data stored in a table format, so structured into rows and columns, where the columns might be different data types, say, some numeric, some categorical and some text. For a decade, deep learning has ushered in the A.I. era by making huge advancements across many kinds of data — pixels from cameras, sound from microphones and of course natural language — but through all of this revolution, deep learning has struggled to be impactful on highly ubiquitous tabular data… until now.

In today’s episode, Prof. Hutter details:

How his revolutionary transformer architecture, TabPFN, has finally cracked the code on using deep learning for tabular data and is outperforming traditionally leading approaches like gradient-boosted trees on tabular datasets.
How version 2 of TabPFN, released last month to much fanfare thanks to its publication in the prestigious journal Nature, is a massive advancement, allowing it to handle orders of magnitude more training data.
How embracing Bayesian principles allowed TabPFN v2 to work "out of the box" on time-series data, beating specialized models and setting a new state of the art on the key time-series analysis benchmark.
The breadth of verticals that TabPFN has already been applied to and how you can now get started with this (conveniently!) open-source project on your tabular data today.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

In Case You Missed It in January 2025

Added on February 14, 2025 by Jon Krohn.

Happy Valentine's Day 💘 ! My high-calorie gift to you is today's episode, which features the best highlights from conversations I had with the (absolutely epic!) guests I hosted on my podcast in January.

The specific conversation highlights included in today's episode are:

Famed futurist Azeem Azhar on how to break your linear mindset to prepare for the exponential technological change that we are experiencing (and will experience even more rapidly in years to come).
Global quantum-computing expert Dr. Florian Neukart on practical, real-world applications of quantum computing today.
Kirill Eremenko and Hadelin de Ponteves — who have together taught over 5 million people data science — with their 12-step checklist for selecting an appropriate foundation model (e.g., large language model) for a given application.
Brooke Hopkins (former engineer at Waymo, now founder and CEO of Y Combinator-backed startup Coval) on why you should evaluate A.I. agents with reference-free metrics.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

From Pro Athlete to Data Engineer: Colleen Fotsch’s Inspiring Journey

Added on February 11, 2025 by Jon Krohn.

Colleen Fotsch won national swimming championships and was a pro athlete in both CrossFit and bobsledding. Now she's excelling at data analytics and engineering! Today, hear her fun, inspiring and practical story.

DeepSeek R1: SOTA Reasoning at 1% of the Cost

Added on February 7, 2025 by Jon Krohn.

In recent weeks, I’m sure you’ve noticed that there’s been a ton of excitement over DeepSeek, a Chinese A.I. company that was spun out of a Chinese hedge fund just two years ago.

BAML: The Programming Language for AI, with Vaibhav Gupta

Added on February 4, 2025 by Jon Krohn.

Today's guest, Vaibhav Gupta, has developed BAML, the programming language for AI. If you are calling LLMs, you've gotta check BAML out for instant accuracy improvements and big (20-30%) cost savings.

More on charming and terrifically brilliant Vaibhav:

Founder & CEO of Boundary (YC W23), a Y Combinator-backed startup that has developed a new programming language (BAML) that makes working with LLMs easier and more efficient for developers.
Across his decade of experience as a software engineer, he built predictive pipelines and real-time computer vision solutions at Google, Microsoft and the renowned hedge fund The D. E. Shaw Group.
Holds a degree in Computer Science and Electrical Engineering from The University of Texas at Austin.

This is a relatively technical episode. The majority of it will appeal to folks who interact with LLMs or other model APIs hands-on with code.

In today’s information-dense episode, Vaibhav details:

How his company pivoted 13 times before settling upon developing a programming language for A.I.
Why creating a programming language was "really dumb" but why it’s turning out to be brilliant, including by BAML already saving companies 20-30% on their AI costs.
Fascinating parallels between today's A.I. tools and the early days of web development.
His unconventional hiring process (I’ve never heard of anything remotely close to it) and the psychology behind why it works.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

Are You The Account Executive We’re Looking For?

Added on January 31, 2025 by Jon Krohn.

We’ve never done an episode like today’s… instead of covering a specific data science-related topic, in today’s episode I’m letting you know about a critical role that we’re hiring for on the SuperDataScience Podcast. Perhaps you are the person we’re looking for or you know the person we are looking for!

How to Ensure AI Agents Are Accurate and Reliable, with Brooke Hopkins

Added on January 28, 2025 by Jon Krohn.

Agentic A.I. is powerful because it has infinite breadth of capability. But this is a double-edged sword: Agents entail great risk and testing their performance is tricky... until now — thanks to Brooke Hopkins, today's guest!

Brooke:

Is Founder & CEO of Coval (YC S24), a San Francisco-based startup that provides a simulation and evaluation platform for A.I. agents. A few days ago, they announced a $3.3m fundraise that includes heavyhitter VCs like General Catalyst, MaC and Y Combinator.

Previously was Tech Lead and Senior Software Engineer at Waymo, where she worked on simulation and evaluation for Waymo’s self-driving cars.
Before that, she was a Software Engineer at Google.
She holds a degree in Computer Science and Mathematics from New York University’s Abu Dhabi campus.

Despite Brooke’s highly technical background, our conversation is largely conceptual and high-level, allowing anyone who’s interested in developing and deploying Agentic A.I. applications to enjoy today’s episode.

In today’s episode, Brooke details:

How simulation and testing best practices inspired by autonomous-vehicle development are being applied by her team at Coval to make A.I. agents useful and trustworthy in the real world.
Why voice agents are poised to be the next major platform shift after mobile, creating entirely new ways to interact with technology.
How companies are using creative strategies like "background overthinkers" to make A.I. agents more robust.
What the rise of A.I. agents means for the future of human work and creativity… indeed, how agents will transform all of society.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

The Fastest-Growing Jobs Are AI Jobs

Added on January 24, 2025 by Jon Krohn.

Assessing the fastest-growing job is tricky. For example, using job-posting data isn’t great because there could be lots of duplicate postings out there or a lot of the postings could be going unfilled. Another big issue is defining exactly what a job is: The exact same responsibilities could be associated with the job title “data scientist”, “data engineer” or “ML engineer”, depending on the particular job titles a particular company decides to go with. So, whoever’s evaluating job growth is going to end up bucketing groups of related jobs and responsibilities into one particular, standardized job-title bucket, probably these days in a largely automated, data-driven way; if you dug into individual examples, I’m sure you’d find lots of job-title standardizations you disagreed with but some kind of standardization approach is essential to ensuring identical roles with slightly different job titles get counted as the same thing.

Exponential Views on AI and Humanity’s Greatest Challenges, with Azeem Azhar

Added on January 21, 2025 by Jon Krohn.

Today, the famed futurist Azeem Azhar eloquently details the exponential forces that are overhauling society — and why A.I. is essential for solving humanity's biggest challenges. This is a special episode; don't miss it!

In case you aren't familiar with his legendary name already, Azeem:

Is creator of the invaluable "Exponential View" newsletter (>100k subscribers).
Hosts the "Exponential View" podcast (well-known guests include Tony Blair and Andrew Ng).
Hosted the Bloomberg TV show "Exponentially" (guests include Sam Altman).
Holds fellowships at Stanford University and Harvard Business School.
Was Founder & CEO of PeerIndex, a venture capital-backed machine-learning startup that was acquired in 2014.
He holds an MA in PPE (Politics, Philosophy and Economics) from the University of Oxford.

Today’s episode will appeal to any listener. In it, Azeem details:

The exponential forces that will overhaul society in the coming decades.
Why AI is essential for solving humanity's biggest challenges.
His own cutting-edge, personal use of A.I. agents, LLMs, and automation.
Why there's no 'solid ground' in the future of work and how we can adapt.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.