I've been teaching Deep Learning for a decade. In that time, countless students have been disappointed by applying DL to tabular and time-series data. Finally, thanks to Prof. Frank Hutter, that will no longer be the case!
Frank:
Is a tenured professor of machine learning and head of the Machine Learning Lab at the University of Freiburg, although he has been on leave since May to focus on…
His fellowship on AutoML and Tabular Foundation Models at the ELLIS Institute Tübingen in Germany…
As well as becoming Co-Founder and CEO of Prior Labs, a German startup that provides a commercial counterpart to his tabular deep-learning model research and open-source projects… and that has just announced a huge €9m pre-seed funding round.
Holds a PhD in Computer Science from The University of British Columbia and his research has been extremely impactful: It has been cited over 87,000 times!
Today’s episode is on the technical side and will largely appeal to hands-on practitioners like data scientists, AI/ML engineers, software developers and statisticians (especially Bayesian statisticians)!
For a bit of context: Pretty much everyone works with tabular data, either primarily or occasionally. Tabular data are data stored in a table format, so structured into rows and columns, where the columns might be different data types, say, some numeric, some categorical and some text. For a decade, deep learning has ushered in the A.I. era by making huge advancements across many kinds of data — pixels from cameras, sound from microphones and of course natural language — but through all of this revolution, deep learning has struggled to be impactful on highly ubiquitous tabular data… until now.
In today’s episode, Prof. Hutter details:
How his revolutionary transformer architecture, TabPFN, has finally cracked the code on using deep learning for tabular data and is outperforming traditionally leading approaches like gradient-boosted trees on tabular datasets.
How version 2 of TabPFN, released last month to much fanfare thanks to its publication in the prestigious journal Nature, is a massive advancement, allowing it to handle orders of magnitude more training data.
How embracing Bayesian principles allowed TabPFN v2 to work "out of the box" on time-series data, beating specialized models and setting a new state of the art on the key time-series analysis benchmark.
The breadth of verticals that TabPFN has already been applied to and how you can now get started with this (conveniently!) open-source project on your tabular data today.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.