This article was originally adapted from a podcast, which you can check out here.
For last week’s Five-Minute Friday episode, I provided a summary of the various methods of undertaking my deep learning curriculum, be it via YouTube, my book, or the associated repository of GitHub code. I mentioned at the end of the episode that while teaching this deep learning content to students online and in-person, I discovered that many folks could use a primer on the foundational subjects that underlie machine learning in general and deep learning in particular. So after publishing all my deep learning content, I set to work on creating content that covers these subjects that are critical to understanding machine learning expertly — namely, those subjects are linear algebra, calculus, probability, statistics, and computer science.
Way back in Episode #474 of this podcast, I detailed why these particular subject areas form the sturdy foundations of what I call the Machine Learning House. As a quick recap, the idea is that to be an outstanding data scientist or ML engineer, it doesn't suffice to only know how to use machine learning algorithms via the abstract interfaces that the most popular libraries (e.g., scikit-learn, Keras) provide. To train innovative models or deploy them to run performantly in production, an in-depth appreciation of machine learning theory may be helpful — or even essential. To cultivate such an in-depth appreciation of ML, one must possess a working understanding of the foundational subjects, which again are linear algebra, calculus, probability, stats, and computer science:
When the foundations of the Machine Learning House are firm, it also makes it much easier to make the jump from general ML principles (the ground floor of the house) to specialized ML domains (the upper floor of the house) such as deep learning, natural language processing, machine vision, and reinforcement learning. This is because, the more specialized the application, the more likely its details for implementation are available only in academic papers or graduate-level textbooks, either of which typically assume an understanding of these foundational subjects.
So, that’s an introduction to what my ML Foundations content covers and why. Unlike the deep learning content that I provided an overview of last week, my ML Foundations content is still under development. But, also unlike some components of my deep learning curriculum, my entire ML Foundations curriculum will eventually be available for free.
Conveniently, you can check out the current state of affairs for my ML Foundations content in one place by visiting the Where and When section of my ML Foundations GitHub repo. On that note, all of the code for this curriculum is already complete and is available as open-source Jupyter notebooks within that GitHub repo.
The notebooks of code, however, are not intended to stand alone. They are intended to be accompanied by my lectures, which I first offered online via the O’Reilly learning platform in 2020. At the time, the world was under strict lockdown and with lots of data scientists and engineers stuck at home with seemingly nothing better to do than hang out with me online, these lectures ended up being some of the most popular lectures in the history of O’Reilly, with over a thousand students registering for each of them. That was an exhilarating experience and a welcome distraction from the pandemic for me too.
In June of last year, the O’Reilly platform also became the first place where you could undertake my entire ML Foundations curriculum outside of live lectures. Specifically, I broke the curriculum up into the four subject areas:
Six hours on Linear Algebra for ML
Six hours on Calculus for ML
Nine hours on Probability and Statistics for ML
Six hours on Data Structures, Algorithms, and Machine Learning Optimization
To access this content, if you don’t already have an O’Reilly subscription personally or through your employer, you can get a free seven-day trial to check it out. And, I’m currently working with O’Reilly to obtain free 30-day trials for SuperDataScience listeners so stay tuned for that.
However, remember that I did say that my ML Foundations curriculum would be available for free. For that, I’m recording my own personal version of all of the videos at home (the content available via O’Reilly was recorded in a professional studio with full-time professional staff) and releasing it on YouTube. All of the Linear Algebra content is already live in a playlist and the Calculus content will be finished when we publish the final video to the playlist of over 50 videos next week. After that, we’ll start publishing the Probability videos and we won’t stop until my entire ML Foundations curriculum is freely available on YouTube.
That said, if you do feel like supporting my YouTube effort, you can buy my ML Foundations course on Udemy, which is often available at a deep discount; in US-dollar terms, it should be easy to spot a sale and grab it for under $20.
Finally, I recently began work on a book version of this content. There’s so much ML Foundations material to cover that I need to break it up into several books. The first book will be called the Mathematical Foundations of Machine Learning and focus primarily on the Linear Algebra and Calculus subject areas. Pearson, the world’s largest publisher of university textbooks, will be publishing it and chapters should start to become available this year. If you’d like to stay up to date on book-release details and anything else I’m working on — be it code, YouTube videos, live in-person lectures, or podcast episodes — you can sign up for my email newsletter on jonkrohn.com.
All right, that’s it for today. Keep on rockin’ it out there; I’m looking forward to catching you on another round of SuperDataScience very soon.