Machine Learning and Data Science Resources
Machine Learning Foundations
Four subject areas provide strong foundations for understanding and applying machine learning theory: linear algebra, calculus, probability/statistics, and computer science. For my comprehensive curriculum covering all of these subject areas, check out my Courses page or my Machine Learning Foundations GitHub repository. My favorite resources on these subjects areas, largely from other folks, are immediately below.
Linear Algebra
3Blue1Brown on YouTube
Ch. 2 of Goodfellow et al. (2016) Deep Learning (free)
Ch. 2 of Deisenroth et al. (2020) Mathematics for ML
Sheldon Axler’s (2015) Linear Algebra Done Right
Calculus
3Blue1Brown on YouTube
Differential calculus: Chapter 6 of Deisenroth et al. (2020) Mathematics for ML
Integral calculus: Appendix 18.5 of Zhang et al.’s (2019) Dive into Deep Learning
Probability & Statistics
My Probability and Statistics for ML course (O’Reilly)
Jaynes (2003) Probability Theory (free)
Wasserman (2004) All of Statistics
Gelman & Hill (2006) Data Analysis Using Regression and Multilevel/Hierarchical Models
Computer Science
My Data Structures, Algorithms, and Machine Learning Optimization course (O’Reilly)
Udacity Data Structures & Algorithms in Python course (free)
Sedgewick & Wayne (2011) Algorithms (free)
Bhargava (2016) Grokking Algorithms
Classic DSA computer science problems:
Sebastian Ruder’s (2016) optimization blog post
Machine Learning
You can hop straight into applying machine learning without mastering the foundational subjects (listed above) first. Indeed, this can be a fun approach to learning ML because you can become familiar with what ML can do at a high level prior to getting into the nitty-gritty of the underlying mathematics and probability. The best book for jumping straight into applications is Aurélien Géron's Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, which I had the great pleasure of reviewing and editing.
If however, you’re already comfortable with the mathematical and probabilistic foundational subjects, my favorite ML books are:
Hastie et al. (2009) Elements of Statistical Learning (2nd ed.)
Bishop (2006) Pattern Recognition and Machine Learning (free)
Murphy (2021-2) Probabilistic Machine Learning
Deep Learning
First Steps in Deep Learning
Deep learning is a specialized field within machine learning. Traditionally, one would already be comfortable with machine learning before getting into it. Modern deep learning libraries, however, make learning about artificial neural networks easy — even if you aren’t too familiar with ML or the foundational mathematical subjects underlying it (see sections above). I wrote my book Deep Learning Illustrated to be the best-possible resource for folks getting started with neural networks and artificial intelligence, including if you haven’t studied much linear algebra, calculus, probability theory, or ML before.
Based on my book, I have also published 18 hours of interactive introductory tutorials:
The notebooks of code built over the course of the videos are available for free in GitHub. In addition, I offer a comprehensive, 30-hour Deep Learning course at the NYC Data Science Academy if you like the structure and personal nature of the in-classroom experience.
Otherwise, get a lay of the land from:
the sequence of courses suggested by Greg Brockman
this (more comprehensive) introductory resource post from Ofir Press
this (even more comprehensive) guide from YerevaNN Research Lab
Deep Learning Books
Relative to viewing lectures, I prefer reading and working through problems. Beyond my own book, the stand-out resources for this, in the order I recommend tackling them are:
Michael Nielsen's e-book Neural Networks and Deep Learning
Aurélien Géron's Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow
the Deep Learning textbook by Goodfellow, Bengio and Courville
Interactive Deep Learning Demos
Top-drawer interactive demos you can develop an intuitive sense of neural networks from are provided by:
Distill, the academic publication for visualising machine learning research
the illustrious Andrej Karpathy
fun, concise, browser-based (i.e., JavaScript) self-driving cars
ML-Showcase, a curated collection of remarkable deep-learning focused demos
...in addition, I've curated introductory Jupyter notebooks across the popular libraries TFLearn, Keras, Theano, and TensorFlow here
Applications of Deep Learning
Scroll further down the page down to see my recommendations for high-quality data sources as well as global issues in need of solutions. Problems worth solving with deep learning approaches in particular are curated by OpenAI. In addition, if you're at the stage that you'd like to test a deep reinforcement learning algorithm across a range of applications (e.g., games), work with:
Open AI's Gym
SLM Lab for running (deep) reinforcement learning experiments
the DeepMind Lab
Time Series Prediction, e.g., Financial Applications
comprehensive, LONG intro to deep learning for stock-price prediction from Boris Banushev
pragmatic intro from Thomas Ebermann
intro to trading with deep learning from Neven Pičuljan
simple deep learning model for time series prediction from Sebastian Heinz
time series prediction with LSTMs from Jason Brownlee
ditto from Jakob Aungiers
ditto while incorporating classic strategies from Alex Honchar
intro to trading with deep reinforcement learning from Denny Britz
Transformers
comprehensive “recipe” of lectures and key papers to enable you to understand transformer architectures deeply
Academic Deep Learning Papers
If you're looking for the latest deep learning research, check out:
Flood Sung's roadmap for deep learning papers
PapersWithCode lists state-of-the-art machine-learning papers by category, with corresponding code
Adit Deshpande's list of nine key papers
this thorough, subcategorised reading list
Karpathy's arXiv Sanity Preserver
GitXiv for open-source implementations of popular arXiv papers
Deep Learning Hardware
Here is the part list for a deep learning server that I built.
Cloud Infrastructure for quickly scripting and training Deep Learning models
Histories of Deep Learning
deep learning history Explored Through Six Code Snippets
The Future of Deep Learning
Insights into emerging trends from Nathan Benaich
Podcasts
I’m privileged to host the SuperDataScience podcast, which airs twice a week and has over 10k listeners per episode. Along with inspiring guests from a broad range of career backgrounds, we focus on the latest in machine learning and data science across both academia and industry. We have content appropriate for any listener, whether you’re simply curious about A.I. or a deep technical expert.
In 2020, I piloted four episodes of a lighthearted AI/ML news show called A4N: the Artificial Neural Network News Network. It was a ton of fun and someday we may record more episodes but for the foreseeable future I’ll be consumed by the SuperDataScience show.
Shivam Rana put together a beautifully well-organized website of data science podcasts called DSPods, so you can check that out for other shows, whatever you’re looking for.
YouTube Channels
I publish videos on machine learning, deep learning, and math for ML on my YouTube channel.
For a compilation of Awesome YouTube Channels for ML, deep learning, and related subjects, check out Benedict Neo Yao En’s GitHub repo.
Open Data Sources
To train a powerful model, the larger the data set, the better -- if it's well-organised and open, that's ideal. The following repositories are standouts that meet all these criteria:
the Internet Archive, a library of millions of free books, movies, software, music, websites, and more
Data.gov (home of >150k US government-related datasets),
Govcode, a collection of government open source projects,
the Open Data Stack Exchange, and
this curated list of 'awesome' public datasets
this well-annotated list of data sets for natural language processing
the Data is Plural blog
for biomedical and health data specifically, check out:
this University of Minnesota resource
this Medical Data for Machine Learning GitHub repo
For machine learning models that require a lot of labelled data, check out:
Yahoo's massive 13TB data set comprised of 100 billion user interactions with news items
Luke de Oliveira's Greatest Public Datasets for AI blog post
CrowdFlower's Data for Everyone
If none of the above data sources suit your needs, Google provides a dataset-specific search tool.
Problems Worth Solving
List of (prospective) socially-beneficial applications of artificial intelligence, from the McKinsey Global Institute
Data science is critical to monitoring the environment of Earth. This article summarises relevant datasets, projects and research on the topic.
Tackling Climate Change with Machine Learning — a 2019 paper co-authored by many heavy-hitting machine and deep learning experts worldwide
List of the most urgent global issues from Benjamin Todd at 80,000 Hours
James Martin's 16 Megaproblems of the 21st Century
the Millenium Project's 15 Global Challenges Facing Humanity
the United Nations' Global Issues
Machine Learning and Data Science Applications in Industry — a GitHub repo
...or, for problems that are more narrowly defined, here's a list
Medical Applications of Deep Learning
High-performance medicine: the convergence of human and artificial intelligence from Nature Medicine in January 2019
Charitable Projects
DataKind is a well-respected platform for finding humanitarian causes to apply your data science skills to.
AI for Good provides opportunities to tackle the UN’s sustainable development goals with data and ML.
General Data Scientist Tools
As initially outlined in my post on Data Scientist Skills and Salaries, here is a list of key data science tools. With a focus on coding in Python wherever possible, they are:
PostgreSQL (you can practise queries at SQLZOO)
Jupyter notebooks
It's also helpful to develop familiarity with:
R, with this list of packages as a helpful reference
HTML / CSS / JavaScript / D3
Note that these tools generally appear in the open-source Hadoop cluster in the O'Reilly Data Science Salary Survey. Based on demand and relative compensation, it appears that valuable next steps to becoming a unicorn-variety data scientist would be to equip oneself with distributed computing tools (e.g., Spark) and model deployment skills (e.g., software engineering).
Fun Online Primers for Data Science Techniques
Seeing Theory: A Visual Introduction to Probability and Statistics
A Visual Introduction to Conditional Probability (and loads of other interactive single-screen tutorials)
Machine Learning From Scratch: down-to-the-fundamentals GitHub repo of common supervised and unsupervised learning techniques
3Blue1Brown's wide range of Animated Math videos, including on Neural Networks
Lay Primers on Software and Artificial Intelligence
MIT Technology Review of Deep Learning (focused on Yann LeCun)
MIT Technology Review of AI History (focused on Geoff Hinton)
Software 2.0, The Rise of Artificial Intelligence and the End of Code
AI Revolution: The Road to Superintelligence by the wonderful Tim Urban
Excellent Lay Books on Math/Stats
Meetups
News
The Economist (they are particularly adept science writers, e.g., on AI and Deep Learning NLP)
References
Google Developers' Machine Learning Glossary
Clarity and Productivity
Headspace: daily mindfulness practice
Muse electroencephalogram headband
RescueTime: track and log how all of your time is spent
Center for Humane Technology: avoid addiction to your digital tools
James Clear: actionable, well-researched writing on becoming a better human
Deep Work: train yourself to avoid the easy, shallow work and tackle meaningful, challenging objectives
Pomodoro Technique: be maximally productive in 25-minute intervals
Things: for task-management
viewpure: remove everything from the periphery of the YouTube video you're watching
uBlock Origin: to block ads (with filter ###hot-network-questions to cut distractions on Stack Exchange)
help page for using the number pad on Apple keyboards within Terminal
Herman Miller Aeron chair
evidence-based advice for being successful in any job (and in life!) from 80,000 Hours
List of Additional Tools
LaTeX for creating beautiful documents, including Beamer for slideshows and Pandoc for conversion to countless other formats (e.g., word processor formats for sharing with coworkers)
I love the Mathematica-based Wolfram Alpha web interface for learning about mathematical concepts interactively
Plotly is a free, easy-to-use GUI for collaboratively creating aesthetically-pleasing visualisations
Eudaemonia
For a life of flourishing -- a life of beauty, truth, justice, play and love -- choose mathematics.