This article was adapted from a podcast, which you can listen to or watch here.
One of the most confusing aspects of terminology in the data science world is the distinction between artificial intelligence, machine learning, and deep learning. The media and businesspeople generally use the three terms interchangeably even though they represhere. ent different concepts; even academic experts will sometimes use them inaccurately when they’re not being careful.
Artificial Intelligence
Let’s start with artificial intelligence: “A.I.” is the buzziest, vaguest, and broadest of the three terms. Taking a stab at a technical definition regardless, a decent one is that AI involves a machine processing information from its surrounding environment and then factoring in that information to achieve some desired outcome. Perhaps given this, some consider the goal of AI to be the achievement of “general intelligence” — intelligence as it is generally referred to with respect to broad reasoning and problem-solving capabilities (you can check out SuperDataScience episode #438 for more detail on general intelligence).
In practice and particularly in the popular press, “AI” is used to describe any cutting-edge machine capability. Presently, these capabilities include:
Voice recognition
Describing what’s happening in a video
Question-answering
Driving a car
Industrial robots that mimic human actors in the factory
Dominating humans at “intuition-heavy” board games like Go.
Once an AI capability becomes commonplace (e.g., recognizing handwritten digits, which was cutting-edge in the 1990s), the “AI” moniker is typically dropped by the popular press for that capability such that the goalposts on the definition of AI are always moving.
Machine Learning
Machine learning is a subset of AI alongside other fields like:
Robotics
Software approaches (such as “expert systems”) that are hard-coded and so don’t learn directly from data
Machine learning, in contrast, is a field of computer science concerned with setting up software in a manner so that the software can recognize patterns in data without the programmer needing to explicitly dictate how the software should carry out all aspects of this recognition. That said, the programmer would typically have some insight into or hypothesis about how the problem might be solved, and would thereby provide a rough model framework and relevant data such that the learning software is well-prepared and well-equipped to solve the problem.
Artificial Neural Networks
Before we can dig into what deep learning is, I first need to introduce the term artificial neural networks (ANNs). Artificial neurons are simple algorithms inspired by biological brain cells, especially in the sense that individual neurons—whether biological or artificial—receive input from many other neurons, perform some computation, and then produce a single output.
An artificial neural network, then, is a collection of artificial neurons arranged so that they send and receive information between each other. Data (e.g., images of cats and dogs) are fed into an ANN, which processes these data in some way with the goal of producing some desired result (e.g., a guess as to whether the image is a cat or a dog), so ANNs are one type of machine learning approach from the many types of machine learning approaches available today.
Deep Learning
Now that we know what an ANN is, deep learning is fairly straightforward to define: A machine learning approach that involves an ANN composed of at least a few separate layers of artificial neurons can be called a deep learning network. More specifically, deep learning networks have a total of five or more layers with the following structure:
A single input layer that is reserved for the data being fed into the network.
Three or more so-called “hidden” layers that can represent the inputs in increasingly complex, increasingly abstract ways as we add more and more of them.
A single output layer that is reserved for the values (e.g., predictions) that the network yields.
With each successive layer in the network being able to represent increasingly abstract, nonlinear recombinations of the previous layers, deep learning models with fewer than a dozen layers of artificial neurons are often sufficient for learning to make accurate predictions with a given dataset. That said, deep learning networks with hundreds or even upwards of a thousand layers have in occasional circumstances been demonstrated to provide value.
As rapidly improving accuracy benchmarks and countless competition wins in the past decade have demonstrated, the deep learning approach to modeling data excels at a broad range of machine learning tasks. Indeed, with deep learning driving so much of the contemporary progress in AI capabilities, It’s no surprise that we see “deep learning” and “artificial intelligence” used so interchangeably by the popular press — and even by experts who should know better!
If you’d like to learn more about A.I., machine learning, and deep learning, I recommend checking out my book, Deep Learning Illustrated. Much of this blog post was inspired by content from Chapter 4.