This article was adapted from a podcast. Listening/viewing options, as well as a full transcript, available here.
At the beginning of 2021, I asked the following on Twitter: “What questions do you have about machine learning as a science or as a career?”
In response, I was asked some terrific questions about data science, many of which are popular ones that I’ve been asked time and again. In today’s FiveMinuteFriday, I’ll answer the ones I thought would be most valuable for everyone to hear the answer to.
Gabriel, who appears to be Brazillian, but indicates his location is “Lost + Found” asked me:
“Is a career in data science really future-proof? What are the odds of another AI winter and a crisis in this career?”
As a lifelong student of probability, I’m never 100% sure of anything especially if it’s a prediction of the future, but with the proliferation of sensors enabling the amount of data stored on the planet to double about every 18 months, I think a career in data science is about the safest bet out there. The issue will only be amplified further in the coming years by the 5G “Internet of Things”. More and more of the data that gets stored is noise, so data scientists should be ever-more critical to distill meaningful signals from the noise and drive commercial value with data.
Given this abundance of data, it will be more and more important to engineer machine learning pipelines, so I do recommend data scientists develop an understanding of software engineering best-practices, including algorithms and data structures. That should help you stay totally future-proofed.
All of that said, I do think that AI is currently a bit overhyped. But I don’t think we’re going to have an AI winter like we had in, say, the 1980s. This time is different because there are so many more sensors, global connectivity, data, and cheap processors than ever before. I don’t think data science will become obsolete, but as investors and consumers realize some of their expectations of AI are driven by Hollywood and marketing hype, there may be an AI “autumn”. For more on that, check out The Economist’s special Technology Quarterly issue from June 13th on “Artificial Intelligence and Its Limits”.
In a similar vein, the French-Martinican cloud consultant Frédéric Anauld asked:
“Is AutoML the future of this field?”
AutoML stands for Automated Machine Learning. In the next episode of SuperDataScience, #445, which will be released on Wednesday, the guest -- Sinan Ozdemir -- and I discuss AutoML in more detail. The short answer is “yes” AND “no” -- AutoML is only very useful on clean data and in the real world we are typically presented with only the dirtiest, noisiest of data.
AutoML may become more prevalent as it accelerates the identification of the optimal model choice or the optimal model hyperparameters, but it is not a replacement for the blood, sweat, and tears of data scientists.
Both a bioengineering PhD student named Zach and someone with the Twitter handle DoomscrollPro asked questions that are related to model interpretability and bias.
This is related to the previous questions because while open-source model interpretability tools are becoming more common -- two particularly popular ones are LIME and SHAP -- ultimately the data scientist themself is responsible for ensuring that a model is sufficiently interpretable and doesn’t include unwanted biases such as those against a particular demographic group.
AutoML may recommend an extravagant deep learning model as optimal for accurately solving a problem, but if interpretability is paramount to your application -- say, because your model will approve people for a credit card or determine the length of their prison sentence -- than a simple regression model with marginally less accuracy perhaps but completely interpretable model weights might be much more appropriate.
Thanks for reading! In the next article in this series, I’ll answer some popular questions about machine learning, ranging from the best learning paths for getting started in ML to the hardest concepts to understand in ML.
In the meantime, check out the guest episode with Sinan Ozdemir published on February 17 for more detail on AutoML as well as tons of other fascinating topics, such as how to make Conversational A.I. -- also known as chatbots -- effective for automating real-world business processes.
Finally, if you’d like to ask me your own data science or machine learning questions (or anything at all really!), feel free to tag me in a post on LinkedIn or Twitter (@JonKrohnLearns) and I’ll aim to answer them via social media or perhaps on an upcoming SuperDataScience episode!