This article was originally adapted from a podcast, which you can check out here.
In last week’s Five-Minute Friday, I covered the highest-paying programming languages for data scientists based on the results of O’Reilly’s 2021 Data/AI Salary Survey. Next week, I’m going to expand on those survey results by covering the highest-paid data tools and the week after that I’ll cover the highest-paying data platforms.
To make the most of those forthcoming episodes, today we are investing a few minutes in getting our definitions straight: I’ll detail what data tools are as well as what data platforms are.
In a phrase, data tools are any software product for working with data that are neither a standalone programming language nor are they a platform. The first distinction — between tools and languages — is pretty straightforward: Python, for example, is a widely used programming language in data science while software libraries that operate within Python — such as scikit-learn, TensorFlow, and PyTorch — are examples of some of the most popular data-science tools. Data tools need not be implemented via code, however; data tools can also be click-and-point software like Microsoft Excel.
Relative to the distinction between data tools and programming languages, the distinction between data tools and data platforms is sometimes less clear-cut and arguments could in many cases be made to classify a given product into either bucket.
Generally speaking, however, platforms are broad software frameworks that, despite not being standalone programming languages, can nevertheless support the development of multiple distinct software tools within them. For example, Spark is a platform for working with massive quantities of data that itself supports particular data tools such as Spark NLP and Spark MLlib within it. Alongside Spark, other prominent examples of data platforms are Kafka and Hadoop.
All right, so there are our definitions: Neither data tools nor data platforms are standalone programming languages. Nevertheless, data platforms can support the development of multiple data tools with them.
As promised at the onset of this episode, for next week’s Five-Minute Friday we’ll explore the data tools that are correlated with the highest salaries and the week after we’ll give data platforms the same treatment.
In the meantime, you can check out the full report from O’Reilly here.
All right, that’s it for today! Keep on rockin’ it out there folks and catch you on another round of SuperDataScience very soon.