This article was adapted from a podcast, which you can listen to or watch here.
A couple of weeks ago, Maria Lee, who helps me out with my social-media marketing, forwarded me a blog post she really liked on good versus great product managers, and she suggested that I write a similar post about data scientists.
Immediately, two distinguishing factors between good and great data scientists came to mind. I frequently ask data science leaders on the SuperDataScience Podcast what they look for in people they hire. I’ve noticed a recurring theme: They nearly always say communication and knowing how to learn. Surprisingly often, these are the only two items mentioned.
In the context of data science, communication means the ability to clearly explain complex, technical content in simple terms to a broad audience, including other data scientists on the team, engineers, product people, and more commercially-oriented folks like managers and end users.
The second item, knowing how to learn, means having demonstrable excellence, from one’s background or perhaps via in-interview exercises, at taking in new information and factoring this new information into decision-making as well as, of course, being able to communicate it clearly. This isn’t solely about innate learning capacity. There are lots of structured ways that people can digest and rehearse information in order to learn more thoroughly and more quickly. These are skills that you can practice. Examples include:
Focusing attention on one task at a time (e.g., with the Pomodoro technique I covered in Episode #456),
Writing down what you’ve learned in your own words
Using flashcards to test your recall of the most important information
Knowing how to find the information you need by searching online or in a book.
Beyond communication and knowing how to learn, I was curious what other data professionals think so on April 15th, I posed a simple question on Twitter: “What separates a good data scientist from a great one?”
I was absolutely blown away by the response, which garnered more attention than I typically get across all of my Tweets in a given month! The next day, the post had been viewed 180,000 times. At the time of writing this blog post on April 19th — four days after the Tweet went out — it has over 200,000 impressions and over 6000 engagements.
Some of the responses were rather witty and had me laughing out loud. Since I asked what “separates” a good data scientist from a great one, a good chunk of responses were to do with pandemic-related restrictions, such as by pointing out that “at least six feet” separates them.
Some comedians went down the frequentist statistics route with their jokes by suggesting that two standard deviations separates a good data scientist from a great one. Others, meanwhile, went down the machine learning route by conjuring up imagery from the support vector machine technique and suggesting that a “decision boundary” or a “hyperplane” separates a good data scientist from a great one.
Martin Goodson, a friend of mine for more than a decade and CEO of Evolution AI in London, wrote:
While amusing, I think the broader point Martin is making here is one shared by many respondents. Good data scientists know the most sophisticated modeling approaches, whereas great ones avoid a computationally complex approach when a simpler one will do.
Rockstar data scientist, Chris Albon, who is Director of Machine Learning at Wikimedia and former host of the brilliant Partially Derivative podcast (that inspired me to begin hosting a podcast myself — now you know who to blame!):
While terse and entertaining, Chris’s Tweet bears truth in two ways. A great data scientist can herself be a data engineer. Or, alternatively, a great data scientist may have a data engineer or even a team of data engineers transforming their model from a collection of weights into real-time magic within a production application.
Beyond the humorous replies, I was delighted that communication and knowing how to learn were indeed a recurring theme across many of them. Countless additional, thoughtful points were made, however, including on:
Creativity
Curiosity
Humility
Ability to listen
Experimental design
Product design
Inspiring/leading a team
Task prioritization
Commercial awareness
Organizational awareness, like being able to “manage up” those above you in the corporate hierarchy
And specific technical skills like software engineering, Bayesian statistics, and distributed computing tools such as Apache Spark.
I wish I could include all of the responses here, but you can of course refer back to the original post and make your way through them. Some of my particular favorites are below.
The legendary Brandon Rohrer, Principal Data Scientist at iRobot and guest on Episode #341 of the SuperDataScience podcast, linked through to a post he’d made on LinkedIn a week earlier and he made a point that I could not agree with more:
Chapman University stats professor Chelsea Parlett-Pelleriti wrote:
And finally, statistician Isabella Ghement, responded in a related vein:
What do you think? Have we covered everything? What do you think is the difference between a good and a great data scientist? Feel free to add your thoughts to the Twitter thread — I look forward to hearing them! You can read the whole thread here, and tweet at me @JonKrohnLearns.