This week's guest is the sensational Karen Jean-François: mathematician, award-winning data analyst, podcast host, and French national champion in the 400m hurdle. She details how to hurdle over data career obstacles.
More specifically, in today’s episode Karen fills us in on:
• How to overcome Imposter Syndrome in the data science industry
• Why you might want to consider becoming a data science manager versus remaining a more specialized individual contributor
• The data tools that she uses regularly
• The productivity and prioritization techniques that enable her to juggle her day job, her thriving podcast, and her world-class athletic pursuits
Karen:
• Manages banking analytics at publicly-listed Cardlytics
• Is the producer and host of the "Women in Data" podcast
• Was recognized last year as one of the "Twenty in Data and Technology"
• Holds degrees in mathematics and computing from Paris-Sud University (Paris XI)
• Was French national champion in the 400m hurdle and bronze medalist in the 100m hurdle
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Filtering by Category: SuperDataScience
The Highest-Paying Data Tools
This article was originally adapted from a podcast, which you can check out here.
Two weeks ago for Five-Minute Friday, I covered the highest-paying programming languages for data scientists based on the results of O’Reilly’s 2021 Data/AI Salary Survey. Last week we used Five-Minute Friday to get our definitions of data tools and data frameworks straight so that today we could dig into the highest-paying data tools — while next week, in turn, we’ll tackle the highest-paying data platforms. If you get through today’s episode and don’t feel 100% clear about what a data tool is then consider popping back to Episode #522 to clarify.
The most widely-used tool in the survey — used by nearly a third of respondents — was Microsoft’s Excel program for working with data in spreadsheets. Despite its popularity, Excel — along with other click-and-point tools in the survey — was associated with a below-average salary. Specifically, the mean across all respondents was $146k but those who indicated that they used Excel were paid on average $8k/year less at $138k.
Read MoreOpen-Source Analytical Computing (pandas, Apache Arrow)
The legend Wes McKinney is this week's guest! He details the genesis of the ubiquitous pandas library, the forthcoming edition of his bestselling book, and how Apache Arrow brings analytics into the distributed computing era.
Wes:
• Created pandas, the industry-standard Python library for data analytics
• Co-created Apache Arrow, a language-agnostic open-source library for efficient analytics on modern distributed CPUs and GPUs
• Wrote the classic O'Reilly Media desk reference "Python for Data Analysis"
• Has worked as technical expert at prestigious firms like Cloudera, RStudio PBC, Two Sigma, and AQR Capital Management
• Today serves as co-founder and CTO of Voltron Data
In this episode, Wes takes us on a technical deep-drive through:
• The creation story of his now-ubiquitous pandas library
• A sneak peek at the third edition of his international-bestselling book
• What the Apache Arrow project is and why it's poised to revolutionize the data science and software industries
• The software and hardware tools that he uses daily to be such an epically productive software developer and entrepreneur
• Responses to great questions by listeners Daniel, David, Doug, and Brett
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Skyrocket Your Career by Sharing Your Writing
This week's guest is Khuyen Tran, one of the most preeminent voices of data science: Her blog alone receives over 100k views/month. In this episode, she details how by publishing your writing, you too can skyrocket your career.
Khuyen:
• Became an author for the Towards Data Science blog less than two years ago; already her articles garner 100k+ views per month
• Writes practical daily posts featuring Python code right here on LinkedIn, leading to her developing a highly engaged following of over 25k — in just one year!
• Recently became a technical writer for NVIDIA's Developer Blog
• Has landed four data science jobs, including her current role at Ocelot Consulting, in part thanks to her writing
• Has a perfect 4.0 GPA in the computational and applied mathematics undergraduate degree that she's on track to complete next year
In today's episode, Khuyen fills us in on:
• How publishing your writing can skyrocket your technical career
• Her tricks for maximizing engagement with the content you publish
• Her favorite data science tools and approaches
• Her tricks for prioritizing and being as epically productive as she is across her studies, her data science work, and her prodigious technical writing
And thanks to Krzysztof and Nikolay for the outstanding audience questions that Khuyen addressed on-air.
The episode's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
The Highest-Paying Programming Languages for Data Scientists
This article was originally adapted from a podcast, which you can check out here.
The beloved technical publisher O’Reilly recently released their 2021 Data/AI Salary Survey. It covers responses from 3000 survey respondents in the US combined with another 300 respondents from the UK. All of the respondents are subscribers to O’Reilly’s Data & AI Email Newsletter.
There’s quite a lot of detail in the report on how the salaries of data professionals vary, including by gender, level of education, career stage, industry, and geographic location. Today, I’m focusing on how salaries vary by programming language since this is an attribute that you can easily change about yourself, simply by learning something new.
Read MoreA.I. for Good
This week's guest is the eloquent and inspiring James Hodson, founder and CEO of the AI for Good Foundation, which leverages data and machine learning to tackle the United Nations' Sustainable Development Goals.
In this episode, James details:
• Globally impactful case studies from his A.I. for Good organization across public health, DEI, and a practical database of A.I. progress on social issues
• How you yourself can get involved in helping apply A.I. for wide-reaching social benefit, whether you're a technical expert or not
• The hard and soft skills that he looks for in the data scientists that he hires
In addition to his leadership of A.I. for Good, James:
• Is an academic research fellow at the Jozef Stefan Institute, where he's focused on Natural Language Processing research
• Is Chief Science Officer at Cognism, a British tech startup
• Served as A.I. Research Manager at Bloomberg LP
• Completed a degree at Princeton University focused on Machine Translation
Thank you to Claudia Perlich for the intro to James! I learned a ton from him while filming this episode.
The episode's available on all major podcasting platforms, on YouTube, and at SuperDataScience.com.
Fail More
Fail more! Failing is very very good. For Five-Minute Friday this week, I elaborate on why.
SuperDataScience episodes are available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Courses in Data Science and Machine Learning
This week's guest is super fun Sadie St. Lawrence, an exceptionally popular data science instructor with over 300k students all-time. She fills us in on her exciting new ML Certificate and the global impact of her Women In Data org.
Sadie:
• Teaches data science at the University of California, Davis
• Her Coursera course is one of the all-time most popular
• Is Founder and CEO of Women in Data, a community of 20k across 15 countries
• Holds a Master's in Analytics from Villanova University
In this episode, she digs into:
• The content of her existing iconic data science course
• The curriculum of her epic forthcoming Machine Learning Certificate
• The mission, impact, and vision of the Women in Data organization
• Her path into data science from music performance
• Non-fungible tokens (NFTs) and the future of technology
Thanks to Harpreet Sahota for introducing me to Sadie! I absolutely loved filming this episode.
Listen or watch here.
Does Caffeine Hurt Productivity? (Part 3: Scientific Literature)
For Five-Minute Friday yesterworkday, I concluded my three-part series on whether caffeine disrupts productivity by broadening from the experiment I ran on myself to the academic literature on the topic.
My Jupyter notebook of data and analysis of the caffeine vs productivity experiment I ran on myself is here.
SuperDataScience episodes are available on all major podcasting platforms and YouTube, and on SuperDataScience.com.
Accelerating Impact through Community — with Chrys Wu
This week's guest is global tech community builder Chrys Wu who details how you too can leverage communities to accelerate your career. This is the first SuperDataScience episode ever recorded in-person!
In addition to accelerating your career with community, Chrys covers:
• K-pop music and its associated cultural movement
• How the Write/Speak/Code and Hacks/Hackers organizations she co-founded leverage community to make a massive global impact for marginalized genders and journalism, respectively
• Her top resources — social media accounts, blogs, and podcasts — for staying abreast of the latest in data science and machine learning
Chrys is a consultant who specializes in product development and change management. She's also a co-founder of both Write/Speak/Code and Hacks/Hackers, the latter of which has grown to 70 chapters across five continents.
Listen or watch here.
Transformers for Natural Language Processing
This week's guest is award-winning author Denis Rothman. He details how Transformer models (like GPT-3) have revolutionized Natural Language Processing (NLP) in recent years. He also explains Explainable AI (XAI).
Denis:
• Is the author of three technical books on artificial intelligence
• His most recent book, "Transformers for NLP", led him to win this year's Data Community Content Creator Award for technical book author
• Spent 25 years as co-founder of French A.I. company Planilog
• Has been patenting A.I. algos such as those for chatbots since 1982
In this episode, Denis fills us in on:
• What Natural Language Processing is
• What Transformer architectures are (e.g., BERT, GPT-3)
• Tools we can use to explain *why* A.I. algorithms provide a particular output
We covered audience questions from Serg, Chiara, and Jean-charles during filming. For those we didn't get to ask, Denis is kindly answering via a LinkedIn post today!
The episode's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Does Caffeine Hurt Productivity? (Part 1)
For Five-Minute Friday this week, I lay out my hypothesis that caffeine decreases people's capacity to focus deeply on work. Next week, we'll review the results of the months-long coffee experiment I ran on myself!
(If you can't wait to see the experiment results, you can head to jonkrohn.com/coffee to check them out.)
Listen or watch here.
Upcoming guest on the SuperDataScience Podcast: Wes McKinney
Next week, I'm interviewing the monumental Wes McKinney — creator of pandas, co-creator of Apache Arrow, and bestselling author of "Python for Data Analysis" — for a SuperDataScience episode.
Got Qs for him? Tweet them @jonkrohnlearns or send them to me on LinkedIn.
Data Science for Private Investing — LIVE with Drew Conway
This week's guest is prominent data scientist and author Dr. Drew Conway. Working at Two Sigma, one of the world's largest hedge funds, Drew leads data science for private markets (e.g., real estate, private equity).
If you aren't familiar with Drew already, he:
• Serves as Senior Vice President for data science at Two Sigma
• Co-authored the classic O'Reilly Media book "ML for Hackers"
• Was co-founder and CEO of Alluvium, which was acquired in 2019
• Advised countless successful data-focused startups (e.g., Yhat, Reonomy)
• Obtained a PhD in politics from New York University
In this episode, he covers:
• What private investing is
• How data science can lead to better private investment decisions
• The differences between creating and executing models for public markets (such as stock exchanges) relative to private markets
• What he looks for in the data scientists he hires and how he interviews them
This is a special SuperDataScience episode because it's the first one recorded live in front of an audience (at the The New York R Conference in September). Eloquent Drew was the willing guinea pig for this experiment, which was a great success: We filmed in a single unbroken take and fielded excellent audience questions.
Listen or watch here.
Deep Reinforcement Learning
Five-Minute Friday today is an intro to (deep) reinforcement learning, which has diverse cutting-edge applications: E.g., machines defeating humans at complex strategic games and robotic hands solving Rubik’s cubes.
You can watch or listen here.
Accelerating Start-up Growth with A.I. Specialists
This week's guest is the game-changing Dr. Parinaz Sobhani. She leads ML at Georgian — a private fund that sends her "special ops" data science teams into its portfolio companies to accelerate their A.I. capabilities.
In this episode, Parinaz details:
• Case studies of Georgian's A.I. approach in action across industries (e.g. insurance, law, real estate)
• Tools and techniques her team leverages, with a particular focus on the transfer learning of transformer-based models of natural language
• What she looks for in the data scientists and ML engineers she hires
• Environmental and sociodemographic considerations of A.I.
• Her academic research (Parinaz holds a PhD in A.I. from the University of Ottawa where she specialized in natural language processing)
Listen or watch here.
...and thanks to Maureen for making this connection to Parinaz!
Building Your Ant Hill
Five-Minute Friday today features my 91-year-old grandmother sharing her insightful life philosophy that centers around an analogy of ants building ant hills.
Listen here.
Bayesian Statistics
Expert Rob Trangucci joins me this week to provide an introduction to Bayesian Statistics, a uniquely powerful data-modeling approach.
If you haven't heard of Bayesian Stats before, today's episode introduces it from the ground up. It also covers why in many common situations, it can be more effective than other data-modeling approaches like Machine Learning and Frequentist Statistics.
Today's episode is a rich resource on:
• The centuries-old history of Bayesian Stats
• Its particular strengths
• Real-world applications, including to Covid epidemiology (Rob's particular focus at the moment)
• The best software libraries for applying Bayesian Statistics yourself
• Pros and cons of pursuing a PhD in the data science field
Rob is a core developer on the open-source STAN project — a leading Bayesian software library. Having previously worked as a statistician in renowned professor Andrew Gelman's lab at Columbia University in the City of New York, Rob's now pursuing a PhD in statistics at the University of Michigan.
Listen or watch here.
Supervised vs Unsupervised Learning
Five-Minute Friday this week is a high-level intro to the two largest categories of Machine Learning approaches: Supervised Learning and Unsupervised Learning.
Listen or watch here.
From Data Science to Cinema
SuperDataScience SuperStar Hadelin returns to report on his journey from multi-million-selling video instructor to mainstream-film actor — and he details the traits that allow data scientists to succeed at anything.
Hadelin has created and presented 30 extremely popular Udemy courses on machine learning topics, selling over two million copies so far. Prior to his epic creative period publishing ML courses, Hadelin studied math, engineering and A.I. at the Université Paris-Saclay and he worked as a data engineer at Google. More recently Hadelin has written a book called "A.I. Crash Course" and was co-founder and CEO of BlueLife AI.
Today's episode focuses on:
• Hadelin's recent shift toward acting in mainstream films
• The characteristics that enable an outstanding data scientist to excel in any pursuit
• How to cultivate your passion and achieve your dreams
• Bollywood vs Hollywood
• How to prepare for the TensorFlow Certificate Program
• Software modules for deploying deep learning models into production
Listen or watch here.