Today, Mariya Sha — host of the wildly popular "Python Simplified" YouTube channel (140k subscribers!) — taps her breadth of A.I. expertise to provide a fun and fascinating finale to SuperDataScience guest episodes for 2022.
Mariya:
• Is the mind behind the "Python Simplified" YouTube channel that makes advanced concepts (e.g., ML, neural nets) simple to understand.
• Her videos cover Python-related topics as diverse as data science, web scraping, automation, deep learning, GUI development, and OOP.
• Is renowned for taking complex concepts such as gradient descent or unsupervised learning and explaining them in a straightforward manner that leverages hands-on, real-life examples.
• Is pursuing a bachelor's in Computer Science (with a specialization in A.I. and Machine Learning) from the University of London.
Today’s episode should appeal to anyone who’s interested in or involved with data science, machine learning, or A.I.
In this episode, Mariya details:
• How the incredible potential of ML in our lifetimes inspired her to shift her focus from web-development languages like JavaScript to Python.
• Why automation and web scraping are critical skills for data scientists.
• How to make learning any apparently complex data science concept straightforward to comprehend.
• Her favorite Python libraries and software tools.
• One rarely-mentioned topic that every data scientist would benefit from.
• The pros and cons of pursuing a 100% remote degree in computer science.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Filtering by Category: Interview
How to Influence Others with Your Data
If you ever use data to make decisions or to persuade those around you to make data-driven decisions, today’s episode is jam-packed with relevant, practical tips from data presentation guru Ann K. Emery.
Ann:
• Is an internationally-acclaimed speaker who delivers 100+ keynotes, workshops, and webinars each year to enable people to share data-driven insights more effectively.
• She has consulted on data visualization, data reporting, and data presentation with over 200 organizations — the likes of the United Nations, the US Centers for Disease Control, and Harvard University.
• She holds a BA in Psychology and Spanish from the University of Virginia and a Masters in Educational Psychology Evaluation, Assessment, and Testing from George Mason University.
I rarely say that everyone should listen to an episode, but this is one of those rare cases.
In this episode, Ann details:
• What data storytelling is.
• Best practices for data visualization.
• Surprising tricks you can pull off with spreadsheet software.
• How to report on data effectively.
• Her top tips for presenting data in a slideshow.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Liquid Neural Networks
Liquid Neural Networks are a new, biology-inspired deep learning approach that could be transformative. I think they're super cool and Adrian Kosowski, PhD introduced them to me for today's Five-Minute Friday episode.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Data Analytics Career Orientation
Considering a Data Analytics career? Today's episode with YouTube icon Luke Barousse (273k subscribers) will be particularly appealing to you, but the terrifically interesting guest makes for an episode that anyone will love.
Luke:
• Is a full-time YouTuber, creating highly educational — but nevertheless hilarious — videos focused on Data Analytics.
• Previously worked as a Lead Data Analyst and Data Engineer at BASF.
• Worked for seven years in the US Navy on nuclear-powered submarines.
• Holds a degree in mechanical engineering, a graduate qualification in nuclear engineering, and an MBA in business analytics.
In this episode, Luke details:
• The must-have skills for entry-level data analyst roles.
• The data analyst skills mistakenly and erroneously pursued by many folks considering the career.
• How his submariner experience prepared him well for a data career.
• His favorite tools for creating interactive data dashboards.
• His favorite scraping libraries for collecting data from the web.
• The skills to learn now to be prepared for the data careers of the future.
• The benefits of CrossFit beyond just the fitness improvements.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Resilient Machine Learning
Machine learning is often fragile in production. For today's Five-Minute Friday episode, Dr. Dan Shiebler details how we can make ML more resilient.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
The Critical Human Element of Successful A.I. Deployments
For today's episode, I sat down with the prolific data-science instructor, author and practitioner Keith McCormick to discuss how critical user considerations are for developing a successful A.I. application.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
AutoML: Automated Machine Learning
AutoML with Erin LeDell — it rhymes! In today's episode, H2O.ai's Chief ML Scientist guides us through what Automated Machine Learning is and why it's an advantageous technique for data scientists to adopt.
Dr. LeDell:
• Has been working at H2O.ai — the cloud A.I. firm that has raised over $250m in venture capital and is renowned for its open-source AutoML library — for eight years.
• Founded (WiMLDS) Women in Machine Learning & Data Science (100+ chapters worldwide).
• Co-founded R-Ladies Global, a community for genders currently underrepresented amongst R users.
• Is celebrated for her talks at leading A.I. conferences.
• Previously was Principal Data Scientist at two acquired A.I. startups.
• Holds a Ph.D. from the Berkeley focused on ML and computational stats.
Today’s episode is relatively technical so will primarily appeal to technical listeners, but it would also provide context to anyone who’s interested to understand how key aspects of data science work are becoming increasingly automated.
In this episode, Erin details:
• What AutoML — automated machine learning — is and why it’s an advantageous technique for data scientists to adopt.
• How the open-source H2O AutoML platform works.
• What the “No Free Lunch Theorem” is.
• What Admissible Machine Learning is and how it can reduce the biases present in many data science models.
• The new software tools she’s most excited about.
• How data scientists can prepare for the increasingly automated data science field of the future.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Data Analyst, Data Scientist, and Data Engineer Career Paths
Keen to become a Data Analyst? Get promoted to Sr Data Analyst? Or explore Data Engineer/Scientist options? Shashank, a YouTube expert on these questions (>100k subscribers!) tackles them in today's episode.
Shashank:
• Has an exceptional YouTube channel focused on helping people break into a data analyst career.
• Works as a Senior Data Engineer at digital sports platform Fanatics, Inc.
• Was previously Data Analyst at luxury retailer Nordstrom and other firms.
• Holds a degree in chemistry from Emory University in Atlanta.
Today’s episode will appeal primarily to folks who are interested in becoming a data analyst, or who are interested in transitioning from a data analyst role into a data science or data engineering role.
In this episode, Shashank details:
• How you can land an entry-level data analyst role in just a few weeks, regardless of your educational and professional background.
• The hard and soft skills you need to progress from a junior data analyst to a senior data analyst position.
• What it takes to transition from data analyst to a typically more lucrative role as a data scientist or data engineer.
• His favorite resources for learning the essential skills for data scientists.
What he looks for when he’s interviewing candidates.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Tools for Deploying Data Models into Production
Today's guest is mighty Erik Bernhardsson — creator of Spotify's music recommender, prolific open-source developer, world-leading technical blogger, and now model-deployment-tool entrepreneur via Modal Labs.
Erik:
• Is the Founder and CEO of Modal Labs, a startup building innovative tools and infrastructure for data teams.
• Previously was CTO of the real estate startup Better, where he grew the engineering team from the size of 1 — himself — to 300 people.
• Was also previously an Engineering Manager at Spotify, where he created their now-ubiquitous music-recommendation algorithm.
• Is a prolific open-sourcer, having created the popular Luigi and Annoy libraries, among several others.
• Is an industry-leading blogger with posts that frequently feature on the front page of Hacker News.
Today’s episode gets deep into the weeds at points, so it will be particularly appealing to practicing data scientists, ML engineers, and the like, but much of the fascinating, wide-ranging conversation in this episode will appeal to any curious listener.
In this episode, Erik details:
• How the Spotify music recommender he built works so well at scale.
•The litany of new data science and engineering tools he’s excited about and thinks you should be excited about too.
•What open-source library he would develop next.
•Why he founded his Modal and how their tools empower data teams.
• Having interviewed more than 2000 candidates for engineering roles, his top tips both for succeeding as an interviewer and as an interviewee.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Data Science Interviews with Nick Singh
For an episode all about tips for crushing interviews for Data Scientist roles, our guest is Nick Singh — author of the bestselling "Ace the Data Science Interview" book and creator of the DataLemur SQL interview platform.
Nick:
• Co-authored “Ace the Data Science Interview”, an interview-question guide that has sold over 16,000 copies since it was released last year.
• Created the DataLemur platform for interactively practicing interview questions involving SQL queries.
• Worked as a software engineer at Facebook, Google, and Microsoft.
• Holds a BS in engineering from the University of Virginia.
Today's episode is ideal for folks who are looking to land a data science job for the first time, level-up into a more senior data science role, or perhaps land a data science gig at a new firm.
In this episode, Nick details:
• His top tips for success in data science interviews.
• Common misconceptions about data science interviews.
• How to become comfortable with self-promotion and increase your chances of landing your dream job.
• Strategies for when interviewers ask if you have any questions for them.
• The subject areas and skills you should master before heading into a data science interview.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
More Guests on Fridays
Going forward, we are still going to have short, five-minute-ish episodes on Friday that feature me solo, but we will increasingly be interspersing in inspiring guests. And I won’t be making an effort to have these Friday guest episodes be anywhere near five minutes long — to start, I’m thinking of having them typically be 20 to 30 minutes long, but we’ll see how it goes with the guests and what the reception is like from you.
Read MoreInferring Causality with Jennifer Hill
Inferring causal direction — as opposed to merely identifying correlations — is central to all real-world data science applications. World-leading expert and author on causality, Prof. Jennifer Hill, is our guest this week.
Jennifer:
• Is Professor of Applied Statistics at New York University, where she researches causality and practical applications of causal research, such as those that are vital to scientific development and government policies.
• Co-directs the NYU Masters in Applied Statistics and directs PRIISM (a center focused on impactful social applications of data science).
• With the renowned statistician Andrew Gelman, wrote the book "Data analysis using regression and multilevel/hierarchical models", an iconic textbook that has been cited over 15k times.
• Holds a PhD in Statistics from Harvard University.
Intended audience:
• Today’s episode largely contains content that will be of interest to anyone who’s keen to better understand the critical concept of causality.
• It also contains technical parts that will appeal primarily to practicing data scientists.
In this episode, Jennifer details:
• How causality is central to all applications of data science.
• How correlation does not imply causation.
• How to design research in order to confidently infer causality from the results.
• Her favorite Bayesian and machine learning tools for making causal inferences within code.
• ThinkCausal, her new graphical user interface for making causal inferences without the need to write code.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Upskilling in Data Science and Machine Learning
This week, iconic Stanford University Deep Learning instructor and entrepreneur Kian Katanforoosh details how ML powers his EdTech platform Workera, enabling you to systematically fill gaps in your data science skills.
Kian:
• Is Co-Founder and CEO of Workera, a Bay Area education technology company that has raised $21m in venture capital to upskill workers, with a particular early focus on upskilling technologists like data scientists, software developers, and machine learning specialists.
• Is a lecturer of computer science at Stanford University (specifically, he teaches the extremely popular CS230 Deep Learning course alongside Prof. Andrew Ng, one of the world’s best-known data scientists).
• Was awarded Stanford’s highest teaching award.
• Is also a founding member of DeepLearning.AI, a platform through which he’s taught over three million students deep learning.
• Holds a Masters in Math and Computer Science from CentraleSupélec.
• Holds a Masters in Management Science and Engineering from Stanford.
By and large, today’s episode will appeal to any listener who’s keen to understand the latest in education technology, but there are parts here and there that will specifically appeal to practicing technologists like data scientists and software developers.
In this episode, Kian details:
• What a skills intelligence platform is.
• Four ways that machine learning drives his skills intelligence platform.
• What frameworks and software languages they selected for building their platform and why.
• What he looks for in the data scientists and software engineers he hires.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Geospatial Data and Unconventional Routes into Data Careers
This week, the remarkably well-read Christina Stathopoulos, details open-source software for working with geospatial data... as well as how you can navigate your data-career path, no matter what your background.
Christina:
• Has worked at Google for nearly five years in several data-centric roles.
• For the past year, she’s worked as an Analytical Lead for Waze, the popular crowdsourced navigation app owned by Google.
• Is also an adjunct professor at IE Business School School in Madrid, where she teaches courses on business analytics, machine learning, data visualization, and data ethics.
• Previously worked as a data engineer at media analytics giant Nielsen.
• Holds a Master’s in Business Analytics and Big Data from IE Business School and a Bachelor’s in Science, Tech, and Society from North Carolina State University.
Today’s episode will appeal to a broad audience of technical and non-technical listeners alike.
In this episode, Christina details:
• Geospatial data and open-source packages for working with it.
• Her tips for getting a foothold in a data career if you come from an unconventional background.
• Guidance to help women and other underrepresented groups thrive in tech.
• The hard and soft skills most essential to success in a data role today.
• Her #bookaweekchallenge and her top data book recommendations.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Guest appearance on The Evan Solomon Show: Mimicking the Voice of Dead Relatives- The Future of Voice Cloning and A.I.
Had a fun time getting back on the Evan Solomon talk-radio show this week... This time to discuss voice-mimicking A.I. that (among presumably other applications) allows your dead relatives to read you bedtime stories.
A.I. Policy at OpenAI
OpenAI released many of the most revolutionary A.I. models of recent years, e.g., DALL-E 2, GPT-3 and Codex. Dr. Miles Brundage was behind the A.I. Policy considerations associated with each transformative release.
Miles:
• Is Head of Policy Research at OpenAI.
• He’s been integral to the rollout of OpenAI’s game-changing models such as the GPT series, DALL-E series, Codex, and CLIP.
• Previously he worked as an A.I. Policy Research Fellow at the University of Oxford’s Future of Humanity Institute.
• He holds a PhD in the Human and Social Dimensions of Science and Technology from Arizona State University.
Today’s episode should be deeply interesting to technical experts and non-technical folks alike.
In this episode, Miles details:
• Considerations you should take into account when rolling out any A.I. model into production.
• Specific considerations OpenAI concerned themselves with when rolling out:
• The GPT-3 natural-language-generation model,
• The mind-blowing DALL-E artistic-creativity models,
• Their software-writing Codex model, and
• Their bewilderingly label-light image-classification model CLIP.
• Differences between the related fields of AI Policy, AI Safety, and AI Alignment.
• His thoughts on the risks of AI displacing versus augmenting humans in the coming decades.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
The A.I. Platforms of the Future
Ben Taylor returns for a third consecutive Five-Minute Friday! This week, he helps us look ahead and dig into what we can expect from the A.I. platforms of the future.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Data Engineering 101
Today's episode is all about Data Engineering — particularly the tools and techniques that Data Scientists should know. "Fundamentals of Data Engineering" book co-authors Matthew Housley and Joe Reis are guests!
Matt and Joe:
• Co-authored the brand-new "Fundamentals of Data Engineering" book that was published by O'Reilly Media and is already a bestseller.
• Co-founded the data architecture and data engineering consultancy Ternary Data. Joe is CEO of the firm while Matt is CTO.
In addition, Joe:
• Is an adjunct professor at the University of Utah.
• Previously founded several tech companies and has held both software engineering and data science roles.
• Holds a math degree from the University of Utah.
Matt:
• Holds a PhD in math from the University of Utah.
• Worked as a professor before becoming a data scientist in industry.
Today’s episode will appeal primarily to technical experts like data scientists and data engineers, but will also be of interest to anyone who manages technology projects that involve data flows.
In this episode, Matt and Joe detail:
• Why they identify as “recovering data scientists”.
• What kinds of people tend to become data scientists versus what kinds tend to become data engineers.
• Key components of their book such as latency trade-offs and the six data engineering undercurrents.
• Their favorite data engineering tools and techniques.
• What the Live Data Stack is and how it’s putting various data professional titles on a collision course.
• The biggest data engineering problems firms face and how to fix them.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Why CEOs Care About A.I. More than Other Technologies
Ben Taylor is back for another Five-Minute Friday this week, this time to fill us in on why CEOs care more about A.I. than any other technology and how to sell them on your machine learning solution.
Special shout-out to my puppy Oboe who features indispensably in the video version of this episode... on Ben's lap! 🐶
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
The Real-World Impact of Cross-Disciplinary Data Science Collaboration
How to unlock breakthroughs — particularly in medicine — through cross-disciplinary data science is the main topic covered this week with the fascinating, trailblazing Professor Philip Bourne.
Philip:
• Is Founding Dean of the University of Virginia's School of Data Science.
• Is also Professor of Biomedical Engineering at Virginia.
• Is Founding Editor-in-Chief of the open-access journal PLOS Computational Biology.
• Was previously Associate Director for Data Science of The National Institutes of Health
Despite Prof. Bourne being a deep technical expert, he conveys concepts so magnificently that today’s episode should be broadly appealing to practicing data scientists and non-technical listeners alike.
In this episode, Philip details:
• Why he founded a School of Data Science.
• Why such schools are uniquely positioned to bear the fruits of applied data science research within universities.
• What the most important data science skills are.
• How computing and data science have evolved across academic departments in the recent decades.
• Fascinating practical applications of his biomedical data science research into the structure and function of biological proteins.
• The absolutely essential role of open-source software and open-access publishing in data science.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.