Thanks to a new lip-reading A.I., non-verbal medical patients can now "speak" to their clinicians and loved ones…
Read MoreFiltering by Category: SuperDataScience
The Intentional Use of Color in Data Communication
In today's episode, Kate Strachnyi— author of the new book ColorWise — opened my eyes to the vastly underutilized power of the intentional use of color in data visualizations. Now you can harness its power too!
Kate:
• Is a multi-time data science book author.
• Her latest book, ColorWise, was published by O'Reilly Media: It’s a beautiful, comprehensive guide to the effective use of color when communicating data visually.
• Founded the DATAcated Circle, a community of data professionals committed to engaging and learning together.
• Is a megastar on LinkedIn where she has over 170k followers and was twice recognized as a LinkedIn Top Voice for Data Science & Analytics.
• Is big into long-distance running; her longest to date was a 50-mile (!!!) ultramarathon in New York’s Central Park.
Today’s episode should appeal to technical and non-technical folks alike because I suspect that pretty well any listener of this show presents data and could benefit from learning how to do so more effectively with the intentional use of color.
In today’s episode, Kate details:
• Why the intentional use of color matters.
• What thought process you should follow to select a color scheme for a visualization.
• Special considerations for color choice, such as accessibility, cultural understanding, and due to human psychology.
• How to effectively use multiple visualizations together in a document, presentation or dashboard.
• Her favorite data viz tools.
Want a free digital copy of ColorWise? The first five people to comment on this post that they want one, get one! Thanks to O'Reilly's Suzanne Huston for offering this to our listeners
If you miss out on one of the five copies, you can use my special code "SDSPOD23" to get a free 30-day trial of the O'Reilly platform and read the book there.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
SparseGPT: Remove 100 Billion Parameters but Retain 100% Accuracy
Today’s episode isn’t specifically about GPT-3, however. It’s about the issue of how massive these large language models are and how we can prune these models to compress them.
Read MoreIntroduction to Machine Learning
After a multi-year hiatus, Hadelin and Kirill — the most popular data science instructors on Udemy, with 2+ million students — have released a new ML course. In this episode, they introduce what ML is from scratch.
Kirill Eremenko:
• Is Founder and CEO of SuperDataScience, an e-learning platform.
• Founded the SuperDataScience Podcast in 2016 and hosted the show until he passed me the reins two years ago.
Hadelin de Ponteves:
• Was a data engineer at Google before becoming a content creator.
• In 2020, took a break from Data Science content to produce and star in a Bollywood film featuring "Miss Universe" Harnaaz Sandhu.
Together, Kirill and Hadelin:
• Have created dozens of data science courses.
• Are the most popular data science instructors on the Udemy platform, with over two million students.
• After a multi-year hiatus from creating courses, they recently published a new course called “Machine Learning in Python: Level 1".
This episode serves as an introduction to machine learning so will primarily appeal to folks who aren’t already expert at ML — that said, I’ve been doing ML for over 15 years and still learned a few critical new pieces of information during filming so this episode could serve as a fun, light-hearted refresher for experts.
In this episode, Kirill and Hadelin introduce ML concepts such as:
• Supervised vs unsupervised learning
• Classification errors
• Logistic regression
• Feature scaling
• The Adjusted R-Squared metric
• The assumptions of linear regression
• The Elbow Method
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Is Data Science Still Sexy?
Had far too much fun filming today's episode with Prof. Tom Davenport, many-time author of bestselling books on analytics and coiner of data science as "sexiest job of the century". A decade on, does he still think so?
Tom:
• Has published over 20 books, such as the bestselling "Competing on Analytics", "The A.I. Advantage", and "Analytics at Work".
• Has penned 300+ articles in publications like the Harvard Business Review and writes regular columns for Forbes and The Wall Street Journal.
• Is President's Distinguished Professor of IT and Management at Babson College.
• Is Visiting Professor at the Saïd Business School, University of Oxford.
• Is Senior Advisor to the A.I. practice for the global professional services giant Deloitte.
• With nearly 300k followers, he’s recognized as a LinkedIn Top Voice.
Today’s episode is equally well-suited to technical and non-technical listeners alike. Every part of it should be appealing to anyone who’s keen to hear about the leading edge of commercial applications of A.I.
In this episode, Prof. Davenport details:
• The discrete A.I. maturity levels of organizations.
• How organizations become A.I. fueled.
• Which jobs are susceptible to replacement by A.I.
• Which jobs are ripe for augmenting with A.I.
• What roles other than data scientist are required to deploy effective machine learning models.
• What the future of data science will look like and, having coined data science as “the sexiest job of the 21st century” a decade ago, whether he still thinks it is today.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Machine Learning for Video Games
Carly Taylor — Lead ML Engineer for the "Call of Duty" franchise — joined me for today's fun, super informative episode on low-latency software engineering, real-time ML, and the future of gaming.
Carly:
• Grew rapidly from a Sr Data Scientist role to simultaneously holding "Expert ML Engineer" and "Sr Mgr — Security Strategy" titles since joining Activision two years ago.
• At Activision, specifically works on Call of Duty, one of the top-grossing video game franchises of all time, with over $30 billion in sales and 250m global users annually.
• Prior to Activision, rapidly grew from Analyst to Data Scientist roles.
• Has amassed a LinkedIn following of 75k+ by regularly posting fruitful tips on breaking into a data science career and progressing within it.
• Advocates for women in STEM, tech, and gaming careers.
• Offers 1:1 career consulting to anyone who desires it.
• Holds a Masters in Computational Chemistry from the University of Colorado and completed the Galvanize Data Science Immersive program.
Today’s episode certainly has technical tidbits throughout that will be useful to hands-on practitioner but much of the wide-ranging conversation will be fascinating to any listener, particularly if you have an interest in video games, the so-called metaverse, or real-time machine learning.
In this episode, Carly details:
• What the future of gaming holds.
• Why low-latency is critical for an optimal gaming experience and the tools that online engineers use to make it happen.
• Her favorite operating systems, software packages, and keyboards.
• How to transition effectively from a quantitative academic background into data science.
• How to file a patent.
• Why she’s called the “Rebel Data Scientist”.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
A Framework for Big Life Decisions
The biggest decisions we make involve trade-offs between professional opportunity (money) and our personal life (love). Today, Stanford labor economist Prof. Myra Strober provides a framework for making a big choice.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
A.I. for Medicine
Machine learning is ushering in a new era of medicine, e.g., by predicting the shape of therapeutic drugs and assisting in their design. Witty Prof. Charlotte Deane of the University of Oxford and Exscientia explains how.
Charlotte:
• Is a global-leading expert on using ML for designing therapeutic drugs.
• Has been faculty at the University of Oxford for over 20 years, where serves as Professor of Structural Bioinformatics and heads the 25-person Protein Informatics Lab.
• Is Chief Scientist Biologics A.I. at Exscientia, a NASDAQ-listed pharmatech company that uses computational approaches to drive drug development in a fraction of the time of traditional drug companies.
• Was COVID-Response Director for UK Research and Innovation, resulting in Queen Elizabeth II honoring her as a Member of the Most Excellent Order of the British Empire.
Today’s episode should appeal to technical and non-technical folks alike as it features an absolutely brilliant scientist and communicator describing how we can use A.I. to speed the discovery of new molecules that help our body fight off ailments as diverse as viruses and cancer.
In this episode, Prof. Deane details:
• How your immune system works.
• What biologics are and why they’re such an important class of drugs.
• What’s holding back the widespread use of precision medicines that are pinpoint-customized to a specific tumor in a specific person.
• What the celebrated AlphaFold algorithm does exquisitely and where it (and all other computational models of protein folding) still need to improve.
• How she used data to marshall the UK’s scientific response to Covid.
• How data and machine learning will transform drug development over the coming years.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Continuous Calendar for 2023
Well, another year, another continuous calendar from us here at SuperDataScience!
Read MoreData Science Trends for 2023
Happy New Year! To kick it off, the entrepreneur, futurist, and mega-popular Machine Learning instructor Sadie St. Lawrence joins me to predict the biggest data science trends of 2023 🍾
We start the episode off by looking back at how our predictions for 2022 panned out from a year ago and then we dive into our predictions for the year ahead. Specific trends we discuss include:
• Data as a product
• Multimodal models
• Decentralization of enterprise data
• A.I. policy
• Environmental sustainability
This episode will appeal to technical and non-technical folks alike — anyone who’d like to understand the trends that will shape the field of data science and the broader world not only in 2023 but also in the years beyond.
Sadie:
• Has created data science and ML courses enjoyed by 350k+ students.
• Is Founder and CEO of Women In Data, a community of over 20k women across 17 countries.
• Serves on multiple start-up boards.
• Hosts the Data Bytes podcast.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
What I Learned in 2022
To cap 2022 off, like I did to cap 2021 off, I’m covering the five big lessons that I learned over the course of the year:
Read MoreSimplifying Machine Learning
Today, Mariya Sha — host of the wildly popular "Python Simplified" YouTube channel (140k subscribers!) — taps her breadth of A.I. expertise to provide a fun and fascinating finale to SuperDataScience guest episodes for 2022.
Mariya:
• Is the mind behind the "Python Simplified" YouTube channel that makes advanced concepts (e.g., ML, neural nets) simple to understand.
• Her videos cover Python-related topics as diverse as data science, web scraping, automation, deep learning, GUI development, and OOP.
• Is renowned for taking complex concepts such as gradient descent or unsupervised learning and explaining them in a straightforward manner that leverages hands-on, real-life examples.
• Is pursuing a bachelor's in Computer Science (with a specialization in A.I. and Machine Learning) from the University of London.
Today’s episode should appeal to anyone who’s interested in or involved with data science, machine learning, or A.I.
In this episode, Mariya details:
• How the incredible potential of ML in our lifetimes inspired her to shift her focus from web-development languages like JavaScript to Python.
• Why automation and web scraping are critical skills for data scientists.
• How to make learning any apparently complex data science concept straightforward to comprehend.
• Her favorite Python libraries and software tools.
• One rarely-mentioned topic that every data scientist would benefit from.
• The pros and cons of pursuing a 100% remote degree in computer science.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
How to Influence Others with Your Data
If you ever use data to make decisions or to persuade those around you to make data-driven decisions, today’s episode is jam-packed with relevant, practical tips from data presentation guru Ann K. Emery.
Ann:
• Is an internationally-acclaimed speaker who delivers 100+ keynotes, workshops, and webinars each year to enable people to share data-driven insights more effectively.
• She has consulted on data visualization, data reporting, and data presentation with over 200 organizations — the likes of the United Nations, the US Centers for Disease Control, and Harvard University.
• She holds a BA in Psychology and Spanish from the University of Virginia and a Masters in Educational Psychology Evaluation, Assessment, and Testing from George Mason University.
I rarely say that everyone should listen to an episode, but this is one of those rare cases.
In this episode, Ann details:
• What data storytelling is.
• Best practices for data visualization.
• Surprising tricks you can pull off with spreadsheet software.
• How to report on data effectively.
• Her top tips for presenting data in a slideshow.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
The Equality Machine
Many recent books and articles spread fear about data collection and A.I. Today's guest, Prof. Orly Lobel, offers the antidote with her book "The Equality Machine" — an optimistic take on the future of data science.
Liquid Neural Networks
Liquid Neural Networks are a new, biology-inspired deep learning approach that could be transformative. I think they're super cool and Adrian Kosowski, PhD introduced them to me for today's Five-Minute Friday episode.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Data Analytics Career Orientation
Considering a Data Analytics career? Today's episode with YouTube icon Luke Barousse (273k subscribers) will be particularly appealing to you, but the terrifically interesting guest makes for an episode that anyone will love.
Luke:
• Is a full-time YouTuber, creating highly educational — but nevertheless hilarious — videos focused on Data Analytics.
• Previously worked as a Lead Data Analyst and Data Engineer at BASF.
• Worked for seven years in the US Navy on nuclear-powered submarines.
• Holds a degree in mechanical engineering, a graduate qualification in nuclear engineering, and an MBA in business analytics.
In this episode, Luke details:
• The must-have skills for entry-level data analyst roles.
• The data analyst skills mistakenly and erroneously pursued by many folks considering the career.
• How his submariner experience prepared him well for a data career.
• His favorite tools for creating interactive data dashboards.
• His favorite scraping libraries for collecting data from the web.
• The skills to learn now to be prepared for the data careers of the future.
• The benefits of CrossFit beyond just the fitness improvements.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Resilient Machine Learning
Machine learning is often fragile in production. For today's Five-Minute Friday episode, Dr. Dan Shiebler details how we can make ML more resilient.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Software for Efficient Data Science
In today's episode, Dr. Jodie Burchell details a broad range of tools for working efficiently with data, including data cleaning, reproducibility, visualization, and natural language processing.
Jodie:
• Is the Data Science Developer Advocate for JetBrains, the developer-tools company behind PyCharm (one of the most widely-used Python IDEs) and DataLore (their new cloud platform for collaborative data science).
• Previously was Data Scientist or Lead Data Scientist at several tech companies, developing specializations in search, recommender systems, and NLP.
• Co-authored two books on data visualization libraries: "The Hitchhiker's Guide to ggplot2" and "The Hitchhiker's Guide to Plotnine".
• Prior to entering industry, was a postdoctoral fellow in biostatistics at the University of Melbourne.
• Holds a PhD in Psychology from the Australian National University.
Today’s episode is primarily intended for a technical audience as it's packed with practical tips and software for data scientists.
In this episode, Jodie details:
• What a data science developer advocate is and why you might want to consider it as a career option.
• How to work effectively, efficiently, and confidently with real-world data.
• Her favorite Python libraries, such as ones for data viz and NLP.
• How to have reproducible data science workflows.
• The subject she would have majored in if she could go back in time.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
The Critical Human Element of Successful A.I. Deployments
For today's episode, I sat down with the prolific data-science instructor, author and practitioner Keith McCormick to discuss how critical user considerations are for developing a successful A.I. application.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
AutoML: Automated Machine Learning
AutoML with Erin LeDell — it rhymes! In today's episode, H2O.ai's Chief ML Scientist guides us through what Automated Machine Learning is and why it's an advantageous technique for data scientists to adopt.
Dr. LeDell:
• Has been working at H2O.ai — the cloud A.I. firm that has raised over $250m in venture capital and is renowned for its open-source AutoML library — for eight years.
• Founded (WiMLDS) Women in Machine Learning & Data Science (100+ chapters worldwide).
• Co-founded R-Ladies Global, a community for genders currently underrepresented amongst R users.
• Is celebrated for her talks at leading A.I. conferences.
• Previously was Principal Data Scientist at two acquired A.I. startups.
• Holds a Ph.D. from the Berkeley focused on ML and computational stats.
Today’s episode is relatively technical so will primarily appeal to technical listeners, but it would also provide context to anyone who’s interested to understand how key aspects of data science work are becoming increasingly automated.
In this episode, Erin details:
• What AutoML — automated machine learning — is and why it’s an advantageous technique for data scientists to adopt.
• How the open-source H2O AutoML platform works.
• What the “No Free Lunch Theorem” is.
• What Admissible Machine Learning is and how it can reduce the biases present in many data science models.
• The new software tools she’s most excited about.
• How data scientists can prepare for the increasingly automated data science field of the future.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.