Filtering by Category: Data Science

Calculating Partial Derivatives with PyTorch AutoDiff

Added on September 30, 2021 by Jon Krohn.

My recent videos have detailed how to calculate partial derivatives by hand. In today's, I demo how we can compute them automatically using PyTorch, enabling us to easily differentiate complex equations like ML models.

We publish a new video from my "Calculus for Machine Learning" course to YouTube every Wednesday. Playlist is here.

More detail about my broader "ML Foundations" curriculum and all of the associated open-source code is available in GitHub here.

Accelerating Start-up Growth with A.I. Specialists

Added on September 30, 2021 by Jon Krohn.

This week's guest is the game-changing Dr. Parinaz Sobhani. She leads ML at Georgian — a private fund that sends her "special ops" data science teams into its portfolio companies to accelerate their A.I. capabilities.

In this episode, Parinaz details:
• Case studies of Georgian's A.I. approach in action across industries (e.g. insurance, law, real estate)
• Tools and techniques her team leverages, with a particular focus on the transfer learning of transformer-based models of natural language
• What she looks for in the data scientists and ML engineers she hires
• Environmental and sociodemographic considerations of A.I.
• Her academic research (Parinaz holds a PhD in A.I. from the University of Ottawa where she specialized in natural language processing)

Listen or watch here.

...and thanks to Maureen for making this connection to Parinaz!

Partial Derivative Exercises

Added on September 30, 2021 by Jon Krohn.

Last week's YouTube tutorial was an epic intro to Partial Derivative Calculus — a critical foundation for understanding Machine Learning. This week's video features coding exercises that test your comprehension of that material.

We publish a new video from my "Calculus for Machine Learning" course to YouTube every Wednesday. Playlist is he re.

More detail about my broader "ML Foundations" curriculum and all of the associated open-source code is available in GitHub here.

Bayesian Statistics

Added on September 30, 2021 by Jon Krohn.

Expert Rob Trangucci joins me this week to provide an introduction to Bayesian Statistics, a uniquely powerful data-modeling approach.

If you haven't heard of Bayesian Stats before, today's episode introduces it from the ground up. It also covers why in many common situations, it can be more effective than other data-modeling approaches like Machine Learning and Frequentist Statistics.

Today's episode is a rich resource on:
• The centuries-old history of Bayesian Stats
• Its particular strengths
• Real-world applications, including to Covid epidemiology (Rob's particular focus at the moment)
• The best software libraries for applying Bayesian Statistics yourself
• Pros and cons of pursuing a PhD in the data science field

Rob is a core developer on the open-source STAN project — a leading Bayesian software library. Having previously worked as a statistician in renowned professor Andrew Gelman's lab at Columbia University in the City of New York, Rob's now pursuing a PhD in statistics at the University of Michigan.

Listen or watch here.

Supervised vs Unsupervised Learning

Added on September 30, 2021 by Jon Krohn.

Five-Minute Friday this week is a high-level intro to the two largest categories of Machine Learning approaches: Supervised Learning and Unsupervised Learning.

Listen or watch here.

What Partial Derivatives Are

Added on September 16, 2021 by Jon Krohn.

Here is my brand-new 30-minute intro to Partial Derivative Calculus. To make comprehension as easy as possible, we use colorful illustrations, hands-on code demos in Python, and an interactive click-and-point curve-plotting tool.

This is an epic video covering a massively foundational topic underlying nearly all statistical and machine learning approaches. I hope you enjoy it!

We publish a new video from my "Calculus for Machine Learning" course to YouTube every Wednesday. Playlist is here.

More detail about my broader "ML Foundations" curriculum and all of the associated open-source code is available in GitHub here.

From Data Science to Cinema

Added on September 14, 2021 by Jon Krohn.

SuperDataScience SuperStar Hadelin returns to report on his journey from multi-million-selling video instructor to mainstream-film actor — and he details the traits that allow data scientists to succeed at anything.

Hadelin has created and presented 30 extremely popular Udemy courses on machine learning topics, selling over two million copies so far. Prior to his epic creative period publishing ML courses, Hadelin studied math, engineering and A.I. at the Université Paris-Saclay and he worked as a data engineer at Google. More recently Hadelin has written a book called "A.I. Crash Course" and was co-founder and CEO of BlueLife AI.

Today's episode focuses on:
• Hadelin's recent shift toward acting in mainstream films
• The characteristics that enable an outstanding data scientist to excel in any pursuit
• How to cultivate your passion and achieve your dreams
• Bollywood vs Hollywood
• How to prepare for the TensorFlow Certificate Program
• Software modules for deploying deep learning models into production

Listen or watch here.

Classification vs Regression

Added on September 11, 2021 by Jon Krohn.

Five-Minute Friday this week is a high-level introduction to Classification and Regression problems — two of the main categories of problems tackled by Machine Learning algorithms.

Listen or watch here.

Calculus II: Partial Derivatives & Integrals – Subject 4 of Machine Learning Foundations

Added on September 11, 2021 by Jon Krohn.

Every few months, we begin a new subject in my Machine Learning Foundations course on YouTube and today is one of those days! This video introduces Subject 4 (of 8), which covers Partial Derivatives and Integrals.

This subject-intro video provides a preview of all the content that will be covered in this subject. It also reviews the Single-Variable Calculus you need to be familiar with (from the preceding subject in the ML Foundation series) in order to understand Partial Derivatives (a.k.a. Multi-Variable Calculus).

The thumbnail illustration of my ever-learning puppy Oboe is by the wonderful artist Aglae Bassens.

We publish a new video from my "Calculus for Machine Learning" course to YouTube every Wednesday. Playlist is here.

More detail about my broader "ML Foundations" curriculum and all of the associated open-source code is available in GitHub here.

Deep Reinforcement Learning for Robotics with Pieter Abbeel

Added on September 7, 2021 by Jon Krohn.

Very special guest this week! Pieter Abbeel is a serial A.I. entrepreneur, host of star-studded The Robot Brains Podcast, and the world's most preeminent researcher of Deep Reinforcement Learning applications.

As a professor of Electrical Engineering and Computer Science at the University of California, Berkeley, Pieter directs the Berkeley Robot Learning Lab and co-directs the Berkeley A.I. Research Lab.

As an entrepreneur, he's been exceptionally successful at applying machine learning for commercial value. Gradescope, a machine learning company in the education technology space that he co-founded, was acquired in 2018. And the A.I. robotics firm Covariant, which he co-founded more recently, has raised $147 million so far, including raising $80 million in a Series C funding round in July.

In this episode, Pieter eloquently discusses:
• His exciting current research in the field of Deep Reinforcement Learning
• Top learning resources and skills for becoming an expert in A.I. robotics
• How academic robotics research is vastly different from R&D for industry
• Productivity tips
• Traits he looks for in data scientists he hires
• Skills to succeed as a data scientist in the coming decades

He also had time to answer thoughtful questions from distinguished SuperDataScience listeners Serg Masís and Hsieh-Yu Li.

Listen or watch here.

Machine Learning from First Principles, with AutoDiff

Added on September 6, 2021 by Jon Krohn.

Today's brand-new, epic 40-minute YouTube tutorial ties together the preceding 27 Calculus videos to enable us to perform Machine Learning from first principles and fit a line to data points.

To make learning interactive and intuitive, this video focuses on hands-on code demos featuring PyTorch, the popular Python library for Automatic Differentiation.

If you're familiar with differential calculus but not machine learning, this video will make clear for you how ML works. If you're not familiar with differential calculus, the preceding videos in my "Calculus for Machine Learning" course will provide you with all of the foundational theory you need for ML.

We publish a new video from my "Calculus for Machine Learning" course to YouTube every Wednesday. Playlist is here.

More detail about my broader "ML Foundations" curriculum and all of the associated open-source code is available in GitHub here.

Statistical Programming with Friends with Jared Lander

Added on September 1, 2021 by Jon Krohn.

This week's guest is THE Jared Lander! He fills us in on real-life communities that support learning about — and effectively applying — open-source statistical-programming languages like Python and R.

In addition, Jared:
• Overviews what data-science consulting is like (with fascinating use-cases from industrial metallurgy to "Money Ball"-ing for the Minnesota Vikings)
• Details the hard and soft skills of successful data-science consultants
• Ventures eloquently into the age-old R versus Python debate

Jared leads the New York Open Statistical Programming Meetup, which is the world's largest R meetup — but it also features other open-source programming languages like Python — for talks from global leaders in data science and machine learning. And Jared runs the R Conference, which is approaching its seventh annual iteration next week, Sep 9-10.

Jared also wrote the bestselling book "R for Everyone" and teaches stats at both Columbia University in the City of New York and Princeton University. And none of the massive responsibilities that I've just mentioned are Jared's day job! Nope, for that he's the CEO and Chief Data Scientist of Lander Analytics, a data-science consulting firm.

Watch or listen here.

P.S.: Jared is kindly providing 20% off admission to next week's R Conference off using promo code SDS20. See rstats.nyc for more details, including the first-ever live episode of SuperDataScience (with Drew Conway as guest)!

O'Reilly + JK October ML Foundations LIVE Classes open for registration

Added on August 30, 2021 by Jon Krohn.

The Linear Algebra classes of my "ML Foundations" curriculum, offered via the O'Reilly Media platform, are in the rear-view mirror. Two Calculus classes are coming up soon and the Probability classes just opened for registration:

• Sep 15 — Calculus III: Partial Derivatives
• Sep 22 — Calculus IV: Gradients and Integrals
• Oct 6 — Intro to Probability
• Oct 13 — Probability II and Information Theory
• Oct 27 — Intro to Statistics

Overall, four subject areas are covered:
• Linear Algebra (3/3 classes DONE)
• Calculus (2/4 classes DONE)
• Probability and Statistics (4 classes)
• Computer Science (3 classes)

Hope to see you in class! Sign up opens about two months prior to each class. All of the training dates and registration links are provided at jonkrohn.com/talks

A detailed curriculum and all of the code for my ML Foundations series is available open-source in GitHub here.

The Line Equation as a Tensor Graph

Added on August 30, 2021 by Jon Krohn.

New YouTube video today — it's meaty! In it, we get ourselves set up for applying Machine Learning from scratch by using the popular Python library PyTorch to create a Tensor Graph representation of a simple line equation.

Next week, we'll publish a massive 40-minute video that builds on the Tensor Graph representation introduced this week in order to use Automatic Differential Calculus within a Machine Learning loop and fit a Regression line to data points.

If you're familiar with differential calculus but not machine learning, this pair of videos will fill in all the gaps for you on how ML works. If you're not familiar with differential calculus, the preceding videos in my "Calculus for Machine Learning" course will provide you with all of the foundational theory you need for ML.

We publish a new video from my "Calculus for Machine Learning" course to YouTube every Wednesday. Playlist is here.

More detail about my broader "ML Foundations" curriculum and all of the associated open-source code is available in Github here.

Data Meshes and Data Reliability

Added on August 25, 2021 by Jon Krohn.

The fun and brilliant Barr Moses joins me this week to detail for us what organization-transforming Data Meshes are, as well as how to track and improve the "Data Uptime" (reliability) of your production systems.

Barr is co-founder and CEO of Monte Carlo, a venture capital-backed start-up that has grown in head count by a remarkable 10x in the past year. Monte Carlo specializes in data reliability, making sure that the data pipelines used for decision-making or production models are available 24/7 and that the data are high quality.

In this SuperDataScience episode, Barr covers:
• What data reliability is, including how we can monitor for the "good pipelines, bad data" problem
• How reliable data enables the creation of a Data Mesh that empowers data-driven decision-makers across all of the departments of a company to independently create and analyze data
• How to build a data science team
• How to get a data-focused start-up off the ground, generating revenue and rapidly scaled up

In addition, Barr took time to answer questions from listeners, including those from Svetlana, Bernard, and A Ramesh. Thanks to Scott Hirleman for suggesting Barr as a guest on the show and thanks to Molly Vorwerck for ensuring everything ran perfectly.

Listen or watch here.

Tutorial on "Big O Notation"

Added on August 23, 2021 by Jon Krohn.

Brand-new, hands-on intro to "Big O Notation" — an essential computer science concept. "Big O" allows us to weigh the compute-time vs memory-usage trade-offs of all algorithms, including machine learning models.

This YouTube video is a 45-minute, standalone excerpt from my six-hour "Data Structures, Algorithms, and ML Optimization" course, which focuses on code demos in Python to make understanding concepts intuitive, fun, and interactive.

If you have an O'Reilly Media subscription, the full course was recently published here.

If you'd like to purchase the course, Pearson is offering it this week (until August 28th) at a 70% discount as their "Video Deal of the Week". The URL for this unusually deep discount is here.

This "DSA and ML Optimization" course is the fourth and final quarter of my broader ML Foundations curriculum. All of the associated code is available open-source via GitHub.

AI Recruitment Technology & Deep Learning - Guest Appearance on the Engineered-Mind Podcast

Added on August 23, 2021 by Jon Krohn.

Thanks to Jousef Murad for having me on the popular Engineered-Mind podcast.! Jousef had deeply insightful questions and I enjoyed the experience immensely :)

I spoke with Jousef back in April 2021, where we discussed:

- untapt and how AI powered recruiting works
- My background in neuroscience
- Where to get started when learning ML
- Tips for becoming a deep learning specialist
- What is I’m most excited about in terms of AI
- How I come up with the idea of writing a book

You can listen to the podcast anywhere podcasts are available including Apple Podcasts, Spotify, and Anchor.fm. You can also check out the video directly on YouTube here.

AutoDiff with TensorFlow

Added on August 23, 2021 by Jon Krohn.

PyTorch and TensorFlow are by far the two most widely-used automatic-differentiation libraries. Last week, we used PyTorch to differentiate an equation automatically and instantaneously. Today, we do it with TensorFlow.

(For an overview of the pros and cons of PyTorch versus TensorFlow, I've got a talk here. The TLDR is you should know both!)

A new video for my "Calculus for ML" course published on YouTube every Wednesday. Playlist is here.

More detail about my broader "ML Foundations" curriculum and all of the associated open-source code is available in GitHub here.

Maximizing the Global Impact of Your Career

Added on August 17, 2021 by Jon Krohn.

This week, expert Benjamin Todd details how you can find purpose in your work and maximize the global impact of your career. In particular, he emphasizes how data scientists can exert a massive positive influence.

In this mind-expanding and exceptionally inspiring episode, Ben details:
• An effective process for evaluating next steps in your career
• A data-driven guide to the most valuable skills for you to obtain regardless of profession
• Specific impact-maximizing career options that are available to data scientists and related professionals, such as ML engineers and software developers.

Ben has invested the past decade researching how people can have the most meaningful and impactful careers. This research is applied to great effect via his charity 80,000 Hours, which is named after the typical number of hours worked in a human lifetime. The Y Combinator-backed charity has reached over eight million people via its richly detailed, exceptionally thoughtful, and 100% free content and coaching.

Listen or watch here.

Why The Best Data Scientists have Mastered Algebra, Calculus and Probability

Added on August 16, 2021 by Jon Krohn.

Over the past year, I've published dozens of hours of video tutorials on the mathematical foundations that underlie machine learning and data science. In this talk, I explain *why* knowing this math is so essential.

Thanks to Roberto Lambertini and SuperDataScience for hosting it!