Filtering by Category: Data Science

TensorFlow vs PyTorch @ DataScienceGo Virtual

Added on July 14, 2021 by Jon Krohn.

The DataScienceGO Virtual conference is coming up next Saturday and it is FREE! I'm giving a talk on TensorFlow vs PyTorch with lots of time for audience questions.

Fixing Dirty Data

Added on July 14, 2021 by Jon Krohn.

My guest this week is the fixer of dirty data herself, the one and only Susan Walsh. We have a lot of laughs in this episode as we discuss how organizations can save substantial sums by tidying up their data.

Susan has worked for a decade as a data-quality specialist for a wide range of firms across the private and public sectors. For the past four years, she's been doing this work as the founder and managing director of her own company, The Classification Guru Ltd. She's also the author of the forthcoming book, "Between the Spreadsheets", and she hosts her own video interview show called "Live from the Data Den".

Listen or watch here.

The Chain Rule for Derivatives — Topic 59 of Machine Learning Foundations

Added on July 12, 2021 by Jon Krohn.

Today's video introduces the Chain Rule — arguably the single most important differentiation rule for ML. It facilitates several of the most ubiquitous ML algorithms, such as gradient descent and backpropagation.

Gradient descent and backprop will be covered in great detail later in my "Machine Learning Foundations" video series. This video is critical for understanding those applications.

New videos are published every Monday and Thursday to my "Calculus for ML" course, which is available on YouTube.

More detail about my broader "ML Foundations" curriculum and all of the associated open-source code is available in GitHub.

The Quotient Rule for Derivatives — Topic 58 of Machine Learning Foundations

Added on July 8, 2021 by Jon Krohn.

This is the penultimate Derivative Rule and then we're moving onward to AutoDiff with TensorFlow and PyTorch! The Quotient Rule is analogous to the Product Rule introduced on Monday but is for division instead of multiplication.

New videos are published every Monday and Thursday. The playlist for my "Calculus for ML" course is here.

More detail about my broader "ML Foundations" series and all of the associated open-source code is available in GitHub here.

Upcoming O'Reilly Calculus Classes

Added on July 8, 2021 by Jon Krohn.

Starting a week today, I'm offering my entire "ML Foundations" curriculum as a series of 14 live, interactive workshops via O'Reilly Media. The first five classes are open for registration; two are already waitlist-only, so grab a spot now:

• Jul 14 — Intro to Linear Algebra (waitlisted)
• Jul 21 — LinAlg II: Matrix Tensors (5 spots remaining)
• Jul 28 — LinAlg III: Eigenvectors (waitlisted)
• Aug 12 — Intro to Calculus (143 spots remaining)
• Aug 18 — Calc II: AutoDiff (148 spots remaining)

REGARDING THE WAITLIST: I have a made a request with O'Reilly to increase the maximum class size from 600 students to 1000, so if you sign up for a waitlisted class now, you should still be able to get in.

Overall, there will be four subject areas covered:
• Linear Algebra (3 classes)
• Calculus (4 classes)
• Probability and Statistics (4 classes)
• Computer Science (3 classes)

Sign up opens about two months prior to each class. All 14 training dates, running from next week through December, are provided at jonkrohn.com/talks

A detailed curriculum and all of the code for my ML Foundations series is available open-source in GitHub here.

Financial Data Engineering

Added on July 7, 2021 by Jon Krohn.

This week's guest is Doug Eisenstein, an exceptionally clear and content-rich communicator. He fills us in on the complexity of engineering a coherent source of truth for financial models, integrating hundreds of data sources.

Topics covered in the episode include:
• A breakdown of the primary financial sectors and departments
• Why data source integration for finance is wildly complicated
• Specific data engineering approaches that resolve these issues including entity resolution, knowledge graph mapping and tri-temporality.

20 years ago, Doug founded the consulting firm, Advanti and they have since become a critical provider of solutions to complex data engineering problems faced by some of the world's largest banks and asset managers including Morgan Stanley, Bank of America, Citibank and State Street.

Listen or watch here.

The Product Rule for Derivatives

Added on July 5, 2021 by Jon Krohn.

Today's video is on the Product Rule, a relatively advanced Derivative Rule. Only a couple such rules remain and then we move onward to Automatic Differentiation with PyTorch and TensorFlow.

New videos are published every Monday and Thursday. The playlist for my "Calculus for ML" course is here.

More detail about my broader "ML Foundations" series and all of the associated open-source code is available in GitHub here.

machinelearning,datascience,calculus,mathematics,python

Algorithm Aversion

Added on July 5, 2021 by Jon Krohn.

This article was originally adapted from a podcast, which you can check out here.

In many domains, algorithms predict the future better than human forecasters. Despite this, people are susceptible to a cognitive bias called Algorithm Aversion.

Exercises on Derivative Rules — Topic 56 of Machine Learning Foundations

Added on July 1, 2021 by Jon Krohn.

Today's YouTube video uses five fun exercises to test your understanding of the derivative rules we’ve covered so far: the Constant Rule, Power Rule, Constant-Multiple Rule, and Sum Rule.

New videos are published every Monday and Thursday. The playlist for my "Calculus for ML" course is here.

More detail about my broader "ML Foundations" series and all of the associated open-source code is available in GitHub here.

If you’d happen to like a detailed walkthrough of the solutions to all the exercises in this video, you can check out my Udemy course called Mathematical Foundations of Machine Learning. See jonkrohn.com/udemy

Setting Yourself Apart in Data Science Interviews

Added on June 29, 2021 by Jon Krohn.

For this week's guest episode, I interrogated Andrew Jones on his data science interview secrets. If you want to improve your interview performance — especially if you're in a data-related career — this episode's for you.

Andrew has held a number of senior data roles over the past decade, including at the tech giant Amazon. In those roles, Andrew interviewed hundreds upon hundreds of data scientists, leading him to create his Data Science Infinity educational program, a curriculum that provides you with the hard and soft skills you need to set yourself apart from other data scientists during the interview process.

Listen or watch here.

The Sum Rule for Derivatives

Added on June 28, 2021 by Jon Krohn.

Thus far in this set of videos on Differentiation Rules, we’ve covered the Constant, Power, and Constant-Multiple rules. Today's video is on the Sum Rule. On Thursday, we'll have comprehension exercises on all four key rules!

The Constant Multiple Rule for Derivatives

Added on June 25, 2021 by Jon Krohn.

Continuing my short series on Differentiation Rules, today’s video covers the Constant Multiple Rule. This rule is often used in conjunction with the Power Rule, which was covered in the preceding video, released on Monday.

New videos are published every Monday and Thursday. The playlist for my "Calculus for ML" course is here.

More detail about my broader "ML Foundations" series and all of the associated open-source code is available in GitHub here.

Data Community Content Creator Awards

Added on June 24, 2021 by Jon Krohn.

I am surprised and utterly delighted to be recognized yesterday with the Data Community Content Creator Award for the "Machine Learning and AI" YouTube category. 🥳

From my perspective, my YouTube channel is still in its early days so while I did not anticipate formal recognition like this perhaps ever, I *certainly* did not so soon after launching the channel. This is a massive, galvanizing signal that I should continue pressing on with this nascent video-creation effort — I absolutely will!

First off, thank you to everyone who voted. This category was apparently one of the tightest races in this "Peoples' Choice"-style awards show, so truly your individual vote may have tipped the award in my favor.

Many thanks are due to Sangbin Lee and Maria Lee, who have edited, produced, branded, and marketed every single video on my channel since day one. My freely-available YouTube content would not exist without them. Thanks as well to Guillaume Rousseau, who recently joined us and dramatically accelerated how quickly we can publish perfectly-edited videos.

Finally, thanks to Harpreet Sahota and Kate Strachnyi who conceived of the DCCCA show and delivered it with the flair, fun, and precision that we'd expect from them!

The entire ceremony is on YouTube here. And a short recap post is here.

Performance Marketing Analytics

Added on June 24, 2021 by Jon Krohn.

My guest this week is Kris Tait, who fills us in on how data and machine learning have transformed — and will continue to transform — marketing, enabling even small firms to effectively target customers and grow their revenue.

In this episode of the SuperDataScience show, we cover:
• What performance marketing is
• The rapidly shifting digital marketing ecosystem, as well as how data and ML can mitigate the risks associated with these changes
• The sweet spot for augmenting human marketers' skills with machines
• How any firm should define metrics to maximize return on marketing investment, thereby ensuring broader commercial success
• The most useful modern data science tools for global digital marketing

Kris is the managing director for the US at Croud - Performance Marketing Agency of the Year, an innovative marketing agency that is driven by data analytics and machine learning algorithms.

Listen or watch here.

The Power Rule for Derivatives

Added on June 21, 2021 by Jon Krohn.

On Thursday, I published a video on the Constant Rule, the first video in a series on Differentiation Rules. Today, we continue the series with the Power Rule, arguably the most common and most important of all the rules.

New videos are published every Monday and Thursday. The playlist for my "Calculus for ML" course is here.

More detail about my broader "ML Foundations" series and all of the associated open-source code is available in GitHub here.

The Derivative of a Constant

Added on June 19, 2021 by Jon Krohn.

This and the next several videos will provide you with clear and colorful examples of all of the most important differentiation rules. We kick these rules off with the Constant Rule.

The derivative rules are critical to machine learning as they allow us to find the derivatives of cost functions. These cost-function derivatives are concatenated into the "gradient" that we descend to allow ML models to learn.

New videos are published every Monday and Thursday. The playlist for my "Calculus for ML" course is here.

More detail about my broader "ML Foundations" series and all of the associated open-source code is available in GitHub here.

Knowledge Graphs

Added on June 15, 2021 by Jon Krohn.

In this week's guest episode, wildly intelligent and meticulously communicative Maureen Teyssier, Ph.D. explains what Knowledge Graphs are, why they're so powerful, and how to grow a flourishing data science team.

In more detail, in today’s episode we cover:
• The theory and applications of Knowledge Graphs, a cool and powerful data type at the heart of much of Maureen’s work at Reonomy
• The data science techniques that Reonomy use to flow data through extremely high-volume pipelines, enabling them to efficiently apply models to their massive data sets
• What Maureen looks for in the data scientists that she hires and the tools and approaches she leverages in order to grow a highly effective data science team
• The differences between data scientists, data analysts, data engineers, and machine learning engineers.
• Maureen’s fascinating academic work in which she used gigantic supercomputers to simulate solar systems and galaxies

Maureen is Chief Data Scientist at Reonomy, a very well-funded New York start-up — they’ve raised over 100 million dollars — that is transforming the world of commercial real estate with data and data science. Prior to working in industry, Maureen was an academic working in the field of computational astrophysics; she obtained her PhD from Columbia University in the City of New York and then carried out research at Rutgers University in New Jersey.

Listen here.

Derivative Notation

Added on June 14, 2021 by Jon Krohn.

In today's YouTube video, we detail all of the most common notation for derivatives. This lays the foundation for a fun, immediately forthcoming series of videos covering all of the major differentiation rules. Enjoy!

New videos are published every Monday and Thursday. The playlist for my "Calculus for ML" course is here.

More detail about my broader "ML Foundations" series and all of the associated open-source code is available in GitHub here.

How Derivatives Arise from Limits

Added on June 11, 2021 by Jon Krohn.

In today's video, we use hands-on code demos in Python to find the slopes of curves with the Delta Method. While finding these slopes, we derive together — from first principles — the most important Differential Calculus formula.

This video is part of a thematic segment of videos on Differentiation. In the forthcoming videos, we’ll cover derivative notation and a series of useful rules for differentiation.

New videos are published every Monday and Thursday. The playlist for my "Calculus for ML" course is here.

More detail about my broader "ML Foundations" series and all of the associated open-source code is available in GitHub here.

How to Thrive as an Early-Career Data Scientist

Added on June 8, 2021 by Jon Krohn.

Getting started in data science? Today's episode is for you! Sidney Arcidiacono is absolutely crushing her first year in the field; we discuss the options for getting started in the field and top tips for early-career success.

Trained as a phlebotomist (blood-sample collection), Sidney was inspired by the potential for machine learning to revolutionize healthcare, so she jumped feet first into a full-time computer science degree at Make School, specializing in the data science track. From no familiarity with code or models just a year ago, Sidney's immersion has paid off: She's now fluent in the modern data science software stack and landed a summer data science internship at GreenLight Biosciences, Inc., an RNA-molecule therapeutics firm (like the Pfizer/BioNTech/Moderna vaccines).

Sidney is terrifically sharp and engaging; I think you'll enjoy hearing from her as much as I did during filming.

Watch or listen here.