OpenAI released many of the most revolutionary A.I. models of recent years, e.g., DALL-E 2, GPT-3 and Codex. Dr. Miles Brundage was behind the A.I. Policy considerations associated with each transformative release.
Miles:
• Is Head of Policy Research at OpenAI.
• He’s been integral to the rollout of OpenAI’s game-changing models such as the GPT series, DALL-E series, Codex, and CLIP.
• Previously he worked as an A.I. Policy Research Fellow at the University of Oxford’s Future of Humanity Institute.
• He holds a PhD in the Human and Social Dimensions of Science and Technology from Arizona State University.
Today’s episode should be deeply interesting to technical experts and non-technical folks alike.
In this episode, Miles details:
• Considerations you should take into account when rolling out any A.I. model into production.
• Specific considerations OpenAI concerned themselves with when rolling out:
• The GPT-3 natural-language-generation model,
• The mind-blowing DALL-E artistic-creativity models,
• Their software-writing Codex model, and
• Their bewilderingly label-light image-classification model CLIP.
• Differences between the related fields of AI Policy, AI Safety, and AI Alignment.
• His thoughts on the risks of AI displacing versus augmenting humans in the coming decades.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
The A.I. Platforms of the Future
Ben Taylor returns for a third consecutive Five-Minute Friday! This week, he helps us look ahead and dig into what we can expect from the A.I. platforms of the future.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Data Engineering 101
Today's episode is all about Data Engineering — particularly the tools and techniques that Data Scientists should know. "Fundamentals of Data Engineering" book co-authors Matthew Housley and Joe Reis are guests!
Matt and Joe:
• Co-authored the brand-new "Fundamentals of Data Engineering" book that was published by O'Reilly Media and is already a bestseller.
• Co-founded the data architecture and data engineering consultancy Ternary Data. Joe is CEO of the firm while Matt is CTO.
In addition, Joe:
• Is an adjunct professor at the University of Utah.
• Previously founded several tech companies and has held both software engineering and data science roles.
• Holds a math degree from the University of Utah.
Matt:
• Holds a PhD in math from the University of Utah.
• Worked as a professor before becoming a data scientist in industry.
Today’s episode will appeal primarily to technical experts like data scientists and data engineers, but will also be of interest to anyone who manages technology projects that involve data flows.
In this episode, Matt and Joe detail:
• Why they identify as “recovering data scientists”.
• What kinds of people tend to become data scientists versus what kinds tend to become data engineers.
• Key components of their book such as latency trade-offs and the six data engineering undercurrents.
• Their favorite data engineering tools and techniques.
• What the Live Data Stack is and how it’s putting various data professional titles on a collision course.
• The biggest data engineering problems firms face and how to fix them.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Why CEOs Care About A.I. More than Other Technologies
Ben Taylor is back for another Five-Minute Friday this week, this time to fill us in on why CEOs care more about A.I. than any other technology and how to sell them on your machine learning solution.
Special shout-out to my puppy Oboe who features indispensably in the video version of this episode... on Ben's lap! 🐶
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
The Real-World Impact of Cross-Disciplinary Data Science Collaboration
How to unlock breakthroughs — particularly in medicine — through cross-disciplinary data science is the main topic covered this week with the fascinating, trailblazing Professor Philip Bourne.
Philip:
• Is Founding Dean of the University of Virginia's School of Data Science.
• Is also Professor of Biomedical Engineering at Virginia.
• Is Founding Editor-in-Chief of the open-access journal PLOS Computational Biology.
• Was previously Associate Director for Data Science of The National Institutes of Health
Despite Prof. Bourne being a deep technical expert, he conveys concepts so magnificently that today’s episode should be broadly appealing to practicing data scientists and non-technical listeners alike.
In this episode, Philip details:
• Why he founded a School of Data Science.
• Why such schools are uniquely positioned to bear the fruits of applied data science research within universities.
• What the most important data science skills are.
• How computing and data science have evolved across academic departments in the recent decades.
• Fascinating practical applications of his biomedical data science research into the structure and function of biological proteins.
• The absolutely essential role of open-source software and open-access publishing in data science.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
How to Sell a Multimillion Dollar A.I. Contract
Starting today and running for four consecutive weeks, Five-Minute Friday episodes of SuperDataScience feature Ben Taylor as my guest. Each week, he answers a specific ML commercialization or education question.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Simulations and Synthetic Data for Machine Learning
Running Simulations and generating Synthetic Data in order to create more-powerful Machine Learning models is this week's topic. Bewilderingly interesting two-time book author Mars Buttfield-Addison is our guest.
Mars:
• Is co-author of two O'Reilly Media books, "Practical Simulations for Machine Learning" and "Practical Artificial Intelligence with Swift".
• Is pursuing a PhD in computer engineering from the University of Tasmania, focused on writing high-performance software to track space objects.
• Teaches courses on A.I. and data science at the University of Tasmania.
• Is a regular speaker at top tech conferences around the world.
• Holds a bachelor’s degree in software development and data modeling.
Today’s episode should be equally fascinating to technical and non-technical folks alike.
In this episode, Mars details:
• What simulations and synthetic data are, and why they can be invaluable for real-life applications.
• How simulated bots can solve any problem by representing the problem as a 3D visualization.
• Why the mobile operating system language Swift is interesting for A.I.
• How much junk there is in space and why it’s critical we track it.
• What it’s like creating video games in a “secret” Tasmanian games lab.
• Whether programming or statistical skills are more important in data science.
• Why you might want to do a data science internship in industry if you’re thinking of having a career in academia.
Thanks to Suzanne Huston for introducing me to Mars :)
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Artificial General Intelligence is Not Nigh (Part 2 of 2)
Last week, I argued that "Artificial General Intelligence" — an algorithm with the learning capabilities of a human — will not arrive anytime soon. This week, I bolster my argument by summarizing points from luminary Yann LeCun.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Narrative A.I. with Hilary Mason
Hilary Mason, one of the world's best-known data scientists, fills us in on A.I. systems that generate interactive story narratives and on building a thriving early-stage A.I. company. This episode was filmed live on stage — so fun!
Hilary:
• Is Co-Founder and CEO of Hidden Door, a start-up that leverages narrative A.I. to generate unique, customized dialog and graphics in real time, thereby delivering a groundbreakingly immersive video game experience.
• Was previously Founder and CEO of Fast Forward Labs, an emerging-tech research company that was acquired by Cloudera.
• Was Data-Scientist-in-Residence at Accel, a leading venture capital firm.
• Co-founded several iconic tech communities in New York such as DataGotham and HackNY.
• Studied computer science at Brown University and Grinnell College.
• Is known for sharing useful data science knowledge with the public; she has over 120k followers on Twitter and over 160k followers on LinkedIn.
The first half of today’s episode contains some technical elements but by and large the episode should be appealing to anyone who’s keen to be on the cutting edge of machine learning application and commercialization.
In today’s episode, Hilary details:
• How narrative A.I. can assist creativity.
• How to build ML products with no quantitative error function to optimize.
• How to prevent A.I. systems from outputting non-sense or explicit content.
• The emerging ML technique she’s most excited about.
• What it takes to be successful as CEO of an early-stage A.I. company.
• How she’s hopeful A.I. will transform our lives for the better in the future.
Thank you to Jared Lander and Nicole DelGiudice of the New York R Conference for providing us with an amazing live forum to host a live SDS episode and for the exceptional footage. And thanks to Claudia Perlich for introducing me to Hilary!
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Artificial General Intelligence is Not Nigh
A popular perception, propagated by film and television, is that machines are nearly as intelligent as humans. They are not, and they will not be anytime soon. Today's episode throws cold water on "Artificial General Intelligence".
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Data Engineering for Data Scientists
Prolific data science content creator 🎯 Mark Freeman details what Data Engineering is and why it's a critically useful subject area for data scientists to be proficient in. Hear all about it in this week's episode.
Mark:
• Is a Senior Data Scientist, with a Data Engineering specialization, at Humu (startup that has raised $100m in venture capital).
• Posts data science and software engineering tips daily on LinkedIn.
• Previously was data scientist at Verana Health and data analyst at the Stanford University School of Medicine.
• Also holds a Master’s in Community Health and Prevention Research from the Stanford medical school.
Today’s episode is geared toward listeners who are already in a technical role such as data scientists, data engineers, ML engineers, or software engineers — as well as to folks who’d like to grow into these kinds of roles.
In today’s episode, Mark details:
• The differences between junior, senior, and staff data scientists.
• What it takes to get promoted into more senior data science roles.
• How data engineering differs from data science.
• His top tools for data extraction, modeling, and pipeline engineering.
• His top tip for getting hired at a fast-growing VC-backed startup.
• How behavioral nudges can drastically improve workplace experiences.
• Why all data scientists should be interested in web3.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Daily Habit #10: Limit Social Media Use
This article was originally adapted from a podcast, which you can check out here.
At the beginning of the new year, in Episode #538, I introduced the practice of habit tracking and provided you with a template habit-tracking spreadsheet. Then, we had a series of Five-Minute Fridays that revolved around daily habits and we’ve been returning to this daily-habit theme periodically since.
The habits we covered in January and February were related to my morning routine. In the spring, these habit episodes have focused on productivity, and I’ve got another such productivity habit for you today.
To provide some context on the impetus behind this week’s habit, I’ve got a quote for you from the author Robert Greene, specifically from his book, Mastery: "The human that depended on focused attention for its survival now becomes the distracted scanning animal, unable to think in depth, yet unable to depend on instincts."
This suboptimal state of affairs — where our minds are endlessly flitting between stimuli — is exemplified by countless digital distractions we encounter every day, but none is quite as pernicious as the distraction brought to us by social media platforms. When using free social media platforms, you are typically the product — a product being sold to in-platform advertisers. Thus, to maximize ad revenue, these platforms are engineered to keep you seeking cheap, typically unsatisfying dopamine hits within them for as long as they can.
Read MorePyMC for Bayesian Statistics in Python
Learn how Bayesian Statistics can be more powerful and interpretable than any other data modeling approach from Dr. Thomas Wiecki, a Core Developer of PyMC — the leading Bayesian software library for Python.
Thomas:
• Has been a Core Developer of PyMC for over eight years.
• Is Co-Founder and CEO of PyMC Labs, which solves commercial problems with Bayesian data models.
• Previously, he worked as VP Data Science at Quantopian Inc.
• Holds a PhD in Computational Neuroscience from Brown University.
Today’s episode is more on the technical side so will appeal primarily to practicing data scientists.
In this episode, Thomas details:
• What Bayesian statistics is.
• Why Bayesian statistics can be more powerful and interpretable than any other data modeling approach.
• How PyMC was developed and how it trains models so efficiently.
• Case studies of large-scale Bayesian stats applied commercially.
• The extra flexibility of *hierarchical* Bayesian models.
• His top resources for learning Bayesian stats yourself.
• How to build a successful company culture.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
OpenAI Codex
OpenAI's Codex model is derived from the famed GPT-3 and allows humans to generate working code with natural language alone. It's flexibility and capability are quite remarkable! Hear all about it in today's episode.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Guest Appearance on The Evan Solomon Show: Could AI one day become sentient?
At 12:35pm ET today, I'll be live on The Evan Solomon Show discussing the Google LaMDA "sentience" hubbub. The radio show's syndicated across Canada on AM 580 or you can listen online (link in comments ⬇️).
Listen to a recording of my segment here.
The State of Natural Language Processing
As the LaMDA "sentience" hubbub highlights, Natural Language Processing is perhaps the most exciting and rapidly accelerating area of Machine Learning. Hear all about NLP from the deep expert Rongyao HUANG.
(LaMDA is definitely not sentient, by the way... but it is an impressive display of state-of-the-art conversational machine capabilities.)
Rongyao:
• Is Lead Data Scientist at CB Insights, a marketing intelligence platform.
• Previously she worked as a data scientist at a number of other New York start-ups and as a quantitative research assistant at Columbia University.
• She holds a masters in research methodology and quantitative methods from Columbia University in the City of New York.
Today’s episode is more on the technical side so will appeal primarily to practicing data scientists, however the second half of the episode does contain general sage guidance for anyone seeking to navigate career options as well as to balance personal and professional obligations.
In today’s episode, Rongyao details:
• The evolution of NLP techniques over the past decade through to the large transformer models of today.
• The practical implications of this dramatic NLP evolution.
• How the “scaling law” will impact NLP model capabilities over the coming decade.
• The major limitations of today’s NLP approaches and how we might overcome them.
• Her Bauhaus-inspired model for effective data science.
• Her pathfinding model for making effective career choices.
• Her top tips for staying sane while juggling career and family.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Live podcast recording with Hilary Mason at New York R Conference
Thanks to data science legend Hilary Mason and the engaging audience at the New York R Conference for making Friday's live-filmed episode of the SuperDataScience podcast an exhilarating and illuminating success ⚡️
Look out for Hilary's episode as #589, which will be released on July 5th.
Model Speed vs Model Accuracy
In the vast majority of real-world, commercial cases, the speed of a machine learning algorithm is more important than it's accuracy. Hear why in today's Five-Minute Friday episode!
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Bayesian, Frequentist, and Fiducial Statistics in Data Science
Harvard stats prof Xiao-Li MENG founded the trailblazing Harvard Data Science Review. We cover that and why BFFs (Bayesians, frequentists and fiducial statisticians) should be BFFs (best friends forever).
Xiao-Li:
• Is the Founding Editor-in-Chief of the Harvard Data Science Review, a new publication in the vein of the renowned Harvard Business Review.
• Has been a full professor in Harvard’s Dept of Statistics for 20+ years.
• Chaired the Harvard Stats Dept for 7 years.
• Was Dean of Harvard’s Grad School of Arts and Sciences for 5 years.
• Has published 200+ journal articles on statistics, machine learning, and data science, and been cited over 25,000 times.
• Holds a PhD in Statistics from — yep! — Harvard.
Today’s episode will be of interest to anyone who’s keen to better understand the biggest challenges and most fascinating applications of data science today.
In the episode, Xiao-Li details:
• What the Harvard Data Science Review is, why he founded it, and the most popular topics covered by the Review so far.
• The concept of “data minding”.
• Why there’s no “free lunch” with data — tricky trade-offs abound no matter what.
• The surprising paradoxical downside of having lots of data.
• What the Bayesian, Frequentist, and Fiducial schools of statistics are and when each of them is most useful in data science.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Collecting Valuable Data
Recently, I've been covering strategies for getting business value from machine learning. In today's episode, we dig into the most effective ways to obtain and label *commercially valuable* data.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.