Seventy years ago, the iconic AT&T Bell Labs unveiled cells that could transform sunlight into power. What started as a potential replacement for batteries in remote locations has now become a global phenomenon. Today, solar panels cover an area the size of Jamaica and provide approximately 6% of the world's electricity.
Read MoreFiltering by Category: Data Science
How to Thrive in Your (Data Science) Career, with Daliana Liu
In today's episode, renowned Daliana Liu details how to overcome common (unhelpful!) career mindsets and thrive professionally, including finding your niche and getting promoted... all without burning out!
If you haven’t already heard of her, Daliana:
• Is well-known for her content creation on data science careers, particularly career-growth strategies, leading her to have >280,000 LinkedIn followers.
• Her The Data Scientist Show is in the top 2% of all podcasts globally in terms of downloads.
• Specializes in 1:1 career coaching as well as coaching groups through structured programs like her upcoming "Survive and Thrive in Data Science and AI Careers" course.
• Previously worked as a Senior Data Scientist at AWS and Predibase (a Bay Area open-source LLMs startup).
• Holds a Master's in Statistics from UC Irvine.
Today’s episode is well-suited to *anyone* who’d like to thrive more than ever professionally; it will particularly appeal to data scientists and related professionals like data analysts, ML engineers and software developers… but most of the advice Daliana covers is beneficial to anyone.
In today’s episode, Daliana details:
• Common unhelpful career mindsets and how to overcome them.
• How to find the role you really want as opposed to the one you think you want.
• How to find your niche in a fast-moving field.
• How to offset common professional issues like imposter syndrome, distraction and burnout.
• Her top tips for accelerating a technical career.
• The must-know tech skills for data scientists in today’s market.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
In Case You Missed It in June 2024
June was yet another month of phenomenal guests on the SuperDataScience Podcast I host. ICYMI, today's episode highlights the most fascinating moments of my conversations with them.
Specifically, conversation highlights include:
Dr. Jason Yosinski, one of my all-time favorite deep-learning researchers and CEO/co-founder of climate-tech startup Windscape AI, shares the secrets to capturing investor interest and what it takes to turn heads in the AI startup scene. Spoiler alert: it’s more than just having a great idea! 🚀
Dr. Gina Guillaume-Joseph, systems engineer and A.I.-regulation guru, details the evolving regulatory field for A.I., helping you ensure that the A.I. systems you deploy won't fall foul of any laws.
Alexandre Andorra, co-founder of PyMC Labs and host of the Learning Bayesian Statistics podcast, on why being able to crunch larger and larger datasets has helped us to use a powerful modeling technique that was originally devised centuries ago (Bayesian stats, of course!)
Dr. Nathan Lambert, research scientist for the Allen Institute for AI (AI2) who previously built out the reinforcement learning from human feedback (RLHF) team at Hugging Face, on the lack of robustness in RLHF and how that could impact the future development and deployment of AI systems.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Merged LLMs Are Smaller And More Capable, with Arcee AI’s Mark McQuade and Charles Goddard
Today's episode is seriously mind-expanding. In it, Mark and Charles detail how they're pushing the A.I. frontier through LLM merging, extremely efficient (even CPU-only!) LLM training, and *Small* Language Models.
Mark McQuade:
• Is Co-Founder and CEO of Arcee.ai.
• Previously, he held client-facing roles at Hugging Face and Roboflow as well as leading the data science and engineering practice of a Rackspace company.
• He studied electronic engineering at Fleming College in Canada.
Charles Goddard:
• Is Chief of Frontier Research at Arcee.ai
• Previously, he was a software engineer at Apple and the famed NASA Jet Propulsion Laboratory.
• Studied engineering at Olin College in Massachusetts.
Today’s episode is relatively technical so will likely appeal most to hands-on practitioners like data scientists and ML engineers. In it, Charles and Mark detail:
• How their impressive open-source model-merging approach combines the capabilities of multiple LLMs without increasing the model’s size.
• A separate open-source approach for training LLMs efficiently by targeting specific modules of the network to train while freezing others.
• The pros and cons of Mixture-of-Experts versus Mixture-of-Agents approaches.
• How to enable small language models to outcompete the big foundation LLMs like GPT-4, Gemini and Claude.
• How to leverage open-source projects to land big enterprise contracts and attract big chunks of venture capital.
On that final note, congrats to the Arcee.ai team on announcing their $24m Series A round this very day... unsurprising given their tremendously innovative tech and rapid revenue ramp-up! It's very rare to see runaway A.I. startup successes like this one.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
AGI Could Be Near: Dystopian and Utopian Implications, with Dr. Andrey Kurenkov
In today's episode, you can immerse yourself in one my favorite on-air convos ever: with the exceptional Andrey Kurenkov, on how soon AGI could be realized and the potential utopian/dystopian implications.
Andrey:
Founded and co-hosts my favorite podcast, "Last Week in A.I.", a weekly program that recaps all of the A.I.-related news you need to know about.
Is an ML Scientist at Astrocade, an NVIDIA-backed generative AI platform that converts your natural-language prompt into a functional video game.
Holds a PhD in Computer Science from Stanford University, with research focused on robotics and reinforcement learning.
Today’s episode should be of interest to just about anyone!
In it, Andrey details":
The genesis of the wide range of A.I. publications and podcasts he’s founded.
What the future of text-to-video-game generative A.I. could look like.
Why “A.I. as a product” rarely works commercially, but what you can succeed at with A.I. instead.
Why A.I. robotics is suddenly progressing so rapidly.
In one of my favorite on-air conversations ever, how soon AGI could be realized and the potentially dystopian or utopian implications.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Claude 3.5 Sonnet: Frontier Capabilities & Slick New “Artifacts” UI
Anthropic’s latest publicly released model, Claude 3.5 Sonnet. This might not seem like a big deal because it’s not a “whole number” release like Claude 3 was or Claude 4 eventually will be, but in fact, it’s quite a big deal as this model now appears to actually represent the state of the art for text-in/text-out generative LLM, outcompeting the other frontier models like OpenAI’s GPT-4o and Google’s Gemini.
Read MoreDeep Learning Classics and Trends, with Dr. Rosanne Liu
Today's guest is the amazing Google DeepMind research scientist, Dr. Rosanne Liu!
Rosanne:
• Is a Research Scientist at Google DeepMind in California.
• Is Co-Founder and Executive Director of ML Collective, a non-profit that provides global ML research training and mentorship.
• Was a founding member of Uber AI Labs, where she served as a Senior Research Scientist.
• She has published deep learning research in top academic venues such as NeurIPS, ICLR, ICML and Science, and her work has been covered in publications like WIRED and the MIT Tech Review.
• Holds a PhD in Computer Science from Northwestern University.
Today’s episode, particularly in the second half when we dig into Rosanne’s fascinating research, is relatively technical so will probably appeal most to hands-on practitioners like data scientists and ML engineers.
In today’s episode, Rosanne details:
• The problem she founded the ML Collective to solve.
• How her work on the “intrinsic dimension” of deep learning models inspired the now-standard LoRA approach to fine-tuning LLMs.
• The thorny problems with LLM evaluation benchmarks and how they might be solved.
• The pros and cons of curiosity- vs goal-driven ML research.
• The positive impacts of diversity, equity and inclusion in the ML community.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Fast-Evolving Data and AI Regulatory Frameworks, with Dr. Gina Guillaume-Joseph
A.I. regulatory frameworks are proliferating globally, protecting personal privacy while unlocking "dark data" for A.I.-model training. In today's episode, Dr. Gina Guillaume-Joseph is our expert guide to these A.I. regulations.
Gina:
Was, until recently, the CTO responsible for Government at Workday, aligning the HRtech giant with the U.S. federal government’s tech transformation strategy.
Prior to Workday, was Director of Technology at financial giant Capital One.
Earlier, spent 16 years supporting the federal government as a contractor with leading firms like Booz Allen Hamilton and The MITRE Corporation.
Now works as a fractional Chief Information Officer and as Adjunct Faculty at The George Washington University.
Holds a PhD in Systems Engineering from George Washington University and a Bachelor’s in Computer Science from Boston College.
Today’s episode should be of interest to just about anyone who would listen to this podcast because it focuses on the data and A.I. regulatory frameworks that will transform our industry.
In today’s episode, Gina details:
The “dark data conundrum”.
The most important data and A.I. regulations of recent years as well as those that are coming soon.
The pros and cons of being or hiring a fractional executive.
What system engineering is and why it’s an invaluable background for implementing large-scale A.I. projects.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Exciting (and Frightening!) Trends in Open-Source AI
Friday's short episode of my podcast features four data-science luminaries (Emily Zabor, James David Long, Drew Conway and Jared Lander) explicating on the most exciting open-source A.I. trends they see.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
In Case You Missed It in May 2024
We had another incredible set of guests in May on the SuperDataScience Podcast I host. ICYMI, today's episode highlights the most fascinating moments of my conversations with them.
Specifically, conversation highlights include:
1. Dr. Luis Serrano, a math- and ML-education YouTuber with 150k subscribers, explaining what language embeddings are, how they function, and how essential they are for running semantic search queries.
2. Sol Rashidi, serial C-suite data-role executive at Fortune 100s and bestselling author of "Your A.I. Survival Guide", on her approach to building data teams.
3. Co-founder of the MLOps Community, Demetrios Brinkmann, on the differences between ML Engineering and MLOps roles.
4. Navdeep Martin, an entrepreneur blending climate tech and generative A.I. in her latest startup, on opportunities where you can tackle climate change with technological innovation yourself.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Reinforcement Learning from Human Feedback (RLHF), with Dr. Nathan Lambert
In today's episode, the renowned RLHF thought-leader Dr. Nathan Lambert digs into the origins of RLHF, its role today in fine-tuning LLMs, emerging alternatives to RLHF... and how GenAI may democratize (human) education!
Nathan:
• Is a Research Scientist at the Allen Institute for AI (AI2) in Seattle, where he’s focused on fine-tuning Large Language Models (LLMs) based on human preferences as well as advocating for open-source AI.
• He’s renowned for his technical newsletter on AI called "Interconnects".
• Previously helped build an RLHF (reinforcement learning from human feedback) research team at Hugging Face.
• Holds a PhD from University of California, Berkeley in which he focused on reinforcement learning and robotics, and during which he worked at both Meta AI and Google DeepMind.
Today’s episode will probably appeal most to hands-on practitioners like data scientists and machine learning engineers, but anyone who’d like to hear from a talented communicator who works at the cutting edge of AI research may learn a lot by tuning in.
In today’s episode, Nathan details:
• What RLHF is and how its roots can be traced back to ancient philosophy and modern economics.
• Why RLHF is the most popular technique for fine-tuning LLMs.
• Powerful alternatives to RLHF such as RLAIF (reinforcement learning from A.I. feedback) and direct distilled preference optimization (dDPO).
• Limitations of RLHF.
• Why he considers AI to often be more alchemy than science.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Open-Source Libraries for Data Science at the New York R Conference
For today's short episode, I asked four data-science luminaries about their favorite open-source libraries. Hear what Emily Zabor, James David Long, Drew Conway and Jared Lander chose, live on stage!
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
ML for Wind-Powered Energy Generation, with Dr. Jason Yosinski
One of my all-time favorite A.I. researchers, Dr. Jason Yosinski, is my guest today! He details how his startup is using ML to collect wind energy more efficiently and digs into visualizing/understanding deep neural networks.
Jason:
• Is Co-Founder and CEO of Windscape AI, a startup using ML to increase the efficiency of energy generation via wind turbines.
• Is Co-Founder and President of the ML Collective, a research group that’s open to ML researchers anywhere.
• Was a Co-Founder of the A.I. Lab at the ride-share company Uber.
• Holds a PhD in Computer Science from Cornell, during which he worked at the NASA Jet Propulsion Laboratory, Google DeepMind and with the eminent Yoshua Bengio in Montreal.
• His work has been featured in The Economist, on the BBC and, coolest of all, in an XKCD comic!
Today’s episode gets fairly technical in parts so may be of greatest interest to hands-on practitioners like data scientists and ML engineers, although there are also parts that will appeal to anyone keen to hear how ML is being used to produce more clean energy.
In today’s episode, Jason details:
• How ML can make wind direction more predictable, thereby making wind turbines and power grids in general more efficient.
• How to infer what individual neurons in a deep learning model are doing by using visualizations.
• Why freezing a particular layer of a neural net prior to doing any training at all can lead to better results.
• How you can get involved in a cutting-edge research community no matter where you are in the world.
• What traits make for successful A.I. entrepreneurs.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Multi-Agent Systems: How Teams of LLMs Excel at Complex Tasks
Groundbreaking multi-agent systems (MAS, for short) are transforming the way AI models collaborate to tackle complex challenges.
Read MoreMLOps: The Job and The Key Tools, with Demetrios Brinkmann
Today, global MLOps community leader Demetrios Brinkmann details why MLOps is essential, how it differs from related roles like LLMOps, DevOps and A.I. Engineering, and the best tools for deploying and scaling LLMs.
Demetrios:
• Is Founder and CEO of MLOps Community, an organization dedicated to supporting MLOps professionals that has quickly grown to over 20,000 members.
• Was previously founder of the Data on Kubernetes community.
• Before that, worked in public-facing roles at a number of European tech startups.
Today’s episode will be of interest to anyone who’s keen to better understand the critical function of MLOps in bringing machine learning models to the real world.
In today’s episode, Demetrios details:
• What exactly MLOps is and how it relates to other jobs like LLMOps, DevOps and A.I. Engineer.
• The key MLOps tools and approaches.
• What it takes to build a thriving community of tens of thousands of professionals in just a few years.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
The Six Keys to Data Scientists’ Success, with Kirill Eremenko
For today's episode, Kirill Eremenko — who has taught more than 2.8 million people data science — fills us in on his six most valuable insights about data science careers.
More on Kirill:
• Founder and CEO of SuperDataScience, an e-learning platform that is the namesake of this very podcast.
• Launched the SuperDataScience Podcast in 2016 and hosted the show until he passed me the reins four years ago.
• Has reached more than 2.8 million students through the courses he’s published on Udemy, making him Udemy’s most popular data science instructor.
At a high level, Kirill's six data science insights are:
1. Unlike many other careers, there’s no need for formal credentials to become a data scientist.
2. Mentors can be invaluable guides in a DS career, but you should also try to give back to your mentors when you can.
3. Portfolios are the key to landing the DS job of your dream because they showcase your DS abilities for all to see.
4. Hands-on labs are a fun, interactive way to develop your portfolio and are a great complement to classes.
5. Collaborations can make lots of aspects of DS career development fun, including learning new materials, completing labs and developing your portfolio.
6. Data scientists can come from any background and work from anywhere in the world with an Internet connection.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Math, Quantum ML and Language Embeddings, with Dr. Luis Serrano
Today, Dr. Luis Serrano (a master at making complex math and ML topics friendly) leads a mind-expanding discussion on embeddings in LLMs, Quantum ML and what the next big trends in A.I. will be. I wouldn't miss this one 🤯
Luis:
• Is the beloved creator behind the Serrano Academy, an educational YouTube channel on math and ML with over 146,000 subscribers.
• Until this month, he worked as Head of Developer Relations at Cohere, one of the world’s few A.I. labs that is actually at the frontier of LLMs.
• Prior to that, he was a Quantum A.I. Research Scientist at Zapata Computing, Lead A.I. Educator at Apple, Head of Content for A.I. at Udacity and ML Engineer at Google.
• Holds a PhD in Math from the University of Michigan.
Today’s episode should be appealing to just about anyone! In it, Luis details:
• How supposedly complex topics like math and A.I. can be made easy to understand.
• How Cohere’s focus on enterprise use cases for LLMs has led it to specialize in embeddings, the most important component of LLMs.
• The promising application areas for Quantum Machine Learning.
• What the next big trends in A.I. will be.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Aligning Large Language Models, with Sinan Ozdemir
For today’s quick Five-Minute Friday episode, the exceptional author, speaker and entrepreneur Sinan Ozdemir provides an overview of what it actually means for an LLM to be “aligned”.
More on Sinan:
• Is Founder and CTO of LoopGenius, a generative AI startup.
• Has authored several excellent books, including, most recently, the bestselling "Quick Start Guide to Large Language Models".
• Is a serial AI entrepreneur, including founding a Y Combinator-backed generative AI startup way back in 2015 that was later acquired.
This episode was filmed live at the Open Data Science Conference (ODSC) East in Boston last month. Thanks to ODSC for providing recording space.
The Super Data Science Podcast is available on all major podcasting platforms and a video version is on YouTube. This is episode #784!
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Generative A.I. for Solar Power Installation, with Navdeep Martin
A startling 70% of solar-power projects fail. In today's episode, hear how Navdeep Martin's startup Flypower is using Generative A.I. to ensure we install renewable energy sources more effectively and efficiently.
Navdeep:
• Co-founder and CEO of Flypower, a generative A.I. startup dedicated to ensuring clean-energy projects, particularly solar-power projects, succeed.
• Previously held senior product leadership roles at VC-backed Bay Area AI startups as well as for AI products at Comcast and The Washington Post.
• Before that, was a software engineer for the CIA.
• Holds a degree in computer science from William & Mary and an MBA from the University of Virginia.
Today’s episode will appeal to anyone who’d like to hear about the evolution of generative A.I. technologies in products and applications, including how you can best make use of the various categories of Gen-A.I. technologies today and how, in particular, A.I. is being used to overcome the social and regulatory hurdles associated with combating climate change.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
In Case You Missed It in April 2024
Other than excessive maleness and paleness*, April 2024 was an excellent month for the podcast, packed with outstanding guests. ICYMI, today's episode highlights the most fascinating moments of my convos with them.
Specifically, conversation highlights include:
1. Iconic open-source developer Dr. Hadley Wickham putting the "R vs Python" argument to bed.
2. Aleksa Gordić, creator of a digital A.I.-learning community of 160k+ people, on the movement from formal to self-directed education.
3. World-leading futurist Bernard Marr on how we can work with A.I. as opposed to it lording over of us.
4. Educator of millions of data scientists, Kirill Eremenko, on why gradient boosting is so powerful for making informed business decisions.
5. Prof. Barrett Thomas on how drones could transform same-day delivery.
*Remedied in May!
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.