In today's episode, the renowned RLHF thought-leader Dr. Nathan Lambert digs into the origins of RLHF, its role today in fine-tuning LLMs, emerging alternatives to RLHF... and how GenAI may democratize (human) education!
Nathan:
• Is a Research Scientist at the Allen Institute for AI (AI2) in Seattle, where he’s focused on fine-tuning Large Language Models (LLMs) based on human preferences as well as advocating for open-source AI.
• He’s renowned for his technical newsletter on AI called "Interconnects".
• Previously helped build an RLHF (reinforcement learning from human feedback) research team at Hugging Face.
• Holds a PhD from University of California, Berkeley in which he focused on reinforcement learning and robotics, and during which he worked at both Meta AI and Google DeepMind.
Today’s episode will probably appeal most to hands-on practitioners like data scientists and machine learning engineers, but anyone who’d like to hear from a talented communicator who works at the cutting edge of AI research may learn a lot by tuning in.
In today’s episode, Nathan details:
• What RLHF is and how its roots can be traced back to ancient philosophy and modern economics.
• Why RLHF is the most popular technique for fine-tuning LLMs.
• Powerful alternatives to RLHF such as RLAIF (reinforcement learning from A.I. feedback) and direct distilled preference optimization (dDPO).
• Limitations of RLHF.
• Why he considers AI to often be more alchemy than science.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.