Meta releasing its giant (405-billion parameter) Llama 3.1 model is a game-changer: For the first time, an "open-source" LLM competes at the frontier (against proprietary models GPT-4o and Claude).
KEY INFO
The 405B member of the Llama 3.1 model family (grain of salt: according to Meta's own research and data) performs (on both benchmarks as well as on human evaluations) on par with the closed-source, proprietary models that are at the absolute frontier of generative A.I. capabilities (i.e., OpenAI's GPT-4o, Anthropic's Claude 3.5 Sonnet and Google's Gemini).
As part of this Llama 3.1 release, Meta also provided 8B and 70B models, which seem to outperform similarly-sized open-source competitors like Google's Gemma 7B and Mistral AI's Mixtral 8x22B, respectively.
Like earlier Llama releases, Meta has additionally provided fine-tuned versions of these LLMs for instruction-following and chat applications.
Expanded context window to 128,000 tokens (approx. 100,000 words) lags far behind Gemini (with a 2-million token window) but otherwise is near the context-window frontier.
Multilingual support for 8 languages (English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai).
TECHNICAL INFO
Trained on over 15 trillion tokens using 16k NVIDIA H100 GPUs.
Decoder-only transformer architecture for training stability (as opposed to, say, a mixture-of-experts approach).
Post-training involving supervised fine-tuning and Direct Preference Optimization (DPO).
New safety tools: "Llama Guard 3" for content moderation and "Prompt Guard" against prompt injection attacks.
IMPACT & ACCESS
While not truly "open-source" (because only model weights are provided, not data or code), releasing an LLM that competes at the frontier may raise safety concerns (malevolent actors now have unfettered access to cutting-edge A.I. tech), but for the most part, this should be a boon to A.I. application developers and make a positive impact on society by providing more flexibility for innovation across various industries (e.g., healthcare, education, science).
Wide accessibility through partnerships with Amazon Web Services (AWS), Databricks, Snowflake, NVIDIA and others (even with Google Cloud and Microsoft Azure, Meta's big-tech competitors who were previously solely claiming the frontier of LLM capabilities with proprietary models).
Available on GitHub and Hugging Face for immediate access, fine-tuning and deployment on your own infrastructure.
WHY WOULD META DO THIS?
Helps them compete for top A.I. talent.
Undercuts big-tech rivals by commoditizing frontier GenAI.
Claim that open-source increases security by allowing anyone to kick tires.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.