Anthropic’s latest publicly released model, Claude 3.5 Sonnet. This might not seem like a big deal because it’s not a “whole number” release like Claude 3 was or Claude 4 eventually will be, but in fact, it’s quite a big deal as this model now appears to actually represent the state of the art for text-in/text-out generative LLM, outcompeting the other frontier models like OpenAI’s GPT-4o and Google’s Gemini.
For a bit of relevant context to tee things up, a quick refresher that Claude 3 came in three sizes:
Haiku is the smallest, fastest and cheapest in the family.
Sonnet is the mid-size model that’s a solid default for most tasks.
Opus is the full-size model that was my favorite text-in/text-out model… well, perhaps until now!
Anthropic so far has released only Claude 3.5 Sonnet, the mid-size model, but in my testing as well as on benchmarks, it outperforms the much larger Claude 3 Opus from a capability perspective. This is amazing because Claude 3.5 Sonnet is much smaller, so not only is it better at complex tasks like code generation, writing high-quality content, summarizing lengthy documents, and creating insights and visualizations from unstructured data… It's also twice as fast!
In terms of quantifying Claude 3.5 Sonnet’s capabilities, we’ve talked on this podcast many times about how benchmarks are not always the most reliable indicator of capabilities because they can be gamed, but alongside my personal assessment of 3.5 Sonnet’s frontier capabilities the model does set new benchmarks across:
The most oft-cited benchmark, MMLU, which assesses undergrad-level knowledge
GPQA, which assess graduate student-level reasoning
HumanEval assessment of coding proficiency
In terms of machine vision, 3.5 Sonnet is about 10% better than Claude 3 Opus across vision benchmarks, performing particularly well at accurately transcribing text out of difficult-to-read photos.
On top of all of the above — the SOTA capabilities, the rapid speed, low cost and broad accessibility — another super cool item is that Anthropic released a new experimental UI feature alongside 3.5 Sonnet that they’ve called Artifacts. When you have Artifacts enabled and you ask Claude to generate content like code, documents or even a functioning website, these appear in a side-by-side panel alongside your text-in/text-out conversation, so your conversation is on the left while the outputs are on the right. This is a game-changer because seeing these outputs on the side means that you don’t need to scroll up and down through your conversation to find the output; it’s just conveniently there in front of you as well.
For me personally, Claude was already my go-to model for most generative AI tasks; this 3.5 Sonnet release has cemented that position for me even more.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.