Today’s episode will fill you in on everything you need to know about an important model OpenAI recently released to the public called o3-mini.
OpenAI’s o3-mini is a reasoning model like DeepSeek’s R1 model (which I detailed two weeks ago in Episode #860) and also like the original, super-famous reasoning model o1, which made a huge splash when it was released by OpenAI back in September (and which I covered back in Episode #820).
As a quick recap, reasoning models like o1, R1 and o3 work through problems step by step in the background before outputting a response to your query. Compared to models like GPT-4o and Claude 3.5 Sonnet that immediately begin streaming their outputs, reasoning models are far more effective at the same kinds of tasks that you might tackle step by step with pencil and paper (such as math problems and challenging coding problems).
There are two reasons why this new o3-mini is such an important release:
First, when left “thinking” long enough (o3-mini has three modes: low, medium and high, where high carries out the most inference-time compute) o3-mini achieves state-of-the-art performance relative to any other publicly-available model on a number of key, challenging benchmarks including the AIME Math benchmark, the Codeforces coding benchmark and the SWE-Bench Verified benchmark that consists of challenging real-world software-engineering problems. To be more explicit, this means that o3-mini high outperforms not only o1-mini, but also DeepSeek-R1 and even OpenAI’s much-more-expensive-to-run, full-size o1 model.
Which brings me to the second reason why o3-mini is such an important release: Because o3-mini is relatively small, it’s way cheaper than o1 to run. While o1 costs $15 per million input tokens and $60 per million output tokens, o3-mini costs just 7% of that. (Note that o3-mini is about twice the cost to run relative to R1 on DeepSeek’s cloud infrastructure in China, but if you want to run R1 with a US cloud provider, o3-mini costs half as much to run.)
To recap that, the key points are that o3-mini provides SOTA performance on complex tasks that require step-by-step reasoning, all at bargain prices compared to the first generation of reasoning models.
So, how can you access this powerful new o3-mini model? Free-tier users of ChatGPT can get a taste of o3-mini by selecting “Reason” in the chat box when you make your query. And, if you have a paid ChatGPT plan (such as ChatGPT Plus, Team or Pro), you can access the o3-mini- high model that spends the most time doing inference-time computation but also provides the SOTA capabilities I’ve been touting in this episode. You can also use the ChatGPT API to embed o3-mini’s reasoning capabilities into any application your heart desires (I’ve got a link to instructions on how to do that in the show notes). Depending on your exact application, you can experiment to determine whether o3-mini-low, -medium or -high is ideal for your use case, noting that of course your compute time and financial cost will both go up if you opt for o3-mini-medium and even more so if you go for -high.
Ultimately, this o3-mini release isn’t as earth-shattering as the DeepSeek-R1 release was a few weeks ago because of how R1 is provided open-source while o3-mini is completely proprietary; this means that you have even more flexibility with R1 to adapt it to your heart’s content and use it on whatever infrastructure you desire.
But, OpenAI does have another card up their sleeve that will presumably be released to the public soon and that’s quite exciting indeed. That’s o3 (as opposed to o3-mini), which has performance that absolutely crushes all other models available today (including DeepSeek-R1 and of course its predecessor, the full-size o1 model) on complex reasoning benchmarks, including:
Math
Coding
Real-world software engineering problems
Detailed, factual question-answering tasks
Another week, another major breakthrough in AI capabilities. These are exciting times indeed. I hope your brain is tingling with ideas for how you can streamline your own activities as well as potentially build world-changing applications with increasingly powerful and exponentially less expensive AI models at your fingertips. If not, try chatting with an LLM to get some guidance!
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.