What does Deep Research do? It, remarkably well, automates deep-dive literature reviews and synthesizes hundreds of online sources into a coherent, well-cited report. Using multi-step “reasoning” models (like the o3-mini model I covered in Episode #864), Deep Research breaks your complex query into smaller tasks and then it searches the web for each piece and then iteratively synthesizes the results, pivoting its research trajectory as it learns new information. In practical terms, it’s like having an expert researcher on call 24/7, crunching through data at speeds no human can match. Tasks that could take a human researcher hours or days are now completed for you, tremendously well, within minutes.

OpenAI trained Deep Research using end-to-end reinforcement learning on challenging web-browsing and reasoning tasks across a range of domains. Through that training, it learned to plan and execute a multi-step trajectory to find the data it needs, backtracking and reacting to real-time information where necessary. The model is also able to browse over user-uploaded files, use Python to plot graphs, embed both generated graphs and images from websites in its responses, and cite specific sentences or passages from its sources. As a result of this training, it reaches new highs on a number of public evaluations focused on real-world problems.

To wit, Deep Research set a dramatically high new benchmark on a recently released AI evaluation called Humanity’s Last Exam—a comprehensive assessment consisting of 3000 multiple choice and short-answer questions across over 100 subjects, ranging from rocket science to linguistics. In this benchmark, Deep Research scored an impressive 26.6% accuracy, compared to just 9.1% for its predecessor. This jump highlights its ability to reason across domains, handle complex questions, seek out specialized information from the web when necessary and synthesize disparate pieces of information into a coherent whole.

This big jump in performance on Humanity’s Last Exam translates into real-world value. Since getting an OpenAI Pro subscription (which is $200/month but easily worth it for me given how much time it saves me and the value of its insights), I’ve been using Deep Research near-daily and have been continuously impressed. For example, I used it to greatly accelerate the development of a syllabus for an upcoming four-hour Agentic AI workshop I’ll be providing at ODSC East in May.

In your case, imagine you’re exploring the latest advances in transformer architectures. Rather than spending days scanning arXiv, conference proceedings, and technical blogs, you could simply ask Deep Research for a summary of recent breakthroughs. The tool would extract key points—such as improvements in training algorithms, scaling techniques, and performance metrics—and present you with a clear, structured overview complete with citations. This not only saves tremendous time but also minimizes the risk of overlooking critical studies.

Of course, as I mentioned at the outset of this episode, OpenAI isn’t alone in this space. Google and Perplexity have also rolled out their own deep research capabilities. Google’s approach, powered by its Gemini LLMs, leverages its vast search infrastructure to pull in a broad array of documents. Their tool typically presents a user-guided research plan, outlining sub-questions before diving in. This method results in a comprehensive report that’s reliable—but sometimes it stops short of the nuanced analysis that Deep Research delivers.

Then there’s Perplexity, which offers a fast (and free!) deep research mode. Perplexity churns out a high-level overview in just a few minutes, making it great for quick snapshots. However, that speed can come at the cost of depth and iterative reasoning. For quick queries, Perplexity works well, but for mission-critical analysis, OpenAI’s more methodical and transparent approach clearly has the edge.

Regardless of what company’s behind the innovation, looking ahead, the implications are profound. Deep Research redefines how we approach problem-solving in data science and beyond. It democratizes access to high-quality research by lowering the barrier to entry—whether you’re a seasoned expert or just starting out. As these systems improve, we might soon see research assistants embedded directly in our development environments, ready to pull insights from the latest publications or internal data stores on demand. Paired with AI agents that can take real-world action with increasing reliability, tools like Deep Research will enable more and more human abilities to be augmented and more routine work automated. The implications are profound and I encourage you to take advantage of this unique moment in human history to consider how the increasingly capable autonomous systems of the coming years could improve your life and the lives of those around you, including on socially-beneficial projects and plain old commercially impactful ones.

Today, there are of course still some limitations to be aware of. Like any LLM-based tool, Deep Research could hallucinate or make incorrect references although I haven’t caught any of these myself yet and OpenAI’s internal evaluations apparently show markedly lower hallucination rates than any of their previous models. The biggest risk to you is that Deep Research could present rumor as authoritative fact, but OpenAI is aware of this occasional issue and you can anticipate that in the coming months and years this overconfidence problem will become vanishingly rare.

So, what’s the catch? Well, Deep Research is expensive. I’m paying $200/month as a Pro user to get 100 queries per month so a little over 3 queries per day. As OpenAI figure out engineering efficiencies and how to use small models like o3-mini more effectively for Deep Research, you can anticipate that more and more Deep Research queries per month will be available to all paying users and, eventually it may be available for free like Perplexity’s Deep Research as well.

In summary, OpenAI’s Deep Research is transforming the research process by automating the heavy lifting of information gathering, analysis, and synthesis. With its impressive benchmark performance on Humanity’s Last Exam, transparent chain-of-thought, and iterative reasoning process, it provides a level of depth and reliability that stands out against competitors like Google’s Gemini and Perplexity’s quick snapshots. As we continue integrating AI into our workflows, tools like these will be key in turning raw data into actionable insights—empowering us to push the boundaries of innovation in data science.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.