People looking at a laptop screen illustrating the launch of an agentic AI capability in ChatGPT that enables the artificial intelligence tool to conduct complex research.

ChatGPT gains agentic capability for complex research


OpenAI is releasing a powerful agentic capability that enables ChatGPT to conduct complex, multi-step research tasks online. The feature, called Deep Research, reportedly achieves in tens of minutes what could take a human researcher hours or even days.

OpenAI describes Deep Research as a significant milestone in its journey toward artificial general intelligence (AGI).

“The ability to synthesise knowledge is a prerequisite for creating new knowledge,” says OpenAI. “For this reason, Deep Research marks a significant step toward our broader goal of developing AGI.”

Agentic AI enables ChatGPT to assist with complex research

Deep Research empowers ChatGPT to find, analyse, and synthesise information from hundreds of online sources autonomously. With just a prompt from the user, the tool can deliver a comprehensive report, comparable to the output of a research analyst, according to OpenAI.

Drawing capabilities from a variant of OpenAI’s upcoming “o3” model, the aim is to free users from time-consuming, labour-intensive information gathering. Whether it’s a competitive analysis of streaming platforms, an informed policy review, or even personalised recommendations for a new commuter bike, Deep Research promises precise and reliable results.

Importantly, every output includes full citations and transparent documentation—enabling users to verify the findings with ease.

The tool appears particularly adept at uncovering niche or non-intuitive insights, making it an invaluable asset across industries like finance, science, policymaking, and engineering. But OpenAI also envisions Deep Research being useful for the average user, such as shoppers looking for hyper-personalised recommendations or a specific product.

This latest agentic capability operates through the user interface of ChatGPT; users simply select the “Deep Research” option in the message composer and type their query. Supporting files or spreadsheets can also be uploaded for additional context.

Once initiated, the AI embarks on a rigorous multi-step process, which may take 5-30 minutes to complete. A sidebar provides updates on the actions taken and the sources consulted. Users can carry on with other tasks and will be notified when the final report is ready. 

The results are presented in the chat as detailed, well-documented reports. In the coming weeks, OpenAI plans to enhance these outputs further by embedding images, data visualisations, and graphs to deliver even greater clarity and context.

Unlike GPT-4o – which excels in real-time, multimodal conversations – Deep Research prioritises depth and detail. Its ability to rigorously cite sources and provide comprehensive analysis sets it apart—shifting the focus from fast, summarised answers to well-documented, research-grade insights.

Built for real-world challenges

Deep Rsearch leverages sophisticated training methodologies, grounded in real-world browsing and reasoning tasks across diverse domains. Its model was trained via reinforcement learning to autonomously plan and execute multi-step research processes, including backtracking and adaptively refining its approach as new information becomes available. 

The tool can browse user-uploaded files, generate and iterate on graphs using Python, embed media such as generated images and web pages into responses, and cite exact sentences or passages from its sources. The result of this extensive training is a highly capable agent for tackling complex real-world problems.

OpenAI evaluated Deep Research across a broad set of expert-level exams known as “Humanity’s Last Exam”. The exams – comprising over 3,000 questions covering topics from rocket science and linguistics to ecology and classics – test an AI’s competence in solving multifaceted problems.

The results were impressive, with the model achieving a record-breaking 26.6% accuracy across these domains:

  • GPT-4o: 3.3%
  • Grok-2: 3.8%
  • Claude 3.5 Sonnet: 4.3%
  • OpenAI o1: 9.1%
  • DeepSeek-R1: 9.4%
  • Deep research: 26.6% (with browsing + Python tools)

Deep Research also reached a new state-of-the-art performance on the GAIA benchmark, which evaluates AI models on real-world questions requiring reasoning, multi-modal fluency, and tool-use proficiency. Deep Research topped the leaderboard with a score of 72.57%.

Limitations and challenges

While the Deep Research agentic AI capability in ChatGPT signifies a bold step forward, OpenAI acknowledges that the technology is still in its early stages and comes with limitations.

The system occasionally “hallucinates” facts or offers incorrect inferences, albeit at a notably reduced rate compared to existing GPT models, according to OpenAI. It also faces challenges in differentiating between authoritative sources and speculative content, and it struggles to calibrate its confidence levels—often displaying undue certainty for potentially uncertain findings.

Minor formatting errors in reports and citations, as well as delays in initiating tasks, could also frustrate initial users. OpenAI says these issues are expected to improve over time with more usage and iterative refinements.

OpenAI is rolling out the capability gradually, starting with Pro users, who will have access to up to 100 queries per month. Plus and Team tiers will follow suit, with Enterprise access arriving next. 

UK, Swiss, and European Economic Area residents are not yet able to access the feature, but OpenAI says it’s working on expanding its rollout to these regions.

In the weeks ahead, OpenAI will expand the feature to ChatGPT’s mobile and desktop platforms. The long-term vision includes enabling connections to subscription-based or proprietary data sources, further enhancing the robustness and personalisation of its outputs.

Looking further ahead, OpenAI envisions integrating Deep Research with “Operator,” an existing chatbot capability that takes real-world actions. This integration would allow ChatGPT to seamlessly handle tasks that require both asynchronous online research and real-world execution.

(Photo by John Schnobrich)

See also: Microsoft and OpenAI probe alleged data theft by DeepSeek

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.





Source link

Leave a Reply

Your email address will not be published. Required fields are marked *