OpenAI has launched ChatGPT Agent, an improve to its flagship synthetic intelligence (AI) mannequin that equips it with a digital pc and an built-in toolkit.
These new instruments enable the agent to hold out complicated, multi-step duties that earlier iterations of ChatGPT have been incapable of — controlling your pc and finishing duties for you.
This extra highly effective model, which remains to be extremely depending on human enter and supervision, arrived shortly earlier than Mark Zuckerberg introduced that Meta researchers had noticed their very own AI fashions displaying indicators of unbiased self-improvement. It additionally launched shortly earlier than OpenAI launched GPT-5 — the newest model of OpenAI’s chatbot.
With ChatGPT Agent, customers can now ask the big language mannequin (LLM) to not solely carry out evaluation or collect information, however to behave on that information, OpenAI representatives stated in a assertion.
As an illustration, you would command the agent to evaluate your calendar and temporary you on upcoming occasions and reminders, or to check a corpus of knowledge and summarize it in a pithy synopsis or as a slide deck. Whereas a standard LLM might seek for and supply recipes for a Japanese-style breakfast, ChatGPT agent might totally plan and buy elements for a similar breakfast for a selected variety of friends.
But the brand new mannequin, whereas extremely succesful, nonetheless faces numerous limitations. Like all AI fashions, its spatial reasoning is weak, so it struggles with duties like planning bodily routes. It additionally lacks true persistent reminiscence, processing info within the second with out dependable recall or the flexibility to reference earlier interactions past instant context.
ChatGPT Agent does present important enhancements in OpenAI’s benchmarking, nevertheless. On Humanity’s Final Examination, an AI benchmark that evaluates a mannequin’s skill to answer expert-level questions throughout numerous disciplines, it greater than doubled the accuracy share (41.6%) versus OpenAI o3 with no instruments outfitted (20.3%).
Associated: OpenAI’s ‘smartest’ AI mannequin was explicitly advised to close down — and it refused
It additionally carried out significantly better than different OpenAI instruments, in addition to a model of itself that lacked instruments like a browser and digital pc. On the planet’s hardest identified math benchmark, FrontierMath, ChatGPT agent and its complement of instruments once more outperformed earlier fashions by a large margin.
The agent is constructed on three pillars derived from earlier OpenAI merchandise. One leg is ‘Operator’, an agent that may use its personal digital browser to plumb the net for customers. The second is ‘deep analysis’, constructed to comb by means of and synthesize giant quantities of knowledge. The ultimate piece of the puzzle is earlier variations of ChatGPT itself, which excelled in conversational fluency and presentation.
“In essence, it may autonomously browse the net, generate code, create information, and so forth, all below human supervision,” stated Kofi Nyarko, a professor at Morgan State College and director of the Information Engineering and Predictive Analytics (DEPA) Analysis Lab.
Nyarko was fast to emphasise, nevertheless, that the brand new agent remains to be not autonomous. “Hallucinations, consumer interface fragility, or misinterpretation can result in errors. Constructed-in safeguards, like permission prompts and interruptibility, are important however not adequate to get rid of threat completely.”
The hazard of advancing AI
OpenAI has itself acknowledged the hazard of the brand new agent and its elevated autonomy. Firm representatives said that ChatGPT agent has “excessive organic and chemical capabilities,” which they declare probably enable it to help within the creation of chemical or organic weapons.
In comparison with current sources, like a chem lab and textbook, an AI agent represents what biosecurity specialists name a “functionality escalation pathway.” AI can draw on numerous sources and synthesize the info in them immediately, merge data throughout scientific disciplines, present iterative troubleshooting like an skilled mentor, navigate provider web sites, fill out order varieties, and even assist bypass fundamental verification checks.
With its digital pc, the agent also can autonomously work together with information, web sites, and on-line instruments in ways in which empower it to do rather more potential hurt if misused. The chance for information breaches or information manipulation, in addition to for misaligned conduct like monetary fraud, is amplified within the occasion of a immediate injection assault or hijacking.
As Nyarko identified, these dangers are along with these implicit in conventional AI fashions and LLMs.
“There are broader issues for AI brokers as a complete, like how brokers working autonomously can amplify errors, introduce biases from public information, complicate legal responsibility frameworks, and unintentionally foster psychological dependence,” he stated.
In response to the brand new threats {that a} extra agential mannequin poses, OpenAI engineers have additionally strengthened numerous safeguards, firm representatives stated within the assertion.
These embody risk modeling, dual-use refusal coaching — the place a mannequin is taught to refuse dangerous requests round information that might have both useful or malicious use — bug bounty applications, and skilled red-teaming — analyzing weaknesses by attacking the system your self — centered on biodefense. Nevertheless, a threat administration evaluation performed in July of 2025 by SaferAI, a safety-focused non-profit, referred to as OpenAI’s threat administration insurance policies Weak, awarding them a rating of 33% out of a potential 100%. OpenAI additionally solely scored a C grade on the AI Security Index compiled by the Way forward for Life Institute, a number one AI security agency.