Are GPT-5.2's new powers enough to surpass Gemini 3? Try it and see

SOPA Images/Contributor/LightRocket via Getty Images

Follow ZDNET: Add us as a preferred source on Google.


ZDNET’s key takeaways

  • OpenAI released GPT-5.2, its latest model, on Thursday.
  • It fastracked the model to stay competitive with Google and Anthropic.
  • GPT-5.2 is built for professional tasks and rivals experts. 

After a week of teasing, OpenAI’s latest model, GPT-5.2, has landed — and it can apparently rival your professional skills. 

The company called GPT-5.2 “the most capable model series yet for professional knowledge work” in the announcement on Thursday. Citing its own recent study of AI use at work, the company noted that AI saves the average worker up to an hour each day; GPT-5.2 appears designed to build on that significantly. 

Also: ChatGPT saves the average worker nearly an hour each day, says OpenAI – here’s how

“We designed GPT‑5.2 to unlock even more economic value for people; it’s better at creating spreadsheets, building presentations, writing code, perceiving images, understanding long contexts, using tools, and handling complex, multi-step projects,” the company wrote. 

The company reportedly fast-tracked the model following Google and Anthropic’s competitive releases of Gemini 3 and Opus 4.5, respectively, according to a report by The Information. Here’s what it can do, and how you can try it. 

(Disclosure: Ziff Davis, ZDNET’s parent company, filed an April 2025 lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.)

Built for work tasks  

OpenAI said GPT-5.2 “outperforms industry professionals at well-specified knowledge work tasks spanning 44 occupations.” The report specifically called out GDPval, an in-house benchmark the company released in September that tries to measure the economic value AI models produce. It does so by evaluating how models approach 1,320 tasks commonly linked to 44 jobs across nine industries that contribute more than 5% to the US gross domestic product (GDP). 

GPT-5.2 Thinking scored 70.9% on GDPval, compared to GPT-5.1 Thinking’s score of 38.8% — meaning it excelled at typical knowledge work tasks like making spreadsheets and presentations. 

“GPT‑5.2 Thinking produced outputs for GDPval tasks at >11x the speed and

Also: 3 ways AI agents will make your job unrecognizable in the next few years

Alongside GDPval, OpenAI released findings on how several of its own models, as well as Anthropic’s Claude Opus 4.1, Google’s Gemini 2.5 Pro, and xAI’s Grok 4, performed on the benchmark. Claude Opus 4.1 came in first place overall, demonstrating particular strengths in aesthetic tasks like document formatting and slide layout, while GPT-5 scored highly for accuracy — what OpenAI described as “finding domain-specific knowledge.

OpenAI also called out GPT-5.2’s improved long-context reasoning and vision abilities. The former, it said, should help professionals maintain accuracy when using the model to analyze long reports, contracts, and other documents, while the latter makes it more skilled at accurately interpreting diagrams, images of dashboards, screenshots, and other visual data. 

“Compared to previous models, GPT‑5.2 Thinking has a stronger grasp of how elements are positioned within an image, which helps on tasks where relative layout plays a key role in solving the problem,” the company wrote. It provided an example of how the model was able to identify bounding boxes even in a low-quality image and demonstrated a stronger understanding of “spatial arrangement” than 5.1. 

Coding prowess 

The model also showed smaller improvements over GPT-5.1 Thinking across several industry-standard benchmarks, including AIME 2025, which measures math, and SWE-Bench Pro, which measures software engineering in four languages. It scored a new state-of-the-art on the latter at 55.6%. 

Also: The best free AI for coding in 2025 – only 3 make the cut now

According to OpenAI, that means better production code debugging and feature implementation, as well as fix deployment with less manual developer intervention. The company also touted GPT-5.2’s improved front-end capabilities, especially on “complex or unconventional UI work” and 3D elements. 

Less hallucination

OpenAI noted in the announcement that GPT-5.2 Thinking hallucinates 30% less than 5.1 Thinking, which it said should encourage enterprise users to worry less about encountering mistakes when using the model for research and analysis. 

Some risk of hallucination is a reality of using any AI model, and users should double-check any claim a model makes, no matter how much its factuality score has improved over its predecessor.  

Safety

The company emphasized in the announcement that it more closely trained GPT-5.2 on how to handle sensitive conversations, finding “fewer undesirable responses in both GPT‑5.2 Instant and GPT‑5.2 Thinking as compared to GPT‑5.1 and GPT‑5 Instant and Thinking models.” 

For its models overall, the company said it has made “meaningful improvements in how they respond to prompts indicating signs of suicide or self-harm, mental health distress, or emotional reliance on the model.” 

Also: Using AI for therapy? Don’t – it’s bad for your mental health, APA warns

OpenAI added that it is still in the process of launching its age prediction model, which the company said will “automatically apply content protections for users who are under 18, in order to limit access to sensitive content.”

The announcement also included a mental health evaluation table for those four aforementioned models, showing scores on a zero-to-one scale for each, though it did not specify methodology. 

How to try it

GPT-5.2 will begin rolling out to paid ChatGPT users on Thursday, following the usual deployment of an OpenAI model family with Instant, Thinking, and Pro versions for different tasks. Developers can access all three versions now in the API. 

Plus, Pro, Business, and Enterprise users can use the model’s spreadsheet and presentation features by selecting the Thinking or Pro modes.

Is GPT-5.2 replacing other models? 

OpenAI assured users that it has “no current plans to deprecate GPT‑5.1, GPT‑5, or GPT‑4.1 in the API and will communicate any deprecation plans with ample advance notice for developers.” It added that the new model works well as is in Codex, but that it will release an optimized version of the model for that environment in the next few weeks. 

Also: Stop using ChatGPT for everything: The AI models I use for research, coding, and more (and which I avoid)

The disclaimer may be meaningful to users who reacted negatively to the momentary deprecation of earlier models, including GPT-4, when OpenAI released GPT-5 this past summer. 

Mystery ‘Garlic’ model 

Another report from The Information published last week revealed that OpenAI was also developing a new model, codenamed Garlic. 

It’s unclear how separate Garlic and the anticipated GPT-5.2 are, but The Information referred to GPT-5.2 (as well as yet another forthcoming release, GPT-5.5) as potential versions of Garlic. Prior to 5.2’s release, OpenAI’s Chief Research Officer Mark Chen informed colleagues that Garlic performed well in company evaluations compared to Gemini 3 and Opus 4.5 in tasks involving coding and reasoning, according to the report. However, neither Gemini 3 nor Opus 4.5, both of which set industry standards last month, were mentioned in benchmark comparisons in the performance report for GPT-5.2.

Chen added that when developing Garlic, OpenAI addressed issues with pretraining, the initial phase of training in which the model begins learning from a massive dataset. The company focused the model on broader connections before training it for more specific tasks. 

Also: Gemini vs. Copilot: I tested the AI tools on 7 everyday tasks, and it wasn’t even close

These changes in pretraining enable OpenAI to infuse a smaller model with the same amount of knowledge previously reserved for larger models, according to Chen’s remarks cited in the report. Smaller models can be beneficial for developers as they are typically cheaper and easier to deploy — something French AI lab Mistral emphasized with its own release last week. 

For the company behind it, a smaller model is cheaper to build and deploy. Garlic is not to be confused with Shallotpeat, a model Altman announced to staff in October, according to a previous report also from The Information. That model also aimed to fix bugs in the pretraining process. 

As for when to expect Garlic, Chen kept the details vague, saying only “as soon as possible” in the report. The developments made when creating Garlic have already allowed the company to move on to developing its next, bigger and better model, Chen said. 

A battle for users

This fierce race between Google and OpenAI can be partially attributed to both vying for the same sector: consumers. 

As Anthropic’s CEO, Dario Amodei, noted in conversation with journalist Andrew Ross Sorkin during The New York Times‘ DealBook Summit last week, Anthropic isn’t in the same race or facing a “code red” panic as its competitors, because it is focused on serving enterprises rather than consumers. The company just announced that its Claude Code agentic coding tool reached $1 billion in run-rate revenue, only six months after becoming available to the public. 

Artificial Intelligence

Comments (0)
Add Comment