Weaponized AI risk is 'high,' warns OpenAI – here's the plan to stop it

Samuel Boivin/NurPhoto via Getty Images

Follow ZDNET: Add us as a preferred source on Google.


ZDNET’s key takeaways

  • OpenAI launched initiatives to safeguard AI models from abuse.
  • AI cyber capabilities assessed through capture-the-flag challenges improved in four months.
  • The OpenAI Preparedness Framework may help track the security risks of AI models.

OpenAI is warning that the rapid evolution of cyber capabilities in artificial intelligence (AI) models could result in “high” levels of risk for the cybersecurity industry at large, and so action is being taken now to assist defenders. 

As AI models, including ChatGPT, continue to be developed and released, a problem has emerged. As with many types of technology, AI can be used to benefit others, but it can also be abused — and in the cybersecurity sphere, this includes weaponizing AI to automate brute-force attacks, generate malware or believable phishing content, and refine existing code to make cyberattack chains more efficient. 

(Disclosure: Ziff Davis, ZDNET’s parent company, filed an April 2025 lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.)

In recent months, bad actors have used AI to propagate their scams through indirect prompt injection attacks against AI chatbots and AI summary functions in browsers; researchers have found AI features diverting users to malicious websites, AI assistants are developing backdoors and streamlining cybercriminal workflows, and security experts have warned against trusting AI too much with our data. 

Also: Gartner urges businesses to ‘block all AI browsers’ – what’s behind the dire warning

The dual nature (as Open AI calls it) of AI models, however, means that AI can also be leveraged by defenders to refine protective systems, to develop tools to identify threats, to potentially train or teach human specialists, and to shoulder the task of time-consuming, reptitive tasks such as alert triage, which frees up the time of cybersecurity staff for more valuable projects. 

The current landscape

According to OpenAI, the capabilities of AI systems are advancing at a rapid rate. 

For example, capture-the-flag (CTF) challenges, traditionally used to test cybersecurity capabilities in test environments and aimed at finding hidden “flags,” are now being used to assess the cyber capabilities of AI models. OpenAI said they have improved from 27% success rates on GPT‑5 in August 2025 to 76% on GPT‑5.1-Codex-Max⁠ in November 2025 — a notable increase in a period of only four months. 

Also: AI agents are already causing disasters – and this hidden threat could derail your safe rollout

The minds behind ChatGPT said they expect AI models to continue on this trajectory, which would give them “high” levels of cyber capability. OpenAI said this classification means that models “can either develop working zero-day remote exploits against well-defended systems, or meaningfully assist with complex, stealthy enterprise or industrial intrusion operations aimed at real-world effects.”

Managing and assessing whether AI capabilities will do harm or good, however, is no simple task — but one that OpenAI hopes to tackle with initiatives including the Preparedness Framework (.PDF). 

OpenAI Preparedness Framework

The Preparedness Framework, last updated in April 2025, outlines OpenAI’s approach to balancing AI defense and risk. While it isn’t new, the framework does provide the structure and guide for the organization to follow — and this includes where it invests in threat defense. 

Three categories of risk, and those that could lead to “severe harm,” are currently the primary focus. These are:

  • Biological and chemical capabilities: The balance between new, beneficial medical and biological discoveries and those that could lead to biological or chemical weapon development.
  • Cybersecurity capabilities: How AI can assist defenders in protecting vulnerable systems, while also creating a new attack surface and malicious tools. 
  • AI self-improvement capabilities: How AI could beneficially enhance its own capabilities — or create control challenges for us to face.  

The priority category appears to be cybersecurity at present, or at least the most publicized. In any case, the framework’s purpose is to identify risk factors and maintain a threat model with measurable thresholds that indicate when AI models could cause severe harm. 

AlsoHow well does ChatGPT know me? This simple prompt revealed a lot – try it for yourself

“We won’t deploy these very capable models until we’ve built safeguards to sufficiently minimize the associated risks of severe harm,” OpenAI said in its framework manifest. “This Framework lays out the kinds of safeguards we expect to need, and how we’ll confirm internally and show externally that the safeguards are sufficient.”

OpenAI’s latest security measures

OpenAI said it is investing heavily in strengthening its models against abuse, as well as making them more useful for defenders. Models are being hardened, dedicated threat intelligence and insider risk programs have been launched, and its systems are being trained to detect and refuse malicious requests. (This, in itself, is a challenge, considering threat actors can act and prompt as defenders to try and generate output later used for criminal activity.)

“Our goal is for our models and products to bring significant advantages for defenders, who are often outnumbered and under-resourced,” OpenAI said. “When activity appears unsafe, we may block output, route prompts to safer or less capable models, or escalate for enforcement.”

The organization is also working with Red Team providers to evaluate and improve its safety measures, and as Red Teams act offensively, it is hoped they can discover defensive weaknesses for remediation — before cybercriminals do.

Also: AI’s scary new trick: Conducting cyberattacks instead of just helping out

OpenAI is set to launch a “trusted access program” that grants a subset of users or partners access to test models with “enhanced capabilities” linked to cyberdefense, but it will be closely controlled.

“We’re still exploring the right boundary of which capabilities we can provide broad access to and which ones require tiered restrictions, which may influence the future design of this program,” the company noted. “We aim for this trusted access program to be a building block towards a resilient ecosystem.”

Furthermore, OpenAI has moved Aardvark, a security researcher agent, into private beta. This will likely be of interest to cybersecurity researchers, as the point of this system is to scan codebases for vulnerabilities and provide patch guidance. According to OpenAI, Aardvark has already identified “novel” CVEs in open source software.

Finally, a new collaborative advisory group will be established in the near future. Dubbed the Frontier Risk Council, this group will include security practitioners and partners who will initially focus on the cybersecurity implications of AI and associated practices and recommendations, but the council will also eventually expand to include the other categories outlined in the OpenAI Preparedness Framework in the future.

What can we expect in the long term?

We have to treat AI with caution, and this includes implementing AI and LLMs not only into our personal lives, but also limiting the exposure of AI-based security risks in business. For example, research firm Gartner recently warned organizations to avoid or block AI browsers entirely due to security concerns, including prompt injection attacks and data exposure.

We need to remember that AI is a tool, albeit a new and exciting one. New technologies all come with risks — as OpenAI clearly knows, considering its focus on the cybersecurity challenges associated with what has become the most popular AI chatbot worldwide — and so any of its applications should be treated in the same way as any other new technological solution: with an assessment of its risks, alongside potential rewards.

Featured

Comments (0)
Add Comment