Government–industry collaboration to improve frontier AI model security

This website will offer limited functionality in this browser. We only support the recent versions of major browsers like Chrome, Firefox, Safari, and Edge.
On 6 October 2025, Anthropic and OpenAI reported voluntary collaborations with the UK AI Security Institute (AISI) and the US Center for AI Standards and Innovation (CAISI) to evaluate and strengthen safeguards in frontier AI systems. This was further to AISI’s earlier update on 13 September 2025 which highlighted UK–US partnership on model security and developer-provided deep system access; see the government blog here.
AISI describes the purpose of this work as twofold: “to better equip governments with an understanding of AI’s risks, and to help leading developers strengthen the security of their systems.” It was the result of government red-teamining further to Anthropic and OpenAI providing access to AI models, documentation, non-public tooling and safeguard details. AISI says that this collaboration “underscores the value of UK-US partnership on AI security”. Further collaboration is expected, and so further public insights can be expected in due course.
Anthropic: AISI and CAISI red‑teamed multiple iterations of “Constitutional Classifiers” on Claude Opus 4/4.1, uncovering prompt‑injection, cipher/obfuscation‑based evasion, universal jailbreaks, and automated attack optimisation. Findings prompted architectural changes and targeted filter improvements, and informed evidence, monitoring and incident‑response practices.
OpenAI (agents): CAISI identified two novel vulnerabilities in ChatGPT Agent enabling, in certain conditions, session‑scoped remote control and user impersonation. By chaining traditional cyber flaws with AI agent hijacking, CAISI achieved a ~50% proof‑of‑concept exploit success rate; OpenAI deployed fixes within one business day.
OpenAI (biosecurity): AISI conducted iterative red‑teaming of safeguards for GPT‑5 and ChatGPT Agent with deep, non‑public access (including helpful‑only variants and safety monitor chain‑of‑thought). AISI surfaced over a dozen vulnerabilities leading to product, policy‑enforcement and classifier training fixes, and strengthened end‑to‑end moderation and monitoring against universal jailbreaks.
Access and process: Comprehensive pre‑deployment access, real‑time classifier signals, and sustained engagement enabled more sophisticated vulnerability discovery and faster remediation.
If you would like to discuss how current or future regulations impact what you do with AI, please contact Brian Wong, Tom Whittaker, Lucy Pegler, Martin Cook, Liz Griffiths, Kerry Berchem or any other member in our Technology team. For the latest on AI law and regulation, see our blog and newsletter.