Evaluating frontier AI systems – early lessons from AISI

This website will offer limited functionality in this browser. We only support the recent versions of major browsers like Chrome, Firefox, Safari, and Edge.
The UK's AI Safety Institute (AISI) has published a blog post on early lessons from evaluating fronter AI systems (here). AISI is a UK government body which designs and runs evaluations that aim to measure the capabilities of AI systems that may pose risks.
The blog post:
When it comes to third party evaluations, AISI explains that their
sense is that the science is too nascent for independent evaluations to act as a ‘certification’ function (i.e. provide confident assurances that a particular system is ‘safe’), they are a critical part of incentivising best efforts at improving safety. Independent evaluations – completed by governments or other third parties - provide key benefits to AI companies, governments, and the public
Those benefits include: providing an independent source of verification about AI system capability and safety claims; improving system safety by acting as a constructive partner to AI system developers; advancing the science of AI evaluations; and helping advance government understanding.
Knowing when to test is still an issue that is difficult to answer precisely. Currently, AISI tests pre-deployment and post-deployment, but options for the future include testing when models exceed a specific performance criteria, where there are significant changes post-deployment, and/or significant external changes that affect capabilities.
AISI currently test for misuse (with a focus on misuse of chemical and biological capabilities, and cyber offense capabilities, where harms could be particularly large in scale), societal impacts, autonomous systems, and safeguards. The focus is on “critical risks with the greatest potential for harm”. The blog explains further about the various tests AISI uses, how they use a tiered approach to inform the extent of testing, and how robust tests can be developed.
If you would like to discuss how current or future regulations impact what you do with AI, please contact Tom Whittaker, Brian Wong, Lucy Pegler, David Varney, Martin Cook or any other member in our Technology team.
For the latest on AI law and regulation, see our blog and sign-up to our AI newsletter.