How the AI Safety Institute is approaching evaluations

The AI Safety Institute (AISI) was established as part of DSIT, poised as the first state-backed organisation focused on advanced AI safety for the public benefit. They have established three core functions at AISI: Developing and conducting evaluations on advanced AI systems, driving foundational AI safety research and facilitating information exchange.

The AISI has provided more clarity on their approaches to evaluations in a recently published insight.

Understanding the AISI evaluations

The AISI is beginning to put ethics principles into practice with their first milestone being to build an evaluation process for assessing the capabilities of the next generation of advanced AI systems in a nascent and fast-developing field of science.

These evaluations will assess the capabilities of systems with the following (alongside others) techniques: red teaming (experts interact with a model and test its capabilities by trying to break its safeguards), human uplift evaluations (assess how bad actors could use systems to carry out real-world harms) and AI agent evaluations (test AI agents ability to operate semi-autonomously and use tools like external databases to take actions in the world).

Prior to the AI Safety Summit, the government published a paper on key risk areas of Frontier AI: misuse, societal impacts, autonomous systems and safeguards. These are the areas of focus for the pre-deployment testing, although the AISI is continuously surveying and scoping other risks.

The AISI is an independent evaluator, and the details of their methodology will be kept confidential to prevent manipulation, however they will publish select portions of the evaluation results with restrictions on proprietary, sensitive or national security-related information.

It is important to note that the AISI is not intending for these evaluations to act as stamps for a ‘safe’ or ‘unsafe’ system but instead as early warning signs of potential harm, describing themselves as a ‘supplementary layer of oversight.’ Ultimately the AISI is not a regulator, and the decision to release systems will remain with the parties developing the systems.

Aside from evaluations the AISI is focused on furthering technological advances and is therefore launching foundational AI safety research efforts across areas such as capabilities elicitation, jailbreaking, explainability and novel approaches to AI alignment.

AISI Criteria for Selecting Models to Evaluate

Models selected for evaluation will be based on the estimated risk of a system’s harmful capabilities in relation to national security and societal impacts, including accessibility. Varying access controls will not keep companies from evaluations as the AISI will evaluate systems that are openly released as well as those which are not. During the Global AI Safety Summit, several AI companies committed to government evaluations of their deep models.

Next Steps

As the AISI continues to research the transformative potential of responsible adoption of advanced AI systems for the UK’s economic growth and public service support, it is encouraging to see the efforts to address associated risks through evaluations conducted by AISI. While detailed evaluation results and AISIs methodologies will not be publicly disclosed to prevent manipulation risks, periodic updates like this are crucial to highlight AISI’s activities.

The Institute’s progress towards developing and deploying evaluations provides insight for companies into what risks the Institute is prioritising. Recognising and incorporating these insights into development processes could enhance safety measures and promote responsible AI adoption. We have produced insights on the ambitions of the institute as well as their first, second and third progress report for further reading.

techUK will continue to monitor AISI’s evaluation work to keep members informed of the latest developments in this field. To find out more and get involved in techUK’s programme work on AI Safety, please contact [email protected].

Original release can be referenced here.

Tess Buckley

Programme Manager - Digital Ethics and AI Safety, techUK

Authors

Tess Buckley

Programme Manager, Digital Ethics and AI Safety, techUK

Tess is the Programme Manager for Digital Ethics and AI Safety at techUK.

Prior to techUK Tess worked as an AI Ethics Analyst, which revolved around the first dataset on Corporate Digital Responsibility (CDR), and then later the development of a large language model focused on answering ESG questions for Chief Sustainability Officers. Alongside other responsibilities, she distributed the dataset on CDR to investors who wanted to further understand the digital risks of their portfolio, she drew narratives and patterns from the data, and collaborate with leading institutes to support academics in AI ethics. She has authored articles for outlets such as ESG Investor, Montreal AI Ethics Institute, The FinTech Times, and Finance Digest. Covered topics like CDR, AI ethics, and tech governance, leveraging company insights to contribute valuable industry perspectives. Tess is Vice Chair of the YNG Technology Group at YPO, an AI Literacy Advisor at Humans for AI, a Trustworthy AI Researcher at Z-Inspection Trustworthy AI Labs and an Ambassador for AboutFace.

Tess holds a MA in Philosophy and AI from Northeastern University London, where she specialised in biotechnologies and ableism, following a BA from McGill University where she joint-majored in International Development and Philosophy, minoring in communications. Tess’s primary research interests include AI literacy, AI music systems, the impact of AI on disability rights and the portrayal of AI in media (narratives). In particular, Tess seeks to operationalise AI ethics and use philosophical principles to make emerging technologies explainable, and ethical.

Outside of work Tess enjoys kickboxing, ballet, crochet and jazz music.