Sections

Research

The Bletchley Park process could be a building block for global cooperation on AI safety

Rishi Sunak stands at the podium in front of a backdrop reading "AI SAFETY."
Britain's Prime Minister Rishi Sunak speaks during the closing press conference on the second day of the UK Artificial Intelligence (AI) Safety Summit at Bletchley Park, Milton Keynes November 2, 2023. Justin Tallis/Pool via REUTERS

Considerable progress has been made on international governance of artificial intelligence (AI). This includes work under the G7 Hiroshima Process, at the Organization for Economic Co-operation and Development (OECD), the Global Partnership on AI (GPAI), international standards bodies, and in various U.N. bodies. Meanwhile, bilateral engagement on AI, including the U.S.-EU Trade and Technology Council, is in a holding pattern pending the outcome of the U.S. presidential elections. A more recent entrant into this global AI governance space has been the so-called Bletchley Park process that comprises development and global networking of AI safety institutes (AISIs) in a number of countries. The Bletchley Park process was kicked off by a meeting at Bletchley Park in November 2023. The Bletchley Park meeting was spearheaded by the United Kingdom and United States and attended by China and a small number of other countries. It aims to develop a framework for how governments and companies developing AI can assess AI safety. A follow-up meeting in Seoul in May 2024 provided momentum to this process, and the key players are now gearing up for a February 2025 AI Action Summit in Paris that will be critical to demonstrating ongoing high-level commitment to the process and for determining whether the process can deliver on AI safety. Recently, the U.S. has announced that it will host a global AI safety summit in November with the goal of kick-starting technical collaboration among the various AISI’s ahead of the Paris AI Action Summit.

While progress to date has been significant, major challenges to reaching agreement on a networked approach to AI safety remain. The following takes a closer look at where the Bletchley Park process now stands in the run-up to Paris 2025.

The Bletchley Park process: A networked approach to addressing AI safety

The Bletchley Park AI Safety Summit last November was launched amid growing attention by governments and industry to addressing risks from so-called frontier AI models—generative AI models that includes OpenAI’s ChatGPT 4, Google’s Claude, and Meta’s Llama, to name a few.

Indeed, since the release in November 2022 of OpenAI’s ChatGPT 3, governments in the U.S., European Union, U.K., and China in particular had stepped up efforts to address safety issues arising from generative AI models. For instance, the EU AI Act was updated and passed in 2024 to include specific and overlapping obligations for generative AI and foundational AI models. This includes obligations to train and develop generative AI with state-of-the-art safeguards against content breaching EU laws, to document copyright training data, and to comply with stronger transparency obligations. In the U.S., the White House secured voluntary commitments by seven AI companies (now expanded to 15 major AI companies) to address and report on risk from foundational AI models. Chinese regulators also pushed forward significant regulations governing generative AI during 2023 focused on content, training data, and managing risks such as misinformation, national security, data privacy, intellectual property rights, social stability, and ethical concerns. In September 2024, Chinese regulators released a document that sets out a framework for AI governance that is somewhere in between the EU AI Act and the U.S. approach and provides a comprehensive blueprint for AI governance principles and safety guidelines.

The Bletchley Park process has the potential to build on these domestic AI developments and become a key building block for global cooperation on AI safety for foundational AI models. Bletchley Park put testing and evaluation of AI safety front and center, identified AI developers as having a particular responsibility when it comes to ensuring the safety of foundational AI systems, and underscored the importance of international cooperation when it comes to understanding and mitigating AI risk. The goal is to develop largely voluntary agreement on how to assess and mitigate AI risk, and how to test and verify these often private sector efforts.

The Seoul meeting on AI safety

The May meeting in Seoul was important in establishing the continuity of the process and moving forward progress on key documents that will be part of the overall framework. The range of countries participating at the Seoul Summit was similar to Bletchley Park, though in most cases, ministers rather than leaders attended. Significantly, while China signed onto the Bletchley Park outcome and attended the Seoul meeting, it was not a signatory to the Seoul Declaration. The reason for this is not entirely clear but could signal reluctance by China to sign on to AI governance mechanisms they view as promoting a Western-centric view of global AI governance.

There were a number of important outcomes from the Seoul Summit. First, the Seoul Declaration and Ministerial Statement reemphasized the signatories’ commitment to AI safety, innovation, and inclusivity, stressing a commitment to guarding against the full spectrum of AI risks while recognizing the game-changing potential of AI across sectors. They articulate a list of AI safety principles as including “transparency, interpretability and explainability, privacy and accountability, meaningful human oversight and effective data management and protection.” These principles should help guide development of AI safety practices within companies and governments, as well as guide standards and practices for AI safety.

The Seoul Declaration also includes an agreement to create or expand AI safety institutes and cooperate on AI safety research. In pursuit of this goal, the declaration welcomes the Seoul Statement of Intent toward International Cooperation on AI Safety Science, which underscores the importance of building a “reliable, interdisciplinary, and reproducible body of evidence to inform policy efforts related to AI safety.” The Seoul Summit also included an intention to promote common scientific understandings of AI and referenced the interim report International Scientific Report on the Safety of Advanced AI, released by the U.K. Department for Science, Innovation and Technology (DSIT) and chaired by leading AI researcher Yoshua Bengio from Canada.

When it comes to the innovation agenda, the Seoul Summit touched on a range of important needs, such as research and development, workforce development, privacy, protecting intellectual property, and energy/resource consumption. As to the goal of inclusivity, mention is made of developing AI systems that protect human rights, strengthening social safety nets, and ensuring safety from risks, including disasters and accidents. Compared to the commitments on AI safety, the declaration and ministerial statement have little to say about next steps by either companies or governments when it comes to innovation or inclusivity. This is not surprising given the complexity of these issues, but also underscores the challenge in keeping the Bletchley Park process focused on AI safety. Indeed, whether the Bletchley Park process can remain focused on delivering on AI safety will be key to its success.

At Seoul, a smaller group of 10 nations and the EU also released a declaration called the Seoul Statement of Intent toward International Cooperation on AI Safety Science, which committed signatories to creating an international network of national-level AI safety institutions. The development of AISIs has been the most concrete outcome from the Bletchley Park process so far. The U.K., the U.S., South Korea, Canada, Japan, Singapore, and France have set up AISIs, and technical cooperation between the EU AI Office and the U.S. AISI has already commenced. Currently, the AISIs are not regulators, and progress is needed on how the AISIs will operate, share best practices, and establish a testing framework for foundational AI models. In addition, for any global network of AISIs to strengthen outcomes on AI safety will require agreement on thresholds for risk as well as the standards and process for testing and verifying steps to understand and mitigate AI risk. The U.S. and the U.K. have already inked a memorandum of understanding committing their AI safety institutes to collaborate, and other MOUs between AISIs are expected. To facilitate collaboration on AI safety research, testing, and evaluation, the U.S. AISI has also concluded MOUs with Anthropic and OpenAI.

Another key outcome from Seoul was the Frontier AI Safety Commitments signed by 16 technology companies. These signatories agreed to publish safety frameworks used to measure risk, thresholds for what they will deem “intolerable” risk, and a promise not to deploy models that exceed their particular thresholds. It is likely that these frameworks and thresholds will be published prior to or during the Paris AI Summit.

While marking a significant step forward in terms of international cooperation on AI safety, the commitments so far come with a lower degree of specificity than the White House voluntary AI commitments or the OECD Code of Conduct for Organizations Developing AI Systems. For example, the White House voluntary commitments and OECD Code of Conduct include specific commitments to red teaming for assessing AI risk, commitments to watermarking content to distinguish AI-generated content, as well as relatively detailed commitments on release of transparency reports to help users understand the capabilities and limitations of frontier AI systems. That said, the commitment in the Frontier AI Safety Commitments to “set out thresholds at which severe risks posed by a model or system, unless adequately mitigated, would be deemed intolerable” is not reflected in the voluntary commitments or the OECD Code of Conduct and holds out the promise of a globally harmonized approach.

Another important aspect of the Frontier AI Safety Commitments was the range of companies that are signatories. They include the Chinese AI company Zhipu.ai, the UAE technology conglomerate and AI developer G42, and UAE Technology Innovation Institute. Beijing’s approval for Zhipu.ai’s participation is likely a trial balloon for Beijing’s AI regulators to determine the implications of allowing the country’s leading AI firms to sign on, even where the government is reluctant to do so. The role of the Chinese government, Chinese AI safety organizations, and Chinese AI technology platforms and startups will be one of the critical issues under discussion among the Bletchley Park process organizers in the run-up to the Paris Summit in early 2025.

Since the Seoul Summit: New foundational AI models and government action

Within the broader AI sector there have been major developments since the U.K. Bletchley Summit last November that will impact aspects of the preparations for the Paris meeting in February 2025. The U.S. AISI appears to be ramping up capacity for testing models and will be convening a meeting of all AISIs in San Francisco in November 2024. The U.K.’s AI safety institute appears to have already developed capacity fairly rapidly, having released its open source “Inspect” platform, which provides benchmarks to evaluate model capabilities. Singapore has also released an AI testing framework and toolkit.

In addition, developments in the U.S. at the state level are shaping AI governance outcomes that will be relevant for U.S. leadership on AI safety. In particular, California’s controversial SB-1047 AI bill which passed the legislature in late August and was vetoed in September by Governor Gavin Newsom because the bill did not adequately address AI risk as it only applied to foundational AI models. At the federal level, Senate Majority Leader Chuck Schumer in late August suggested that AI legislation could be coming soon. How these developments play out in the U.S. will likely affect whether the voluntary AI commitments that currently underpin the Bletchley Park process turn into something more binding.

Within the broader industry, the development and release of even more powerful foundational AI models continues, underscoring just how important and difficult it will be for international AI governance to keep pace. This includes OpenAI’s ChatGPT 4 and Google’s Claude, along with more advanced open source models from Meta and French AI company Mistral. At a minimum, rapid progress in both the power of foundational AI models and in the availability of open source models underscores the importance of making progress on AI safety and developing a globally networked approach to assessing AI risk and agreeing thresholds beyond which AI risk is unacceptable. Rapid developments in the capacity of foundational AI models also underscore how a nimble, iterative, and networked approach to international AI governance such as the Bletchley Park process will be needed if frameworks for international cooperation on AI can keep pace with AI innovation. The Bletchley Park process, with its focus on networking AISIs, regular convenings to assess progress, and inclusion of nongovernment stakeholders, could be an important element of the evolving AI governance architecture.

Looking ahead: Fragile multilateral processes, geopolitics likely to be more important

The next several months will be critical to determining the direction and relative success of the Bletchley Park process, which appears somewhat fragile despite the clear progress over the past year. So far, the process appears to have survived the change of government in the U.K., with the new Labor government being supportive of work on AI regulation.

In the United States, the November election will be a major inflection point for U.S. participation in international efforts to cooperate on AI safety. The Republican platform, for example, calls for the revocation of the Biden administration’s AI executive order, and it remains unclear how a second Trump administration would view the Bletchley Park process and the participation of China. An administration led by Kamala Harris would almost certainly continue to resource existing efforts on AI.

In Europe, the EU is forming a new commission, and outcomes here will also matter for the evolving EU approach to international cooperation. Now that that the AI Act has passed, the focus has shifted to the development of AI standards, including the extent that the EU AI Office will more formally engage with the other AISIs.

In terms of the Paris meeting, there are a host of issues that need further articulation. This includes developing standards for AI risk and assessing the effectiveness of measures to mitigate AI risk. The rapid development of the capacity of foundational AI models also creates new challenges for building scientific consensus on AI risk. As noted, the interim AI report is a first step in this direction. This approach is modeled on the work of the Intergovernmental Panel on Climate Change (IPCC), which convenes experts to produce annual assessments on the risks from climate change. However, climate change and its impacts are relatively slow moving compared to developments in AI models. Going forward, it will be important to find a more iterative and perhaps less formal approach to assessing AI risk that can keep pace and still inform AI safety in a meaningful way.

Over the next year, governments and companies engaged in the Bletchley Park process, as well as other international efforts on AI in the G7, the OECD, and the U.N., will all be grappling with how to balance the need for AI regulation with the importance of also supporting innovation. A successful outcome from the Paris Summit could showcase how a globally networked approach to AI safety can address AI safety, support innovation, and remain nimble enough to respond to rapid developments in the power of AI models.

Authors