Emerging Technology

How Do OpenAI’s Efforts To Make GPT-4 “Safer” Stack Up Against The NIST AI Risk Management Framework?

05.11.23 | 9 min read | Text by Liam Alexander & Divyansh Kaushik

In March, OpenAI released GPT-4, another milestone in a wave of recent AI progress. This is OpenAI’s most advanced model yet, and it’s already being deployed broadly to millions of users and businesses, with the potential for drastic effects across a range of industries

But before releasing a new, powerful system like GPT-4 to millions of users, a crucial question is: “How can we know that this system is safe, trustworthy, and reliable enough to be released?” Currently, this is a question that leading AI labs are free to answer on their own–for the most part. But increasingly, the issue has garnered greater attention as many have become worried that the current pre-deployment risk assessment and mitigation methods like those done by OpenAI are insufficient to prevent potential risks, including the spread of misinformation at scale, the entrenchment of societal inequities, misuse by bad actors, and catastrophic accidents. 

This concern is central to a recent open letter, signed by several leading machine learning (ML) researchers and industry leaders, which calls for a 6-month pause on the training of AI systems “more powerful” than GPT-4 to allow more time for, among other things, the development of strong standards which would “ensure that systems adhering to them are safe beyond a reasonable doubt” before deployment. There’s a lot of disagreement over this letter, from experts who contest the letter’s basic narrative, to others who think that the pause is “a terrible idea” because it would unnecessarily halt beneficial innovation (not to mention that it would be impossible to implement). But almost all of the participants in this conversation tend to agree, pause or no, that the question of how to assess and manage risks of an AI system before actually deploying it is an important one. 

A natural place to look for guidance here is the National Institute of Standards and Technology (NIST), which released its AI Risk Management Framework (AI RMF) and an associated playbook in January. NIST is leading the government’s work to set technical standards and consensus guidelines for managing risks from AI systems, and some cite its standard-setting work as a potential basis for future regulatory efforts.

In this piece we walk through both what OpenAI actually did to test and improve GPT-4’s safety before deciding to release it, limitations of this approach, and how it compares to current best practices recommended by the National Institute of Standards and Technology (NIST). We conclude with some recommendations for Congress, NIST, industry labs like OpenAI, and funders.

What did OpenAI do before deploying GPT-4? 

OpenAI claims to have taken several steps to make their system “safer and more aligned”. What are those steps? OpenAI describes these in the GPT-4 “system card,” a document which outlines how OpenAI managed and mitigated risks from GPT-4 before deploying it. Here’s a simplified version of what that process looked like:

Was this enough? 

Though OpenAI says they significantly reduced the rates of undesired model behavior through the above process, the controls put in place are not robust, and methods for mitigating bad model behavior are still leaky and imperfect. 

OpenAI did not eliminate the risks they identified. The system card documents numerous failures of the current version of GPT-4, including an example in which it agrees to “generate a program calculating attractiveness as a function of gender and race.” 

Current efforts to measure risks also need work, according to GPT-4 red teamers. The Alignment Research Center (ARC) which assessed these models for “emergent” risks says that “the testing we’ve done so far is insufficient for many reasons, but we hope that the rigor of evaluations will scale up as AI systems become more capable.” Another GPT-4 red-teamer,  Aviv Ovadya, says that “if red-teaming GPT-4 taught me anything, it is that red teaming alone is not enough.” Ovadya recommends that future pre-deployment risk assessment efforts are improved using “violet teaming,” in which companies identify “how a system (e.g., GPT-4) might harm an institution or public good, and then support the development of tools using that same system to defend the institution or public good.” 

Since current efforts to measure and mitigate risks of advanced systems are not perfect, the question comes down to when they are “good enough.” What levels of risk are acceptable?  Today, industry labs like OpenAI can mostly rely on their own judgment when answering this question, but there are many different standards that could be used. Amba Kak, the executive director of the AI Now Institute, suggests a more stringent standard, arguing that regulators should require AI companies ”to prove that they’re going to do no harm” before releasing a system. To meet such a standard, new, much more systematic risk management and measurement approaches would be needed.

How did OpenAI’s efforts map on to NIST’s Risk Management Framework? 

NIST’s AI RMF Core consists of four main “functions,” broad outcomes which AI developers can aim for as they develop and deploy their systems: map, measure, manage, and govern. 

Framework users can map the overall context in which a system will be used to determine relevant risks that should be “on their radar” in that identified context. They can then measure identified risks quantitatively or qualitatively, before finally managing them, acting to mitigate risks based on projected impact. The govern function is about having a well-functioning culture of risk management to support effective implementation of the three other functions.

Looking back to OpenAI’s process before releasing GPT-4, we can see how their actions would align with each function in the RMF Core. This is not to say that OpenAI applied the RMF in its work; we’re merely trying to assess how their efforts might align with the RMF.  

Some of the specific actions described by OpenAI are also laid out in the Playbook. The Measure 2.7 function highlights “red-teaming” activities as a way to assess an AI system’s “security and resilience,” for example.

NIST’s resources provide a helpful overview of considerations and best practices that can be taken into account when managing AI risks, but they are not currently designed to provide concrete standards or metrics by which one can assess whether the practices taken by a given lab are “adequate.” In order to develop such standards, more work would be needed. To give some examples of current guidance that could be clarified or made more concrete:

So, across NIST’s AI RMF, while determining whether a given “outcome” has been achieved could be up for debate, nothing stops developers from going above and beyond the perceived minimum (and we believe they should). This is not a bug of the framework as it is currently designed, rather a feature, as the RMF “does not prescribe risk tolerance.” However, it is important to note that more work is needed to establish both stricter guidelines which leading labs can follow to mitigate risks from leading AI systems, and concrete standards and methods for measuring risk on top of which regulations could be built. 

Recommendations

There are a few ways that standards for pre-deployment risk assessment and mitigation for frontier systems can be improved: 

Congress

NIST

Industry Labs

Funders

publications
See all publications
Emerging Technology
Blog
Creating A Vision and Setting Course for the Science and Technology Ecosystem of 2050

To better understand what might drive the way we live, learn, and work in 2050, we’re asking the community to share their expertise and thoughts about how key factors like research and development infrastructure and automation will shape the trajectory of the ecosystem.

08.06.25 | 4 min read
read more
Emerging Technology
Blog
Why Listening Matters for Moonshot Programs: ARPA-I’s National Tour

Recognizing the power of the national transportation infrastructure expert community and its distributed expertise, ARPA-I took a different route that would instead bring the full collective brainpower to bear around appropriately ambitious ideas.

08.05.25 | 7 min read
read more
Emerging Technology
day one project
Policy Memo
Establish a Network of Centers of Excellence in Human Nutrition (CEHN) to Overcome the Data Drought in Nutrition Science Research

NIH needs to seriously invest in both the infrastructure and funding to undertake rigorous nutrition clinical trials, so that we can rapidly improve food and make progress on obesity.

08.04.25 | 12 min read
read more
Emerging Technology
day one project
Policy Memo
Terminal Patients Need Better Access to Drugs and Clinical Trial Information

Modernizing ClinicalTrials.gov will empower patients, oncologists, and others to better understand what trials are available, where they are available, and their up-to-date eligibility criteria, using standardized search categories to make them more easily discoverable.

07.30.25 | 18 min read
read more