Critical Thinking on Critical Minerals
How the U.S. Government Can Support the Development of Domestic Production Capacity for the Battery Supply Chain
Access to critical minerals supply chains will be crucial to the clean energy transition in the United States. Batteries for electric vehicles, in particular, will require the U.S. to consume an order of magnitude more lithium, nickel, cobalt, and graphite than it currently consumes. Currently, these materials are sourced from around the world. Mining of critical minerals is concentrated in just a few countries for each material, but is becoming increasingly geographically diverse as global demand incentivizes new exploration and development. Processing of critical minerals, however, is heavily concentrated in a single country—China—raising the risk of supply chain disruption.
To address this, the U.S. government has signaled its desire to onshore and diversify critical minerals supply chains through key legislation, such as the Bipartisan Infrastructure Law and the Inflation Reduction Act, and trade policies. The development of new mining and processing projects entails significant costs, however, and project financiers require developers to demonstrate certainty that projects will generate profit through securing long-term offtake agreements with buyers. This is made difficult by two factors: critical minerals markets are volatile, and, without subsidies or trade protections, domestically-produced critical minerals have trouble competing against low-priced imports, making it difficult for producers and potential buyers to negotiate a mutually agreeable price (or price floor). As a result, progress in expanding the domestic critical minerals supply may not occur fast enough to catch up to the growing consumption of critical minerals.
To accelerate project financing and development, the Department of Energy (DOE) should help generate demand certainty through backstopping the offtake of processed, battery-grade critical minerals at a minimum price floor. Ideally, this would be accomplished by paying producers the difference between the market price and the price floor, allowing them to sign offtake agreements and sell their products at a competitive market price. Offtake agreements, in turn, allow developers to secure project financing and proceed at full speed with development.
While demand-side support can help address the challenges faced by individual developers, market-wide issues with price volatility and transparency require additional solutions. Currently, the pricing mechanisms available for battery-grade critical minerals are limited to either third-party price assessments with opaque sources or the market exchange traded price of imperfect proxies. Concerns have been raised about the reliability of these existing mechanisms, hindering market participation and complicating discussions on pricing.
As the North American critical minerals industry and market develops, DOE should support the parallel development of more transparent, North American based pricing mechanisms to improve price discovery and reduce uncertainty. In the short- and medium-term, this could be accomplished through government-backed auctions, which could be combined with offtake backstop agreements. Auctions are great mechanisms for price discovery, and data from them can help improve market price assessments. In the long-term, DOE could support the creation of new market exchanges for trading critical minerals in North America. Exchange trading enables greater price transparency and provides opportunities for hedging against price volatility.
Through this two-pronged approach, DOE would simultaneously accelerate the development of the domestic critical minerals supply chain through addressing short-term market needs, while building a more transparent and reliable marketplace for the future.
Introduction
The global transportation system is currently undergoing a transition to electric vehicles (EVs) that will fundamentally transform not only our transportation system, but also domestic manufacturing and supply chains. Demand for lithium ion batteries, the most important and expensive component of EVs, is expected to grow 600% by 2030 compared to 2023, and the U.S. currently imports a majority of its lithium batteries. To ensure a stable and successful transition to EVs, the U.S. needs to reduce its import-dependence and build out its domestic supply chain for critical minerals and battery manufacturing.
Crucial to that will be securing access to battery-grade critical minerals. Lithium, nickel, cobalt, and graphite are the primary critical minerals used in EV batteries. All four were included in the 2023 Department of Energy (DOE) Critical Minerals List. Cobalt and graphite are considered at risk of shortage in the short-term (2020-2025), while all four materials are at risk in the medium-term (2025-2030).
As shown in Figure 1, the domestic supply chain for batteries and critical minerals consists primarily of downstream buyers like automakers and battery assemblers, though there are a growing number of battery cell manufacturers thanks to domestic sourcing requirements in the Inflation Reduction Act (IRA) incentives. The U.S. has major gaps in upstream and midstream activities—mining of critical minerals, refining/processing, and the production of active materials and battery components. These industries are concentrated globally in a small number of countries, presenting supply chain risks. By developing new domestic industries within these gaps, the federal government can help build out new, resilient clean energy supply chains.
This report is organized into three main sections. The first section provides an overview of current global supply chains and the process of converting different raw materials into battery-grade critical minerals. The second section delves into the pricing and offtake challenges that projects face and proposes demand-side support solutions to provide the price and volume certainty necessary to obtain project financing. The final section takes a look at existing pricing mechanisms and proposes two approaches that the government can take to facilitate price discovery and transparency, with an eye towards mitigating market volatility in the long term. Given DOE’s central role in supporting the development of domestic clean energy industries, the policies proposed in this report were designed with DOE in mind as the main implementer.

Adapted from Li-BRIDGE
Segments highlighting in light blue indicated gaps in U.S. supply chains. See original graphic from Li-BRIDGE for more information.
Section 1. Understanding Critical Minerals Supply Chains
Global Critical Minerals Sources
Globally, 65% or more of processed lithium, cobalt, and graphite originates from a single country: China (Figure 2). This concentration is particularly acute for graphite, 91% of which was processed by China in 2023. This market concentration has made downstream buyers in the U.S. overly dependent on sourcing from a single country. The concentration of supply chains in any one country makes them vulnerable to disruptions within that country—whether they be natural disasters, pandemics, geopolitical conflict, or macroeconomic changes. Moreover, lithium, nickel, cobalt, and graphite are all expected to experience shortages over the next decade. In the case of future shortages, concentration in other countries puts U.S. access to critical minerals at risk. Rocky foreign relations and competition between the U.S. and China over the past few years have put further strain on this dependence. In October 2023, China announced new export controls on graphite, though it has not yet restricted supply, in response to the U.S.’s export restrictions on semiconductor chips to China and other “foreign entities of concern” (FEOC).
Expanding domestic processing of critical minerals and manufacturing of battery components can help reduce dependence on Chinese sources and ensure access to critical minerals in future shortages. However, these efforts will hurt Chinese businesses, so the U.S. will also need to anticipate additional protectionist measures from China.
On the other hand, mining of critical minerals—with the exception of graphite and rare earth elements—occurs primarily outside of China. These operations are also concentrated in a small handful of countries, shown in Figure 3. Consequently, geopolitical disruptions affecting any of those primary countries can significantly affect the price and supply of the material globally. For example, Russia is the third largest producer of nickel. In the aftermath of Russia’s invasion of Ukraine at the beginning of 2022, expectations of shortages triggered a historic short squeeze of nickel on the London Metal Exchange (LME), the primary global trading platform, significantly disrupting the global market.
To address global supply chain concentration, new incentives and grant programs were passed in the IRA and the Bipartisan Infrastructure Law. These include the 30D clean vehicle tax credit, the 45X advanced manufacturing production credit, and the Battery Materials Processing Grants Program (see Domestic Price Premium section for further discussion). Thanks to these policies, there are now on the order of a hundred North American projects in mining, processing, and active1 material manufacturing in development. The success of these and future projects will help create new domestic sources of critical minerals and batteries to feed the EV transition in the U.S. However, success is not guaranteed. A number of challenges to investment in the critical minerals supply chain will need to be addressed first.
Battery Materials Supply Chain
Critical minerals are used to make battery electrodes. These electrodes require specific forms of critical minerals for their production processes: typically lithium hydroxide or carbonate, nickel sulfate, cobalt sulfate, and a blend of coated spherical graphite and synthetic graphite.2

Lithium hydroxide/carbonate typically comes from two sources: spodumene, a hard rock ore that is mined primarily in Australia, and lithium brine, which is primarily found in South America (Figure 3). Traditionally, lithium brine must be evaporated in large open-air pools before the lithium can be extracted, but new technologies are emerging for direct lithium extraction that significantly reduces the need for evaporation. Whereas spodumene mining and refining are typically conducted by separate entities, lithium brine operations are typically fully integrated. A third source of lithium that has yet to be put into commercial production is lithium clay. The U.S. is leading the development of projects to extract and refine lithium from clay deposits.

Nickel sulfate can be made from either nickel metal, which was historically the preferred feedstock, or directly from nickel intermediate products, such as mixed hydroxide precipitate and nickel matte, which are the feedstocks that most Chinese producers have switched to in the past few years (Figure 4). Though demand from batteries is driving much of the nickel project development in the U.S., since nickel metal has a much larger market than nickel sulfate, developers are designing their projects with the flexibility to produce either nickel metal or nickel sulfate.

Cobalt is primarily produced in the Democratic Republic of the Congo from cobalt-copper ore. Cobalt can also be found in lesser amounts in nickel and other metallic ores. Cobalt concentrate is extracted from cobalt-bearing ore and then processed into cobalt hydroxide. At this point, the cobalt hydroxide can be further processed into either cobalt sulfate for batteries or cobalt metal and other chemicals for other purposes.

Battery cathodes come in a variety of chemistries: lithium nickel manganese cobalt (NMC) is the most common in lithium-ion batteries thanks to its higher energy density, while lithium iron phosphate is growing in popularity for its affordability and use of more abundantly available materials, but is not as energy dense. Cathode active material (CAM) manufacturers purchase lithium hydroxide/carbonate, nickel sulfate, and cobalt sulfate and then convert them into CAM powders. These powders are then sold to battery cell manufacturers, who coat them onto copper electrodes to produce cathodes.

Graphite can be synthesized from petroleum needle coke, a fossil fuel waste material, or mined from natural deposits. Natural graphite typically comes in the form of flakes and is reshaped into spherical graphite to reduce its particle size and improve its material properties. Spherical graphite is then coated with a protective layer to prevent unwanted chemical reactions when charging and discharging the battery.

The majority of battery anodes on the market are made using just graphite, so there is no intermediate step between processors and battery cell manufacturers. Producers of battery-grade synthetic graphite and coated spherical graphite sell these materials directly to cell manufacturers, who coat them onto electrodes to make anodes. These battery-grade forms of graphite are also referred to as graphite anode powder or, more generally, as anode active materials. Thus, the terms graphite processor and graphite anode manufacturer are interchangeable.
Section 2. Building Out Domestic Production Capacity
Challenges Facing Project Developers
Offtake Agreements
Offtake agreements (a.k.a. supply agreements or contracts) are an agreement between a producer and a buyer to purchase a future product. They are a key requirement for project financing because they provide lenders and investors with the certainty that if a project is built, there will be revenue generated from sales to pay back the loan and justify the valuation of the business. The vast majority of feedstocks and battery-grade materials are sold under offtake agreements, though small amounts are also sold on the spot market in one-off transactions. Offtake agreements are made at every step of the supply chain: between miners and processors (if they’re not vertically integrated), between processors and component manufacturers; and between component manufacturers and cell manufacturers. Due to domestic automakers’ concerns about potential material shortages upstream and the desire to secure IRA incentives, many of them have also been entering into offtake agreements directly with North American miners and processors. Tesla has started constructing their own domestic lithium processing plant.
Historically, these offtake agreements were structured as fixed-price deals. However, when prices on the spot market go too high, sellers often find a way to rip up the contract, and vice versa, when spot prices go too low, buyers often find a way to get out of the contract. As a result, more and more offtake agreements for battery-grade lithium, nickel, and cobalt have become indexed to spot prices, with price floors and/or ceilings set as guardrails and adjustments for premiums and discounts based on other factors (e.g. IRA compliance, risk from a greenfield producer, etc.).
Graphite is the one exception where buyers and suppliers have mostly stuck to fixed-price agreements. There are two main reasons for this: graphite pricing is opaque and products exhibit much more variation, complicating attempts to index the price. As a result, cell manufacturers don’t consider the available price indexes to accurately reflect the value of the specific products they are buying.
Offtake agreements for battery cells are also typically partially indexed on the price of the critical minerals used to manufacture them. In other words, a certain amount of the price per unit of battery cell is fixed in the agreement, while the rest is variable based on the index price of critical minerals at the time of transaction.
Domestic critical minerals projects face two key challenges to securing investment and offtake agreements: market volatility and a lack of price competitiveness. The price difference between materials produced domestically and those produced internationally stems from two underlying causes: the current oversupply from Chinese-owned companies and the domestic price premium.
Market Volatility
Lithium, cobalt, and graphite have relatively low-volume markets with a small customer base compared to traditional commodities. Low-volume products experience low liquidity, meaning it can be difficult to buy or sell quickly, so slight changes in supply and demand can result in sharp price swings, creating a volatile market. Because of the higher risk and smaller market, companies and investors tend to prefer mining and processing of base metals, such as copper, which have much larger markets, resulting in underinvestment in production capacity.
In comparison, nickel is a base metal commodity, primarily used for stainless steel production. However, due to its rapidly growing use in battery production, its price has become increasingly linked to other battery materials, resulting in greater volatility than other base metals. Moreover, the short squeeze in 2022 forced LME to suspend trading and cancel transactions for the first time in three decades. As a result, trust in the price of nickel on LME faltered, many market participants dropped out, and volatility grew due to low trading volumes.
For all four of these materials, prices reached record highs in 2022 and subsequently crashed in 2023 (Figure 4). Nickel, cobalt, and graphite experienced price declines of 30-45%, while lithium prices dropped by an enormous 75%. As discussed above, market volatility discourages investment into critical minerals production capacity. The current low prices have caused some domestic projects to be paused or canceled. For example, Jervois halted operation of its Idaho cobalt mine in March 2023 due to cobalt prices dropping below its operating costs. In January 2024, lithium giant Albemarle announced that it was delaying plans to begin construction on a new South Carolina lithium hydroxide processing plant.
Retrospective analysis suggests that mining companies, battery investors, and automakers had all made overly optimistic demand projections and ramped up their production a bit too fast. These projections assumed that EV demand would keep growing as fast as it did immediately after the pandemic and that China’s lifting of pandemic restrictions would unlock even faster growth in the largest EV market. Instead, China, which makes up over 60% of the EV market, emerged into an economic downturn, and global demand elsewhere didn’t grow quite as fast as projected, as backlogs built up during the pandemic were cleared. (It is important to note that the EV market is still growing at significant rates—global EV sales increased by 35% from 2022 to 2023—just not as fast as companies had wished.) Consequently, supply has temporarily outpaced demand. Midstream and upstream companies stopped receiving new purchase orders while automakers worked through their stock build-up. Prices fell rapidly as a result and are now bottoming out. Some companies are waiting for prices to recover before they restart construction and operation of existing projects or invest in expanding production further.
While companies are responding to short-term market signals, the U.S. government needs to act in anticipation of long-term demand growth outpacing current planned capacity. Price volatility in critical minerals markets will need to be addressed to ensure that companies and financiers continue investing in expanding production capacity. Otherwise, demand projections suggest that the supply chain will experience new shortages later this decade.
Oversupply
The current oversupply of critical minerals has been exacerbated by below market-rate financing and subsidies from the Chinese government. Many of these policies began in 2009, incentivizing a wave of investment not just in China, but also in mineral-rich countries. These subsidies played a large role in the 2010s in building out nascent battery critical minerals supply chains. Now, however, they are causing overproduction from Chinese-owned companies, which threatens to push out competitors from other countries.
Overproduction begins with mining. Chinese companies are the primary financial backers for 80% of both the Democratic Republic of the Congo’s cobalt mines and Indonesia’s nickel mines. Chinese companies have also expanded their reach in lithium, buying half of all the lithium mines offered for sale since 2018, in addition to domestically mining 18% of global lithium. For graphite, 82% of natural graphite was mined directly in China in 2023, and nearly all natural and synthetic graphite is processed in China.
After the price crash in 2023, while other companies pulled back their production volume significantly, Chinese-owned companies pulled back much less and in some cases continued to expand their production, generating an oversupply of lithium, cobalt, nickel, and natural and synthetic graphite. Government policies enabled these decisions by making it financially viable for Chinese companies to sell materials at low prices that would otherwise be unsustainable.
Domestic Price Premium (and Current Policies Addressing It)
Domestically-produced critical minerals and battery electrode active materials come with a higher cost of production over imported materials due to higher wages and stricter environmental regulations in the U.S. The IRA’s new 30D and 45X tax credit and upcoming section 301 tariffs help address this problem by creating financial incentives for using domestically produced materials, allowing them to compete on a more even playing field with imported materials.
The 30D New Clean Vehicle Tax Credit provides up to $7,500 per EV purchased, but it requires eligible EVs to be manufactured from critical minerals and battery components that are FEOC-compliant, meaning they cannot be sourced from companies with relationships to China, North Korea, Russia, and Iran. It also requires that an increasing percentage of critical minerals used to make the EV batteries be extracted or processed in the U.S. or a Free Trade Agreement country. These two requirements apply to lithium, nickel, cobalt, and graphite. For graphite, however, since nearly all processing occurs in China and there is currently no domestic supply, the US Treasury has chosen to exempt it from the 30D tax credit’s FEOC and domestic sourcing requirements until 2027 to give automakers time to develop alternate supply chains.
The 45X Advanced Manufacturing Production Tax Credit subsidizes 10% of the production cost for each unit of critical minerals processed. The Internal Revenue Service’s proposed regulations for this tax credit interprets the legislation for 45X as applying only to the value-added production cost, meaning that the cost of purchasing raw materials and processing chemicals is not included in the covered production costs. This limits the amount of subsidy that will be provided to processors. The strength of 45X, though, is that unlike the 30D tax credit, there is no sunset clause for critical minerals, providing a long term guarantee of support.
In terms of tariffs, the Biden administration announced in May 2024, a new set of section 301 tariffs on Chinese products, including EVs, batteries, battery components, and critical minerals. The critical minerals tariffs include a 25% tariff on cobalt ores and concentrates that will go into effect in 2024 and a 25% tariff on natural flake graphite that will go into effect in 2026. In addition, there are preexisting 25% tariffs in section 301 for natural and synthetic graphite anode powder. These tariffs were previously waived to give automakers time to diversify their supply chains, but the U.S. Trade Representative (USTR) announced in May 2024 that the exemptions would expire for good on June 14th, 2024, citing the lack of progress from automakers as a reason for not extending them.
Current State of Supply Chain Development
For lithium, despite market volatility, offtake demand for existing domestic projects has remained strong thanks to IRA incentives. Based on industry conversations, many of the projects that are developed enough to make offtake agreements have either signed away their full output capacity or are actively in the process of negotiating agreements. Strong demand combined with tax incentives has enabled producers to negotiate offtake agreements that guarantee a price floor at or above their capital and operating costs. Lithium is the only material for which the current planned mining and processing capacity for North America is expected to meet demand from planned U.S. gigafactories.
Graphite project developers report that the 25% tariff coming into force will be sufficient to close the price gap between domestically produced materials and imported materials, enabling them to secure offtake agreements at a sustainable price. Furthermore, the Internal Revenue Service will require 30D tax credit recipients to submit period reports on progress that they are making on sourcing graphite outside of China. If automakers take these reports and the 2027 exemption deadline seriously, there will be even more motivation to work with domestic graphite producers. However, the current planned production capacity for North America still falls significantly short of demand from planned U.S. battery gigafactories. Processing capacity is the bottleneck for production output, so there is room for additional investment in processing capacity.
Pricing has been a challenge for cobalt though. Jervois briefly opened the only primary cobalt mine in the U.S. before shutting down a few months later due to the price crash. Jervois has said that as soon as prices for standard-grade cobalt rise above $20/pound, they will be able to reopen the mine, but that has yet to happen. Moreover, the real bottleneck is in cobalt processing, which has attracted less attention and investment than other critical minerals in the U.S. There are currently no cobalt sulfate refineries in North America; only one or two are in development in the U.S. and a few more in Canada.3
Nickel sulfate is also facing pricing challenges, and, similar to cobalt, there is an insufficient amount of nickel sulfate processing capacity being developed domestically. There is one processing plant being developed in the U.S. that will be able to produce either nickel metal or nickel sulfate and a few more nickel sulfate refineries being developed in Canada.
Policy Solutions to Support the Development of Processing Capacity
The U.S. government should prioritize the expansion of processing capacity for lithium, graphite, cobalt, and nickel. Demand from domestic battery manufacturing is expected to outpace the current planned capacity for all of these materials, and processing capacity is the key bottleneck in the supply chain. Tariffs and tax incentives have resulted in favorable pricing for lithium and graphite project developers, but cobalt and nickel processing has gotten less support and attention.
DOE should provide demand-side support for processed, battery-grade critical minerals to accelerate the development of processing capacity and address cobalt and nickel pricing needs. The Office of Manufacturing and Energy Supply Chains (MESC) within DOE would be the ideal entity to administer such a program, given its mandate to address vulnerabilities in U.S. energy supply chains. In the immediate term, funding could come from MESC’s Battery Materials Processing Grants program, which has roughly $1.9B in remaining, uncommitted funds. Below we propose a few demand-support mechanisms that MESC could consider.
Long term, the Bipartisan Policy Center proposes that Congress establish and appropriate funding for a new government corporation that would take on the responsibility of administering demand-support mechanisms as necessary to mitigate volume and price uncertainty and ensure that domestic processing capacity grows to sufficiently meet critical minerals needs.
Offtake Backstops
Offtake backstops would commit MESC to guaranteeing the purchase of a specific amount of materials at a minimum negotiated price if producers are unable to find buyers at that price. This essentially creates a price floor for specific producers while also providing a volume guarantee. Offtake backstops help derisk project development and enable developers to access project financing. Backstop agreements should be made for at least the first five years of a plant’s operations, similar to a regular offtake agreement. Ideally, MESC should prioritize funding for critical minerals with the largest expected shortages based on current planned capacity—i.e., nickel, cobalt, and graphite.
There are two primary ways that DOE could implement offtake backstops:
First. The simplest approach would be for DOE to pay processors the difference between the spot price index (adjusted for premiums and discounts) and the pre-negotiated price floor for each unit of material, similar to how a pay-for-difference or one-sided contract-for-difference would work.4 This would enable processors to sign offtake agreements with no price floor, accelerating negotiations and thus the pace of project development. Processors could also choose to keep some of their output capacity uncommitted so that they can sell their products on the spot market without worrying about prices collapsing in the future.
A more limited form of this could look like DOE subsidizing the price floor for specific offtake agreements between a processor and a buyer. This type of intervention requires a bit more preliminary work from processors, since they would have to identify and bring a buyer to the table before applying for support.
Second. Purchasing the actual materials would be a more complex route for DOE to take, since the agency would have to be ready to receive delivery of the materials. The agency could do this by either setting up a system of warehouses suitable for storing battery-grade critical minerals or using “virtual warehousing,” as proposed by the Bipartisan Policy Center. An actual warehousing system could be set up by contracting with existing U.S. warehouses, such as those in LME and CME’s networks, to expand or upgrade their facilities to store critical minerals. These warehouses could also be made available for companies’ to store their private stockpiles, increasing the utility of the warehousing system and justifying the cost of setting it up. Virtual warehousing would entail DOE paying producers to store materials on-site at their processing plants.
The physical reserve provides an additional opportunity for DOE to address market volatility by choosing when it sells materials from the reserve. For example, DOE could pause sales of a material when there is an oversupply on the market and prices dip or ramp up sales when there is a shortage and prices spike. However, this can only be used to address short-term fluctuations in supply and demand (e.g. a few months to a few years at most), since these chemicals have limited shelf lives.
A third way to implement offtake backstops that would also support price discovery and transparency is discussed in Section 3.
Section 3. Creating Stable and Transparent Markets
Concerns about Pricing Mechanisms
Market volatility in critical minerals markets has raised concerns about just how reliable the current pricing mechanisms for these markets are. There are two main ways that prices in a market are determined: third-party price assessments and market exchanges. A third approach that has attracted renewed attention this year is auctions. Below, we walk through these three approaches and propose potential solutions for addressing challenges in price discovery and transparency.
Index Pricing
Price reporting agencies like Fastmarkets and Benchmark Mineral Intelligence offer subscription services to help market participants assess the price of commodities in a region. These agencies develop rosters of companies for each commodity, who regularly contribute information on transaction prices. That intel is then used to generate price indexes. Fastmarkets and Benchmark’s indexes are primarily based on prices provided by large, high-volume sellers and buyers. Smaller buyers may pay higher than index prices.
It can be hard to establish reliable price indexes in immature markets if there is an insufficient volume of transactions or if the majority of transactions are made by a small set of companies. For example, lithium processing is concentrated among a small number of companies in China and spot transactions are a minority share of the market. New entrants and smaller producers have raised concern that these companies have significant control over Asian spot prices reported by Fastmarkets and Benchmark, which are used to set offtake agreement prices, and that the price indexes are not sufficiently transparent.
Exchange Trading
Market exchanges are a key feature of mature markets that helps reduce market volatility. Market exchanges allow for a wider range of participants, improving market liquidity, and enables price discovery and transparency. Companies up and down the supply chain can use physically-delivered futures and options contracts to hedge against price volatility and gain visibility into expectations for the market’s general direction to help inform decision-making. This can help derisk the effect of market volatility on investments in new production capacity.
Of the materials we’ve discussed, nickel and cobalt metal are the only two that are physically traded on a market exchange, specifically LME. Metals make good exchange commodities due to their fungibility. Other forms of nickel and cobalt are typically priced as a percentage of the payable price for nickel and cobalt metal. LME’s nickel price is used as the global benchmark for many nickel products, while the in-warehouse price of cobalt metal in Rotterdam, Europe’s largest seaport, is used as the global benchmark for many cobalt products. These pricing relationships enable companies to use nickel and cobalt metal as proxies for hedging related materials.
After nickel trading volumes plummeted on LME in the wake of the short squeeze, doubts were raised about LME’s ability to accurately benchmark its price, sparking interest in alternative exchanges. In April 2024, UK-based Global Commodities Holdings Ltd (GCHL) launched a new trading platform for nickel metal that is only available to producers, consumers, and merchants directly involved in the physical market, excluding speculative traders. The trading platform will deliver globally “from Baltimore to Yokohama.” GCHL is using the prices on the platform to publish its own price index and is also working with Intercontinental Exchange to create cash-settled derivatives contracts. This new platform could potentially expand to other metals and critical minerals.
In addition to LME’s troubles though, changes in the battery supply chain have led to a growing divergence between the nickel and cobalt metal traded on exchanges and the actual chemicals used to make batteries. Chinese processors who produce most of the global supply of nickel sulfate have mostly switched from nickel metal to cheaper nickel intermediate products as their primary feedstock. Consequently, market participants say that the LME exchange price for nickel metal, which is mostly driven by stainless steel, no longer reflects market conditions for the battery sector, raising the need for new tradeable contracts and pricing mechanisms. For the cobalt industry, 75% of demand comes from batteries, which use cobalt sulfate. Cobalt metal makes up only 18% of the market, of which only 10-15% is traded on the spot market. As a result, cobalt chemicals producers have transitioned away from using the metal reference price towards fixed-prices or cobalt sulfate payables.
These trends motivate the development of new exchange contracts for physically trading nickel and cobalt chemicals that can enable price discovery separate from the metals markets. There is also a need to develop exchange contracts for materials like lithium and graphite with immature markets that exhibit significant volatility.
However, exchange trading of these materials is complicated by their nature as specialty chemicals: they have limited shelf lives and more complex storage requirements, unlike metal commodities. Lithium and graphite products also exhibit significant variations that affect how buyers can use them. For example, depending on the types and level of impurities in lithium hydroxide/carbonate, manufacturers of cathode active materials may need to conduct different chemical processes to remove them. Offtakers may also require that products meet additional specifications based on the characteristics they need for their CAM and battery chemistries.
For these reasons, major exchanges like LME, the Chicago Mercantile Exchange (CME), and the Singapore Exchange (SGX) have instead chosen to launch cash-settled contracts for lithium hydroxide/carbonate and cobalt hydroxide that allow for financial trading, but require buyers and sellers to arrange physical delivery separately from the exchange. Large firms have begun to participate increasingly in these derivatives markets to hedge against market volatility, but the lack of physical settlement limits their utility to producers who still need to physically deliver their products in order to make a profit. Nevertheless, CME’s contracts for lithium and cobalt have seen significant growth in transaction volume. LME, CME, and SGX all use Fastmarkets’ price indexes as the basis for their cash-settled contracts.
As regional industries mature and products become more standardized, these exchanges may begin to add physically settled contracts for battery-grade critical minerals. For example, the Guangzhou Futures Exchange (GFEX) in China, where the vast majority of lithium refining currently occurs, began offering physically settled contracts for lithium carbonate in August 2023. Though the exchange exhibited significant volatility in its first few months, raising concerns, the first round of physical deliveries in January 2024 occurred successfully, and trading volumes have been substantial this year. Access to GFEX is currently limited to Chinese entities and their affiliates, but another trading platform could come to do the same for North America over the next few decades as lithium production volume grows and a spot market emerges. Abaxx Exchange, a Singapore-based startup, has also launched a physically settled futures contract for nickel sulfate with delivery points in Singapore and Rotterdam. A North American delivery point could be added as the North American supply chain matures.
No market exchange for graphite currently exists, since products in the industry vary even greater than other materials. Even the currently available price indexes are not seen as sufficiently robust for offtake pricing.
Auctions
In the absence of a globally accessible market exchange for lithium and concerns about the transparency of index pricing, Albemarle, the top producer of lithium worldwide, has turned to auctions of spodumene concentrate and lithium carbonate as a means to improve market transparency and an “approach to price discovery that can lead to fair product valuation.” Albemarle’s first auction in March of spodumene concentrate in China closed at a price of $1200/ton, which was in line with spot prices reported by Asian Metal, but about 10% greater than prices provided by other price reporting agencies like Fastmarkets. Plans are in place to continue conducting regular auctions at the rate of about one per week in China and other locations like Australia. Lithium hydroxide will be auctioned as well. Auction data will be provided to Fastmarkets and other price reporting agencies to be formulated into publicly available price indexes.
Auctions are not a new concept: in 2021 and 2022, Pilbara Minerals regularly conducted auctions of spodumene on its own platform Battery Metals Exchange, helping to improve market sentiment. Now, though, the company says that most of its material is now committed to offtakers, so auctions have mostly stopped, though it did hold an auction for spodumene concentrate in March. If other lithium producers join Albemarle in conducting auctions, the data could help improve the accuracy and transparency of price indexes. Auctions could also be used to inform the pricing of other battery-grade critical minerals.
Policy Solutions to Support Price Discovery and Transparency Across the Market
Right now, the only pricing mechanisms available to domestic project developers are spot price indexes for battery-grade critical minerals in Asia or global benchmarks for proxies like nickel and cobalt metal. Long-term, the development of new pricing mechanisms for North America will be crucial to price discovery and transparency in this new market. There are two ways that DOE could help facilitate this: one that could be implemented immediately for some materials and one that will require domestic production volume to scale up first.
First. Government-Backed Auctions: Auctions require project developers to keep a portion of their expected output uncommitted to any offtakers. However, there is a risk that future auctions won’t generate a price sufficient to offset capital and operating expenses, so processors are unlikely to do this on their own, especially for their first domestic project. MESC could address this by providing a backstop guarantee for the portion of a producer’s output that they commit to regularly auctioning for a set timespan. If, in the future, auctions are unable to generate a price above a pre-negotiated price floor, then DOE would pay sellers the difference between the highest auction price and the price floor for each unit sold. Such an agreement could be made using DOE’s Other Transaction Authority. DOE could separately contract with a platform such as MetalsHub to conduct the auction.
Government-backed auctions would enable the discovery of a true North American price for different battery-grade critical minerals and the raw materials used to make them, generating a useful comparison point with Asian spot prices. Such a scheme would also help address developers’ price and demand needs for project financing. These backstop-auction agreements could be complementary to the other types of backstop agreements proposed earlier and potentially more appealing than physically offtaking materials since the government would not have to receive delivery of the materials and there would be a built-in mechanism to sell the materials to an appropriate buyer. If successful, companies could continue to conduct auctions independently after the agreements expire.
Second. New Benchmark Contracts: Employ America has proposed that the Loan Programs Office (LPO) could use Section 1703 to guarantee lending to a market exchange to develop new, physically settled benchmark contracts for battery-grade critical minerals. The development of new contracts should include producers in the entire North American region. Canada also has a significant number of mines and processing plants in development. Including those projects would increase the number of participants, market volume, and liquidity of new benchmark contracts.
In order for auctions or new benchmark contracts to operate successfully, three prerequisites must be met:
- There must be a sufficient volume of materials available for sale (i.e. production output that is not committed to an offtaker).
- There must be sufficient product standardization in the industry such that materials produced by different companies can be used interchangeably by a significant number of buyers.
- There must be a sufficient volume of demand from buyers, brokers, and traders.
Market exchanges typically conduct research into stakeholders to understand whether or not the market is mature enough to meet these requirements before they launch a new contract. Interest from buyers and sellers must indicate that there would be sufficient trading volume for the exchange to make a profit greater than the cost of setting up the new contract. A loan from LPO under Section 1703 can help offset some of those upfront costs and potentially make it worthwhile for an exchange to launch a new contract in a less mature market than they typically would.
Government-backed auctions, on the other hand, solve the first prerequisite by offering guarantees to producers for keeping a portion of their production output uncommitted. Product standardization can also be less stringent, since each producer can hold separate auctions, with varying material specifications, unlike market exchanges where there must be a single set of product standards.
Given current market conditions, no battery-grade critical minerals can meet the above prerequisites for new benchmark contracts, primarily due to a lack of available volume, though there are also issues with product standardization for certain materials. However, nickel, cobalt, lithium, and graphite could be good candidates for government-backed auctions. DOE should start engaging with project developers that have yet to fully commit their output to offtakers and gauge their interest in backstop-auction agreements.
Nickel and Cobalt
As discussed prior, there are only a handful of nickel and cobalt sulfate refineries currently being developed in North America, making it difficult to establish a benchmark contract for North America. None of the project developers have yet signed offtake agreements covering their full production capacity, so backstop-auction agreements could be appealing to project developers and their investors. Given that more than half of the projects in development are located in Canada, MESC and DOE’s Office of International Affairs should collaborate with the Canadian government in designing and implementing government-backed auctions.
Lithium
Domestic companies have expressed interest in establishing North American-based spot markets and price indexes for lithium hydroxide and carbonate, but say that it will take quite a few years before production volume is large enough to warrant that. Product variation has also been a concern from lithium processors when the idea of a market exchange or public auction has been raised. Lessons could be learned from the GFEX battery-grade lithium carbonate contracts. GFEX set standards on the purity, moisture, loss on ignition, and maximum content of different impurities. Some Chinese companies were able to meet these standards, while others were not, preventing them from participating in the futures market or requiring them to trade their materials as lower-purity industrial-grade lithium carbonate, which sells for a discounted price. Other companies producing lithium of much higher quality than the GFEX standards, opted to continue selling on the spot market because they could charge a premium on the standard price. Despite some companies choosing not to participate, trading volumes on GFEX have been substantial, and the exchange was able to weather through initial concerns of a short squeeze, suggesting that challenges with product variation can be overcome through standardization.
Analysts have proposed that spodumene could be a better candidate for exchange trading, since it is fungible and does not have the limited shelf-life or storage requirements of lithium salts. 60% of global lithium comes from spodumene, and the U.S. has some of the largest spodumene deposits in the world, so spodumene would be a good proxy for lithium salts in North America. However, the two domestic developers of spodumene mines are planning to construct processing plants to convert the spodumene into battery-grade lithium on-site. Similarly, the two Canadian mines that currently produce spodumene are also planning to build their own processing plants. These vertical integration plans mean that there is unlikely to be large amounts of spodumene available for sale on a market exchange in the near future.
DOE could, however, work with miners and processors to sign backstop-auction agreements for smaller amounts of lithium hydroxide/carbonate and spodumene that they have yet to commit to offtakers. This may be especially appealing to companies that have announced delays to project development due to current low market prices and help derisk bringing timelines forward. Interest in these future auctions could also help gauge the potential for developing new benchmark contracts for lithium hydroxide/carbonate further down the line.
Graphite
Natural and synthetic graphite anode material products currently exhibit a great range of variation and insufficient product standardization, so a market exchange would not be viable at the moment. As the domestic graphite industry develops, DOE should work with graphite anode material producers and battery manufacturers to understand the types and degree of variations that exist across products and discuss avenues towards product standardization. Government-backed auctions could be a smaller-scale way to test the viability of product standards developed from that process, perhaps using several tiers or categories to group products. Natural and synthetic graphite would have to be treated separately, of course.
Conclusion
The current global critical minerals supply chain partially reflects the results of over a decade of focused, industrial policies implemented by the Chinese government. If the U.S. wants to lead the clean energy transition, critical minerals will also need to become a cornerstone of U.S. industrial policy. Developing a robust North American critical minerals industry would bolster U.S. energy security and independence and ensure a smooth energy transition.
Promising progress has already been made in lithium, with planned processing capacity expected to meet demand from future battery manufacturing. However, market and pricing challenges remain for battery-grade nickel, cobalt, and graphite, which will fall far short of future demand without additional intervention. This report proposes that DOE take a two-pronged approach to supporting the critical minerals industry through offtake backstops, which address project developers’ current pricing dilemmas, and the development of more reliable and transparent pricing mechanisms such as government-backed auctions, which will set up markets for the future.
While the solutions proposed in this report focus on DOE as the primary implementer, Congress also has a role to play in authorizing and appropriating new funding necessary to execute a cohesive industrial strategy on critical minerals . The policies proposed in this report can also be applied to other critical minerals crucial for the energy transition and our national security. Similar analysis of other critical minerals markets and end uses should be conducted to understand how these solutions can be tailored to those industry needs.
Building a Whole-of-Government Strategy to Address Extreme Heat
Comprehensive recommendations from +85 experts to enable a heat-resilient nation
From August 2023 to March 2024, the Federation of American Scientists (FAS) talked with +85 experts to source 20 high-demand opportunity areas for ready policy innovation and 65 policy ideas. In response, FAS recruited 33 authors to work on +18 policy memos through our Extreme Heat Policy Sprint from January 2024 to April 2024, generating an additional +100 policy recommendations to address extreme heat. Our experts’ full recommendations can be found here. In total, FAS has collected +165 recommendations for 34 offices and/or agencies. Key opportunity areas are described below and link out to a set of featured recommendations. Find the 165 policy ideas developed through expert engagement here.
America is rapidly barreling towards its next hottest summer on record. While we wait for a national strategy, states, counties, and cities around the country have taken up the charge of addressing extreme heat in their communities and are experimenting on the fly. California has announced $200 million to build resilience centers that protect communities from extreme heat and has created an all-of-government action plan to address extreme heat. Arizona, New Jersey, and Maryland are all actively developing extreme heat action plans of their own. Miami-Dade County considered passing some of the strictest workplace heat rules (although the measure ultimately failed). Additionally, New York City and Los Angeles have driven cool roof adoption through funding programs and local ordinances, which can reduce energy demands, improve indoor comfort, and potentially lower local outside air temperatures.
While state and local governments can make significant advances, national extreme heat resilience requires a “whole of government” federal approach, as it intersects health, energy, housing, homeland and national security, international relations, and many more policy domains. The federal government plays a critical role in scaling up heat resilience interventions through research and development, regulations, standards, guidance, funding sources, and other policy levers. But what are the transformational policy opportunities for action?
Sourcing Opportunities and Ideas for Policy Innovation
During Fall 2023, FAS engaged +85 experts in conversations around federal policies needed to address extreme heat. Our stakeholders included: 22 academic researchers, 33 non-profit organization leaders, 12 city and state government employees, 3 private company leaders, 2 current or former Congressional staffers, 3 National Labs leaders, and 10 current or former federal government employees. Our conversations were guided by the following four questions:
- What work are you currently doing to address extreme heat?
- What do you see as some of the opportunity areas to address extreme heat?
- What are the existing challenges to managing and responding to extreme heat?
- What actions should the federal government take to address extreme heat?
Our conversations with experts sourced 20 high-demand opportunity areas for policy innovation and 65 policy ideas. To go deeper, FAS recruited 33 authors to work on +18 policy memos through our Extreme Heat Policy Sprint, generating an additional +100 policy recommendations to address extreme heat’s impacts and build community resilience. Our policy memos from the Extreme Heat Policy Sprint, published in April 2024, provide a more comprehensive dive into many of the key policy opportunities articulated in this report. Overall, FAS’ work scoping the policy landscape, understanding the needs of key actors, identifying demand signals, and responding to these demands has generated +165 policy recommendations for 34 offices and/or agencies.
Opportunities for Extreme Heat Policy Innovation
The following 20 “opportunity areas” are not exhaustive, yet can serve as inspiration for the building blocks of a future strategic initiative.
Facilitate Government-Wide Coordination
The first opportunity is an overarching call to action: the need for a government-wide extreme heat strategic initiative. This can build upon the National Integrated Health Health Information System’s (NIHHIS) National Heat Strategy, set to release this year. This strategy would define the problems to solve, create targets and galvanizing goals, set and assign priorities for federal agencies, review available resources for financial assistance, assess regulatory and rulemaking authority where applicable, highlight legislative action, and include evaluation metrics and timeline for review, adjustment, and renewal of programs. In creating this strategy, one interviewee recommended there should be a comprehensive review of “heat exposure settings” and federal actors that can safeguard Americans in these settings: homes, workplaces, schools and childcare facilities, transit, senior living facilities, correctional facilities, and outdoor public spaces. Through scoping potential regulations, standards, guidelines, planning processes, research agendas, and financial assistance, the federal government will then be prepared to support its intergovernmental actors and communities.
Infrastructure And The Built Environment
Accelerate Resilient Cooling Technologies, Building Codes, and Urban Infrastructure
On average, Americans spend 90% of their time indoors, making the built environment a critical site for heat exposure mitigation. To keep cool, especially in places of the U.S. not used to extreme heat, buildings are increasingly reliant on mechanical cooling interventions. While a life-saving necessity, air conditioning (AC) consumes significant amounts of electricity, putting high demands on aging grid infrastructure during the hottest days. Excess heat from air conditioners can lead to higher outdoor temperatures and even more AC demand. Finally, ACs are useless interventions if there’s no power, an increasing risk due to growing energy poverty and grid failure. In these scenarios, our current construction is likely to widely “fail” in its ability to cool residents.
Resilient cooling strategies, like high-energy efficiency cooling systems, demand/response systems, and passive cooling interventions, need policy actions to rapidly scale for a warming world. For example, cool roofs, walls, and surfaces can keep buildings cool and less reliant on mechanical cooling, but are often not considered a part of weatherization audits and upgrades. District cooling, such as through networked geothermal, can keep entire neighborhoods cool while relying on little electricity, but is still in the demonstration project phase in the United States. Heat pumps are also still out of reach for many Americans, making it essential to design technologies that work for different housing types (i.e. affordable housing construction). Initiatives like the Department of Energy’s (DOE) Affordable Home Energy Shot can bring these technologies into reach for millions of Americans, but only if it is given sufficient financial resources. DOE’s Office of Clean Energy Demonstrations and State and Community Energy Programs FY25 budget request to strengthen heat resilience in disadvantaged communities through energy solutions could be a step towards realizing innovative heat technologies. Further, the Environmental Protection Agency’s Energy Star program can further incentivize low-power and resilient cooling technologies — if rebates are designed that take advantage of these technologies.
Thermal resilience of buildings must also be considered, for both day-to-day operations and emergency blackout scenarios. DOE can work with stakeholders to create “cool” building standards and metrics with human health and safety in mind, and integrate them into building codes like ASHREI 189.1 and 90 series. These codes are “win-wins” for building designers, creating buildings that consume far less electricity while keeping inhabitants safe from the heat. DOE can assist in conducting more demonstration projects for building strategies that ensure indoor survivability in everyday and extreme conditions.
Intervention efficacy and applicability are still evolving for extreme heat resilience interventions at the community scale, such as cool pavements, urban greening, shading, ventilation corridors, and development regulations (i.e. solar orientation). Individual interventions and their interactions need more evidence of their costs and benefits, potential tradeoffs and maladaptations. The National Institutes of Standards and Technology works on building and urban planning standards for other natural hazards, such as their National Windstorm Impact Reduction Program (NWIRP) and their Community Resilience program, and could serve as a “technology test-bed” for heat resilience practices and advance our understanding of their effectiveness as well as how to measure and account for benefits and costs. This could be done in partnership with the National Science Foundation, which has been dedicating funding for use-inspired research and technology development for climate resilience.
Finally, the U.S. government is the largest landlord in the nation. As the General Services Administration is rapidly decarbonizing its buildings, it can also be a test site for new technologies, building designs, planning, and resilience metrics development and analysis.
Adapt Transportation to the Heat
Public transportation is a site of high exposure to extreme heat. While the Department of Transportation’s Promoting Resilient Operations for Transformative, Efficient, and Cost-saving Transportation (PROTECT) grants are for “surface transportation resilience,” multiple of our local and regional government interviewees expressed difficulty successfully applying to these grants for “cooling” infrastructure, like water fountains, shade, and air-conditioned bus shelters. DOT should make extreme heat resilience explicit in its eligibility requirements as well as review the benefit-cost analysis (BCA) formula and how it might disadvantage cool infrastructure.
Asphalt and concrete roadways contribute to the urban heat island effect and hotter weather makes asphalt in particular more vulnerable to cracking. DOT should leverage its research and development (R&D) capabilities to develop and deploy reflective and cool materials as a part of transportation infrastructure improvements. Finally, DOT should also consider the levers available to incentivize cool surfaces and cool materials as a part of transportation construction.
Create More Heat-Resilient Schools for Sustained Learning
Higher temperatures combined with minimal to no air conditioning in older school buildings have led to an increase in the number of “heat days”, or school closures due to dangerous temperatures. Pulling children out of the classroom not only negatively impacts them, but also puts increasing strain on families that rely on schools as childcare. Even when school is in session, many students are attempting to learn in classrooms exceeding 80°F, a temperature threshold where studies have repeatedly shown that students struggle to learn and fall short of true academic performance. This is because heat reduces cognitive function and ability to concentrate – both essential to learning. Learning loss from rising heat will only compound the learning losses from the COVID-19 pandemic. The Environmental Protection Agency predicts that the total lost future income attributable to heat-related learning losses may reach $6.9 billion at 2°C (a threshold we are well on the way to meeting) and $13.4 billion at 4°C. Schools need guidance on how to deal with the heat crisis currently at hand, while being supported as they plan necessary climate adaptations needed for a hotter world.
At a minimum, schools can be encouraged to formalize plans for school heat preparedness to protect both the health of students and safeguard their learning. No federal heat safety recommendations yet exist and thus will need to be created by the Department of Education (Ed), EPA, FEMA, the National Oceanic and Atmospheric Administration (NOAA), and others. Title I Grants, in alignment with Justice40, could then assist schools in adapting to climate change that includes researched guidance on ways to cool students indoors, outdoors, and through behavioral management. Further, school system leaders need a better system to track how schools are currently experiencing extreme heat and what strategies could be employed to respond to heat exposure (closing schools, informed behavioral interventions to manage heat exposure, green infrastructure to build resilience, etc). Federal involvement is essential for creating this tool. Finally, to address the root causes of excessive classroom heat, schools will need to transform their infrastructure through HVAC investments and improvements, greening, playground material changes and shading. HVAC costs alone are expected to be $40 billion for all U.S. schools that need infrastructure improvements. While Inflation Reduction Act (IRA) tax credits are available for updating HVAC systems, many low-wealth schools will not be able to finance the gap between the credit coverage and the true cost and will need additional financial assistance.
Make Housing and Eviction Policy More Climate-Aware and Resilient
Most of the U.S. lacks minimum cooling requirements for buildings and existence of a cooling device within the property. Adoption of the latest building energy codes, despite their previously described limitations, can still be a cost-saving and life-saving advancement according to research by the DOE. For new properties, the Federal Housing Finance Agency could require that they adhere to the latest energy codes to receive a mortgage from Government Sponsored Enterprises, which is already under consideration by Housing and Urban Development (HUD) and the U.S. Department of Agriculture (USDA) for their mortgage products. For older construction, there could be requirements for adequate cooling to exist in the property at the point of sale.
For all property types, weatherization audits, through the Weatherization Assistance Program (WAP) and Low-Income Home Energy Assistance Program (LIHEAP), can be expanded to consider heat resilience and cooling efficiency of the property and then identify upgrades such as more efficient HVAC, building envelope improvements, cool roofs, cool walls, shade, and other infrastructure. If cooling the entire property is unfeasible or costly, homeowners could benefit from creating “Climate Safe Rooms” which are guaranteed to be safe during a heat wave. DOE and HUD could collaborate to demonstrate climate safe rooms in affordable housing, where many residents lack access to consistent cooling.
Some housing types are more risky than others. People living in manufactured homes in Arizona were 6 to 8 times more likely to die indoors due to extreme heat. This is because of poorly functioning or completely defunct cooling systems and/or inability to pay electric bills. Manufactured home park landlords can also set a variety of rules for homeowners, including banning cooling devices like window ACs and shade systems. While states like Arizona have now passed laws making these bans illegal, there is a need for a nationwide policy for secure access to cooling. HUD does not regulate manufactured homes parks, but does finance the parks through Section 207 mortgages and could stipulate park owners must guarantee resident safety. Finally, HUD could also update the Manufactured Home Construction and Safety Standards to allow for HVAC and other cooling regulations in local building codes to apply to manufactured homes, as they do for other forms of housing, as well as require homes perform to a certain level of cooling under high heat conditions.
Renter’s are another highly vulnerable population. Most states do not require landlords to provide cooling devices to tenants or keep housing below risky temperatures. HUD for example does not require cooling devices in public housing, although regulations exist for heating. HUD could implement similar guarantees of a “right to cool”. Evictions in the summer months are also on the rise, due to rising rents compounded with rising energy costs, putting people out in the deadly heat. Keeping people in housing should be of the utmost importance, yet implementation remains fractured across the nation. Eviction moratoriums at a national level have been challenged by the Supreme Court, which overturned the CDC’s COVID-19 moratorium.
Address Communities’ Needs for Long-Term Infrastructure Funding Support
Heat vulnerability mapping has advanced significantly in the past few years. Federal programs like the NIHHIS’s Urban Heat Island Mapping Campaigns have mapped +60 communities in the United States that have guided city policy. The Census’ new product, Community Resilience Estimates (CRE) for Heat, assesses vulnerability at the level of individuals and households. Finally, researchers and non-profit organizations have been developing tools that can assess risk and also aid in individual or local decision-making, such as the Climate Health and Risk Tool and Heat FactorⓇ.
Advancements in our understanding of heat’s impacts and potential interventions have not translated to sustained resources to support transformative infrastructure development. As one interviewee put it “communities that have mapped their urban heat islands are still waiting on funding opportunities to build relevant infrastructure projects”. Federal grants for mitigation and resilience may or may not consider heat resilience projects “cost-effective” and aligned with grant-making objectives, leading to rejection.
FEMA’s Hazard Mitigation Grants (HMGP), made available only after a federally-declared disaster, can only be used for extreme heat in specific circumstances and recommends that cost-effective heat mitigation projects will also “reduce risks of other hazards”. Another example, FEMA’s BRIC grant has rejected cooling centers, HVAC upgrades, and weatherization activities, all strategies with some benefit to preventing morbidity and mortality. Green infrastructure projects, with co-benefits such as flood mitigation, have been more successful, often because the BCA is based on the property-damaging hazard, flooding. Only one FEMA BRIC project has been funded with heat as the main hazard, an urban greening project in Portland, Oregon. This unknown regarding grant success can lead to communities not applying with a heat-focused project, when time could be better spent securing grants for other community priorities. FEMA’s announcement that it will fund net-zero projects, including passive heating and cooling, through its HMGP and BRIC programs and Public Assistance could shift the paradigm, yet communities will likely need more guidance and technical assistance to execute these projects.
To invest in resilience to the growing risk of heat, policymakers will need to create a dedicated and reliable funding resource. Federal stakeholders can look to the states for models. California’s Integrated Climate Adaptation and Resiliency Program’s Extreme Heat and Community Resilience grants are currently slated to allocate $118 million to 20-40 communities for planning and implementation grants over three rounds. To start, FEMA could replicate this program, similar to its specific programs for wildfires, providing $50,000 to $5 million to a wide range of heat resilience projects, and make it eligible for joint funding through BRIC. DOE’s $105 million FY25 budget request for a program for planning, development, and demonstration of community-scale solutions to mitigate extreme heat in low-income communities is a step in the right direction. If funded, the program would benefit from coordinating with FEMA’s BRIC program on high-impact solutions.
Workforce Safety And Development
Set Indoor and Outdoor Temperature Standards and Workplace Protections to Protect Human Health
Our understanding of when heat becomes risky to human health and impacts daily governance is still in development. Our interviewees shared that there is not yet consensus or agreement on the lower threshold for 1) when outdoor and indoor temperatures risks begin and 2) at what level of continued exposure should there be cause for action, such as implementing breaks for workers or deploying rapid emergency cooling to residents. For workplaces, guidelines will come soon: the Occupational Health and Safety Administration (OSHA) is set to release their heat standard for indoor and outdoor workers by the end of 2024, which will advance heat safety for workers across the country. For all other settings (such as residential settings and schools), the jury is still out on a valid threshold and a regulatory mechanism to establish it.
Enforcement of standards is necessary for realizing their full potential. In preparation for a workplace heat standard, interviewees recommended the Department of Labor create an advanced Hazard Alert System for Heat (using an evolved data standard discussed in a later section) in order to better pinpoint regulatory enforcement. Small businesses will also need help to be prepared for compliance with the new standard. DOL and the Small Businesses Administration should consider setting up a navigator program for resourcing energy-efficient, worker-centric cooling strategies, leveraging IRA funds where applicable.
Build the Extreme Heat Resilience Workforce
Extreme heat is not just a challenge to worker health, it’s also a challenge to workforce ability and capacity. As heat becomes a threat to the entire nation, many fields are needing to rapidly adapt to entirely new knowledge bases. For example, much of the health workforce, doctors, nurses, public health workers, receive little to no education on climate change and climate’s health impacts. Programs are beginning to crop up, such as Harvard’s C-Change Program, yet will need support to scale. With the federal government being the nation’s largest single source funder of graduate medical education, there are many levers at their disposal to develop, incentivize, and even require climate and health education. The U.S. Public Health Commissioned Corps is another program that could mobilize a climate-aware health workforce, placing professionals with a deep awareness of climate change’s impact on health in local communities.
The weatherization and decarbonization workforce must also be made aware and ready for heat’s growing impacts and emerging strategies to build building and community-scale resilience. While promising strategies exist for heat mitigation, such as cool walls and roofs, these interventions are largely not considered during weatherization audits and energy efficiency audits. Tax credits that have been created by the IRA/BIL could be used for interventions for passive or low-energy cooling, yet a lack of clarity prevents their uptake and implementation. For example, EPA’s EnergyStar program used to certify roofing products before the program sunsetted in 2022. Stakeholders at DOE and EPA should consider their role in workforce readiness for extreme heat, collaborating with third party entities to build awareness about these promising strategies.
Navigating all of the benefits of the IRA and BIL is challenging for resource-strapped communities and households. Program navigators for weatherization assistance and resilience could be an incredible asset to low-resource communities, and leverage IRA resources for technical assistance as well as the newly created American Climate Corps.
Finally, the federal government workforce is being stretched thin by the sheer number of new mandates in IRA and BIL. To meet the moment, agencies have used flexible hiring mechanisms like the Intergovernmental Personnel Act (IPAs) and for some offices its BIL and IRA connected Direct Hire Authority to make those critical talent decisions and staff their agencies. DOE, for example, has exceeded its goals – hiring over 1000 new employees to date. But not all agencies and offices have access to the Direct Hire Authority – and it’s set to expire anywhere between 2025 (for IRA) and 2027 (for BIL). Congress should be encouraged to expand this authority, extend it beyond 2025 and 2027 respectively, and remove the limit on the number of staff allowed. Further, agencies should be encouraged to use other flexible hiring mechanisms like IPAs and other termed positions. The federal government should have the talent needed to meet its current mandates and be prepared to solve problems like extreme heat.
Public Health, Preparedness, And Health Security
Build Healthcare System Preparedness
Years of underinvestment in preparedness have impacted U.S. health infrastructure’s surveillance, data collection, and workforce capacity to respond to emerging climate threats like extreme heat. The Administration for Strategic Planning and Response’s Hospital Preparedness Program, which prepares healthcare systems for emergencies, has had its budget reduced by 67% from FY 2002-FY2022, considering inflation. Further, the Center for Disease Control and Prevention (CDC) has seen a 20% budget reduction from FY 2002-2022. The CDC’s Climate Ready States and Cities Initiative can only support nine states, one city, and one county, despite 40 jurisdictions having applied. The Trust for America’s Health (TFAH) found increasing funding from $10 million to $110 million is required to support all states, and improve climate surveillance. The TFAH also found that an additional $75 million is needed to extend the CDC’s National Environmental Public Health Tracking Program, a program that tracks threats and plans interventions, to every state. Finally, the Office of Climate Change and Health Equity, the sole office within Health and Human Services solely dedicated to the intersection of climate and health, has yet to receive direct appropriations to support its work.
Centers for Medicare and Medicaid (CMS) and the Healthcare Resources and Services Administration (HRSA) provide critical investments to healthcare facilities, operations, care provision, and the medical workforce, yet have no publicly available programs dedicated to building climate resilience in the face of rising temperatures. The Veterans Health Administration (VHA), the largest integrated healthcare system in the U.S., includes responding to heat wave exposure in its agency Climate Action Plan and has made commitments to developing biosurveillance systems that incorporate external data on air quality, temperature, heat index, and weather as well as upgrading medical center infrastructure. This is critical as 62% of VHA medical centers are exposed to extreme heat and the VHA sees a rise in heat-related illness in the Veteran population. Given its sheer size, systems changes like this made by the VHA can drive real change in healthcare practice.
To build resilience to extreme heat within healthcare systems, our interviews and literature review highlighted that these three actions are most critical: 1) increasing surveillance and tracking of heat-related illness through improvements to medical diagnosis and coding practices and technological systems (i.e. EHRs); 2) leveraging healthcare financing for preventative treatments (i.e. cooling devices), incentives for climate-change preparedness, accurate coding and treatment, and quality care delivery (CQIs), and requirements for accreditation and reimbursements; and 3) fostering capacity-building through grants, technical assistance, planning support and guidance, and emergency preparedness.
Design Activation Thresholds for Public Health, Medical, and Emergency Responses
Despite the fact that extreme heat events have overwhelmed local capacity and triggered local disaster declarations, heat is not explicitly required in healthcare preparedness efforts authorized under the Pandemics and All Hazards Preparedness Act (PAHPA), insufficiently included or not included at all in local and state hazard mitigation plans required by FEMA, and there has yet to be a federal disaster declaration for heat. This all inhibits the deployment of federal resources to mitigation, planning, and response that states and local jurisdictions rely on for other hazards. Our interviewees recommended that there needs to be better “activation thresholds” for heat i.e. markers that the hazard has reached a level of impact that needs additional capacity and resources. Most thresholds set right now just rely on high-temperatures, not the risk factors that exacerbate the impacts of heat. Data inputs into these locally-relevant thresholds can include wet-bulb globe temperature (which accounts for humidity), heat stress risk, level of acclimatization, nighttime temperatures, building conditions and cooling device uptake, work situations, other compounding health risks like wildfire smoke, and other factors. These activation thresholds should also be designed around the most heat-vulnerable populations, such as children, the elderly, pregnant people, and those with comorbidities.
Increased transmission of viral pathogens and pathogen spread is also a growing risk of overall hotter average temperatures that needs more attention. Increased pathogen surveillance and correlation with existing climate conditions would greatly enable U.S. pandemic and endemic disease surveillance. Finally, no program to date at the Biomedical Advanced Development and Research Authority has focused on creating climate-aware medical countermeasures and the 2022-2026 strategic plan includes no mention of climate change.
Reduce Energy Burdens, Utility Insecurity, and Grid Insecurity
As temperatures rise, so do energy bills. Americans are facing an ever-growing burden of energy debt. 16% (20.9 million people) of U.S. households find themselves behind on their energy bills, increasing the risk of utility shut-offs due to non-payment. The Low Income Home Energy Assistance Program (LIHEAP) exists to relieve energy burdens, yet was designed primarily for heating assistance. Thus, the LIHEAP formulas advantage states with historically frigid climates. Further, most states use their LIHEAP budgets for heating first, leaving what remains for cooling assistance (or just don’t offer cooling assistance at all). As a result, nationally from 2001-2019, only 5% of energy assistance went to cooling. Finally, the LIHEAP program is massively oversubscribed, and can only service a portion of needy families. To adapt to a hotter world, LIHEAP’s budgets must increase and allocation formulas will need to be made more “cooling”-aware and equitable for hot-weather states. The FY25 presidential budget keeps LIHEAP’s funding levels at $4.1 billion, while also proposing expanding eligible activities that will draw on available resources. The National Energy Assistance Directors Association recent analysis found that this funding level could cut ~1.5 million families from the program and cut program benefits like cooling.
Another key issue is that 31 states have no policy preventing energy shut-offs during excessive heat events and even the states that have policies vary widely in their cut-off points. These cut-off policies are all set at the state level, and there is still an ongoing need to identify best practices that save lives. While the Public Utility Regulatory Policies Act of 1978 (PURPA) prohibits electric utilities from shutting off home electricity for overdue bills when doing so would be dangerous for someone’s health, it does not have explicit protections for extreme weather (hot/cold). Reforms to PURPA could be considered that require utilities to have moratoriums on energy shut-offs during extreme heat seasons.
Finally, grid resilience will become even more essential in a hotter climate. Power outages and blackouts during extreme heat events are deadly. If a blackout were to occur in Phoenix, Arizona during the summer, nearly 900,000 people would need immediate medical attention. Rising use of AC itself is a risk factor for blackouts due to increases in energy demand. The North American Electric Reliability Corporation (NERC), a regulatory organization that works to reduce risks to power grid infrastructure, issued a dire warning that two-thirds of the U.S. are facing reliability challenges because of heatwaves. Ensuring grids are ready for the climate to come should be top priority for DOE, the Federal Emergency Management Agency (FEMA), and the Federal Energy Regulatory Commission (FERC). Given the risks to human health, the Centers for Disease Control and Prevention (CDC) should work with public health organizations to prepare for blackouts and grid failure events.
Address Critical Needs of Confined Populations Facing Heat
Confined populations, whether because of their medical status or legal status, are vulnerable to extreme heat indoors. Long-term care facilities are required by law to keep properties within 71-81℉. Yet, long-term care facilities are reporting challenges actually meeting resident’s needs in a disaster, such as a power outage, calling for a need for more coordination with CMS.
Incarcerated populations on the other hand are not guaranteed any cooling, even as summers become more brutal. This directly leads to an increase in deaths, 45% of U.S. detention facilities saw spikes in deaths on hazardous heat days from 1982 to 2020. Despite this lack of sufficient cooling being “cruel and unusual” punishment, there has been no public activity to date from the Department of Justice to secure cooling infrastructure for federal prisons or work with state prisons to expand cooling infrastructure. The National Institute of Corrections does recommend ASHRAE 55 Thermal Environmental Conditions for Human Occupancy to corrections institutions, though this metric needs to be updated for our evolving understanding of extreme heat’s risks to human health.
Food Security And Multi Hazard Resilience
Anticipate and Prevent Supply Chain Disruptions
Hotter temperatures are changing the landscape of American and global food production. 70% of global agriculture is expected to be affected by heat stress by 2045. Recent heat waves have already killed crops and livestock en masse, leading to lower yields and even shortages for certain products – like olive oil, potatoes, coffee, rice, and fruits. Rising heat is also poised to reshape local and state economies that rely on their changing climatic capabilities to produce certain crops. Oranges, a $5 billion dollar industry for Florida, are struggling in the heat which stresses the trees and provides fertile ground for pathogens. As a result, Florida is facing its worst citrus yield since the Great Depression. A decrease in winter chill is another growing risk, as many perennial crops have adapted to certain amounts of accumulated winter chill to develop and bloom. Winter-time heat is shaking up plants’ biological clocks, decreasing quality and yield. Overall, extreme heat is impacting American household bottom lines in the short-term and long-term through heat-exacerbated earning losses and spiking food prices.
Ensuring ongoing access to critical commodity and specialty agricultural products in a future of higher temperatures is a national security priority. Resilience of products to extreme heat could be included as a future requirement in the Federal Supplier Climate Risks and Resilience Rule that governs Federal Acquisition Regulations. Further, FAS’ work scoping the federal landscape has shown there are few federal research and development programs, financial assistance opportunities, and incentives for heat resilience, and our interviewees concurred with that assessment. The U.S. Department of Agriculture (USDA) can prepare farmers for future climate risks and hotter temperatures, ensuring consistent food production and reducing the losses and needed economic pay-outs from the USDA through crop insurance and disaster assistance. The USDA can accelerate advances in biotechnology and genetic engineering to improve heat resilience of agricultural products while also encouraging practices like shade, effective water management, and soil regeneration that build system-wide resilience. As Congress continues to consider reauthorizations and appropriations for the Farm Bill, they should consider fully funding the Agriculture Advanced Research and Development Authority to advance resilient agriculture R&D while also increasing funding to the USDA Climate Hubs to support roll-out of heat resilient practices.
Connect Drought Resilience and Heat Resilience Strategies
Hotter winters have literal downstream consequences. Warming is shrinking the snowpack that feeds rivers, leading to further groundwater reliance, straining aquifers to the brink of complete collapse. Warmer temperatures also leads to more surface water evaporating, thus leaving less to seep through the ground to replenish overstressed aquifers. Rising temperatures also mean that plants need more water, as they evapotranspirate at greater rates to keep their internal temperatures in-check. All of these factors compound the growing risk of drought facing American communities. Drought, now made worse by high heat conditions, accounts for a significant portion of annual agricultural losses. 80% of 2023 emergency disaster designations declared by the United States Department of Agriculture (USDA) were for drought and/or excessive heat. Secure access to water is an escalating catastrophe, and to address it requires a national strategy that accounts for future hotter temperatures and how they will put strain on water accounts necessary to sustain agricultural production and human habitation.
Heat and dry weather/drought also combine to make prime conditions for megawildfires. The smoke then generated by these fires compounds the health impacts of extreme heat, with research showing that concurrent effects of heat and smoke drive up the number of hospitalizations and deaths. More funding from Congress is needed to improve wildfire forecasting and threat intelligence in the era of compounding hazards.
Planning And Response
Reform the Benefit-Costs Analysis
Benefit cost analysis (BCA) is a critical tool for guiding infrastructure investments, and yet is not set up to account for the benefits of heat mitigation investments. When the focus of the BCA is mitigating property damage and loss of life, it will discount impact’s that go beyond those damages such as economic losses, learning losses, wage losses, and healthcare costs. Research will likely be needed to generate the pre-calculated benefits of heat mitigation infrastructure, such as avoiding heat illness, death, and wage losses and preventing widespread power failures (a growing risk). Further, strategies that enhance an equitable response, articulated in the recent update to the Office of Management and Budget’s Circular A-4, need to be quantified. This could include response efforts that protect the most vulnerable populations to extreme heat, such as checking in on heat sensitive households identified by the CRE for Heat. Developing these metrics will take time, and should be done in partnership with agencies like the DOE, EPA, and CDC. Finally, FEMA’s BCA is often based on a single hazard, the one with the highest BCA ratio, making it more challenging to work on multi-hazard resilience. FEMA should develop BCA methods that allow for accounting of an infrastructure investment for community resilience to many hazards (like resilience hubs).
Create the “Plan” for How the Federal Emergency Management Agency and Others Should Respond to an Extreme Heat Disaster
Extreme heat’s extended duration, from a few days to several months, poses a significant challenge to existing disaster policy’s focus on acute events that damage property. An acute focus on infrastructure damages by FEMA has been an insurmountable barrier to all past attempts to declare extreme heat as a disaster and receive federal disaster assistance. Because in theory, FEMA can reimburse state and local governments for any disaster response effort that exceeds local resources, including heat waves. Our interviewees acknowledged that federal recognition that heat waves are disasters will only come with extending the definition of what a disaster is.
New governance models will need to be created for climate and health hazards like extreme heat, focusing on an adaptation forward, people-centered disaster response approach given the outsized impact of heat hazards on human health and economic productivity. Such a shift will challenge the federal government’s existing authorities authorized under national disaster law, the Stafford Act, which at this current moment does not consider “human damages” beyond loss of life. Thus, we do not see how existing infrastructure fails to provide critical function during these heat hazard events, such as secure learning, secure workplaces, secure municipal operations, secure healthcare delivery, and resultantly strains or exceeds local resources to respond. By quantifying more of these damages, there will then be an existing incentive to design responses that address current impacts and plan for and mitigate future impacts.
Finally, there are highly-risky heat disasters that we need to be executing planning scenarios for, specifically an extended power outage in a city under high-heat conditions. A power outage during the summer in Phoenix would send 800,000 people to the emergency room, which would very likely overwhelm local resources and those of all surrounding jurisdictions. There is a need for a power outage during an extended heat wave to be an included planning scenario for emergency management exercises lead by state and local governments. FEMA should produce a comprehensive list of everything a city needs to be prepared for a catastrophic power outage.
Spur Insurance and Financing Innovation
While insurance is the countries’ largest industry, few insurance products and services exist in the U.S. to cover the losses from extreme heat. The U.S. Department of the Treasury recently acknowledged this lack of comprehensive insurance for extreme heat’s impacts in its comprehensive report on how climate change worsens household finances. Heat insurance for individuals could manifest in a variety of ways: security from utility cost spikes during extreme weather events, real-estate assessment and scoring for future heat-risk, “worker safety” coverage to protect wages during extremely hot days where it might be unsafe to work, protections for household items/resources lost due an extended blackout or power outage, and full coverage for healthcare expenses caused by or exacerbated by heat waves. California is currently leading the country on thinking through the role of the insurance industry in mitigating extreme heat’s impacts, and should be a model to watch by federal stakeholders to see what can be scaled and replicated across the nation.
Further, it is important that investments made today are resilient for the climate conditions of tomorrow. The Office of Management and Budget’s November 2023 memo on climate-smart infrastructure, currently being implemented, provides technical guidance on how federal financial assistance programs can and should be invested in climate resilience. A yet unexplored financial lever for climate resilience identified in our interviews is federally-backed municipal bonds. Climate change is undermining this once stable investment, as cities and local governments struggle to pay back interest due to the rising costs of addressing hazards. The municipal bond market could price climate risk when deciding on interest payments, and give beneficial rates to jurisdictions that have done a full analysis of their risks and made steps towards resilience.
Finally, there is a need to update assessments of heat risk that are used to make insurance and financial decisions. Recent research by the DOE has found that the FEMA NRI property damage data appear to be deficient and underestimate damages when compared to published values for recent U.S. extreme temperature events. To start, FEMA should consider including metrics in its NRI that characterize the building stock (i.e. by adherence to certain building codes) and its thermal comfort levels (even with cooling devices) as well as thermal resilience.
Incorporate Future Climate Projections into Planning at All Levels
Recent research has shown that cities and counties are barreling toward temperature thresholds at which it would be dangerous to operate municipal services, affecting the operations of daily life. Yet little of this future risk is accounted for in the various planning activities (for public health, emergency preparedness, grid security, transportation, urban design, etc) done by local and state governments. Our interviewees expressed that because many plans are based on historical and current risk data, there is little anticipation of the future impacts of hotter temperatures when making current planning choices.
One example stood out around nature-based solutions (NBS): while NBS has received over a billion dollars in federal funding and is argued as an approach to mitigate extreme heat’s impacts – planners are not always considering whether the trees planted today will survive effectively in 20-30 years of warming. Reporting has shown that Southern Nevada is at risk of losing many of its shade trees due to inadequate species selection, as the trees that once thrived in this climate exceed their zones of heat tolerance.
Changes are being made to some federally-required planning processes to require assessment of future risk. FEMA’s National Mitigation Planning Program now requires state and local governments to plan for future risks caused by climate change, land use, and population change to receive emergency disaster funds and mitigation funding. While extreme heat is a noteworthy future risk, it is not explicitly required in the new guidelines. As of April 2023, only half of U.S. states had a section dedicated to extreme heat in their Hazard Mitigation Plans.
Climate.gov, operated by NOAA, was a recommended starting place for a library of future climate files that can be brought into planning processes and resilience analysis. Technical assistance and decision-making tools that support planners in making predictive analyses based on future extreme temperature conditions can help inform the effective design of resilient transportation systems, infrastructure investments, public health activities, and grids, and ensure accurate estimations of investment cost effectiveness over the measure lifetime.
Data And Indices
Set Standards for Data Collection and Analysis
While official CDC-reported deaths from heat, approximately 1670 in 2022, exceed those from any other natural hazard, experts widely agree this number is an undercount. True mortality is likely at a rate of 10,000 deaths a year from extreme heat under current climate conditions. Many factors compound this systematic undercount: hospitals often do not consider extreme heat in their hazard preparedness plans, there’s a lack of awareness around ICD-10 coding for heat illness, death attribution exacerbated/caused by heat is often attributed to other causes. Retraining the healthcare workforce and modernizing death counting for climate change will take time, our interviewees acknowledged. Thus, decision makers need better data and surveillance systems now to address this growing public health crisis. Excess deaths analysis could provide a proxy data point for the true number of heat deaths, and has already been employed by California to assess the impact of past heat waves. The CDC has utilized excess death methods in tracking the COVID-19 pandemic, and could apply this analysis to “climate killers” like extreme heat to inform healthcare system planning ahead of Summer 2024 (such as forecasting tools like HeatRisk). It will be critical to set a standard methodology in order to compare heat’s impacts in different communities across the United States. True mortality is also essential to enhancing the benefit-cost analysis for heat mitigation and resilience.
Our conversations also highlighted the data gaps that exist around counting worker injuries and deaths due to extreme heat. For work-related heat-health impacts, injuries or deaths are often only counted if there’s a hospital admission that is a required report, heat-exacerbated injuries (i.e. falls) aren’t often counted as heat-related, and harms off the job (i.e. long-term kidney impacts) go unnoticed. Studies estimate that California alone saw 20,000 heat injuries a year, while The U.S. Department of Labor (DOL) reports only 3400 injuries a year nationally. DOL could track how overall workplace injuries correlate with temperature to develop a methodology that would yield much more accurate numbers around true heat impacts.
Finally, anticipating the full risks of heat due to factors like existing infrastructure, social vulnerability, and levels of community resilience, remains a work in progress. For example, FEMA’s National Risk Index (which informs environmental justice tools like the Climate and Economic Justice Screening Tool and the Community Disaster Resilience Zones program) has notable limitations due to its reliance on previous weather data and narrow focus on mortality reduction, leading to underestimates of damages when compared to published values for recent U.S. extreme temperature events. There is a big opportunity to develop a standard data set for extreme heat risks and vulnerabilities in current and future anticipated climate conditions. This data set can then produce high-quality and relevant tools for community decision making (like FEMA’s Flood Maps) and inform federal screening tools and funding decisions.
Create Regulatory Oversight Infrastructure for Extreme Heat
There are only a few regulatory levers currently in place or in the regulatory pipeline to protect Americans from the growing heat and build more heat resilient communities. These include the temperature standards for senior living facilities set by CMS and OSHA’s upcoming heat standard. There are many more common settings: homes, schools and childcare facilities, transit, correctional facilities, and outdoor public spaces where regulations are needed. There will also need to be expanded enforcement of the regulations, including better monitoring of temperatures outdoors and indoors. HUD, EPA, and NOAA should work to identify expansion opportunities to indoor and outdoor air temperature monitoring, seeking additional funding from Congress where needed
Future regulations for mitigating extreme heat exposure can be conceptualized in the following three ways: technology standards, the required presence of a cooling and/or thermal-regulating technology, behavioral guidelines and expectations, required actions to avert overexposure, and performance standards, requirements that heat exposure cannot cross a certain threshold. These potential regulations will need to be conceptualized, reviewed, and implemented by several federal agencies, as authority for different aspects of heat exposure is fragmented across the federal government. Some examples of regulatory levers identified through our interviews (and introduced in previous sections) include:
- HUD could have standards for building performance that includes thermal comfort and safety, for its properties, backed-mortgages, and public housing it supports, as well as requirements for reducing building waste heat.
- DOE could expand its performance assessment and certification of energy efficiency products to those that also enhance thermal comfort and resilience.
- FEMA could require individuals, local governments, and state governments to do mitigation planning for extreme heat, and make resources then available to build community-scale thermal resilience.
- DOT could implement requirements for infrastructure projects to not increase urban areas UHI effects.
- EPA could further its analysis of the compounding effects of hot air and air pollution, and consider hotter air temperatures (such as those in UHIs) a risk to guaranteeing clean air.
- The Administration for Strategic Planning and Response (ASPR) could require hospital planning for surges in heat illness during heat waves to receive Hospital Preparedness Program funding.
Conclusion
Extreme heat, both acute and chronic, is a growing threat to American livelihoods, affecting household incomes, students’ learning, worker safety, food security, and health and wellbeing. While the policy landscape for addressing heat is nascent, this report offers recommendations for near and long term solutions that policymakers can consider. Complimentary to FAS’ Extreme Heat Policy Sprint, we hope this report can be a toolkit for potential realistic actions.
Heat Hazards and Migrant Rights: Protecting Agricultural Workers in a Changing Climate
KEY TAKEAWAYS
- Urgent Heat Risks: Climate change is leading to more frequent and intense heat waves, increasing the urgency for comprehensive heat safety regulations for agricultural workers.
- Vulnerable Migrant Workers: Migrant workers face heightened risks due to low wages, inadequate healthcare, and precarious working conditions. Fear of retaliation and deportation often prevents them from reporting violations.
- Economic Impact: Lack of heat safety measures endangers workers & results in significant economic costs, including lost productivity. Employers who fail to implement heat safety measures face high costs to their businesses, while investing in worker safety can yield substantial economic benefits.
- Regulatory Progress & Challenges: OSHA is developing federal heat safety regulations, with states like California and Oregon setting effective precedents. As efforts advance, the focus must shift to ensure equitable protection, particularly vulnerable groups like migrant laborers. Inclusive engagement and tailored implementation strategies are crucial to bridge gaps and create effective protections.
- Community & Stakeholder Engagement: True progress in regulation requires the active involvement of all stakeholders, including workers, employers, advocacy groups and industry leaders. Transitioning to more inclusive & direct engagement methods are essential for comprehensive worker protection.
KEY FACTS
- Farmworkers are 20x more likely to die from heat than other workers
- Heat exposure is responsible for as many as 2,000 worker fatalities in the U.S. each year
- Up to 170,000 workers in the U.S. are injured in heat stress related accidents annually. There is a 1% increase in workplace injuries for every increase of 1° Celsius
- The failure of employers to implement simple heat safety measures costs the U.S. economy nearly $100 billion every year
In 2008, Maria Isabel Vasquez Jimenez, a 17-year-old pregnant farmworker, tragically died from heatstroke while working in the vineyards of California. Despite laboring for more than nine hours in the sweltering heat, Maria was denied access to shade and adequate water breaks. Management never called 911 and instructed her fiancé to lie about the events. To this day, her death underscores the dire need for robust protections for those who endure extreme conditions to feed our nation.
This heartbreaking incident is not isolated. With the United States shattering over a thousand temperature records last year, the crisis of heat-related illnesses in the agricultural sector is intensifying. Rising global temperatures are making heat waves more frequent and severe, posing a significant threat to farmworkers who are essential to our food supply. While progress is being made towards comprehensive heat safety regulations, we must now focus on ensuring these protections are equitably implemented to safeguard all farmworkers from the intensifying threats of climate change, especially vulnerable groups like migrants. As individual stories shed light on the real-life tragedies of neglecting climate resilience, broader climate trends reveal a significant rise in these risks, affecting agricultural workers nationwide.
Climate change & agriculture
Rising Temperatures
Climate change poses significant challenges to global agricultural systems, threatening food security, livelihoods, and the overall sustainability of farming practices. Among the various climate-related hazards, rising temperatures stand out as a primary concern for agricultural productivity and worker health and safety. The Environmental Protection Agency (EPA) reports that the average temperature in the United States has increased by 1.8°F over the past century, with the most significant increases occurring in the last few decades. According to the Intergovernmental Panel on Climate Change, global average temperatures have been steadily increasing due to the accumulation of greenhouse gasses in the atmosphere, primarily from human activities such as burning fossil fuels and deforestation. This warming trend is expected to continue, critically impacting agricultural operations worldwide. The Union of Concerned Scientists predicts that by mid-century, the average number of days with a heat index above 100°F in the United States will more than double, severely impacting agricultural productivity and worker health. As the climate continues to change, the direct threats to those who supply our food become increasingly severe, particularly for farmworkers exposed to the elements.
Threats to Farmworkers
In agriculture, rising temperatures worsen challenges like water scarcity, soil degradation, and pest infestations, and introduce new risks like heat stress for farmworkers. As temperatures rise, heatwaves become more frequent, intense, and prolonged, posing serious threats to the health and well-being of agricultural workers who perform physically demanding tasks outdoors. Heat stress can lead to heat-related illnesses such as heat exhaustion and heatstroke, which can be life-threatening if not properly managed. Prolonged exposure to high temperatures can impair cognitive function, reduce productivity, and increase the risk of accidents and injuries in the workplace. According to the Public Citizen, from 2000 to 2010, as many as 2,000 workers died each year from heat-related causes in the United States, while farmworkers are 20 times more likely to die from heat-related illnesses than other workers.
Given the critical role of agricultural workers in food production and supply chains, protecting their health and safety in the face of escalating heat risks is critical. Comprehensive heat safety standards and regulations are essential to mitigate the adverse impacts of climate change on farmworkers and ensure the sustainability and resilience of agricultural operations. By implementing comprehensive heat safety measures such as heat acclimatization guidelines, shade access, and regular rest breaks, agricultural employers can minimize the risk of heat-related illnesses and injuries. Effective heat standard implementation requires collaboration among policymakers, industry stakeholders, and worker advocacy groups to address climate change challenges and protect agricultural workers. Beyond the direct effects of heat, farmworkers also face compounded environmental hazards that further jeopardize their health and safety.
Compounded Hazards
While the focus of this discussion is on heat safety regulations, it’s important to recognize that these regulations intersect with broader environmental and health challenges faced by agricultural workers. High temperatures often coincide with wildfire seasons, leading to increased exposure to wildfire smoke. This overlap amplifies health risks like respiratory and cardiovascular diseases, disproportionately affecting workers with vulnerable conditions. Effective protection against these compounded hazards requires coordination among policymakers and industry leaders. Comprehensive standards and holistic safety measures are crucial to mitigate the risks associated with heat and to address the broader spectrum of environmental pollutants. While environmental hazards are a significant concern, the specific vulnerabilities of migrant workers introduce additional layers of risk and complexity.
Challenges faced by migrant workers
Recognizing these challenges is only the first step; next, we must assess how current protections measure up and where they fall short in safeguarding these vulnerable populations.
Understanding the Vulnerabilities
Migrant agricultural workers face socioeconomic, legal, and environmental challenges that increase their vulnerability to heat hazards. Economically, many migrant workers endure low wages and lack access to adequate healthcare, which complicates their ability to cope with and recover from heat-related illnesses. A study by the National Center for Farmworker Health found that 85% of migrant workers earn less than the federal poverty level, making it difficult for them to access necessary medical care. Legally, the fragile status of many migrant workers, including those on temporary visas or without documentation, exacerbates their vulnerability. These workers often hesitate to report violations or seek help due to fear of retaliation, job loss, or deportation.
Harsh Working Conditions
Additionally, migrant workers frequently labor in conditions that provide minimal protection against the elements. Excessive heat exposure is compounded by inadequate access to water, shade, and breaks, making outdoor work particularly dangerous during heatwaves. Furthermore, many migrant workers return after work to substandard housing that lacks essential cooling or ventilation, preventing effective recovery from daily heat exposure and exacerbating dehydration and heat-related health risks. According to the National Center for Farmworker Health, about 40% of migrant farmworkers in the United States live in homes without air conditioning.
Barriers to Protection
The barriers to effective heat protection for migrant workers are extensive and complex, which may prevent them from accessing crucial protections and resources, including:
Language Diversity. The migrant worker community is incredibly diverse, encompassing individuals from various cultural and linguistic backgrounds. In the U.S. agricultural sector, over 50% of workers report limited English proficiency. This diversity may present a significant challenge to understand their rights and the safety measures available to them. Even when regulations and protections are in place, the communication of these policies often fails to reach non-English speaking workers effectively, leading to misunderstandings that can prevent them from advocating for their safety and well-being. The National Agricultural Workers Survey reports that 77% of farmworkers in the United States are foreign-born, with 68% primarily speaking Spanish, highlighting the language barriers that complicate effective communication of safety regulations.
Vulnerable Visas & Immigration Status. Visa statuses and undocumented immigration also play a critical role in the vulnerability of migrant workers. Workers holding temporary visas, such as H-2A visas, often face precarious employment conditions because these visas tie them to specific employers, limiting their ability to assert their rights without fear of retaliation. Undocumented workers are particularly susceptible to exploitation and abuse by employers who may use their immigration status as leverage. Fear of deportation and legal repercussions further discourages reporting workplace incidents, perpetuating a cycle of exploitation and vulnerability.

Undocumented workers are particularly susceptible to exploitation and abuse by employers who may use their immigration status as leverage
Farmworker Housing. Farmworker housing often lacks proper cooling or ventilation, increasing heat exposure risks during off-work hours. Many agricultural workers live in substandard housing characterized by overcrowding, poor insulation, and inadequate access to air conditioning or ventilation systems. Poor living conditions worsen heat-related illnesses, particularly during extreme weather. Limited access to cooling amenities after long hours of outdoor labor exacerbates heat stress and heightens the health risks associated with heat exposure.
Recognizing these challenges is only the first step; next, we must assess how current protections measure up and where they fall short in safeguarding these vulnerable populations.
Review of existing protections
Federal Efforts
Currently, there is no overarching federal mandate specifically addressing heat exposure, leaving significant gaps in worker protection, especially for vulnerable populations like migrant workers. However, the federal government has taken several critical steps to address heat safety in the interim. OSHA has moved beyond relying solely on the General Duty Clause, launching a National Emphasis Program that prioritizes inspections on high-heat days and increases outreach in vulnerable industries. The Biden administration’s Heat Hazard Alert in July 2023 further emphasized employers’ responsibilities, while the initiation of a federal heat standard through OSHA’s rulemaking process signals a commitment to sweeping, nationwide protections.
These efforts reflect progress but it’s crucial that these federal efforts evolve to address the unique challenges faced by workers, ensuring that no one is left behind in the implementation of heat safety measures. The true test of these regulations will be their ability to safeguard those most at risk, bridging gaps in protection and creating a more resilient workforce in the face of rising temperatures.
State-Level Protections
At the state level, the scenario is mixed, with states like California, Washington, and Oregon having implemented their own heat safety regulations, which provide a model for other states and potentially for federal standards. Oregon’s regulations, for instance, require employers to provide drinking water, access to shade, and adequate rest periods during high heat conditions. These measures are designed not just to respond to the immediate needs of workers but also to educate them on the risks of heat exposure and the importance of self-care in high temperatures. When Oregon implemented stricter heat safety standards, it saw a significant reduction in heat-related illnesses reported among agricultural workers. By requiring more frequent breaks, adequate hydration, and access to shade, Oregon’s regulations demonstrate how well-designed policies can decrease the incidence of heat stress and related medical emergencies. California has also taken a comprehensive approach with its Heat Illness Prevention Program, which extends protections to both outdoor and indoor workers, reflecting the broad scope of heat hazards. This program is noted for its requirements, including training programs that educate workers on preventing heat illness, emergency response strategies, and the necessity of acclimatization.
Legislative Challenges & Need for Unified Approach
Conversely, legislative actions in states like Florida and Texas represent a significant challenge to advancements in occupational heat safety. For example, Florida’s HB 433, recently signed into law, expressly prohibits local governments from enacting regulations that would mandate workplace protections against heat exposure. This legislation stalls progress and endangers workers by blocking local standards tailored to the state’s specific needs.
The contradiction between states pushing for more stringent protections and those opposing regulatory measures illustrates a fragmented approach that could undermine worker safety nationwide. Without a federal standard, the protection a worker receives is largely dependent on state policies, which may not adequately address the specific risks associated with heat exposure in increasingly hot climates. This patchwork of regulations underscores the importance of a unified federal standard that could provide consistent and enforceable protections across all states, ensuring that no worker, regardless of geographical location, is left vulnerable to the dangers of heat exposure.
With an understanding of the gaps in current heat safety regulations, the next crucial step is fostering effective stakeholder engagement to drive meaningful changes.
Engaging Stakeholders: Beyond Public Comment
While progress has been made in recognizing the need for heat safety regulations, we must now focus on ensuring equitable representation in the policy-making process. Traditional engagement methods have often fallen short in capturing the voices of those most impacted by these policies, particularly vulnerable groups like migrant agricultural workers. Regulatory agencies must rethink their strategies to include more direct and inclusive approaches, empowering workers to contribute meaningfully to policies that directly affect their safety and well-being.
Challenges in Traditional Engagement
The traditional approaches to stakeholder engagement, particularly in regulatory settings, often rely heavily on formal mechanisms like public comment periods. While these methods are structured to gather feedback, they frequently fall short of engaging those most impacted by the policies—namely, the workers themselves. Many workers, especially in labor-intensive sectors like agriculture, may not have the time, resources, or knowledge to participate in these processes. Relying on online submissions or weekday meetings during work hours can exclude many workers whose insights are crucial for shaping effective regulations. A survey conducted by the Migrant Clinicians Network found that fewer than 10% of migrant workers had participated in any form of public comment or feedback process related to workplace safety.
The complexities of these workers’ lives—ranging from language barriers to fear of retaliation—mean that conventional engagement strategies may not effectively reach or address their concerns. This gap highlights a critical need for regulatory bodies to rethink and expand their engagement strategies to include more direct and inclusive methods.
As we push for broader and more inclusive engagement, we must also consider systemic improvements that can solidify these efforts into lasting safety standards.
Looking Forward: Systemic Improvements & Community Collaboration
Protecting migrant workers from extreme heat requires systemic improvements and a coordinated approach to address gaps in current regulations and foster collaborative efforts among stakeholders. By combining the strengths of government agencies, employers, and community advocates, we can develop robust solutions of heat safety which protect the well-being of vulnerable workers while supporting the productivity and resilience of the agricultural industry.
Systemic Changes Needed
To effectively protect migrant workers from the dangers of extreme heat, systematic changes are required. On the regulatory side, this includes boosting the human resources and funding available to agencies like OSHA to ensure they can effectively implement and enforce new heat safety standards. Building robust infrastructure for enforcement and consultation is crucial, as is ensuring these bodies can handle the demands of new regulatory programs. From the employer and industry perspective, federal support is essential. Incentives such as tax breaks or reimbursement programs similar to those provided under the Families First Coronavirus Response Act during the COVID-19 pandemic could motivate employers to adhere more strictly to safety standards, knowing they can recoup some costs associated with implementing safety measures like paid sick leave.
Fostering a Safe Reporting Culture
Creating a workplace that encourages safe and open communication is vital. Employers must be encouraged to establish non-retaliatory policies and to offer regular training sessions that educate workers about their rights and the importance of reporting safety violations. Reporting mechanisms should protect employee anonymity to reduce fear of retaliation. These practices can improve safety, while also enhancing worker retention and morale, contributing to a healthier workplace culture.
Role of Community & Grassroots Advocacy
Grassroots organizations and community advocates play a pivotal role in shaping and enforcing heat safety regulations. These groups often have direct insights into the needs and challenges of workers on the ground and can help tailor educational and enforcement strategies to the community context. Collaborations with these organizations can facilitate the delivery of multilingual training and legal assistance, ensuring that workers are well-informed about their rights and the safety measures in place to protect them. Additionally, these partnerships can help to monitor compliance and gather grassroots feedback on the efficacy of the regulatory measures. A notable example is the partnership between California Rural Legal Assistance and local farming communities to develop heat stress prevention training tailored to the languages and cultures of the workers. This program has improved knowledge and awareness of heat stress risks among workers, and has also empowered them to take proactive steps in managing their health during extreme conditions. Evaluations of this initiative show a marked improvement in both the adoption of safety practices and worker satisfaction, highlighting the importance of community-driven approaches in policy implementation.
To support these systemic changes, strategic investments are essential, not only to enhance regulatory capacity but to ensure the long-term health and productivity of the agricultural workforce.
The Power of Investment
Investing in heat safety offers strategic, far-reaching benefits for both workers and employers alike. By funding regulatory frameworks and workplace safety programs, organizations can effectively mitigate the impact of heat-related illnesses and injuries. Such investments can enhance regulatory agencies’ capacity to enforce standards while creating safer, more productive work environments that benefit businesses and employees. An investment approach to heat safety strengthens economic sustainability, worker well-being, and industry compliance.

By funding regulatory frameworks and workplace safety programs, organizations can effectively mitigate the impact of heat-related illnesses and injuries.
Envisioning Enhanced Regulatory Capacity
In the pursuit of more effective heat safety regulations, one critical aspect overlooked is the role of increased investment in regulatory agencies like OSHA. An addition of resources into these bodies is not merely a bureaucratic expansion but a potential lifesaver. Research consistently demonstrates that increased funding for regulatory enforcement can significantly enhance compliance and improve safety outcomes. This investment empowers agencies to provide greater education and outreach, conduct more inspections, and enforce compliance more effectively, which are essential for protecting workers from heat-related hazards. Enhancing the capacity of organizations like OSHA to enforce heat safety standards saves lives, while supporting economic efficiency and sustainability in labor-intensive industries. These investments ensure that safety regulations evolve from paper to practice, significantly impacting the lives of those they are designed to protect.
Economic Benefit
Economic analyses further support the notion that investing in worker safety is not just a cost but a strategic benefit. Studies show that every dollar spent on improving workplace safety yields substantial returns in reducing the costs of workplace injuries and deaths. For instance, implementing stringent heat safety measures not only reduces the incidence of heat-related illnesses but also cuts down on associated costs such as medical expenses, workers’ compensation, and lost workdays. This is particularly relevant in sectors like agriculture, where the physical nature of the work increases vulnerability to heat stress. The economic benefit for employers extends beyond direct cost savings. Maintaining a safe work environment enhances a company’s reputation, aids in employee retention, and increases productivity. Workers are more likely to stay with an employer they trust to prioritize their health and safety, which is crucial in industries facing labor shortages. A culture that encourages reporting and promptly addresses safety concerns can significantly reduce the risk of severe injuries and fatalities, further lowering potential liabilities and insurance costs.
Employer Benefit
A compelling example of the benefits of proactive safety measures is the Gold Star Grower Program in North Carolina. This program recognizes agricultural employers who provide housing that meets and exceeds the requirements of the Migrant Housing Act of North Carolina. This recognition serves as a badge of honor, indicating to potential employees that these employers value worker well-being. Reports suggest that workers actively seek out employers with this certification, preferring to work in environments where their health and safety are a priority. A preference like this can drive more growers to participate in safety programs, fostering a broader culture of safety and compliance within the industry.
Call for Collaborative Action
As the climate crisis continues, so does the threat of heat exposure to agricultural workers, posing grave risks to their health and to the core of our food supply systems. The necessity for comprehensive heat safety measures is now both urgent and undeniable.
Governments at every level, employers across industries, community groups, and the workers themselves must unite to create resilient, practical strategies that prioritize safety and health. The cost of inaction is stark, exceeding $100 billion annually— not only affecting the economy but leading to the irreplaceable loss of life and well-being.
We are at a critical juncture which demands a unified, strong response to heat hazards. By adopting systemic improvements and fostering a culture of collaboration and proactive communication, we have the opportunity to safeguard those most vulnerable to the impacts of climate threats.
As we progress towards implementing rigorous heat safety regulations, our focus must now shift to ensuring these protections reach all workers equitably. Let’s mobilize, from grassroots movements to national policy reforms, to create inclusive implementation strategies that protect our most vulnerable workers, particularly migrants, and secure our collective future.
For resources on how you can support these critical efforts, please refer to the guides provided in Appendix A and B, which offer strategies for advocacy, community engagement, and policy development. Together, our collective efforts can protect our most vulnerable and build a resilient path forward in the face of climate change.
APPENDIX A: RESOURCE GUIDE
Further information and support on heat-related safety and worker rights
Resources for Migrant Workers
- National Center for Farmworker Health (NCFH) – Provides health information and advocacy resources for farmworkers. Website: ncfh.org
- Farmworker Justice – Legal support and resources focusing on improving living and working conditions for migrant farmworkers. Website: farmworkerjustice.org
- Heat Stress Prevention Training Materials – Educational resources provided by the Occupational Safety and Health Administration (OSHA). Website: OSHA Heat Stress
- Legal Aid Justice Center – Provides legal aid for low-income individuals, including migrant workers, focusing on civil rights and employment issues. Website: justice4all.org
- Migrant Clinicians Network (MCN) – Offers tools and training for clinicians serving migrant communities. Website: migrantclinician.org
Resources for Employers
- OSHA’s Heat Illness Prevention Campaign – Resources to help employers prevent heat illness in outdoor workers. Website: OSHA Campaign
- AgSafe – Organization offering training, consulting, and resources aimed at ensuring the safety and health of agricultural workers. Website: agsafe.org
- Gold Star Grower Program – North Carolina Department of Labor’s recognition program for employers who exceed migrant housing regulations. Website: NC Dept. of Labor
- Protecting Workers: Guidance on Mitigating and Preventing the Spread of COVID-19 in the Workplace – While specific to COVID-19, this guide from OSHA includes valuable information on maintaining a healthy workplace that can apply to heat safety.
Resources for Policymakers
- National Institute for Occupational Safety and Health (NIOSH) – Research and guidelines on occupational safety, including heat-related risks. Website: CDC – NIOSH
- U.S. Environmental Protection Agency (EPA) – Worker Protection Standard – Regulations designed to protect farm workers from pesticide exposures but can be extended to other environmental risks. Website: EPA Worker Protection
- Congressional Research Service Reports – Provides detailed reports and analysis useful for policymakers on various topics, including agricultural worker safety and climate impacts. Website: CRS Reports
- Rural Health Information Hub – Offers resources to improve healthcare and access to healthcare services in rural communities, which can include migrant farmworkers. Website: Rural Health Info
APPENDIX B: ACTION GUIDE
Support Legislative Changes
- Join Advocacy Campaigns: Engage with organizations like the United Farm Workers (UFW) and Farmworker Justice, which are actively lobbying for stronger heat protection laws. Sign up for their newsletters and participate in their advocacy campaigns.
- Contact Your Representatives: Urge your local, state, and federal representatives to support comprehensive heat safety standards and improved working conditions for agricultural workers. Personalized letters, emails, and phone calls can make an impact.
- Petition for Change: Sign and share petitions calling for better heat safety regulations and protections for migrant workers. Platforms like Change.org often host relevant petitions that need public support.
Participate in Advocacy Efforts
- Volunteer Your Time: Volunteer with grassroots organizations and advocacy groups that are working directly with farmworkers. Your involvement can help amplify their efforts and bring about meaningful change.
- Educate and Raise Awareness: Use social media platforms to spread awareness about the issue. Share articles, statistics, and personal stories to educate your network and encourage others to take action.
- Support Community Initiatives: Donate to or partner with local nonprofits that provide resources and support to farmworkers. Organizations like the National Center for Farmworker Health and Migrant Clinicians Network rely on community support to continue their vital work.
Engage in Policy Development
- Attend Public Hearings and Forums: Participate in public hearings and forums hosted by regulatory bodies like OSHA. Your voice and presence can influence policy decisions and ensure that the needs of agricultural workers are addressed.
- Collaborate with Employers: If you are an employer or part of an agricultural business, collaborate with worker advocacy groups to implement and promote heat safety measures. Encourage a culture of safety and open communication within your organization.
A Guide to Public Deliberation
Science is advancing at an unprecedented speed, and scientists are facing major ethical dilemmas daily. Unfortunately, the general public rarely gets opportunities to share their opinions and thoughts on these ethical challenges, moving us, as a society, towards a future that is not inclusive of most people’s ideas and beliefs. Scientists regularly call for public engagement opportunities to discuss cutting-edge research. In fact, “71% of scientists [associated with the American Association for the Advancement of Science (AAAS)] believe the public has either some or a lot of interest in their specialty area.” Sadly, scientists’ calls often go unnoticed and unanswered, as there continue to be inadequate mechanisms for these engagement opportunities to come to fruition.
To Deliberate or Not to Deliberate
Public deliberation, when performed well, can lead to more transparency, accountability to the public, and the emergence of ideas that would otherwise go unnoticed. Due to the direct involvement of participants from the public, decisions made through such initiatives can also be seen as more legitimate. On a societal level, public deliberation has been shown to encourage pluralism among participants.
Despite the importance of deliberation, it’s important to note that it is not always the best way to engage the public. Planning a public deliberation event — a citizens’ panel, for instance — takes a large amount of time and resources. Plus, incentivizing a random sample of citizens to participate (which is considered the gold standard of deliberation) is difficult. It’s therefore paramount to first assess whether the topic of focus is suitable for public deliberation.
To assess the appropriateness of a deliberation topic, consider the following criteria (inspired by criteria set forth by Stephanie Solomon and Julia Abelson and the Kettering Foundation):
- Does the issue involve conflicting public opinions? Issues that involve setting priorities in healthcare, for example, may benefit from public deliberation as there is no singular correct answer; deliberation may offer a more clear and holistic view of what is best for a community, according to the community.
- Is the issue controversial? If so, deliberation can be a good tool as it brings many opinions into view and can foster pluralism as mentioned previously.
- Does the issue have no clear-cut solution and is “intractable, ongoing, or systemic”?
- Do all available solutions have significant drawbacks?
- Does the community at large have an interest in the problem?
- Would the discussion of the issue benefit from a combination of expert and real-world experience and knowledge (what Solomon and Abelson call “hybrid” topics)? Certain issues may solely require technical knowledge but many issues would benefit from the views of the public as well.1
- Are citizens and the government on the same page about the issue? If not, public deliberation can foster trust, but only if the initiative is done with the intention of taking the public’s conclusions into account.
Setting Goals
If it’s deemed that the topic is suitable for public deliberation, the next step is to set goals for the public deliberation initiative. Julia Abelson, Lead of the Public Engagement in Health Policy Project and Professor at McMaster University, has explained that one of the significant differentiating factors between successful and unsuccessful initiatives is thoughtful planning and organization — including setting clear goals and objectives organizers would like to meet by the end of deliberation. Having an end goal not only helps with planning but also allows for a realistic goal to be shared with deliberation participants. Setting unrealistic expectations as to what the deliberation process is meant to achieve — and subsequently not achieving those goals — will lead participants and citizens, in general, to lose trust in the deliberation process (and organizational body).
Is the goal of deliberation to bring new ideas into view and share those with relevant agencies (governmental or otherwise)? Is the goal instead to enact change in current policies? Is the goal to help shape new policies? The aforementioned Citizens’ Reference Panel on Health Technologies in Canada did not directly impact the government’s decisions, but served to make experts aware of a viewpoint they had not previously explored. This is in contrast to the typical “sit and listen” initiatives that don’t have as much of a capacity to encourage new ideas to emerge. In another instance, a citizens’ jury in Buckinghamshire, England was formed to discuss how to tackle back pain in the county. The Buckinghamshire Health Authority promised to implement the citizens’ recommendations (as was mandated by a charity that was supporting this public deliberation effort) — and they did.
Expanding on the idea of making promises and accountability, it’s important for the organizing body — which may or may not include a federal agency — to consider its role in implementing the conclusions of the deliberation. Promising to implement the conclusion of the deliberations can serve to invigorate discussion and make participants more engaged, knowing that their discussions can have a direct impact on future decisions. For instance, the British Columbia Biobank Deliberation involved a “commitment at the outset of the deliberation from the leaders of a proposed BC BioLibrary (now funded by the Michael Smith Foundation for Health Research) that the Bio-Library’s policy discussions would consider suggestions from this deliberation.” Researchers have suggested this may have contributed to participants’ interest in the deliberation event. Despite some examples of implementation following deliberation (such as the Buckinghamshire and Ontario examples), there continues to be a lack of adequate change based on the public’s recommendations. One other instance comes from NASA’s 2014 efforts to involve the public in the discussion around planetary defense (in the context of asteroids) through a participatory technology assessment (PTA). It seems that the PTA helped to spur the creation of NASA’s Planetary Defense Coordination Office.
Furthermore, providing updates on implementation to participants, and the public at large, would provide another crucial aspect of accountability: “explanations and justifications.” However, these updates on their own would not fulfill an organization or agency’s duty to accountability as that requires an active dialogue with the public (which is precisely why implementing the conclusions of public deliberation initiatives is important).
When to Deliberate: Agenda Setting for Citizens
As mentioned above, deliberation can happen at various points during the policymaking pipeline. It has become increasingly popular to include the public early on in the process, such as in an agenda-setting role. This allows the public not only to engage in discussions about a topic but to also set the priorities and frame how the discussions will move forward. As Naomi Scheinerman writes, “with proper agenda setting and precedent creation, the resulting […] questions would be more reflective of what the public is interested in discussing rather than of the companies, industries, and other stakeholder groups.”
A trailblazing model in citizen agenda-setting has been the Ostbelgien Model. The model involves both a permanent Citizens’ Council and ad hoc Citizens’ Panels. Though the members of the Citizens’ Council rotate (and are chosen randomly), one of the permanent roles of the Council is to select topics for the ad hoc Citizens’ Panels, with citizens having a direct hand in what issues their fellow citizens and government should tackle. Since its inception in 2019, the Citizens’ Council has asked Citizens’ Panels to tackle issues such as “how to improve the working conditions of healthcare workers” and “inclusive education.”
Framing
One of the pillars of the success of public deliberation is a well-scoped question that is framed appropriately. Issues that are framed unfairly, meaning they place emphasis on a specific part of the issue while ignoring others, can lead to inaccurate results and a loss of trust between the public and the organizers. Though this depends on the goals of the deliberation, it’s often best for questions to be specific in their scope to allow for concrete results at the end of the deliberation initiative. For example, an online deliberation session in New York City aimed to assess the public’s views on who should be given priority access to COVID-19 vaccines. One of the questions asked participants to rank the order in which they think a pre-specified list of essential workers should get access to the vaccine. This allows for discussion while retaining a clear focus.
Another example comes from climate change. Climate change can be framed in many ways — through an economic frame, a public health frame, a justice frame, and others. These various framings impact how the public reacts to the issue; in the case of the economic frame, it has led to “political divisiveness.” Focusing instead on the public health frame, for instance, led to greater agreement on policy decisions. Similarly, according to a 2023 policy paper from the Organisation for Economic Co-operation and Development (OECD), an issue like COVID-19 can be less polarizing if the framing used is about solutions to the pandemic rather than solely vaccines. Importantly, the organizers of the public deliberation initiative do not have sole control over the framing of the issue. Citizens often have a pre-existing “frame of thought.” This makes frames tricky yet essential in making it possible to appropriately and productively deliberate a topic.
Framing is implicit in that participants in deliberation are not aware of it, making it all the more crucial to be wary of the framing. Thus, it becomes clear how seemingly unimportant factors, such as setting, also affect deliberation. According to Mauro Barisione, the framing of the setting includes:
- Who is promoting the event and who the sponsors are: Whether the public trusts the promoters/sponsors and the feelings they associate with them (including any explicit views the sponsors promote) may impact the framing of the topic at hand.
- Where is/are the deliberative session(s) being held “(i.e., institutional, academic, civil society or business organization, etc.)”: Apart from accessibility concerns, the location of the deliberative session(s) sends implicit messages to participants who may have previously had negative experiences at governmental institutions, for instance.
- Who are the witnesses and experts brought on as part of the project: This point is most obvious. Especially when it comes to less well-known topics, witnesses and experts may be the first window into a new topic for participants. A biased selection of experts (or including experts who provide unscientific evidence) can significantly impact deliberation. Moreover, an oft-neglected point worth considering is the diversity of expert witnesses who present relevant information at public deliberation events. Selecting experts is discussed in more detail in a later section, but it’s worth noting here that a diverse group of expert presenters will ensure a diversity of views and may aid in building trust with participants as well.
Selecting a Type of Public Deliberation
Another factor that merits attention at this point is the type of public deliberation being undertaken. Though public deliberation has been referred to as one entity thus far, there are many different types, including, but not limited to, citizens’ juries, planning cells, consensus conferences, citizens’ assemblies, and deliberative polls. Below are some further details about various types of public deliberation (where a source is not included below, it was adapted from Smith & Setälä).
Citizens’ juries
- Description: “In common with the legal jury, citizens’ jury assumes that a small group of ordinary people, without special training, is willing and able to take important decisions in the public interest” (Coote & Mattinson)
- Selected jurists are meant to represent a microcosm of their community (Crosby)
- Jurors are selected through a quota process that often takes into account demographics (e.g., age, gender, education and race) or jurors’ prior opinions about the question at hand (Crosby)
- Neutral moderator moderates discussions (Crosby)
- Jurors are able to determine what questions are asked of the witnesses
- Similar to planning cells (below), jurors are paid for the time (e.g., $75 per day)
- Number of participants: 12–24
- Time: 2-4 days (Armour)
- Output: Jurors put together a written report of their conclusions (Armour)
- Examples: Ned Crosby of the independent Jefferson Institute in the US initially ran them with groups of 12-24 people (Smith & Wales)
- Another example is the Oregon Citizens’ Initiative Review
Planning cells
- Description: Deliberation includes 3 phases (Participedia):
- Citizens are presented with pertinent information from multiple perspectives and ask clarifying questions
- Cells are broken up into smaller groups of 5 and work together to come up with recommendations regarding the topic
- The full cell reconvenes and each mini group presents their top recommendation, which is then assessed by all members of the group, until some final recommendations are made
- Number of participants: 25 in each cell (multiple cells running simultaneously or one after another); often have 6-10 planning cells total (Dienel)
- Time: 3-4 days (Dienel)
- Each day is split into four chunks of time, each spent on a different “thematic focus”
- The final day is used to summarize the participants’ thoughts and come to conclusions (Planning Cells Database)
- Output: Citizens’ report
- Created by the planning committee based on quantitative data aggregated from all the cells (for instance, from how participants responded to which option they favored) (Dienel)
- The first draft of the report may also be discussed with participants in a follow-up meeting
- Example: Cologne Town Square
Consensus conferences/citizens’ conferences
- Description: These conferences have been described as a citizens’ jury plus a town hall meeting (Einsiedel & Eastlick)
- Citizens learn some basic facts about the issue at hand and formulate questions they’d like addressed (Kenyon). In other words, the topic is chosen by the organizers while the concrete problems are selected by the citizens (Nielsen et al.).
- Witnesses are called to answer these questions (Kenyon) and the experts are selected entirely by citizens (Nielsen et al.)
- A steering group is often involved who ensures that “the conference is balanced and just” and provides support to the organizers (Nielsen et al.)
- Members include “scientists, representatives of non-governmental organizations, policy-makers and others, who are engaged and informed concerning the topic”
- It’s recommended that fact sheets be prepared that explain basic definitions necessary for understanding the topic as well as pros and cons for all sides — this will be provided to participants in preparation for discussions and the questioning of expert witnesses (Nielsen et al.)
- Introductory materials should also clearly explain the role of the consensus conference participants and how their contributions will be used to inform future decisions
- Number of participants: 10-24
- Time: 4 days
- Output: Report written by citizen participants (Kenyon)
- Example: Danish Board of Technology consensus conferences
Citizens’ assemblies
- Description: Typically larger and longer than the deliberative processes discussed above, similar to legislatures composed of regular citizens (Lacelle-Webster & Warren)
- Organizers often aim to make assemblies as representative of the public as possible through methods like stratified random sampling (Lacelle-Webster & Warren)
- More limited participation opportunities so some suggest that it’s better to call citizens’ assemblies “representative institutions” rather than a form of participatory democracy (Lacelle-Webster & Warren)
- Participants compensated for their time (Ferejohn)
- Based on the initial citizens’ assembly in British Columbia, any interested expert witnesses or groups were able to testify in front of the participants (Ferejohn)
- Participants can ask any questions of the witnesses (Ferejohn)
- A Chair — who was the only public official directly involved in the process — is able to make obligatory rulings as to how the deliberations would proceed
- Organizers set the agenda
- Formats can vary: the inaugural 2004 British Columbia citizens’ assembly was more so in the format of a legislature as opposed to an Irish citizens’ assembly which included roundtable discussions with groups of 7-8 participants alongside a facilitator and notetaker at each table (Farrell et al.)
- Number of participants: 99-150
- Time: Meetings over several months (often during weekends) (Farrell et al.)
- Output: Report and recommendations from citizens’ assembly (Ferejohn)
- Example: The 2004 British Columbia Citizens’ Assembly
Deliberative polls
- Description: Combines traditional polling with the benefits of participatory processes
- Randomly selected participants are polled on their opinions about a topic (Stanford Deliberative Democracy Lab)
- Participants receive introductory materials about the issue prior to the event (Stanford Deliberative Democracy Lab)
- A weekend-long event where participants learn more about the issues from experts (ask questions from experts) and discuss issues with one another with trained facilitators present (Stanford Deliberative Democracy Lab)
- At the end, complete the questionnaire from earlier once more (Stanford Deliberative Democracy Lab)
- Participants often paid for their time ($75-$200) (Participedia)
- Number of participants: 200+
- Time: One weekend (Stanford Deliberative Democracy Lab)
- Output: Differences measured between the opinions of participants pre- and post-deliberative polling process (Stanford Deliberative Democracy Lab)
- Example: 2008 deliberative poll on housing shortages in San Mateo County, California
A note on online deliberation
The COVID-19 pandemic forced many initiatives to shift to a fully online modality. This highlighted many of the opportunities as well as challenges that online deliberation presents. One consideration is accessibility, a double-edged sword when it comes to deliberation. Virtual deliberation alleviates the need for a venue or hotel accommodations — decreasing costs for organizers — and may allow participants to continue to go to work at the same time. However, difficulties with using technology and a lack of access to a device or an internet connection are drawbacks. Another opportunity presented by virtual deliberation is to provide more balanced viewpoints on the topic of deliberation. For instance, there are no geographical barriers as to the experts organizers can invite to speak at an event.
A concern somewhat unique to online deliberation is data privacy and security. While this can also be an issue with in-person initiatives, many tools that participants are familiar with and may prefer to use do not have robust security.
A note on cost
While the cost of many deliberation initiatives is not publicly available, the available estimates range from $20,000 (citizens’ jury) to $95,000 (consensus conference) to $2.6 million (Europe-wide deliberative poll of 4300 people) to $5.5 million (citizens’ assembly). Note that these costs come from a range of time points and locations (though they have been adjusted for inflation) and only serve as rough estimates. A major contributor to these costs, particularly for longer deliberative initiatives, is hotel or venue costs as well as the reimbursement of participants. This reimbursement is costly but a part of the founding philosophy of many types of deliberation, including that of planning cells.
Selecting Participants
Many different approaches can be taken to selecting participants for deliberative forums. Unfortunately, there are inherent trade-offs in selecting a sampling method or approach. For instance, random sampling is more in line with the principle of “equal opportunity” and may promote “cognitive diversity”— the diversity of ideas, experiences, and approaches participants bring to the event — but is prone to creating deliberation groups that are not representative of the population at large. This is particularly true when the deliberative forum has few participants. This is why, depending on the type of deliberation event (and therefore number of participants chosen), a different type of sampling may be appropriate.
Another approach is random-stratified sampling, where participants are randomly chosen and invited to participate in the deliberative event. There is often an unequal distribution among those who accept the invitation — for instance, individuals with higher socio-economic statuses may respond disproportionately more. In this case, a more representative sample may be chosen from those who responded. Quotas may also be set, such as ensuring that a certain number of female-identifying participants are included in a deliberative event. For this method, the organizers must decide on groups of individuals who are primarily affected by the topic being discussed, as well as groups often excluded from such deliberations. A deliberative forum on immigration, for instance, may call for the presence of a participant who is an immigrant to ensure polarization does not take place. In certain instances, purposive sampling — where individuals from groups whose views are specifically being sought are purposefully chosen — may also be appropriate. Furthermore, some researchers suggest including a “critical mass” of individuals from typically underserved groups. This can serve to make participants more comfortable in speaking up, ensure that the diversity of discussions is retained when participants are broken up into smaller groups (in certain forms of public deliberation), and provide a step in avoiding tokenism.
Furthermore, there are newer methods of selecting participants that combine both random and stratified sampling — namely algorithms that try to maximize both representation and equal opportunity of participation. One instance is the LEXIMIN algorithm which “choose[s] representative panels while selecting individuals with probabilities as close to equal as mathematically possible.” This algorithm is open-access and can be used at panelot.org.
Aside from considerations for selecting participants, it’s important to consider the selected individuals’ ability and willingness to participate. Several factors can dissuade selected individuals from taking part, including but not limited to, the cost of missing work, the cost of childcare, transportation costs, and lack of trust in the organizing body or agency. Prohibitive costs are addressed by several of the deliberation models discussed in the “Selecting a Type of Public Deliberation” section. These models strongly suggest stipends which, at minimum, cover incidental expenses. A lack of trust is a particularly important issue to address as it can hinder the organizer’s ability to reach individuals typically left out of policymaking discussions. One approach to addressing this once again brings us to making — and critically, keeping — promises regarding the implementation of the conclusions of participants. Framing (as discussed in an earlier section) can also contribute to building trust, though, importantly, this is not a gap that can be bridged overnight. A more extensive discussion on inclusion in public deliberation forums can be found here.
Bringing On Experts & Creating Materials
Prior to selecting the group who will participate in the public deliberation activity, steps need to be taken to organize which experts will be part of the event and create the informational material that will be provided to participants before deliberations begin.
Here, efforts must be made to ensure sufficient and balanced information is presented without creating a framing event where participants enter discussions with a biased perspective. It has been found that participants readily integrate the facts and opinions presented by experts/witnesses prior to deliberation and critically engage with their points. A deliberative engagement initiative in British Columbia, Canada about biobanking brought on a variety of experts and stakeholders to present to participants. To ensure fairness, presenters were “given specific topics, limited presentation times, and asked to use terms as defined in the information booklet” that was previously provided. A unique component included in this initiative was the ability for participants to ask presenters questions in between the two deliberative session weekends, which were two weeks apart, through a website.
In addition, participants were provided with booklets and readings. In the case of the British Columbia initiative, to create booklets and background materials, a literature review was performed. Once more, the materials should provide a balance of opinions. They should include the most important facts relevant to the question at hand, some of the most common/salient approaches and points with regards to the question, and the weaknesses of each approach/point (Mauro Barisione). It is also best to keep materials succinct, with some deliberative initiatives keeping their materials to one page long.
Though the traditional approach is to have experts present prior to deliberation, other methods have also been used. For instance, a Colorado deliberation initiative focused on future water supply used an “on tap but not on top” expert approach. Rather than call experts to present information, they instead provided one-page information sheets, followed directly by deliberation. Experts were present during the deliberation session. When prompted by a participant, a facilitator would ask an expert to briefly join the group to answer the participant’s question. The approach was largely successful, though one “rogue expert” frequently interjected in a group’s discussion, providing his own opinions. One limiting factor to this approach is time; the deliberative sessions mentioned above were two hours long. But many other forms of deliberation are significantly longer, making coordinating with experts for long durations of time difficult. Despite these challenges, this approach provides an interesting way of integrating experts into the deliberation process so their expertise is best used and the participants’ questions are best answered as they arise.
Facilitation
A good facilitator or moderator is critical to the deliberation process. As explained by Kara N. Dillard, moderators set the ground rules for the discussion and prevent any one participant from dominating the session; this is called presentation. It has been found that clearly setting expectations for the discussion can lead to greater deliberative functioning — which, for our purposes, includes the exchange of ideas/reasons, equality, and freedom to speak and be heard — according to participants. Moderators also guide the discussion in two main ways: asking questions that challenge what participants have already discussed (elicitation); and connecting ideas that were previously brought up to new topics and “play[ing] devil’s advocate” to bring forth new ideas (interpretation). At the end of the session, moderators also help participants produce conclusions by asking what areas of consensus and contention were present throughout the discussion.
Moderators can take multiple approaches to facilitating, with one framework proposed by Kara N. Dillard separating moderators into three groups: passive, moderate, and involved. Passive moderators take a “backseat” approach to moderating. They often describe their role to participants as only being there to prevent a participant from dominating the conversation, rather than actively leading it. This has led to unfocused discussions and unclear conclusions. Participants often jumped around and went off-topic. Though this passive approach may work in some instances, a moderate or involved approach often leads to better deliberation.
Involved facilitators actively lead the discussion by asking questions that challenge participants to think in new ways, sometimes acting as a “quasi-participant.” In line with this, these moderators often play devil’s advocate to move the discussion in new, albeit related, directions. These moderators ask follow-up questions and “editorialize” to help participants flesh out their ideas together and aim to pinpoint points of contention so participants can further discuss them. If participants begin to veer off-topic, involved moderators will move the group back into a more focused direction while also connecting this new topic to the main question, allowing for new thoughts to emerge. These moderators take the time to sum up the main points brought up by participants after each point so conclusions become clear. Once more, this approach may not work in all instances but often leads to deeper conversations and more focused conclusions.
As implied by the name, moderate facilitators are somewhere in between passive and involved facilitators. These moderators ask questions to guide the discussion, but don’t often challenge the participants and let them take the wheel. These moderators use the elicitation strategy frequently, an important difference between moderate and passive moderators.
Due to the skills needed to facilitate a deliberation event well, organizers or government agencies looking to organize these events may require would-be facilitators to undergo brief training.
What Comes Next
After deliberation has taken place, the next step is to write a report summarizing the conclusions of the deliberative forum. As we have seen several times with other topics, there are multiple approaches to this. One approach is to leave the report writing to the facilitators, organizers, or researchers who use their own takeaways from the deliberation (in the case of facilitators) or summarize based on recordings or transcripts (in the case of organizers or researchers). However, this method introduces bias into the process and doesn’t allow participants to be directly involved in creating conclusions or next steps.
An alternative is to allocate time towards coming up with conclusions together with participants both throughout and at the end of the deliberative session. Recall that involved facilitators frequently summarize the conclusions of the group throughout the deliberation, making this final task both more efficient and more participant-led. Participants can directly and immediately add on to or push back against the facilitator’s summary. As a guideline, Public Agenda, an organization conducting public engagement research, divides the summary into the following sections: areas of agreement, areas of disagreement, questions requiring further research, and high-priority action steps.
ALI Task Force Findings to Improve Education R&D
The Alliance for Learning Innovation (ALI) coalition, which includes the Federation of American Scientists, EdCounsel, and InnovateEdu, today celebrate the release of three task force briefs aimed at enhancing education research and development (“ed R&D”). With pressing issues such as declining literacy and math scores, chronic absenteeism, and the rise of technologies like AI, a strong ed R&D infrastructure is vital. In 2023, ALI convened three task forces to recommend ways to bolster ed R&D. The task forces focused on state and local ed R&D infrastructure, inclusive ed R&D, and the critical role of Historically Black Colleges and Universities (HBCUs), Minority-Serving Institutions (MSIs), and Tribal Colleges and Universities (TCUs) in this ecosystem.
State and Local Education R&D Infrastructure
Supporting R&D at the local level encourages an environment of continuous learning, accelerating improvements to educational methods based on new evidence and pioneering research. Therefore, given that over 90% of K-12 education funding comes from state and local sources, the ALI task force recommends that capacity-building, vision alignment, and investment in state and local education agencies (SEAs and LEAs) is prioritized. Preparing these entities to leverage R&D resources within their specific locales, in rural and urban contexts, will enable the infrastructure to best meet the unique needs of communities and students across the country. Additionally, supporting human capacity and development, modernizing data systems, and strengthening collaborative partnerships and fellowships across research institutions and key stakeholders in the ecosystem, will set the stage for more context-specific and effective ed R&D infrastructure at the state and local levels.
Inclusive Education R&D
Traditional education R&D is often dominated by privileged institutions and individuals with outsized access to capital and opportunities, sidelining the needs and perspectives of historically marginalized communities. To address this imbalance, intentional efforts are needed to create a more inclusive R&D ecosystem. The task force recommends that government actors implement multidimensional measures of progress and simplify application processes for R&D funding. Continuing dialogue on equity and inclusion will create space for identifying possible biases in approaches and processes. In sum, inclusion is imperative to achieving greater equity in education and supporting all learners of diverse backgrounds and communities.
The Role of HBCUs, MSIs, & TCUs in Education R&D
Achieving collaborative infrastructure and inclusion in ed R&D requires the strong participation of Historically Black Colleges and Universities (HBCUs), Minority-Serving Institutions (MSIs), and Tribal Colleges and Universities (TCUs). An equitable education R&D ecosystem must focus on the representation of these institutions and diverse student populations in research topics, grants, and funding to support learners from all backgrounds, particularly those of disadvantaged circumstances. Actionable steps include establishing diverse peer review panels, incentivizing grant proposals from minority-serving institutions, and creating specialized scholar programs. Additionally, programs should explicitly outline resource accessibility, leadership dynamics, funder relationships, grant processes, and inclusive language to dismantle structural inequalities and make the invisible visible.
Conclusion
Recommendations from the ALI task forces propose that sufficient funding, inclusivity, and diverse representation of higher education institutions are strong first steps in a path toward a more equitable and effective education system. The education R&D ecosystem must be a learning-oriented network committed to the principles of innovation that the system itself strives to promote across best practices in education and learning.
K-12 STEM Education For the Future Workforce: A Wish List for the Next Five Year Plan
This report was prepared in partnership with the Alliance for Learning Innovation (ALI), to advocate for building a better research and development (R&D) infrastructure in education. The Federation of American Scientists believes that STEM education evolution is necessary to prepare today’s students for tomorrow’s in-demand scientific and technological careers, as well as being a national security pursuit.
American STEM Education in Context
“This country is in the midst of a STEM and data literacy crisis,” opined Elena Gerstmann and Laura Albert in a recent piece for The Hill. Their sentiment represents a widely held concern that America’s global leadership in scientific and technological innovation, anchored in educational excellence, is being relinquished, thereby jeopardizing our economy and national security. Their message recycles a 65-year-old warning to U.S. policy makers, educators, and employers when the USSR seemingly eclipsed our innovation pace with the launch of Sputnik.
Life magazine devoted their March 1958 edition to a scathing comparison of the playful approach to STEM education in U.S. schools versus the no-nonsense rigor of Russian classrooms. The issue’s theme, “Crisis in Education” was summed up soberly: “The outcomes of the arms race will depend eventually on our schools and those of the Russians.” America answered the bell and came out swinging. Under President Eisenhower, the National Aeronautics and Space Administration (NASA) and the Defense Advanced Research Projects Agency (DARPA) were both established in 1958, as was the National Defense Education Act that channeled billions of dollars into K-12 and collegiate STEM education. By innumerable metrics (the Apollo program, the internet, GPS, and manufacturing dominance, all fueled by an internationally envied higher education system), the United States reclaimed preeminence in STEM innovation.

Over the next four decades tectonic shifts in demographics, economics, and politics rearranged continental competition such that complacent U.S. education systems were once again called on the carpet. In 2001, shortly before terrorists struck the World Trade Center and Pentagon, a U.S. Senate report on homeland vulnerability echoed that of Life magazine decades prior: “The inadequacies of our systems of research and education pose a greater threat to U.S. national security over the next quarter century than any potential conventional war that we might imagine.” The painfully prescient study, product of the Hart-Rudman Commission on National Security/21st Century, identified the advancement of information technology, bioscience, energy production, and space science, all overlain by economic and geopolitical destabilization, as the nation’s greatest challenge and our new Sputnik. The Commission called on reformed education systems to quadruple the number of scientists and engineers and to dramatically increase the number and skills of science and mathematics teachers. As in 1958, leaders responded boldly, creating the Department of Homeland Security in 2001, and planting the seeds for the 2007 America Creating Opportunities to Meaningfully Promote Excellence in Technology, Education, and Science (COMPETES) Act.
Funding for research and development across federal agencies significantly increased over the decade, including a budget boost for the National Science Foundation’s grant programs supporting emergent scholars (Faculty Early Career Development Program, or CAREER), the research capacities of targeted jurisdictions (Established Program to Stimulate Competitive Research, or EPSCoR), Graduate Research Fellowships (GRF), the Robert Noyce Teacher Scholarships, the Advanced Technological Education (ATE) program, and others designed to bolster diverse talent pipelines to STEM careers. Despite increases in the number of students studying science and engineering in the U.S, there is still a significant gap in diverse representation and equitable access to opportunities in the STEM field; ensuring greater inclusion and diversity in the American science and engineering landscape is essential to engaging the “missing millions,” or persistently underrepresented minority groups and women, in the nation’s STEM workforce and education programs.
Nearly a quarter century later, America is once again in a STEM talent crisis. The solutions of Hart-Rudman and of the Eisenhower era need an update. This latest Sputnik moment, unlike the space race that motivated the National Defense Education Act, and the terrorism that spawned Homeland Security, is more perfuse and profound, permeating every aspect of our lives: artificial intelligence and machine learning, CRISPR (clustered regularly interspaced short palindromic repeat), quantum computing, 6G and 7G communications, semiconductors, hydrogen and other energy sources, lithium and other ionic energy storage, robotics, big data, blockchain, biopharmaceuticals, and other emergent technologies.
To relinquish the lead in these arenas would put the U.S. economy, national security, and social fabric in the hands of other nations. Our new USSR is a roulette wheel of friends and foes vying for STEM supremacy including Singapore, Japan, China, Germany, the UK, Taiwan, Saudi Arabia, India, South Korea and many more. Not unlike the education crises that came to a head in 1958 and in 2001, our educational Achilles heel is a lack of exposure to and under-preparedness for STEM career pursuit for the majority of diverse young Americans. Further, the U.S. Bureau of Labor Statistics projects that STEM career opportunities will grow 10.8% by 2032, more than four times faster than non-STEM occupations.
What the United States has going for it in 2024 (and was comparatively lacking in the 1950s and the early 2000s) are STEM-rich local schools, communities, and states. Powered by investments of federal agencies (e.g., Smithsonian, NSF, NASA, DOL, ED and others), state governments (governors in Massachusetts, Iowa, Alabama, for example), nonprofits (Project Lead The Way and the Teaching Institute for Excellence in STEM for example), and industries (Regeneron, Collins Aerospace, John Deere, Google, etc.), STEM is now seen as an imperative field by most Americans.
Today’s STEM education landscape presents significant opportunities and challenges. Existing models of excellence demonstrate readiness to scale. To focus on what works and to channel resources in the direction of broader impacts for maximal benefit is to answer the call of our omni-present 2024 Sputnik.
The Current State: Future STEM Workforce Cultivation
At its root, STEM education is about workforce cultivation for high-demand and high-skill occupations of fundamental importance to American economic vitality and national security. In the ideal state, STEM education also prepares all learners to be critical thinkers who make evidence-based decisions by equipping them with analytical, computational, and scientific ways of knowing. STEM students should learn effective collaboration and problem-solving skills with an interdisciplinary approach, and feel prepared to apply STEM skills and knowledge to everyday life as voters, consumers, parents, and citizens.1
Target Audiences and Service Providers
The early childhood education community (pre-K-grade 3), both in school and out-of-school (at informal learning centers), has emerged over the last decade as a prime target for boosting STEM education as research findings accumulate around the importance of early exposure to and comfort with STEM concepts and processes. Popular providers of kits and activities, curricula, software platforms, and professional development for educators include Hand2Mind (Numberblocks), Robo Wunderkind, StoryTimeSTEM (Dragonland), NewBoCo (Tiny Techies), BirdBrain Tech (Finch robot), FIRST Lego League (Discover), Museum of Science Boston (Wee Engineer), Iowa Regents’ Center for Early Developmental Education (Light & Shadow), and Mind Research (Spatial-Temporal Math).
The elementary to middle school level of STEM education options both in and out of school enjoys the richest menu of STEM programming on the market, reflecting stronger curricular freedom to integrate content compared to high schools. Popular STEM programs include Blackbird Code, Derivita Math, FUSE Studio, Positive Physics, Micro:bit, Nepris (now Pathful), Project Lead The Way (Launch and Gateway), FIRST Tech Challenge, Code.org (CS Discoveries), Bootstrap Data Science, and many more.
The secondary education STEM landscape differs from pre-K-8 in a significant way: although discrete STEM activities and programs are plentiful for integration into secondary science, mathematics, and other classes, the adoption of packaged courses or after-school enrichment opportunities is more common. Project Lead The Way and Code.org offer an array of stand-alone elective STEM courses2, as do local community colleges and universities. Nonprofits and industry sources offer STEM enrichment programs such as the Society of Women Engineers’ SWEnext Leadership Academy, Google’s CodeNext, the Society of Hispanic Professional Engineers’ Virtual STEM Labs, and Girls Who Code’s Summer Immersion. Finally, a number of federal, state, nonprofit and business organizations conduct future workforce programs for targeted students including the federal TRIO program, Advancement Via Individual Determination (AVID), Jobs for America’s Graduates (JAG), and Jobs For the Future (JFF).
Investment in STEM Education
A modestly conservative estimate of the total American investment in STEM education annually is $12 billion, nearly the equivalent of the entire budget of the National Science Foundation or the Environmental Protection Agency.
For fiscal year 2023 the White House budgeted $4.0424 billion for STEM education across 16 agencies that make up the Subcommittee on Federal Coordination in STEM Education (FC-STEM). Total nonprofit and philanthropic investments are more elusive since there are so many, with origins of their dollars often overlapping with state or local government (grants for example), and wildly variable definitions of STEM investments. That said, U.S. charitable giving to the education sector totaled $64 billion in 2019. A reasonable assumption that two percent made its way to STEM education equates to over $1 billion contributed to the overall funding pie. Business and industry in the United States contribute well over $5 billion annually, a conservative estimated proportion of total annual STEM education market share among ten nations, according to a recent study. K-12 schools spend well over $1 billion on STEM, a minimally modest fraction of the $870 billion total spent on K-12 across the U.S. The same figure would likely be true of America’s annual $700 billion higher education expenditure, minimally $1 billion to STEM. Elusive as definitive figures can be in this space, a glaring reality is that funds are streaming into STEM education at a level where measurable results should be expected. Are resources being distributed for maximal impact? Are measures capturing that impact? Is it enough money?
There are approximately 55.4 million K-12 students across the nation. At $12 billion per year on STEM, that comes to about $217 worth of STEM education annually per young American. Is that enough to move the needle? The answer is a qualified “yes” based on Iowa’s experience. The state launched a legislatively funded STEM education program in 2012, investing on average about $4.2 million annually to provide enrichment opportunities for about one-fifth of all K-12 students, or 100,000 per year. To date, about 1.2 million youth have been served through a total investment of about $50 million. That calculates to $42 per student. The result? Among participants: increased standardized test scores in math and science; increased interest in STEM study and careers; a near doubling of post-secondary STEM majors at community colleges and universities. Thus, from Iowa’s experience, the amount of funding toward American STEM education is adequate to expect systemic gains. The qualifier is that Iowa funds flow toward increased equity (most needy are top priority), school-work alignment (career linked curriculum, professional development), and proof of effectiveness (rigorously vetted and carefully monitored programs). Variance in these three factors can separate ambitions from realities.
Ambitions vs. Realities
The federal STEM education strategic plan Charting a Course for Success: America’s Strategy for STEM Education, identified three consensus goals for U.S. STEM education: a strong STEM literacy foundation for all Americans; increased diversity, equity, and inclusion in STEM study and work; and preparation of the STEM workforce of the future. Three challenges lie between those goals and reality.
Elusive equity. The provision of quality STEM education opportunities to Americans most in need is universally embraced yet difficult to achieve at the program level. Unequal funding of school STEM programs across urban, rural, and suburban public and private school districts equate to less experienced educators and diminished material resources (laboratories, computers, transportation to enrichment experiences) in socioeconomically disadvantaged communities. The challenge is then compounded by the lack of role models to inspire and support youth of underserved subpopulations by race, ability, ethnicity, gender, and geography. Bias, whether implicit or explicit, fuels stereotype threat and identity doubt for too many individuals in schools, colleges, and workplaces, countering diversity and equity efforts.
School-work misalignment. For most learners, the school experience can seem quite different from the higher education which follows, and the work and life experiences beyond. Employer and learner polls unearth misalignment in priorities: employers value in new hires skills such as relationship building, dealing with complexity and ambiguity, balancing opposing views, collaboration, co-creativity, and cultural sensitivity, in addition to expectations of work-related experiences. Schools typically proclaim missions like “Educating each student to be a lifelong learner and a caring, responsible citizen” omitting the importance of employability. Learners feel that school taught them time management, academic knowledge, and analytical skills, while experiential learning remains limited.
Elusive proof. Evidence of effect can be vexingly evasive. The 2022 progress report of the federal STEM plan clarified the difficulty in verifying reach to those most in need: the identification of participants in STEM programs can be restricted for privacy/legality reasons. The gathering of racial, ethnic, and demographic data on STEM participants may often be unreliable given self-reported or observational identifications as well as the fleeting, often anonymous encounters typical of “STEM Night” or informal experiences at science centers, zoos, and museums.
Participant profiles aside, variability in program assessments – design and objectives – make meaningful meta-analysis challenging, which creates difficulties in scaling promising STEM programs. “We recommend that states and programs prioritize research and evaluation using a common framework, common language, and common tools” advised a group of evaluators recently.
Exemplars
Plentiful success stories exist at the local, regional, and national levels. The following six exemplars are each funded in whole or in part by federal and/or state grants. The first examples are local education systems (one in-school, one out-of-school) masterfully aligning learning experiences to career preparation. The second pair of examples profile a regional out-of-school STEM program powerfully documenting effects on participants, and an in-school enrichment course demonstrating success. And the final pair of examples are a nationwide equity program successfully preparing STEM educators to effectively serve students of diversity, and an exciting consortium effort aimed at refocusing the entire educational enterprise on skills that matter most.
1.a. School-work alignment at the local level
The Barrow Community School District (BCSD) in Georgia is strongly committed to work-based learning (WBL). All 15,000 students are required to take a sequence of exploratory STEM career classes beginning in ninth grade. Fifteen career pathways are available ranging from computing to health, manufacturing to engineering. It all culminates in an optional senior year internship serving 400 students annually. Interns earn dual-enrollment credits in partnership with local colleges and are paid by the employer host. Interns spend 7.5 to 15 hours per week at work experiences in a hospital, on a construction site, or in a production plant. The district employs a full time WBL coordinator to oversee, administer, and evaluate, as well as to cultivate community employer partners. Teachers are expected to spend one week in an industry externship every three to five years. The BCSD commitment to a school experience aligned to future careers is something that every student in any district ought to be able to experience.
1.b. Diverse workforce of the future – local-to-global level
The World Smarts STEM Challenge is a community-based, after-school, real-world problem-solving experience for student workforce development. Funded by a 2021 National Science Foundation ITEST (Innovative Technology Experiences for Students and Teachers) grant in partnership with North Carolina State University, students in the Washington D.C. area are assigned bi-national groups (arranged through a partnership with the International Research and Exchanges Board) to collaborate in solving local/global STEM issues via virtual communications. Groups are mentored by industry professionals. In the process, students develop skills in innovation, investigation, problem-solving, and global citizenship for careers in STEM. Participant diversity is a primary objective. Learners of underrepresented backgrounds, including Black, Hispanic, economically disadvantaged, and female students, are actively recruited from local schools. Educator-facilitators are treated to professional development opportunities to build mentorship skills that support students. The end-product is a World Smarts STEM Activation Kit for implementing the model elsewhere.
2.a. Proof of effect at the regional level out-of-school
NE STEM 4U is an after-school program serving Omaha, Nebraska regional elementary school youth. Programs are hands-on problem-based challenges relevant to children. The staff were interested in the effect of their activities on the excitement, curiosity, and STEM concept gains of participants. The instrument they chose to use is the Dimensions of Success (DoS) observational tool of the P.E.A.R. Institute (Program in Education, Afterschool & Resiliency). The DoS is conducted by a certified administrator who observes and rates four groups of criteria: the learning environment itself, level of engagement in the activity, STEM knowledge and skills gained, and relevancy. Through multiple cohorts over two years, the DoS findings validated the learning approach at NE STEM 4U across dimensions, though with natural variations in positive effect. The upshot is not only that this after-school model is readily replicable, but that the DoS observation tool is a thoroughly vetted, powerful, and readily available instrument that could become a “common tool” in the STEM education program evaluation community.
2.b. Proof of effect at the regional school level
From a modest New York origin in 1997, Project Lead The Way (PLTW) has blossomed into a nationwide tour de force in STEM education, funded by the Kern Foundation, Chevron, and other philanthropies. Adopted at the community school level where trained educators integrate units at the pre-K-5 and middle school levels (Launch, and Gateway, respectively), or offer courses at the secondary level (Algebra, Computer Science, Engineering, Biomedical), all share a common focus on developing in-demand, transportable skills like problem solving, critical and creative thinking, collaboration, and communication. Career connections are a mainstay. To that end, PLTW is notable for expecting schools to form advisory boards of local employers for feedback and connections. Attitudinal surveys attest to increased student interest in STEM careers.
3.a. Equity at the national level – diversity and inclusion
The National Alliance for Partnerships in Equity (NAPE) offers a wide array of professional development programs related to STEM equity. One module is called Micromessaging to Reach and Teach Every Student. Educators in and out of school convey micro-messages to students at every encounter. Micro-messages are subtle and typically unconscious. Sometimes they are helpful – a smile or eye contact. Sometimes they can be harmful towards individuals or reveal bias towards a group to which a student may belong – a furrowed brow or a stereotypical comment. Exceedingly rare is micro-message expertise in the teacher preparatory pipeline or in standard professional development. Yet micro-messaging is tremendously influential in the self-perceptions of learners as welcome in STEM.
3.b. Equity at the national level – leveling the playing field
Durable skills – e.g., teamwork, collaboration, negotiation, empathy, critical thinking, initiative, risk-taking, creativity, adaptability, leadership, and problem-solving – define jobs of the future. AI and automation cannot replace durable skills. The nonprofit America Succeeds has championed a list of 100 durable skills grouped into 10 competencies, based on industry input. They studied state standards for college and career readiness against those competencies and prescribe remedies to states whose standards fall short (most U.S. states). Durable Skills, packaged by America Succeeds, is an equity service par excellence – every learner can command these 100 enduring skills, setting them up for success.

The Case for Increased Investment in STEM Education R&D at the Federal and State Level
Billions of dollars pour into American STEM education each year. Millions of learners and employers benefit from the investment. Outstanding programs produce undeniably successful results for individuals and organizations. And yet, “This country is in the midst of a STEM and data literacy crisis.” How can that be? Here are some of the factors in play.
Recent STEM Education/Workforce Investment Trends
The bi-annual Science and Technology Indicators compiled by the National Science Board were released in March 2024. Noteworthy findings (necessarily a couple of years old given the retrospective analysis) include:
- When it comes to local K–12 education and STEM workforce outcomes, both are widely variable across regions of the United States.
- The timeframe of 2019 to 2022 was a period of sharp decline in mathematics scores on national tests for U.S. elementary and secondary students.
- Among bachelor’s degree holders in science and engineering fields, Hispanic or Latino, Black or African American, and American Indian or Alaska Native individuals are all underrepresented.
- About one-third of master’s and doctoral degree earners in science and engineering fields at U.S. colleges and universities in 2021 were international students on temporary visas.
- Of all STEM workers, 19% are foreign-born.
- Women accounted for 35% of all STEM workers in 2021, 47% of all workers.
- The U.S. science, technology, engineering, and mathematics (STEM) workforce accounts for 24% of the total U.S. workforce (36.6 million people), up from 21% a decade ago. (Federal STEM does not include medicine/health).
The federal government funds 52% of all academic research and development taking place at colleges and universities (2021).
Contrasting the findings of the NSB against current federal budgets, FY2024 appropriation for STEM education research and development is a work in progress. In comparison to FY23, the budget presented to Congress by the executive branch called for increases for STEM spending across many agencies but not all. The U.S. House and Senate generally propose reductions in spending. The Defense Department’s STEM education line, for example, the National Defense Education Program, is slated for significant reduction (-7.3 percent to -20 percent). The Department of Energy’s Office of Science which funds STEM education, is slated for a slight increase (+1.7 percent). The same is true for the NSF’s STEM education programs (+1.6 percent). NASA’s Office of STEM Engagement is on track for a slight decrease (-.3 percent). The Department of Agriculture’s Research and Education budget is down slightly (-1.7 percent). The U.S. Geological Survey’s Science Support budget that includes human capital development, is down slightly (-1.2 percent). The Department of Education’s Institute for Education Sciences was slated for significant increase by the executive branch though slated for reduction in both the House and Senate budgets. The Department of Homeland Security’s Science and Technology budget which includes funding for university-based centers and minority institution programs is set for reduction (-1.3 percent to -19 percent).
Significant STEM education and workforce development support resides within the CHIPS and Science Act of 2022 which has yet to be fully funded by the Congress. An overall trend in shifting R&D, including education, from federal to private sector support means greater reliance on business and industry to invest in STEM program development. The NSB Indicators report highlights this shift in R&D investment: federal government investment in R&D is at 19 percent in 2021 (down from 30 percent in 2011), while the business sector now funds 75 percent of U.S. R&D funding.
A bottom-line interpretation is that federal investment in STEM education/workforce development, though significant, can hardly be described as a generational response to an economic and national security crisis.
Emergent Frontiers
Meanwhile, economic Sputniks are circling the globe. All driven by semiconducting silicon and germanium chips. Yet another testament to American STEM education is the home-grown invention of chips. But they are built mostly elsewhere – Taiwan, South Korea, and Japan. Semiconductors lie at the heart of our communications (e.g. cell phones, satellites), transportation (e.g. planes, trains, automobiles), defense (e.g. guidance systems and risk analytics), health (e.g. pacemakers, insulin pumps), lifestyle (e.g. dishwashers, Siri and Alexa), and virtually every other aspect of life and commerce. The federal government committed $53 billion through the 2022 CHIPS and Science Act to expand semiconductor talent development, research, and manufacture in the U.S., amplified by $231 billion in commitments to semiconductor development by business and industry. Guidance through the National Strategy on Microelectronics Research was recently released by the White House Office of Science and Technology Policy. When fully realized, the CHIPS Act may come to be a generational response to an international adversarial threat far more profound than Sputnik.
Equally compelling and weighty in terms of life, liberty, and the pursuit of happiness is to lead in research and development as well as governance around artificial intelligence. Extraordinary workplace and homelife evolution are underway resulting from applications of this new technology. For example, AI dramatically increases precision and thus reduces error in health care. Machine learning is far superior to human eyes at image analysis – MRI or x-ray – for detecting cancer early. On a lighter note, machine learning can dramatically increase the likely appeal of new movies by compressing millions of historic data points and a sea of YouTube videos into a sure box office hit. Conversely are the misuses both present and potential, to AI. The displacement of radiologists, movie script writers, and countless others whose routine, analytical, or creative skills can be performed by robots and neural networked sensors is troublesome yes, but a mild effect of AI compared to the proneness of our privacy, our democratic systems, business and finance integrity, and national defense structures for starters.
The White House Blueprint for an AI Bill of Rights plants an important stake in the ground around AI safeguards. But it does not speak to the cultivation of future managers of AI. Similarly, the U.S. Department of Education report Artificial Intelligence and the Future of Teaching and Learning advises on risks of and uses for AI in diagnostics and descriptive statistics. However, guidance for preparing the upcoming generation to manage AI is not included. The National Science Foundation supports several AI-education studies that may prove worthy of scaling.
A potpourri of additional emergent trends fuel the current STEM crisis. Many are technological innovations, unearthing powers of manipulation and control with which society is ill-prepared to manage. Quantum computing is one such innovation – using subatomic particle positioning, qubits, to store information. Computers will become exponentially faster and more powerful, possibly solving climate change while also deciphering everyone’s passwords. Relatedly, revolutions in cybersecurity and data analytics may be out ahead of societal grasp. Many educational programs at the local and national levels have emerged in this space, including eCybermission from the Army Education Outreach Program (AEOP), and Data Science Foundations using sports, finance, and other contexts for sense-making, from EverFi.
Not everyone needs to know how a microwave oven works in order to use it effectively. But U.S. citizens bear the responsibility for weighing ethical, equitable, and legal dimensions of STEM advancements as voters, educators, parents, and consumers. Whether it be CRISPR alterations of individuals’ genetics, socioeconomic dimensions of factory automation, morality aspects of Directed Energy Weaponry (DEW), the cost/benefit balance of climate mitigation technologies such as carbon sequestration, and so on, STEM education and workforce development need to be out front. That requires additional investment.
Supply-Demand Imbalance
Emergent technologies will drive job opportunities in the STEM arena that are expected to grow at four times the rate of jobs in other sectors in the coming decade. While it is encouraging that post-secondary STEM certificates and degrees have increased over the last decade (growing from 982,000 in 2012 to 1,310,000 in 2021), this growth is a ripple when the field needs a wave. Further, significant subpopulations of Americans are underrepresented in STEM majors and jobs. Women make up just about one-third of the science and engineering workforce. While racial and ethnic subgroups including Alaska Native, Black or African American, American Indian, and Hispanic or Latino comprise 30% of the total workforce, just 23% are in STEM jobs. Rural residency exacerbates those disparities for all subpopulations regarding the STEM education pipeline. While 40% of urban adults have at least a bachelor’s degree, only 25% of rural residents do.
The commitment to diversify the STEM talent pipeline is a universal consensus across federal, state, local, corporate, nonprofit, and philanthropic investors in STEM education and workforce development. Numerous programs devoted to equity and inclusion are at work today with promising results, ripe for scaling.
Impact on Individuals and Society
Of all the arguments supporting increased investment in STEM education R&D to solve our current STEM crisis – tepid federal spending, ominously powerful inventions, and the dearth of talent for advancing and managing those inventions – a fourth argument eclipses each of them: STEM education improves the lives of individuals irrespective of their occupation. And in so doing, STEM education improves communities and the country at large.
Learners fortunate to enjoy quality STEM education develop creativity through imaginative design, interpretation, and representation of investigations. The tools they use strengthen technology literacy. The mode of discovery is highly social, honing communication and cooperation skills. With no sage-on-the-stage they develop independence of thought. Failure happens, forging perseverance and resilience in its wake. Asking and answering questions nurtures curiosity. Defending and refuting ideas cultivates critical thinking, Truth and facts are evidence-based yet always tentative. Empathy is cultivated through alternative interpretations or points of view. And confidence to pursue STEM as a career comes from doing STEM.
The prospect of an entire population of Americans thus equipped is the most compelling case for strategically increased R&D investment in STEM education.

Policy Recommendations for Increasing the Efficacy of Education R&D to Support STEM Education
Where do federal, state, local, corporate, nonprofit, and philanthropic STEM investors look for guidance in the alignment and leveraging of their dollars to nationwide priorities? The closest we have to a “master plan” is the federal STEM education strategic plan mandated by the America COMPETES Act. Updated every five years by the White House Office of Science and Technology Policy in close collaboration with federal agencies, the 2018-2023 plan is due for an update, and it is likely the next iteration will be released soon.
While the STEM community waits, valuable input on the next iteration was recently provided to the OSTP from the STEM Education Coalition. Coalition members, (numbering over 600) represent the spectrum of STEM advocates – business and industry, higher education, formal and informal K-12 education, nonprofits, and national/state policy groups – and collectively hold great sway in matters of STEM education nationally. The expiring federal STEM plan is closely reflective of their input, as its successor likely will be as well.
Six of the following ten recommendations build upon the STEM Education Coalition’s priorities, while the remaining four recommendations address gaps in the pipeline from STEM education to workforce pathways.
In order to maximize research and development to improve STEM education, we have distilled ten recommendations:
- Devote resources (human and financial) to both the scaling of, and continued research and development in, interventions that disrupt the status quo when it comes to rural under-reach and under-service in STEM education.
- Devote resources to both the scaling of, and continued research and development in transdisciplinary (a.k.a. Convergent) STEM teaching and learning, formally and informally.
- STEM teacher recruitment and training to support learning characterized on page 11 is a high-value target for investment in both the scaling of existent models as well as research and development on this essential frontier.
- Expand student authentic career-linked or work-based learning experiences to all, earning credits while acquiring job skills, by improving coordination capacity, and crediting – especially earning core (graduation) credits.
- Devote resources to research and development on coordination across components of the STEM education system – in school and out of school, educator preparation – at the local, state and national levels.
- Devote resources to research and development toward improved awareness/communication systems of Federal STEM education agencies.
- Devote resources to research and development on supporting the training of STEM teachers and professionals for career coaching on a real-time, as-needed basis for all youth.
- Devote resources to research and development on the expansion of local/global challenge-solution learning opportunities and how they influence student self-efficacy and STEM career trajectories.
- Devote resources to research and development of a digital platform readily accessible, easily navigable, and comprehensively thorough, for education-providers to harvest effective, vetted STEM programs from across the entire producer spectrum.
- Devote resources to the design and development of a catalog of STEM/workforce education “discoveries” funded by federal grant agencies (e.g., NSF’s I-Test, DR-K12, INCLUDES, CSforAll, etc.) to be used by STEM educators, developers and practitioners.
Recommendation 1. Devote resources (human and financial) to both the scaling of, and continued research and development in, interventions that disrupt the status quo when it comes to rural under-reach and under-service in STEM education.
Aligning to the STEM Ed Coalition’s priority of “Achieving Equity in STEM Education Must Be a National Priority,” this recommendation is central to the success of STEM education. The economic and moral imperative to broaden access to quality STEM education and to high-demand STEM careers is a national consensus. Lack of access and opportunity across rural America, where 20% of all youth attend half of all school districts and where persistent inequality hits members of racial and ethnic minority groups hardest, creates a high-value target.
STEM Excellence and Leadership Project
Identifying and nurturing STEM talent in rural K-12 settings can be a challenge. The Belin-Blank Center for Gifted Education and Talent Development successfully designed and implemented the “STEM Excellence and Leadership Project” at the middle school level. Funded by the NSF’s Advancing Informal STEM Learning program, flexible professional development, wide-net-casting of students, networking within the community, and career-counseling, resulted in increased creatively, critical thinking, and positive perceptions of mathematics and science.
Recommendation 2. Devote resources to both the scaling of, and continued research and development in transdisciplinary (a.k.a. Convergent) STEM teaching and learning, formally and informally.
Aligning to the STEM Ed Coalition’s Priority “Science Education Must Be Elevated as a National Priority within a Transdisciplinary Well-Rounded STEM Education,” we need more investment in R&D to understand the transdisciplinary STEM teaching and learning models that improve student outcomes. America’s formal education model remains largely reflective of the 1894 recommendations of the Committee of Ten: annually teach all students History, English, Mathematics, Physics, Chemistry, etc. This prevailing “layer cake” approach serves transdisciplinary education poorly. Even the Next Generation Science Standards upon which state and district science standards are largely based, focuses on developing… “an in-depth understanding of content and develop key skills…” All modern STEM-related challenges facing Generations Z, Alpha, and Beta require an entirely different brand of education – one of transdisciplinary inquiry.
USPTO Motivates Young Innovators and Entrepreneurs
The United States Patent and Trade Office (USPTO)’s National Summer Teacher Institute (NSTI) on Innovation, STEM, and Intellectual Property (IP) trains teachers to incorporate concepts of making, inventing, and intellectual property creation and protection into classroom instruction, with the goal to inspire and motivate young innovators and entrepreneurs. To date the program claims 22,000 hours of IP and invention education training of 444 teachers in 50 states – 110 of whom have inventions – now equipped to spread the power of invention education and IP to hundreds of thousands of learners across the country and the world. We should better understand the program components that enable this kind of transdisciplinary learning.
Recommendation 3. STEM teacher recruitment and training to support learning is a high-value target for investment in both the scaling of existent models as well as research and development on this essential frontier.
Aligning to the STEM Ed Coalitions’ priority “Increase the Number of STEM Teachers in Our Nation’s Classrooms,” we need to deploy more education R&D to address America’s well-documented STEM teacher shortage. But the shortage is only half of the challenge we face. The other half is equipping teachers to authentically teach STEM, not merely a discipline underneath the STEM umbrella. Efforts such as the NSF’s Robert Noyce Teacher Scholarship program and the UTeach model support the production of excellent teachers of mathematics and science, but not STEM overall. To teach in a convergent (transdisciplinary) fashion through collaborative community partnerships, on local/global complex issues is beyond the scope and capacity of traditional teacher preparatory models.
Example Programs
Two means for equipping educators to teach STEM are (1) in their pre-professional preparation, and (2) as in-service professional development for disciplinary instructors. Promising examples are flourishing.
- STEM Teaching Certificate. A few U.S. states and some national organizations have built STEM licenses and endorsements. Georgia State University’s STEM Certificate program trains teachers to bring a convergent STEM approach to whatever course, “[candidates] figure out how to work across their schools, with the arts, with connections to other subjects.”
- Iowa now offers STEM teaching endorsements featuring integrated methodology coursework, and a field experience in a STEM job internship or research.
- The National Institute for STEM Education offers a certificate based on 15 competencies including “argumentation” and “data utilization.”
- In-service STEM Externships. Teachers in industry externships discover workplace connections and durable skills important to build in classrooms. Numerous businesses (e.g., 3M), organizations (e.g. Aerospace/NASA), and states (e.g., Iowa’s NSF ITEST funded externships) conduct variations on the concept, with compelling results.
Recommendation 4. Expand student authentic career-linked or work-based learning experiences to all, earning credits while acquiring job skills, by improving coordination capacity, and crediting – especially earning core (graduation) credits.
Aligning to the STEM Ed Coalition’s priority to “Support Partnerships with Community Based STEM Organizations, Out of School Providers and Informal Learning Providers” education R&D needs to better understand career based learning models that work and deploy these evidence-based practices at scale.
Example Programs
With all 50 U.S. states aggressively pursuing work-based learning (WBL) policies and support, there is an opportunity to study and codify what states are learning to improve and iterate faster. According to the Education Commission of the States, 33 states have a definition for WBL, though variable. Nearly all states report WBL as a state strategy for their Workforce Innovation and Opportunity Act (WIOA) profile. Twenty-eight states legislate funding to support WBL. Less than half of all states permit WBL to count for graduation credits. Of all states, Tennessee presents a particularly aggressive WBL profile worthy of scale/replication.
Recommendation 5. Devote resources to research and development on coordination across components of the STEM education system – in school and out of school, educator preparation – at the local, state and national levels.
Aligning to the STEM Ed Coalition’s priority to “Take a Systemic Approach to Future STEM Education Interventions,” more R&D should be deployed to study ecosystem models to understand the components that lead to student outcomes
The STEM learning that takes place during the K-12 school day may or may not mesh well with the STEM learning that takes place at museum nights or at summer camp. In both instances, it may or may not align well with local, state, or national assessments. The preparation of educators is widely variable. The curricular content classroom-to-classroom and state-to-state varies. To drop novel grant-funded interventions into the mix is a random act of hope.
Example Programs
STEM Learning Ecosystems now number over 100 across the U.S., providing vertebral backbone to a national coordinative skeleton for STEM education. Formally designated by their membership in the STEM Learning Ecosystems Community of Practice supported by the Teaching Institute for Excellence in STEM (TIES), they each unite “…pre-K-16 schools; community-based organizations, such as after-school and summer programs; institutions of higher education; STEM-expert organizations, such as science centers, museums, corporations, intermediary and non-profit organizations and professional associations; businesses; funders; and informal experiences at home and in a variety of environments” to “…spark young people’s engagement, develop their knowledge, strengthen their persistence and nurture their sense of identity and belonging in STEM disciplines.” Every one of America’s 20,000 cities and towns ought to have a STEM Ecosystem. Just 19,900 to go.
Recommendation 6. Devote resources to research and development toward improved awareness/communication systems of Federal STEM education agencies.
Aligning to the STEM Ed Coalition’s priority to “Clarify and Define the Role of Federal Agencies and OSTP in Supporting STEM Education” we should utilize R&D and inspiration from other fields to ensure we are propagating knowledge and systems in ways that foster increased transparency and evidence-use.
Awareness is the weak link in the chain of federal STEM education outreach to consumers at local levels. Seventeen federal agencies engage in STEM education via 156 programs spanning pre-K-12 formal and informal, higher education, and adult education.
In 2018-19 a strong push was put forth by the OSTP and the Federal Coordination in STEM subcommittee (FC-STEM) to build STEM.gov or STEMeducation.gov in the spirit of AI.gov and Grants.gov. A one-stop clearinghouse through which Americans can explore and discover funding, programs, and expertise in STEM. To date, the closest analog is https://www.ed.gov/stem.
Example Programs
Discrete programs of various federal agencies have employed clever tactics for awareness and communication, as described in the 2022 Progress Report on the Implementation of the Federal STEM Education Strategic Plan. The AmeriCorp program, for example, partnered with Mathematica to build a web-based interactive SCALERtool useable by education professionals, local education agencies, state education agencies, nonprofits, state and local government agencies, universities and colleges, tribal nations, and others to request participants to address local challenges they have identified, including STEM. Similarly, the National Institute of Standards and Technology launched their NIST Educational STEM Resource registry (NEST-R) to provide wide access to NIST educational and workforce development content including STEM resource records. Can the concept be broadened to a grand unifying collective?
Recommendation 7. Devote resources to research and development on supporting the training of STEM teachers and professionals for career coaching on a real-time, as-needed basis for all youth.
Gen Z and Gen Alpha may end up in jobs like machine learning tech, molecular medical therapist, cryptocurrency auditor, big data distiller, climate change mitigator, or jetpack mechanic. From whom can they expect good career coaching? It is unrealistic to expect that their school counselors can keep up, with an average caseload of 385 students across all disciplines, their hands are full. STEM teachers, both the disciplinary and the integrated type, are best positioned to take on more responsibility for career coaching, with the help of counselors, administrators, librarians in fact it is an all-hands-on-deck challenge.
Example Programs
Meaningful Career Conversations is a program begun in Colorado now spreading to other states. It is a light training experience of four hours to equip educators and others with whom youth come into contact to conduct conversations that steer students toward reflection, exploration, and consideration of career pathways of interest. Trainings are based upon starters and prompts that get students talking about and reflecting on their strengths and interests, such as “What activities or places make you feel safe and valued? Why?” Not a silver bullet, but a model of distributed responsibility which, by engaging core teachers and other adults in career guidance, can help more students find their way toward a STEM career.
Recommendation 8. Devote resources to research and development on the expansion of local/global challenge-solution learning opportunities and how they influence student self-efficacy and STEM career trajectories.
The standardization of a vision for STEM in classrooms across America will take time and resources. In the meantime, programs like MIT Solve can fast-track authentic learning experiences in school and after school. It is the ultimate student-centeredness to invite groups of youth to think big – identify challenges for which they are enthused and tap all imaginable resources in dreaming up solutions – to command their own learning.
Example Programs
Common in higher education are capstone projects, applied coursework, even entire college missions (e.g., Olin College) that center the student learning experience around local/global challenges and solutions.
For citizens of all ages there are opportunities like Changemakers Challenges, and the “Reinvent the Toilet” competition of the Gates Foundation.
At the K-12 level, FIRST Lego League teams learn about robotics through humanitarian themes such as adaptive technologies for the disabled. The World Food Prize offers student group projects focused on global food security challenges. Of similar format is Future Cities, and Invention Convention. These well-evaluated programs are prime for expansion or replication.
Recommendation 9. Devote resources to research and development of a digital platform readily accessible, easily navigable, and comprehensively thorough, for education-providers to harvest effective, vetted STEM programs from across the entire producer spectrum.
More than 50 different programs are named in this paper, each an exemplar, a mere snapshot of the STEM programs available to the pre-K-12 community in and out of school. Therein lies a challenge/opportunity uniquely defining this moment in American educational history compared to the 1958 and 2001 crises: an embarrassment of riches.
Example Programs
The number of databases and resource catalogs on STEM education programs available to educators is almost as overwhelming as the number of programs themselves. A few standouts help dampen the decibels (though none are perfect):
- What Works Clearinghouse (WWC). Established in 2002 under the Institute for Education Sciences at the U.S. Department of Education, the WWC does the hard work for educators of reviewing the research to make evidence-based recommendations about instruction. A priceless service. The trick is distillation. Their goal to digest and disseminate education research gets the material down to the level of curriculum developers, publishers, teacher-trainers, etc. Overwhelming though, for casual-shopping educators.
- STEMworks Database. Born under Change The Equation in 2012, WestEd acquired STEMworks in 2017, a tool to sift through the noise using a rigorous rubric (Design Principles) to present sure-fire winning STEM programs to educators and organizations. Programs (kits, courses, software, lessons) submit applications for expert review. The result is a “Searchable honor-roll” of high-quality STEM. The hitch? Relatively few providers apply, especially not the emergent or experimental yet to acquire robust impact evidence.
Recommendation 10. Devote resources to the design and development of a catalog of STEM/workforce education “discoveries” funded by federal grant agencies (e.g., NSF’s I-Test, DR-K12, INCLUDES, CSforAll, etc.) to be used by STEM educators, developers and practitioners.
This recommendation relates to recommendation #9 except expressly regarding federal programs, and related to recommendation #6 except not a mere roster of offerings, but a vetted (and user-friendly) What Works Clearinghouse for all prior grants that yields empirical support for preK-12 STEM, across all agencies. What a treasure-trove of proven interventions and innovations across NSF, DE, DOE, DoD and on, mostly unknown to practitioners across the United States.
Each federal agency currently posts STEM opportunities at their websites (e.g., http://www.ed.gov/stem, http://dodstem.us/, http://www.nsf.gov/funding, http://www.nasa.gov/education, https://science.education.nih.gov/). These tools are valuable, but a desperate need remains for a singular STEM.gov style searchable landing page.
There must be a way to view what worked for the thousands of R&D projects funded by these agencies. An online shopping mall for successful preK-12 STEM curricula, teaching approaches, equity practices, virtual platforms, etc. CoSTEM could create a “STEM Ideas that Work” landing page to ensure that emerging research insights are captured in systematic and accessible ways.
Example Programs
The Ideas That Work resource is an analog. Curated by the Office of Special Education Programs at the U.S. Department of Education, it is a searchable database that includes all grants past and current exclusive to the NSF. Special educators and families can search, e.g., “behavioral challenge” yielding resources and toolkits, training modules, tip sheets, etc.

Recommended Actions of ALI and Other Stakeholders
While we hope to see many of these recommendations in the forthcoming Five Year STEM Plan, to actualize these recommendations, it will take multiple actors working together to advance the STEM education field.
The Alliance for Learning Innovation has perhaps the most potent of tools among STEM/workforce stakeholders to affect change: communication.
ALI should host events, publish white papers, develop convenings and deploy mass media and other awareness and advocacy modes for rallying the august collective of member organizations toward amplifying America’s rural STEM equity opportunity, career coaching capacity, educator-employer partnership potential, convergence approach to learning, along with six other recommendations, doing more for preparing the future STEM workforce than any other action, including investment.
Investment is a close second-most impactful action ALI can take. If all STEM investors – federal, state, corporate, and philanthropic, aligned around a finite array of pressing priorities served by a proven set of interventions (the very function of this report), the collective impact would transform systems. What it would take is an aggregator. ALI or a designee organization, functioning as an agent for businesses, philanthropies and other STEM investors, can make funding recommendations (or more ambitiously, pool investor funds) based on consensus goals of the STEM cooperative, acting to focus investments accordingly.
Federal Agencies have made significant gains toward cooperative and complementary STEM education support through the sustenance of interagency working groups on Computational Literacy, Convergence, Strategic Partnerships, Transparency & Accountability, Inclusion in STEM, and Veterans and Military Spouses in STEM. As a result, improvements are being made in coordination and increased transparency about federal education R&D investments, especially between the National Science Foundation and the Department of Education. And yet, more needs to be done.
- As detailed in the ALI FY 2024 Appropriations Letter to Congress, the Institute for Education Sciences should establish a National Center for Advanced Development in Education (NCADE) for nurturing innovations supporting the rural STEM diversity pipeline, STEM educator preparation, school-business partnerships, and other recommendations of this report. Programs supported by the future NCADE as well as existent agency grants must be presented to the nation’s STEM community whether success or failure in a transparent, readily accessible dashboard fashion to inform advancement of the field at large.
- All member agencies of the Subcommittee on Federal Coordination in Science, Technology, Engineering, and Mathematics Education (FC-STEM) should incentivize research and development around the ten recommendations of this report given their alignment to the goals and priorities of the federal STEM education strategic plan. Especially, the National Science Foundation and Department of Education should lead in driving the advancement of transdisciplinary (convergent) STEM education, work-based or career-linked learning, the synchronization of in-school and out-of-school STEM education, educator career-coaching capacity, and the development of rural, diverse STEM workforce talent.
Business, Industry, and Philanthropic Organizations have the ability to pilot or expand proven programs to national scale, as many examples herein attest. However, the impact of the investments of the private sector may fall short of systemic change due to a smorgasbord of pet programs chosen by each entity, leading to incremental rather than wholesale progress.
Business, industry, and philanthropic investors in STEM education should pool their resources around a finite array of proven programs for maximal, collective impact. A functional intermediary such as the Alliance for Learning Innovation could represent the interests of all non-government STEM funders by winnowing the horde of pre-K-12 STEM education programs to only those most effective at achieving consensus goals and priorities. The outcome might be a Consumer Reports-style top-rated performers menu that concentrates investments, amplifying impact. Like federal agencies, non-government funders should consider driving the advancement of transdisciplinary (convergent) STEM education, work-based or career-linked learning, the synchronization of in-school and out-of-school STEM education, educator career-coaching capacity, and the development of rural, diverse STEM workforce talent.
States are best positioned to help local education/workforce organizations meet the human resource challenges and the material challenges inhibiting full production of future workers for high demand careers. It is state government that sets the policies that determine practices.
- The qualification to teach K-12 STEM for example, should include mandatory experience in the world of commerce, such as summer externships.
- The certification of higher education institutions to train teachers should include a similar expectation that faculty demonstrate competency in career coaching and employer partnerships, as well as disciplinary convergence around local/global challenges that inspire youth.
- State education regulations governing seat time, graduation requirements, testing and grades, educator licensing, field experience liability, and other rules known to inhibit experiential, convergent, equitable learning should be revised.
- All such changes should be accompanied by the creation of Statewide Longitudinal Data Systems from Pre-K through workforce for transparent assessment, as called for by the Alliance for Learning Innovation.
K-12 formal and informal education at the daily practical level bears the greatest responsibility to act on behalf of the future STEM workforce. Insofar as government and non-government funded programs support, and state policies empower, and preparatory trainings equip, educators should seize this moment in history to help American economic vitality and national security one student at a time.
- Practice equitable behavior by which every learner belongs in STEM;
- Coordinate learning activities across in-school and out-of-school experiences;
- Connect classroom to careers;
- Empower learners to cross disciplinary boundaries, exploring complex local and global challenges;
- Determine curricular inclusions based on evidence of effect.
Others at the table include post-secondary institutions, media outlets, faith communities, local trade and professional societies, social service providers, families, and citizens at-large. Each should contribute to the goal of producing a vibrant future workforce by advocating for education research and development policies at the state and federal levels and by partnering with formal and nonformal learning organizations to inspire tomorrow’s innovators in today’s classrooms.

Conclusion
American competitiveness through innovation is driven by leading-edge education systems. Legitimate concern for whether those systems can maintain their lead surfaces during periods of vulnerability whether eclipsed in the space race, or comparatively under-armored in military advancement, or surpassed in the advancement of information technology. To relinquish leadership in innovation is a threat to the U.S. economy and national security. In response to periodic threats to American innovation preeminence, bold investments in STEM education have produced waves of talent for securing the helm.
This era is different. A myriad of fronts for innovation advancement – automation, machine learning, molecular medicine, energy transformation, cybersecurity – each harboring an existential challenge, heighten the imperative for action to an unprecedented level. And yet, the U.S. has never been more prepared to act. A wealth of pre-K-12 STEM programs and infrastructure stand in testament to legacy investments by the federal government and the private sector. This time, the challenge is to engage a broader swath of the population, especially those underserved and underrepresented in STEM programs of the past. And in tight budgetary times, broadened opportunities must utilize evidence-based solutions proven to work, whether they be in the realm of teacher preparation, equity and inclusion, early learning, informal education, community engagement, mathematics, coding, quantum physics, or all the above and more.
The best time to invest is when the pathway to success is clear. The tools and the know-how for producing tomorrow’s STEM workforce reside within pre-K-12 systems today. For public and private investors alike, there is an opportunity for amplification through collective impact. By collectively identifying high-impact solutions transparent in design and indisputable in effect, aligning resources for surgical precision rather than shotgun spray, and scaling known winners to all young Americans, the current challenge to U.S. innovation leadership will be met. Enough with moving the needle. It is time to pin the needle, shattering the gauge.
Predicting Progress: A Pilot of Expected Utility Forecasting in Science Funding
Read more about expected utility forecasting and science funding innovation here.
The current process that federal science agencies use for reviewing grant proposals is known to be biased against riskier proposals. As such, the metascience community has proposed many alternate approaches to evaluating grant proposals that could improve science funding outcomes. One such approach was proposed by Chiara Franzoni and Paula Stephan in a paper on how expected utility — a formal quantitative measure of predicted success and impact — could be a better metric for assessing the risk and reward profile of science proposals. Inspired by their paper, the Federation of American Scientists (FAS) collaborated with Metaculus to run a pilot study of this approach. In this working paper, we share the results of that pilot and its implications for future implementation of expected utility forecasting in science funding review.
Brief Description of the Study
In fall 2023, we recruited a small cohort of subject matter experts to review five life science proposals by forecasting their expected utility. For each proposal, this consisted of defining two research milestones in consultation with the project leads and asking reviewers to make three forecasts for each milestone:
- the probability of success;
- The scientific impact of the milestone, if it were reached; and
- The social impact of the milestone, if it were reached.
These predictions can then be used to calculate the expected utility, or likely impact, of a proposal and design and compare potential portfolios.
Key Takeaways for Grantmakers and Policymakers
The three main strengths of using expected utility forecasting to conduct peer review are
- For reviewers, it’s a relatively light-touch approach that encourages rigor and reduces anti-risk bias in scientific funding.
- The review criteria allow program managers to better understand the risk-reward profile of their grant portfolios and more intentionally shape them according to programmatic goals.
- Quantitative forecasts are resolvable, meaning that program officers can compare the actual outcomes of funded proposals with reviewers’ predictions. This generates a feedback/learning loop within the peer review process that incentivizes reviewers to improve the accuracy of their assessments over time.
Despite the apparent complexity of this process, we found that first-time users were able to successfully complete their review according to the guidelines without any additional support. Most of the complexity occurs behind-the-scenes, and either aligns with the responsibilities of the program manager (e.g., defining milestones and their dependencies) or can be automated (e.g., calculating the total expected utility). Thus, grantmakers and policymakers can have confidence in the user friendliness of expected utility forecasting.
How Can NSF or NIH Run an Experiment on Expected Utility Forecasting?
An initial pilot study could be conducted by NSF or NIH by adding a short, non-binding expected utility forecasting component to a selection of review panels. In addition to the evaluation of traditional criteria, reviewers would be asked to predict the success and impact of select milestones for the proposals assigned to them. The rest of the review process and the final funding decisions would be made using the traditional criteria.
Afterwards, study facilitators could take the expected utility forecasting results and construct an alternate portfolio of proposals that would have been funded if that approach was used, and compare the two portfolios. Such a comparison would yield valuable insights into whether—and how—the types of proposals selected by each approach differ, and whether their use leads to different considerations arising during review. Additionally, a pilot assessment of reviewers’ prediction accuracy could be conducted by asking program officers to assess milestone achievement and study impact upon completion of funded projects.
Findings and Recommendations
Reviewers in our study were new to the expected utility forecasting process and gave generally positive reactions. In their feedback, reviewers said that they appreciated how the framing of the questions prompted them to think about the proposals in a different way and pushed them to ground their assessments with quantitative forecasts. The focus on just three review criteria–probability of success, scientific impact, and social impact–was seen as a strength because it simplified the process, disentangled feasibility from impact, and eliminated biased metrics. Overall, reviewers found this new approach interesting and worth investigating further.
In designing this pilot and analyzing the results, we identified several important considerations for planning such a review process. While complex, engaging with these considerations tended to provide value by making implicit project details explicit and encouraging clear definition and communication of evaluation criteria to reviewers. Two key examples are defining the proposal milestones and creating impact scoring systems. In both cases, reducing ambiguities in terms of the goals that are to be achieved, developing an understanding of how outcomes depend on one another, and creating interpretable and resolvable criteria for assessment will help ensure that the desired information is solicited from reviewers.
Questions for Further Study
Our pilot only simulated the individual review phase of grant proposals and did not simulate a full review committee. The typical review process at a funding agency consists of first, individual evaluations by assigned reviewers, then discussion of those evaluations by the whole review committee, and finally, the submission of final scores from all members of the committee. This is similar to the Delphi method, a structured process for eliciting forecasts from a panel of experts, so we believe that it would work well with expected utility forecasting. The primary change would therefore be in the definition and approach for eliciting criterion scores, rather than the structure of the review process. Nevertheless, future implementations may uncover additional considerations that need to be addressed or better ways to incorporate forecasting into a panel environment.
Further investigation into how best to define proposal milestones is also needed. This includes questions such as, who should be responsible for determining the milestones? If reviewers are involved, at what part(s) of the review process should this occur? What is the right balance between precision and flexibility of milestone definitions, such that the best outcomes are achieved? How much flexibility should there be in the number of milestones per proposal?
Lastly, more thought should be given to how to define social impact and how to calibrate reviewers’ interpretation of the impact score scale. In our report, we propose a couple of different options for calibrating impact, in addition to describing the one we took in our pilot.
Interested grantmakers, both public and private, and policymakers are welcome to reach out to our team if interested in learning more or receiving assistance in implementing this approach.
Introduction
The fundamental concern of grantmakers, whether governmental or philanthropic, is how to make the best funding decisions. All funding decisions come with inherent uncertainties that may pose risks to the investment. Thus, a certain level of risk-aversion is natural and even desirable in grantmaking institutions, especially federal science agencies which are responsible for managing taxpayer dollars. However, without risk, there is no reward, so the trade-off must be balanced. In mathematics and economics, expected utility is the common metric assumed to underlie all rational decision making. Expected utility has two components: the probability of an outcome occurring if an action is taken and the value of that outcome, which roughly corresponds with risk and reward. Thus, expected utility would seem to be a logical choice for evaluating science funding proposals.
In the debates around funding innovation though, expected utility has largely flown under the radar compared to other ideas. Nevertheless, Chiara Franzoni and Paula Stephan have proposed using expected utility in peer review. Building off of their paper, the Federation of American Scientists (FAS) developed a detailed framework for how to implement expected utility into a peer review process. We chose to frame the review criteria as forecasting questions, since determining the expected utility of a proposal inherently requires making some predictions about the future. Forecasting questions also have the added benefit of being resolvable–i.e., the true outcome can be determined after the fact and compared to the prediction–which provides a learning opportunity for reviewers to improve their abilities and identify biases. In addition to forecasting, we incorporated other unique features, like an exponential scale for scoring impact, that we believe help reduce biases against risky proposals.
With the theory laid out, we conducted a small pilot in fall of 2023. The pilot was run in collaboration with Metaculus, a crowd forecasting platform and aggregator, to leverage their expertise in designing resolvable forecasting questions and to use their platform to collect forecasts from reviewers. The purpose of the pilot was to test the mechanics of this approach in practice, see if there are any additional considerations that need to be thought through, and surface potential issues that need to be solved for. We were also curious if there would be any interesting or unexpected results that arise based on how we chose to calculate impact and total expected utility. It is important to note that this pilot was not an experiment, so we did not have a control group to compare the results of the review with.
Since FAS is not a grantmaking institution, we did not have a ready supply of traditional grant proposals to use. Instead, we used a set of two-page research proposals for Focused Research Organizations (FROs) that we had sourced through separate advocacy work in that area.1 With the proposal authors’ permission, we recruited a cohort of twenty subject matter experts to each review one of five proposals. For each proposal, we defined two research milestones in consultation with the proposal authors. Reviewers were asked to make three forecasts for each milestone:
- The probability of success;
- The scientific impact, conditional on success; and
- The social impact, conditional on success.
Reviewers submitted their forecasts on Metaculus’ platform; in a separate form they provided explanations for their forecasts and responded to questions about their experience and impression of this new approach to proposal evaluation. (See Appendix A for details on the pilot study design.)
Insights from Reviewer Feedback
Overall, reviewers liked the framing and criteria provided by the expected utility approach, while their main critique was of the structure of the research proposals. Excluding critiques of the research proposal structure, which are unlikely to apply to an actual grant program, two thirds of the reviewers expressed positive opinions of the review process and/or thought it was worth pursuing further given drawbacks with existing review processes. Below, we delve into the details of the feedback we received from reviewers and their implications for future implementation.
Feedback on Review Criteria
Disentangling Impact from Feasibility
Many of the reviewers said that this model prompted them to think differently about how they assess the proposals and that they liked the new questions. Reviewers appreciated that the questions focused their attention on what they think funding agencies really want to know and nothing more: “can it occur?” and “will it matter?” This approach explicitly disentangles impact from feasibility: “Often, these two are taken together, and if one doesn’t think it is likely to succeed, the impact is also seen as lower.” Additionally, the emphasis on big picture scientific and social impact “is often missing in the typical review process.” Reviewers also liked that this approach eliminates what they consider biased metrics, such as the principal investigator’s reputation, track record, and “excellence.”
Reducing Administrative Burden
The small set of questions was seen as more efficient and less burdensome on reviewers. One reviewer said, “I liked this approach to scoring a proposal. It reduces the effort to thinking about perceived impact and feasibility.” Another reviewer said, “On the whole it seems a worthwhile exercise as the current review processes for proposals are onerous.”
Quantitative Forecasting
Reviewers saw benefits to being asked to quantify their assessments, but also found it challenging at times. A number of reviewers enjoyed taking a quantitative approach and thought that it helped them be more grounded and explicit in their evaluations of the proposals. However, some reviewers were concerned that it felt like guesswork and expressed low confidence in their quantitative assessments, primarily due to proposals lacking details on their planned research methods, which is an issue discussed in the section “Feedback on Proposals.” Nevertheless, some of these reviewers still saw benefits to taking a quantitative approach: “It is interesting to try to estimate probabilities, rather than making flat statements, but I don’t think I guess very well. It is better than simply classically reviewing the proposal [though].” Since not all academics have experience making quantitative predictions, we expect that there will be a learning curve for those new to the practice. Forecasting is a skill that can be learned though, and we think that with training and feedback, reviewers can become better, more confident forecasters.
Defining Social Impact
Of the three types of questions that reviewers were asked to answer, the question about social impact seemed the harder one for reviewers to interpret. Reviewers noted that they would have liked more guidance on what was meant by social impact and whether that included indirect impacts. Since questions like these are ultimately subjective, the “right” definition of social impact and what types of outcomes are considered most valuable will depend on the grantmaking institution, their domain area, and their theory of change, so we leave this open to future implementers to clarify in their instructions.
Calibrating Impact
While the impact score scale (see Appendix A) defines the relative difference in impact between scores, it does not define the absolute impact conveyed by a score. For this reason, a calibration mechanism is necessary to provide reviewers with a shared understanding of the use and interpretation of the scoring system. Note that this is a challenge that rubric-based peer review criteria used by science agencies also face. Discussion and aggregation of scores across a review committee helps align reviewers and average out some of this natural variation.2
To address this, we surveyed a small, separate set of academics in the life sciences about how they would score the social and scientific impact of the average NIH R01 grant, which many life science researchers apply to and review proposals for. We then provided the average scores from this survey to reviewers to orient them to the new scale and help them calibrate their scores.
One reviewer suggested an alternative approach: “The other thing I might change is having a test/baseline question for every reviewer to respond to, so you can get a feel for how we skew in terms of assessing impact on both scientific and social aspects.” One option would be to ask reviewers to score the social and scientific impact of the average grant proposal for a grant program that all reviewers would be familiar with; another would be to ask reviewers to score the impact of the average funded grant for a specific grant program, which could be more accessible for new reviewers who have not previously reviewed grant proposals. A third option would be to provide all reviewers on a committee with one or more sample proposals to score and discuss, in a relevant and shared domain area.
When deciding on an approach for calibration, a key consideration is the specific resolution criteria that are being used — i.e., the downstream measures of impact that reviewers are being asked to predict. One option, which was used in our pilot, is to predict the scores that a comparable, but independent, panel of reviewers would give the project some number of years following its successful completion. For a resolution criterion like this one, collecting and sharing calibration scores can help reviewers get a sense for not just their own approach to scoring, but also those of their peers.
Making Funding Decisions
In scoring the social and scientific impact of each proposal, reviewers were asked to assess the value of the proposal to society or to the scientific field. That alone would be insufficient to determine whether a proposal should be funded though, since it would need to be compared with other proposals in conjunction with its feasibility. To do so, we calculated the total expected utility of each proposal (see Appendix C). In a real funding scenario, this final metric could then be used to compare proposals and determine which ones get funded. Additionally, unlike a traditional scoring system, the expected utility approach allows for the detailed comparison of portfolios — including considerations like the expected proportion of milestones reached and the range of likely impacts.
In our pilot, reviewers were not informed that we would be doing this additional calculation based on their submissions. As a result, one reviewer thought that the questions they were asked failed to include other important questions, like “should it occur?” and “is it worth the opportunity cost?” Though these questions were not asked of reviewers explicitly, we believe that they would be answered once the expected utility of all proposals is calculated and considered, since the opportunity cost of one proposal would be the expected utility of the other proposals. Since each reviewer only provided input on one proposal, they may have felt like the scores they gave would be used to make a binary yes/no decision on whether to fund that one proposal, rather than being considered as a part of a larger pool of proposals, as it would be in a real review process.
Feedback on Proposals
Missing Information Impedes Forecasting
The primary critique that reviewers expressed was that the research proposals lacked details about their research plans, what methods and experimental protocols would be used, and what preliminary research the author(s) had done so far. This hindered their ability to properly assess the technical feasibility of the proposals and their probability of success. A few reviewers expressed that they also would have liked to have had a better sense of who would be conducting the research and each team member’s responsibilities. These issues arose because the FRO proposals used in our pilot had not originally been submitted for funding purposes, and thus lacked the requirements of traditional grant proposals, as we noted above. We assume this would not be an issue with proposals submitted to actual grantmakers.3
Improving Milestone Design
A few reviewers pointed out that some of the proposal milestones were too ambiguous or were not worded specifically enough, such that there were ways that researchers could technically say that they had achieved the milestone without accomplishing the spirit of its intent. This made it more challenging for reviewers to assess milestones, since they weren’t sure whether to focus on the ideal (i.e., more impactful) interpretation of the milestone or to account for these “loopholes.” Moreover, loopholes skew the forecasts, since they increase the probability of achieving a milestone, while lowering the impact of doing so if it is achieved through a loophole.
One reviewer suggested, “I feel like the design of milestones should be far more carefully worded – or broken up into sub-sentences/sub-aims, to evaluate the feasibility of each. As the questions are currently broken down, I feel they create a perverse incentive to create a vaguer milestone, or one that can be more easily considered ‘achieved’ for some ‘good enough’ value of achieved.” For example, they proposed that one of the proposal milestones, “screen a library of tens of thousands of phage genes for enterobacteria for interactions and publish promising new interactions for the field to study,” could be expanded to
- “Generate a library of tens of thousands of genes from enterobacteria, expressed in E. coli
- “Validate their expression under screenable conditions
- “Screen the library for their ability to impede phage infection with a panel of 20 type phages
- “Publish …
- “Store and distribute the library, making it as accessible to the broader community”
We agree with the need for careful consideration and design of milestones, given that “loopholes” in milestones can detract from their intended impact and make it harder for reviewers to accurately assess their likelihood. In our theoretical framework for this approach, we identified three potential parties that could be responsible for defining milestones: (1) the proposal author(s), (2) the program manager, with or without input from proposal authors, or (3) the reviewers, with or without input from proposal authors. This critique suggests that the first approach of allowing proposal authors to be the sole party responsible for defining proposal milestones is vulnerable to being gamed, and the second or third approach would be preferable. Program managers who take on the task of defining milestones should have enough expertise to think through the different potential ways of fulfilling a milestone and make sure that they are sufficiently precise for reviewers to assess.
Benefits of Flexibility in Milestones
Some flexibility in milestones may still be desirable, especially with respect to the actual methodology, since experimentation may be necessary to determine the best technique to use. For example, speaking about the feasibility of a different proposal milestone – “demonstrate that Pro-AG technology can be adapted to a single pathogenic bacterial strain in a 300 gallon aquarium of fish and successfully reduce antibiotic resistance by 90%” – a reviewer noted that
“The main complexity and uncertainty around successful completion of this milestone arises from the native fish microbiome and whether a CRISPR delivery tool can reach the target strain in question. Due to the framing of this milestone, should a single strain be very difficult to reach, the authors could simply switch to a different target strain if necessary. Additionally, the mode of CRISPR delivery is not prescribed in reaching this milestone, so the authors have a host of different techniques open to them, including conjugative delivery by a probiotic donor or delivery by engineered bacteriophage.”
Peer Review Results
Sequential Milestones vs. Independent Outcomes
In our expected utility forecasting framework, we defined two different ways that a proposal could structure its outcomes: as sequential milestones where each additional milestone builds off of the success of the previous one, or as independent outcomes where the success of one is not dependent on the success of the other(s). For proposals with sequential milestones in our pilot, we would expect the probability of success of milestone 2 to be less than the probability of success of milestone 1 and for the opposite to be true of their impact scores. For proposals with independent outcomes, we do not expect there to be a relationship between the probability of success and the impact scores of milestones 1 and 2. There are different equations for calculating the total expected utility, depending on the relationship between outcomes (see Appendix C).
For each of the proposals in our study, we categorized them based on whether they had sequential milestones or independent outcomes. This information was not shared with reviewers. Table 1 presents the average reviewer forecasts for each proposal. In general, milestones received higher scientific impact scores than social impact scores, which makes sense given the primarily academic focus of research proposals. For proposals 1 to 3, the probability of success of milestone 2 was roughly half of the probability of success of milestone 1; reviewers also gave milestone 2 higher scientific and social impact scores than milestone 1. This is consistent with our categorization of proposals 1 to 3 as sequential milestones.
Further Discussion on Designing and Categorizing Milestones
We originally categorized proposal 4’s milestones as sequential, but one reviewer gave milestone 2 a lower scientific impact score than milestone 1 and two reviewers gave it a lower social impact score. One reviewer also gave milestone 2 roughly the same probability of success as milestone 1. This suggests that proposal 4’s milestones can’t be considered strictly sequential.
The two milestones for proposal 4 were
- Milestone 1: Develop a tool that is able to perturb neurons in C. elegans and record from all neurons simultaneously, automated w/ microfluidics, and
- Milestone 2: Develop a model of the C. elegans nervous system that can predict what every neuron will do when stimulating one neuron with R2 > 0.8
The reviewer who gave milestone 2 a lower scientific impact score explained: “Given the wording of the milestone, I do not believe that if the scientific milestone was achieved, it would greatly improve our understanding of the brain.” Unlike proposals 1-3, in which milestone 2 was a scaled-up or improved-upon version of milestone 1, these milestones represent fundamentally different categories of output (general-purpose tool vs specific model). Thus, despite the necessity of milestone 1’s tool for achieving milestone 2, the reviewer’s response suggests that the impact of milestone 2 was being considered separately rather than cumulatively.
To properly address this case of sequential milestones with different types of outputs, we recommend that for all sequential milestones, latter milestones should be explicitly defined as inclusive of prior milestones. In the above example, this would imply redefining milestone 2 as “Complete milestone 1 and develop a model of the C. elegans nervous system…” This way, reviewers know to include the impact of milestone 1 in their assessment of the impact of milestone 2.
To help ensure that reviewers are aligned with program managers in how they interpret the proposal milestones (if they aren’t directly involved in defining milestones), we suggest that either reviewers be informed of how program managers are categorizing the proposal outputs so they can conduct their review accordingly or allow reviewers to decide the category (and thus how the total expected utility is calculated), whether individually or collectively or both.
We chose to use only two of the goals that proposal authors provided because we wanted to standardize the number of milestones across proposals. However, this may have provided an incomplete picture of the proposals’ goals, and thus an incomplete assessment of the proposals. We recommend that future implementations be flexible and allow the number of milestones to be determined based on each proposal’s needs. This would also help accommodate one of the reviewers’ suggestion that some milestones should be broken down into intermediary steps.
Importance of Reviewer Explanations
As one can tell from the above discussion, reviewers’ explanation of their forecasts were crucial to understanding how they interpreted the milestones. Reviewers’ explanations varied in length and detail, but the most insightful responses broke down their reasoning into detailed steps and addressed (1) ambiguities in the milestone and how they chose to interpret ambiguities if they existed, (2) the state of the scientific field and the maturity of different techniques that the authors propose to use, and (3) factors that improve the likelihood of success versus potential barriers or challenges that would need to be overcome.
Exponential Impact Scales Better Reflect the Real Distribution of Impact
The distribution of NIH and NSF proposal peer review scores tends to be skewed such that most proposals are rated above the center of the scale and there are few proposals rated poorly. However, other markers of scientific impact, such as citations (even with all of its imperfections), tend to suggest a long tail of studies with high impact. This discrepancy suggests that traditional peer review scoring systems are not well-structured to capture the nonlinearity of scientific impact, resulting in score inflation. The aggregation of scores at the top end of the scale also means that very negative scores have a greater impact than very positive scores when averaged together, since there’s more room between the average score and the bottom end of the scale. This can generate systemic bias against more controversial or risky proposals.
In our pilot, we chose to use an exponential scale with a base of 2 for impact to better reflect the real distribution of scientific impact. Using this exponential impact scale, we conducted a survey of a small pool of academics in the life sciences about how they would rate the impact of the average funded NIH R01 grant. They responded with an average scientific impact score 5 and an average social impact score of 3, which are much lower on our scale compared to traditional peer review scores4, suggesting that the exponential scale may be beneficial for avoiding score inflation and bunching at the top. In our pilot, the distribution of scientific impact scores was centered higher than 5, but still less skewed than NIH peer review scores for significance and innovation typically are. This partially reflects the fact that proposals were expected to be funded at one to two orders of magnitude more than NIH R01 proposals are, so impact should also be greater. The distribution of social impact scores exhibits a much wider spread and lower center.

Conclusion
In summary, expected utility forecasting presents a promising approach to improving the rigor of peer review and quantitatively defining the risk-reward profile of science proposals. Our pilot study suggests that this approach can be quite user-friendly for reviewers, despite its apparent complexity. Further study into how best to integrate forecasting into panel environments, define proposal milestones, and calibrate impact scales will help refine future implementations of this approach.
More broadly, we hope that this pilot will encourage more grantmaking institutions to experiment with innovative funding mechanisms. Reviewers in our pilot were more open-minded and quick-to-learn than one might expect and saw significant value in this unconventional approach. Perhaps this should not be so much of a surprise given that experimentation is at the heart of scientific research.
Interested grantmakers, both public and private, and policymakers are welcome to reach out to our team if interested in learning more or receiving assistance in implementing this approach.
Acknowledgements
Many thanks to Jordan Dworkin for being an incredible thought partner in designing the pilot and providing meticulous feedback on this report. Your efforts made this project possible!
Appendix A: Pilot Study Design
Our pilot study consisted of five proposals for life science-related Focused Research Organizations (FROs). These proposals were solicited from academic researchers by FAS as part of our advocacy for the concept of FROs. As such, these proposals were not originally intended as proposals for direct funding, and did not have as strict content requirements as traditional grant proposals typically do. Researchers were asked to submit one to two page proposals discussing (1) their research concept, (2) the motivation and its expected social and scientific impact, and (3) the rationale for why this research can not be accomplished through traditional funding channels and thus requires a FRO to be funded.
Permission was obtained from proposal authors to use their proposals in this study. We worked with proposal authors to define two milestones for each proposal that reviewers would assess: one that they felt confident that they could achieve and one that was more ambitious but that they still thought was feasible. In addition, due to the brevity of the proposals, we included an additional 1-2 pages of supplementary information and scientific context. Final drafts of the milestones and supplementary information were provided to authors to edit and approve. Because this pilot study could not provide any actual funding to proposal authors, it was not possible to solicit full length research proposals from proposal authors.
We recruited four to six reviewers for each proposal based on their subject matter expertise. Potential participants were recruited over email with a request to help review a FRO proposal related to their area of research. They were informed that the review process would be unconventional but were not informed of the study’s purpose. Participants were offered a small monetary compensation for their time.
Confirmed participants were sent instructions and materials for the review process on the same day and were asked to complete their review by the same deadline a month and a half later. Reviewers were told to assume that, if funded, each proposal would receive $50 million in funding over five years to conduct the research, consistent with the proposed model for FROs. Each proposal had two technical milestones, and reviewers were asked to answer the following questions for each milestone:
- Assuming that the proposal is funded by 2025, will the milestone be achieved before 2031?
- What will be the average scientific impact score, as judged in 2032, of accomplishing the milestone?
- What will be the average social impact score, as judged in 2032, of accomplishing the milestone?
The impact scoring system was explained to reviewers as follows:
Please consider the following in determining the impact score: the current and expected long-term social or scientific impact of a funded FRO’s outputs if a funded FRO accomplishes this milestone before 2030.
The impact score we are using ranges from 1 (low) to 10 (high). It is base 2 exponential, meaning that a proposal that receives a score of 5 has double the impact of a proposal that receives a score of 4, and quadruple the impact of a proposal that receives a score of 3. In a small survey we conducted of SMEs in the life sciences, they rated the scientific and social impact of the average NIH R01 grant — a federally funded research grant that provides $1-2 million for a 3-5 year endeavor — on this scale to be 5.2 ± 1.5 and 3.1 ± 1.3, respectively. The median scores were 4.75 and 3.00, respectively.
Below is an example of how a predicted impact score distribution (left) would translate into an actual impact distribution (right). You can try it out yourself with this interactive version (in the menu bar, click Runtime > Run all) to get some further intuition on how the impact score works. Please note that this is meant solely for instructive purposes, and the interface is not designed to match Metaculus’ interface.

The choice of an exponential impact scale reflects the tendency in science for a small number of research projects to have an outsized impact. For example, studies have shown that the relationship between the number of citations for a journal article and its percentile rank scales exponentially.
Scientific impact aims to capture the extent to which a project advances the frontiers of knowledge, enables new discoveries or innovations, or enhances scientific capabilities or methods. Though each is imperfect, one could consider citations of papers, patents on tools or methods, or users of software or datasets as proxies of scientific impact.
Social impact aims to capture the extent to which a project contributes to solving important societal problems, improving well-being, or advancing social goals. Some proxy metrics that one might use to assess a project’s social impact are the value of lives saved, the cost of illness prevented, the number of job-years of employment generated, economic output in terms of GDP, or the social return on investment.
You may consider any or none of these proxy metrics as a part of your assessment of the impact of a FRO accomplishing this milestone.
Reviewers were asked to submit their forecasts on Metaculus’ website and to provide their reasoning in a separate Google form. For question 1, reviewers were asked to respond with a single probability. For questions 2 and 3, reviewers were asked to provide their median, 25th percentile, and 75th percentile predictions, in order to generate a probability distribution. Metaculus’ website also included information on the resolution criteria of each question, which provided guidance to reviewers on how to answer the question. Individual reviewers were blind to other reviewers’ responses until after the submission deadline, at which point the aggregated results of all of the responses were made public on Metaculus’ website.
Additionally, in the Google form, reviewers were asked to answer a survey question about their experience: “What did you think about this review process? Did it prompt you to think about the proposal in a different way than when you normally review proposals? If so, how? What did you like about it? What did you not like? What would you change about it if you could?”
Some participants did not complete their review. We received 19 complete reviews in the end, with each proposal receiving three to six reviews.
Study Limitations
Our pilot study had certain limitations that should be noted. Since FAS is not a grantmaking institution, we could not completely reproduce the same types of research proposals that a grantmaking institution would receive nor the entire review process. We will highlight these differences in comparison to federal science agencies, which are our primary focus.
- Review Process: There are typically two phases to peer review at NIH and NSF. First, at least three individual reviewers with relevant subject matter expertise are assigned to read and evaluate a proposal independently. Then, a larger committee of experts is convened. There, the assigned reviewers present the proposal and their evaluation, and then the committee discusses and determines the final score for the proposal. Our pilot study only attempted to replicate the first phase of individual review.
- Sample Size: In our pilot, the sample size was quite small, since only five proposals were reviewed, and they were all in different subfields, so different reviewers were assigned to each proposal. NIH and NSF peer review committees typically focus on one subfield and review on the order of twenty or so proposals. The number of reviewers per proposal–three to six–in our pilot was consistent with the number of reviewers typically assigned to a proposal by NIH and NSF. Peer review committees are typically larger, ranging from six to twenty people, depending on the agency and the field.
- Proposals: The FRO proposals plus supplementary information were only two to four pages long, which is significantly shorter than the 12 to 15 page proposals that researchers submit for NIH and NSF grants. Proposal authors were asked to generally describe their research concept, but were not explicitly required to describe the details of the research methodology they would use or any preliminary research. Some proposal authors volunteered more information on this for the supplementary information, but not all authors did.
- Grant Size: For the FRO proposals, reviewers were asked to assume that funded proposals would receive $50 million over five years, which is one to two orders of magnitude more funding than typical NIH and NSF proposals.
Appendix B: Feedback on Study-Specific Implementation
In addition to feedback about the review framework, we received feedback on how we implemented our pilot study, specifically the instructions and materials for the review process and the submission platforms. This feedback isn’t central to this paper’s investigation of expected value forecasting, but we wanted to include it in the appendix for transparency.
Reviewers were sent instructions over email that outlined the review process and linked to Metaculus’ webpage for this pilot. On Metaculus’ website, reviewers could find links to the proposals on FAS’ website and the supplementary information in Google docs. Reviewers were expected to read those first and then read through the resolution criteria for each forecasting question before submitting their answers on Metaculus’ platform. Reviewers were asked to submit the explanations behind their forecasts in a separate Google form.
Some reviewers had no problem navigating the review process and found Metaculus’ website easy to use. However, feedback from other reviewers suggested that the different components necessary for the review were spread out over too many different websites, making it difficult for reviewers to keep track of where to find everything they needed.
Some had trouble locating the different materials and pieces of information needed to conduct the review on Metaculus’ website. Others found it confusing to have to submit their forecasts and explanations in two separate places. One reviewer suggested that the explanation of the impact scoring system should have been included within the instructions sent over email rather than in the resolution criteria on Metaculus’ website so that they could have read it before reading the proposal. Another reviewer suggested that it would have been simpler to submit their forecasts through the same Google form that they used to submit their explanations rather than through Metaculus’ website.
Based on this feedback, we would recommend that future implementations streamline their submission process to a single platform and provide a more extensive set of instructions rather than seeding information across different steps of the review process. Training sessions, which science funding agencies typically conduct, would be a good supplement to written instructions.
Appendix C: Total Expected Utility Calculations
To calculate the total expected utility, we first converted all of the impact scores into utility by taking two to the exponential of the impact score, since the impact scoring system is base 2 exponential:
Utility=2Impact Score.
We then were able to average the utilities for each milestone and conduct additional calculations.
To calculate the total utility of each milestone, ui, we averaged the social utility and the scientific utility of the milestone:
ui = (Social Utility + Scientific Utility)/2.
The total expected utility (TEU) of a proposal with two milestones can be calculated according to the general equation:
TEU = u1P(m1 ∩ not m2) + u2P(m2 ∩ not m1) + (u1+u2)P(m1m2),
where P(mi) represents the probability of success of milestone i and
P(m1 ∩ not m2) = P(m1) – P(m1 ∩ m2)
P(m2 ∩ not m1) = P(m2) – P(m1 ∩ m2).
For sequential milestones, milestone 2 is defined as inclusive of milestone 1 and wholly dependent on the success of milestone 1, so this means that
u2, seq = u1+u2
P(m2) = Pseq(m1 ∩ m2)
P(m2 ∩ not m1) = 0.
Thus, the total expected utility of sequential milestones can be simplified as
TEU = u1P(m1)-u1P(m2) + (u2, seq)P(m2)
TEU = u1P(m1) + (u2, seq-u1)P(m2)
This can be generalized to
TEUseq = Σi(ui, seq-ui-1, seq)P(mi).
Otherwise, the total expected utility can be simplified to
TEU = u1P(m1) + u2P(m2) – (u1+u2)P(m1 ∩ m2).
For independent outcomes, we assume
Pind(m1 ∩ m2) = P(m1)P(m2),
so
TEUind = u1P(m1) + u2P(m2) – (u1+u2)P(m1)P(m2).
To present the results in Tables 1 and 2, we converted all of the utility values back into the impact score scale by taking the log base 2 of the results.
Scaling AI Safely: Can Preparedness Frameworks Pull Their Weight?
A new class of risk mitigation policies has recently come into vogue for frontier AI developers. Known alternately as Responsible Scaling Policies or Preparedness Frameworks, these policies outline commitments to risk mitigations that developers of the most advanced AI models will implement as their models display increasingly risky capabilities. While the idea for these policies is less than a year old, already two of the most advanced AI developers, Anthropic and OpenAI, have published initial versions of these policies. The U.K. AI Safety Institute asked frontier AI developers about their “Responsible Capability Scaling” policies ahead of the November 2023 UK AI Safety Summit. It seems that these policies are here to stay.
The National Institute of Standards & Technology (NIST) recently sought public input on its assignments regarding generative AI risk management, AI evaluation, and red-teaming. The Federation of American Scientists was happy to provide input; this is the full text of our response. NIST’s request for information (RFI) highlighted several potential risks and impacts of potentially dual-use foundation models, including: “Negative effects of system interaction and tool use…chemical, biological, radiological, and nuclear (CBRN) risks…[e]nhancing or otherwise affecting malign cyber actors’ capabilities…[and i]mpacts to individuals and society.” This RFI presented a good opportunity for us to discuss the benefits and drawbacks of these new risk mitigation policies.
This report will provide some background on this class of risk mitigation policies (we use the term Preparedness Framework, for reasons to be described below). We outline suggested criteria for robust Preparedness Frameworks (PFs) and evaluate two key documents, Anthropic’s Responsible Scaling Policy and OpenAI’s Preparedness Framework, against these criteria. We claim that these policies are net-positive and should be encouraged. At the same time, we identify shortcomings of current PFs, chiefly that they are underspecified, insufficiently conservative, and address structural risks poorly. Improvement in the state of the art of risk evaluation for frontier AI models is a prerequisite for a meaningfully binding PF. Most importantly, PFs, as unilateral commitments by private actors, cannot replace public policy.
Motivation for Preparedness Frameworks
As AI labs develop potentially dual-use foundation models (as defined by Executive Order No. 14110, the “AI EO”) with capability, compute, and efficiency improvements, novel risks may emerge, some of them potentially catastrophic. Today’s foundation models can already cause harm and pose some risks, especially as they are more broadly used. Advanced large language models at times display unpredictable behaviors.
To this point, these harms have not risen to the level of posing catastrophic risks, defined here broadly as “devastating consequences for vast numbers of people.” The capabilities of models at the current state of the art simply do not imply levels of catastrophic risk above current non-AI related margins.1 However, as these models continue to scale in training compute, some speculate they may develop novel capabilities that could potentially be misused. The specific capabilities that will emerge from further scaling remain difficult to predict with confidence or certainty. Some analysis indicates that as training compute for AI models has doubled approximately every six months since 2015, performance on capability benchmarks has also steadily improved. While it’s possible that bigger models could lead to better performance, it wouldn’t be surprising if smaller models emerge with better capabilities, as despite years of research by machine learning theorists, our knowledge of just how the number of model parameters relates to model capabilities remains uncertain.
Nonetheless, as capabilities increase, risks may also increase, and new risks may appear. Executive Order 14110 (the Executive Order on Artificial Intelligence, or the “AI EO”) detailed some novel risks of potentially dual-use foundation models, including potential risks associated with chemical, biological, radiological, or nuclear (CBRN) risks and advanced cybersecurity risks. Other risks are more speculative, such as risks of model autonomy, loss of control of AI systems, or negative impacts on users including risks of persuasion.2 Without robust risk mitigations, it is plausible that increasingly powerful AI systems will eventually pose greater societal risks.
Other technologies that pose catastrophic risks, such as nuclear technologies, are heavily regulated in order to prevent those risks from resulting in serious harms. There is a growing movement to regulate development of potentially dual-use biotechnologies, particularly gain-of-function research on the most pathogenic microbes. Given the rapid pace of progress at the AI frontier, comprehensive government regulation has yet to catch up; private companies that develop these models are starting to take it upon themselves to prevent or mitigate the risks of advanced AI development.
Prevention of such novel and consequential risks requires developers to implement policies that address potential risks iteratively. That is where preparedness frameworks come in. A preparedness framework is used to assess risk levels across key categories and outline associated risk mitigations. As the introduction to OpenAI’s PF states, “The processes laid out in each version of the Preparedness Framework will help us rapidly improve our understanding of the science and empirical texture of catastrophic risk, and establish the processes needed to protect against unsafe development.” Without such processes and commitments, the tendency to prioritize speed over safety concerns might prevail. While the exact consequences of failing to mitigate these risks are uncertain, they could potentially be significant.
Preparedness frameworks are limited in scope to catastrophic risks. These policies aim to prevent the worst conceivable outcomes of the development of future advanced AI systems; they are not intended to cover risks from existing systems. We acknowledge that this is an important limitation of preparedness frameworks. Developers can and should address both today’s risks and future risks at the same time; preparedness frameworks attempt to address the latter, while other “trustworthy AI” policies attempt to address a broader swathe of risks. For instance, OpenAI’s “Preparedness” team sits alongside its “Safety Systems” team, which “focuses on mitigating misuse of current models and products like ChatGPT.”
A note about terminology: The term “Responsible Scaling Policy” (RSP) is the term that took hold first, but it presupposes scaling of compute and capabilities by default. “Preparedness Framework” (PF) is a term coined by OpenAI, and it communicates the idea that the company needs to be prepared as its models approach the level of artificial general intelligence. Of the two options, “Preparedness Framework” communicates the essential idea more clearly: developers of potentially dual-use foundation models must be prepared for and mitigate potential catastrophic risks from development of these models.
The Industry Landscape
In September of 2023, ARC Evals (now METR, “Model Evaluation & Threat Research”) published a blog post titled “Responsible Scaling Policies (RSPs).” This post outlined the motivation and basic structure of an RSP, and revealed that ARC Evals had helped Anthropic write its RSP (version 1.0) which had been released publicly a few days prior. (ARC Evals had also run pre-deployment evaluations on Anthropic’s Claude model and OpenAI’s GPT-4.) And in December 2023, OpenAI published its Preparedness Framework in beta; while using new terminology, this document is structurally similar to ARC Evals’ outline of the structure of an RSP. Both OpenAI and Anthropic have indicated that they plan to update their PFs with new information as the frontier of AI development advances.
Not every AI company should develop or maintain a preparedness framework. Since these policies relate to catastrophic risk from models with advanced capabilities, only those developers whose models could plausibly attain those capabilities should use PFs. Because these advanced capabilities are associated with high levels of training compute, a good interim threshold for who should develop a PF could be the same as the AI EO threshold for potentially dual-use foundation models; that is, developers of models trained on over 10^26 FLOPS (or October 2023-equivalent level of compute adjusted for compute efficiency gains).3 Currently, only a handful of developers have models that even approach this threshold. This threshold should be subject to change, like that of the AI EO, as developers continue to push the frontier (e.g. by developing more efficient algorithms or realizing other compute efficiency gains).
While several other companies published “Responsible Capability Scaling” documents ahead of the UK AI Safety Summit, including DeepMind, Meta, Microsoft, Amazon, and Inflection AI, the rest of this report focuses primarily on OpenAI’s PF and Anthropic’s RSP.
Weaknesses of Preparedness Frameworks
Preparedness frameworks are not panaceas for AI-associated risks. Even with improvements in specificity, transparency, and strengthened risk mitigations, there are important weaknesses to the use of PFs. Here we outline a couple weaknesses of PFs and possible answers to them.
1. Spirit vs. text: PFs are voluntary commitments whose success depends on developers’ faithfulness to their principles.
Current risk thresholds and mitigations are defined loosely. In Anthropic’s RSP, for instance, the jump from the current risk level posed by Claude 2 (its state of the art model) to the next risk level is defined in part by the following: “Access to the model would substantially increase the risk of catastrophic misuse, either by proliferating capabilities, lowering costs, or enabling new methods of attack….” A “substantial increase” is not well-defined. This ambiguity leaves room for interpretation; since implementing risk mitigations can be costly, developers could have an incentive to take advantage of such ambiguity if they do not follow the spirit of the policy.
This concern about the gap between following the spirit of the PF and following the text might be somewhat eased with more specificity about risk thresholds and associated mitigations, and especially with more transparency and public accountability to these commitments.
To their credit, OpenAI’s PF and Anthropic’s RSP show a serious approach to the risks of developing increasingly advanced AI systems. OpenAI’s PF includes a commitment to fine-tune its models to better elicit capabilities along particular risk categories, then evaluate “against these enhanced models to ensure we are testing against the ‘worst case’ scenario we know of.” They also commit to triggering risk mitigations “when any of the tracked risk categories increase in severity, rather than only when they all increase together.” And Anthropic “commit[s] to pause the scaling and/or delay the deployment of new models whenever our scaling ability outstrips our ability to comply with the safety procedures for the corresponding ASL [AI Safety Level].” These commitments are costly signals that these developers are serious about their PFs.
2. Private commitment vs. public policy: PFs are unilateral commitments that individual developers take on; we might prefer more universal policy (or regulatory) approaches.
Private companies developing AI systems may not fully account for broader societal risks. Consider an analogy to climate change—no single company’s emissions are solely responsible for risks like sea level rise or extreme weather. The risk comes from the aggregate emissions of all companies. Similarly, AI developers may not consider how their systems interact with others across society, potentially creating structural risks. Like climate change, the societal risks from AI will likely come from the cumulative impact of many different systems. Unilateral commitments are poor tools to address such risks.
Furthermore, PFs might reduce the urgency for government intervention. By appearing safety-conscious, developers could diminish the perceived need for regulatory measures. Policymakers might over-rely on self-regulation by AI developers, potentially compromising public interest for private gains.
Policy can and should step into the gap left by PFs. Policy is more aligned to the public good, and as such is less subject to competing incentives. And policy can be enforced, unlike voluntary commitments. In general, preparedness frameworks and similar policies help hold private actors accountable to their public commitments; this effect is stronger with more specificity in defining risk thresholds, better evaluation methods, and more transparency in reporting. However, these policies cannot and should not replace government action to reduce catastrophic risks (especially structural risks) of frontier AI systems.
Suggested Criteria for Robust Preparedness Frameworks
These criteria are adapted from the ARC Evals post, Anthropic’s RSP, and OpenAI’s PF. Broadly, they are aspirational; no existing preparedness framework meets all or most of these criteria.
For each criterion, we explain the key considerations for developers adopting PFs. We analyze OpenAI’s PF and Anthropic’s RSP to illustrate the strengths and shortcomings of their approaches. Again, these policies are net-positive and should be encouraged. They demonstrate costly unilateral commitments to measuring and addressing catastrophic risk from their models; they meaningfully improve on the status quo. However, these initial PFs are underspecified and insufficiently conservative. Improvement in the state of the art of risk evaluation and mitigation, and subsequent updates, would make them more robust.
1. Preparedness frameworks should cover the breadth of potential catastrophic risks of developing frontier AI models.
These risks may include:
- CBRN risks. Advanced AI models might enable or aid the creation of chemical, biological, radiological, and/or nuclear threats. OpenAI’s PF includes CBRN risks as their own category; Anthropic’s RSP includes CBRN risks within risks from misuse.
- Model autonomy. Anthropic’s RSP defines this as: “risk that a model is capable of accumulating resources (e.g. through fraud), navigating computer systems, devising and executing coherent strategies, and surviving in the real world while avoiding being shut down.” OpenAI’s PF defines this as: “[enabling] actors to run scaled misuse that can adapt to environmental changes and evade attempts to mitigate or shut down operations. Autonomy is also a prerequisite for self-exfiltration, self-improvement, and resource acquisition.” OpenAI’s definition includes risk from misuse of a model in model autonomy; Anthropic’s focuses on risks from the model itself.
- Potential for misuse, including cybersecurity and critical infrastructure. OpenAI’s PF defines cybersecurity risk (in their own category) as “risks related to use of the model for cyber-exploitation to disrupt confidentiality, integrity, and/or availability of computer systems.” Anthropic’s RSP mentions cyber risks in the context of risks from misuse.
- Adverse impact on human users. OpenAI’s PF includes a tracked risk category for persuasion: “Persuasion is focused on risks related to convincing people to change their beliefs (or act on) both static and interactive model-generated content.” Anthropic’s RSP does not mention persuasion per se.
- Unknown future risks. As developers create and evaluate more highly capable models, new risk vectors might become clear. PFs should acknowledge that unknown future risks are possible with any jump in capabilities. OpenAI’s PF includes a commitment to tracking “currently unknown categories of catastrophic risk as they emerge.”
Preparedness frameworks should apply to catastrophic risks in particular because they govern the scaling of capabilities of the most advanced AI models, and because catastrophic risks are of the highest consequence to such development. PFs are one tool among many that developers of the most advanced AI models should use to prevent harm. Developers of advanced AI models tend to also have other “trustworthy AI” policies, which seek to prevent and address already-existing risks such as harmful outputs, disinformation, and synthetic sexual content. Despite PFs’ focus on potentially catastrophic risks, faithfully applying PFs may help developers catch many other kinds of risks as well, since they involve extensive evaluation for misuse potential and adverse human impacts.
2. Preparedness frameworks should define the developer’s acceptable risk level (“risk appetite”) in terms of likelihood and severity of risk, in accordance with the NIST AI Risk Management Framework, section Map 1.5.
Neither OpenAI nor Anthropic has publicly declared their risk appetite. This is a nascent field of research, as these risks are novel and perhaps less predictable than eg. nuclear accident risk.5 NIST and other standard-setting bodies will be crucial in developing AI risk metrology. For now, PFs should state developers’ risk appetites as clearly as possible, and update them regularly with research advances.6
AI developers’ risk appetites might be different than a regulatory risk appetite. Developers should elucidate their risk appetite in quantitative terms so their PFs can be evaluated accordingly. As in the case of nuclear technology, regulators may eventually impose risk thresholds on frontier AI developers. At this point, however, there is no standard, scientifically-grounded approach to measuring the potential for catastrophic AI risk; this has to start with the developers of the most capable AI models.
3. Preparedness frameworks should clearly define capability levels and risk thresholds. Risk thresholds should be quantified robustly enough to hold developers accountable to their commitments.
OpenAI and Anthropic both outline qualitative risk thresholds corresponding with different categories of risk. For instance, in OpenAI’s PF, the High risk threshold in the CBRN category reads: “Model enables an expert to develop a novel threat vector OR model provides meaningfully improved assistance that enables anyone with basic training in a relevant field (e.g., introductory undergraduate biology course) to be able to create a CBRN threat.” And Anthropic’s RSP defines the ASL-3 [AI Safety Level] threshold as: “Low-level autonomous capabilities, or access to the model would substantially increase the risk of catastrophic misuse, either by proliferating capabilities, lowering costs, or enabling new methods of attack, as compared to a non-LLM baseline of risk.”
These qualitative thresholds are under-specified; reasonable people are likely to differ on what “meaningfully improved assistance” looks like, or a “substantial increase [in] the risk of catastrophic misuse.” In PFs, these thresholds should be quantified to the extent possible.
To be sure, the AI development research community currently lacks a good empirical understanding of the likelihood or quantification of frontier AI-related risks. Again, this is a novel science that needs to be developed with input from both the private and public sectors. Since this science is still developing, it is natural to want to avoid too much quantification. A conceivable failure mode is that developers “check the boxes,” which may become obsolete quickly, in lieu of using their judgment to determine when capabilities are dangerous enough to warrant higher risk mitigations. Again, as research improves, we should expect to see improvements in PFs’ specification of risk thresholds.
4. Preparedness frameworks should include detailed evaluation procedures for AI models, ensuring comprehensive risk assessment within a developer’s tolerance.
Anthropic and OpenAI both have room for improvement on detailing their evaluation procedures. Anthropic’s RSP includes evaluation procedures for model autonomy and misuse risks. Its evaluation procedures for model autonomy are impressively detailed, including clearly defined tasks on which it will evaluate its models. Its evaluation procedures for misuse risk are much less well-defined, though it does include the following note: “We stress that this will be hard and require iteration. There are fundamental uncertainties and disagreements about every layer…It will take time, consultation with experts, and continual updating.” And OpenAI’s PF includes a “Model Scorecard,” a mock evaluation of an advanced AI model. This model scorecard includes the hypothetical results of various evaluations in all four of their tracked risk categories; it does not appear to be a comprehensive list of evaluation procedures.
Again, the science of AI model evaluation is young. The AI EO directs NIST to develop red-teaming guidance for developers of potentially dual-use foundation models. NIST, along with private actors such as METR and other AI evaluators, will play a crucial role in creating and testing red-teaming practices and model evaluations that elicit all relevant capabilities.
5. For different risk thresholds, preparedness frameworks should identify and commit to pre-specified risk mitigations.
Classes of risk mitigations may include:
- Restricting development and/or deployment of models at different risk thresholds
- Enhanced cybersecurity measures, to prevent exfiltration of model weights
- Internal compartmentalization and tiered access
- Interacting with the model only in restricted environments
- Deleting model weights8
Both OpenAI’s PF and Anthropic’s RSP commit to a number of pre-specified risk mitigations for different thresholds. For example, for what Anthropic calls “ASL-2” models (including its most advanced model, Claude 2), they commit to measures including publishing model cards, providing a vulnerability reporting mechanism, enforcing an acceptable use policy, and more. Models at higher risk thresholds (what Anthropic calls “ASL-3” and above) have different, more stringent risk mitigations, including “limit[ing] access to training techniques and model hyperparameters…” and “implement[ing] measures designed to harden our security…”
Risk mitigations can and should differ in approaches to development versus deployment. There are different levels of risk associated with possessing models internally and allowing external actors to interact with them. Both OpenAI’s PF and Anthropic’s RSP include different risk mitigation approaches for development and deployment. For example, OpenAI’s PF restricts deployment of models such that “Only models with a post-mitigation score of “medium” or below can be deployed,” whereas it restricts development of models such that “Only models with a post-mitigation score of “high” or below can be developed further.”
Mitigations should be defined as specifically as possible, with the understanding that as the state of the art changes, this too is an area that will require periodic updates. Developers should include some room for judgment here.
6. Preparedness frameworks’ pre-specified risk mitigations must effectively address potentially catastrophic risks.
Having confidence that the risk mitigations do in fact address potential catastrophic risks is perhaps the most important and difficult aspect of a PF to evaluate. Catastrophic risk from AI is a novel and speculative field; evaluating AI capabilities is a science in its infancy; and there are no empirical studies of the effectiveness of risk mitigations preventing such risks. Given this uncertainty, frontier AI developers should err on the side of caution.
Both OpenAI and Anthropic should be more conservative in their risk mitigations. Consider OpenAI’s commitment to restricting development: “[I]f we reach (or are forecasted to reach) ‘critical’ pre-mitigation risk along any risk category, we commit to ensuring there are sufficient mitigations in place…for the overall post-mitigation risk to be back at most to ‘high’ level.” To understand this commitment, we have to look at their threshold definitions. Under the Model Autonomy category, the “critical” threshold in part includes: “model can self-exfiltrate under current prevailing security.” Setting aside that this threshold is still quite vague and difficult to evaluate (and setting aside the novelty of this capability), a model that approaches or exceeds this threshold by definition can self-exfiltrate, rendering all other risk mitigations ineffective. A more robust approach to restricting development would not permit training or possessing a model that comes close to exceeding this threshold.
As for Anthropic, consider their threshold for “ASL-3,” which reads in part: “Access to the model would substantially increase the risk of catastrophic misuse…” The risk mitigations for ASL-3 models include the following: “Harden security such that non-state attackers are unlikely to be able to steal model weights and advanced threat actors (e.g. states) cannot steal them without significant expense.” While an admirable approach to development of potentially dual-use foundation models, assuming state actors seek out tools whose misuse involves catastrophic risk, a more conservative mitigation would entail hardening security such that it is unlikely that any actor, state or non-state, could steal the model weights of such a model.9
7. Preparedness frameworks should combine credible risk mitigation commitments with governance structures that ensure these commitments are fulfilled.
Preparedness Frameworks should detail governance structures that incentivize actually undertaking pre-committed risk mitigations when thresholds are met. Other incentives, including profit and shareholder value, sometimes conflict with risk management.
Anthropic’s RSP includes a number of procedural commitments meant to enhance the credibility of its risk mitigation commitments. For example, Anthropic commits to proactively planning to pause scaling of its models,10 publicly sharing evaluation results, and appointing a “Responsible Scaling Officer.” However, Anthropic’s RSP also includes the following clause: “[I]n a situation of extreme emergency, such as when a clearly bad actor (such as a rogue state) is scaling in so reckless a manner that it is likely to lead to lead to imminent global catastrophe if not stopped…we could envisage a substantial loosening of these restrictions as an emergency response…” This clause potentially undermines the credibility of Anthropic’s other commitments in the RSP, if at any time it can point to another actor who in its view is scaling recklessly.
OpenAI’s PF also outlines commendable governance measures, including procedural commitments, meant to enhance its risk mitigation credibility. It summarizes its operation structure: “(1) [T]here is a dedicated team “on the ground” focused on preparedness research and monitoring (Preparedness team), (2) there is an advisory group (Safety Advisory Group) that has a sufficient diversity of perspectives and technical expertise to provide nuanced input and recommendations, and (3) there is a final decision-maker (OpenAI Leadership, with the option for the OpenAI Board of Directors to overrule).”
8. Preparedness frameworks should include a mechanism for regular updates to the framework itself, in light of ongoing research and advances in AI.
Both OpenAI’s PF and Anthropic’s RSP acknowledge the importance of regular updates. This is reflected in both of these documents’ names: Anthropic labels its RSP as “Version 1.0,” while OpenAI’s PF is labeled as “(Beta).”
Anthropic’s RSP includes an “Update Process” that reads in part: “We expect most updates to this process to be incremental…as we learn more about model safety features or unexpected capabilities…” This language directly commits Anthropic to changing its RSP as the state of the art changes. OpenAI references updates throughout its PF, notably committing to updating its evaluation methods and rubrics (“The Scorecard will be regularly updated by the Preparedness team to help ensure it reflects the latest research and findings”).
9. For models with risk above the lowest level, most evaluation results and methods should be public, including any performed mitigations.
Publishing model evaluations and mitigations is an important tool for holding developers accountable to their PF commitments. Sensitivity about the level of transparency is key. For example, full information about evaluation methodology and risk mitigations could be exploited by malicious actors. Anthropic’s RSP takes a balanced approach in committing to “[p]ublicly share evaluation results after model deployment where possible, in some cases in the initial model card, in other cases with a delay if it serves a broad safety interest.” OpenAI’s PF does not commit to publishing its Model Scorecards, but OpenAI has since published related research on whether its models aid the creation of biological threats.
Conclusion
Preparedness frameworks represent a promising approach for AI developers to voluntarily commit to robust risk management practices. However, current versions have weaknesses—particularly their lack of specificity in risk thresholds, insufficiently conservative risk mitigation approaches, and inadequacy in addressing structural risks. Frontier AI developers without PFs should consider adopting them, and OpenAI and Anthropic should update their policies to strengthen risk mitigations and include more specificity.
Strengthening preparedness frameworks will require advancing AI safety science to enable precise risk quantification and develop new mitigations. NIST, academics, and companies plan to collaborate to measure and model frontier AI risks. Policymakers have a crucial opportunity to adapt regulatory approaches from other high-risk technologies like nuclear power to balance AI innovation and catastrophic risk prevention. Furthermore, standards bodies could develop more robust AI evaluations best practices, including guidance for third-party auditors.
Overall the AI community must view safety as an intrinsic priority, not just private actors creating preparedness frameworks. All stakeholders, including private companies, academics, policymakers and civil society organizations have roles to play in steering AI development toward societally beneficial outcomes. Preparedness frameworks are one tool, but not sufficient absent more comprehensive, multi-stakeholder efforts to scale AI safely and for the public good.
Many thanks to Madeleine Chang, Di Cooke, Thomas Woodside, and Felipe Calero Forero for providing helpful feedback.
Working with academics: A primer for U.S. government agencies
Collaboration between federal agencies and academic researchers is an important tool for public policy. By facilitating the exchange of knowledge, ideas, and talent, these partnerships can help address pressing societal challenges. But because it is rarely in either party’s job description to conduct outreach and build relationships with the other, many important dynamics are often hidden from view. This primer provides an initial set of questions and topics for agencies to consider when exploring academic partnership.
Why should agencies consider working with academics?
- Accessing the frontier of knowledge: Academics are at the forefront of their fields, and their insights can provide fresh perspectives on agency work.
- Leveraging innovative methods: From data collection to analysis, academics may have access to the new technologies and approaches that can enhance governmental efforts.
- Enhancing credibility: By incorporating research and external expertise, policy decisions gain legitimacy and trust, and align with evidence-based policy guidelines.
- Generating new insights: Collaboration between agencies and outside researchers can lead to discoveries that advance both knowledge and practice..
- Developing human capital: Collaboration can enhance the skills of both public servants and academics, creating a more robust workforce and potentially leading to longer-term talent exchange.
What considerations may arise when working with academics?
- Designing collaborative relationships that are targeted to the incentives of both the agency and the academic partners;
- Navigating different rules and regulations that may impact academic-government collaboration, e.g. rules on external advisory groups, university guidelines, and data/information confidentiality;
- Understanding the different structures and mechanisms that enable academic-government collaboration, such as sabbaticals, fellowships, consultancies, grants, or contracts;
- Identifying and approaching the right academics for different projects and needs.
Academic faculty progress through different stages of professorship — typically assistant, associate, and full — that affect their research and teaching expectations and opportunities. Assistant professors are tenure-track faculty who need to secure funding, publish papers, and meet the standards for tenure. Associate professors have job security and academic freedom, but also more mentoring and leadership responsibilities; associate professors are typically tenured, though this is not always the case. Full professors are senior faculty who have a high reputation and recognition in their field, but also more demands for service and supervision. The nature of agency-academic collaboration may depend on the seniority of the academic. For example, junior faculty may be more available to work with agencies, but primarily in contexts that will lead to traditional academic outputs; while senior faculty may be more selective, but their academic freedom will allow for less formal and more impact-oriented work.
Soft money positions are those that depend largely or entirely on external funding sources, typically research grants, to support the salary and expenses of the faculty. Hard money positions are those that are supported by the academic institution’s central funds, typically tied to more explicit (and more expansive) expectations for teaching and service than soft-money positions. Faculty in soft money positions may face more pressure to secure funding for research, while faculty in hard money positions may have more autonomy in their research agenda but more competing academic activities. Federal agencies should be aware of the funding situation of the academic faculty they collaborate with, as it may affect their incentives and expectations for agency engagement.
A sabbatical is a period of leave from regular academic duties, usually for one or two semesters, that allows faculty to pursue an intensive and unstructured scope of work — this can include research in their own field or others, as well as external engagements or tours of service with non-academic institutions . Faculty accrue sabbatical credits based on their length and type of service at the university, and may apply for a sabbatical once they have enough credits. The amount of salary received during a sabbatical depends on the number of credits and the duration of the leave. Federal agencies may benefit from collaborating with academic faculty who are on sabbatical, as they may have more time and interest to devote to impact-focused work.
Consulting limits & outside activity limits are policies that regulate the amount of time that academic faculty can spend on professional activities outside their university employment. These policies are intended to prevent conflicts of commitment or interest that may interfere with the faculty’s primary obligations to the university, such as teaching, research, and service, and the specific limits vary by university. Federal agencies may need to consider these limits when engaging academic faculty in ongoing or high-commitment collaborations.
Some academic faculty are paid on a 9-month basis, meaning that they receive their annual salary over nine months and have the option to supplement their income with external funding or other activities during the summer months. Other faculty are paid on a 12-month basis, meaning that they receive their annual salary over twelve months and have less flexibility to pursue outside opportunities. Federal agencies may need to consider the salary structure of the academic faculty they work with, as it may affect their availability to engage on projects and the optimal timing with which they can do so.
Informal advising
Advisory relationships consist of an academic providing occasional or periodic guidance to a federal agency on a specific topic or issue, without being formally contracted or compensated. This type of collaboration can be useful for agencies that need access to cutting-edge expertise or perspectives, but do not have a formal deliverable in mind.
Academic considerations
- Career stage: Informal advising can be done by faculty at any level of seniority, as long as they have relevant knowledge and experience. However, junior faculty may be more cautious about engaging in informal advising, as it may not count towards their tenure or promotion criteria. Senior faculty, who have established expertise and secured tenure, may be more willing to engage in impact-focused advisory relationships.
- Incentives: Advisory relationships can offer some benefits for faculty regardless of career stage, such as expanding their network, increasing their visibility, and influencing policy or practice. Informal advising can also stimulate new research questions, and create opportunities for future access to data or resources. Some agencies may also acknowledge the contributions of academic advisors in their reports or publications, which may enhance researchers’ academic reputation.
- Conflicts of interest: Informal advising may pose potential conflicts of interest or commitment for faculty, especially if they have other sources of funding or collaboration related to the same topic or issue. Faculty may need to consult with their department chair or dean before engaging in formal conversations, and should also avoid any activities that may compromise their objectivity, integrity, or judgment in conducting or reporting their university research.
- Timing: Faculty on 9-month salaries might be more willing/able to engage during summer months, when they have minimal teaching requirements and are focused on research and impact outputs.
Regulatory & structural considerations
- Contracting: An advisory relationship may not require a formal agreement or contract between the agency and the academic. For some topics or agencies, however, it may require a non-disclosure agreement or consulting agreement if the agency wants to ensure the exclusivity or confidentiality of the conversation.
- Advisory committee rules: Depending on the scope and scale of the academic engagement, agencies should be sure to abide by Federal Advisory Committee Act regulations. With informal one-on-one conversations that are focused on education and knowledge exchange, this is unlikely to be an issue.
- University approval: An NDA or consulting agreement may require approval from the university’s office of sponsored programs or office of technology transfer before engaging in informal advising. These offices may review and approve the agreement between the agency and the academic institution, ensuring compliance with university policies and regulations.
- Compensation: Informal advising typically does not involve compensation for the academic, but it may involve reimbursement for travel or other expenses related to the advisory role. This work is unlikely to count towards the consulting limit for faculty, but it may count towards the outside professional activity limit, depending on the nature and frequency of the advising.
Federal agencies and academic institutions are subject to various laws and regulations that affect their research collaboration, and the ownership and use of the research outputs. Key legislation includes the Federal Advisory Committee Act (FACA), which governs advisory committees and ensures transparency and accountability; the Federal Acquisition Regulation (FAR), which controls the acquisition of supplies and services with appropriated funds; and the Federal Grant and Cooperative Agreement Act (FGCAA), which provides criteria for distinguishing between grants, cooperative agreements, and contracts. Agencies should ensure that collaborations are structured in accordance with these and other laws.
Federal agencies may use various contracting mechanisms to engage researchers from non-federal entities in collaborative roles. These mechanisms include the IPA Mobility Program, which allows the temporary assignment of personnel between federal and non-federal organizations; the Experts & Consultants authority, which allows the appointment of qualified experts and consultants to positions that require only intermittent and/or temporary employment; and Cooperative Research and Development Agreements (CRADAs), which allow agencies to enter into collaborative agreements with non-federal partners to conduct research and development projects of mutual interest.
Offices of Sponsored Programs are units within universities that provide administrative support and oversight for externally funded research projects. OSPs are responsible for reviewing and approving proposals, negotiating and accepting awards, ensuring compliance with sponsor and university policies and regulations, and managing post-award activities such as reporting, invoicing, and auditing. Federal agencies typically interact with OSPs as the authorized representative of the university in matters related to sponsored research.
When engaging with academics, federal agencies may use NDAs to safeguard sensitive information. Agencies each have their own rules and procedures for using and enforcing NDAs involving their grantees and contractors. These rules and procedures vary, but generally require researchers to sign an NDA outlining rights and obligations relating to classified information, data, and research findings shared during collaborations.
Study groups
A study group is a type of collaboration where an academic participates in a group of experts convened by a federal agency to conduct analysis or education on a specific topic or issue. The study group may produce a report or hold meetings to present their findings to the agency or other stakeholders. This type of collaboration can be useful for agencies that need to gather evidence or insights from multiple sources and disciplines with expertise relevant to their work.
Academic considerations
- Career stage: Faculty at any level of seniority can participate in a study group, but junior faculty may be more selective about joining, as they have limited time and resources to devote to activities that may not count towards their tenure or promotion criteria. Senior faculty may be more willing to join a study group, as they have more established expertise and reputation, and may seek to have more impact on policy or practice.
- Soft vs. hard money: Faculty in soft money positions, where their salary and research expenses depend largely on external funding sources, may be more interested in joining a study group if it provides funding or other resources that support their research. Faculty in hard money positions, where their salary and research expenses are supported by institutional funds, may be less motivated by funding, but more by the recognition and impact that comes from participating.
- Incentives: Study groups can offer some benefits for faculty, such as expanding their network, increasing their visibility, and influencing policy or practice. Study groups can also stimulate new research ideas or questions for faculty, and create opportunities for future access to data or resources. Some study groups may also result in publication of output or other forms of recognition (e.g., speaking engagements) that may enhance the academic reputation of the faculty.
- Conflicts of interest: Study groups may pose potential conflicts of interest or commitment for academics, especially if they have other sources of funding related to the same topic. Faculty may also be cautious about entering into more formal agreements if it may impact their ability to apply for & receive federal research funding in the future. Agencies should be aware of any such impacts of academic participation, and faculty should be encouraged to consult with their department chair or dean before joining a study group.
Regulatory & structural considerations
- Contracting and compensation: The optimal contracting mechanism for a study group will depend on the agency, the topic, and the planned output of the group. Some possible contracting mechanisms are extramural grants, service contracts, cooperative agreements, or memoranda of understanding. The mechanism will determine the amount and type of compensation that participants (or the organizing body) receive, and could include salary support, travel reimbursement, honoraria, or overhead costs.
- Advisory committee rules: When setting up study groups, agencies should work carefully to ensure that the structure abides by Federal Advisory Committee Act regulations. To ensure that study groups are distinct from Advisory Committees , these groups should be limited in size, and should be tasked with providing knowledge, research, and education — rather than specific programmatic guidance — to agency partners.
- University approval: Depending on the contracting mechanism and the compensation involved, academic participants may need to obtain approval from their university’s office of sponsored programs or office of technology transfer before joining a study group. These offices may review the terms and conditions of the agreement between the agency and the academic institution, such as the scope of work, the budget, and the reporting requirements.
Case study
In 2022, the National Science Foundation (NSF) awarded the National Bureau of Economic Research (NBER) a grant to create the EAGER: Place-Based Innovation Policy Study Group. This group, led by two economists with expertise in entrepreneurship, innovation, and regional development — Jorge Guzman from Columbia University and Scott Stern from MIT — aimed to provide “timely insight for the NSF Regional Innovation Engines program.” During Fall 2022, the group met regularly with NSF staff to i) provide an assessment of the “state of knowledge” of place-based innovation ecosystems, ii) identify the insights of this research to inform NSF staff on design of their policies, and iii) surface potential means by which to measure and evaluate place-based innovation ecosystems on a rigorous and ongoing basis. Several of the academic leads then completed a paper synthesizing the opportunities and design considerations of the regional innovation engine model, based on the collaborative exploration and insights developed throughout the year. In this case, the study group was structured as a grant, with funding provided to the organizing institution (NBER) for personnel and convening costs. Yet other approaches are possible; for example, NSF recently launched a broader study group with the Institute for Progress, which is structured as a no-cost Other Transaction Authority contract.
Collaborative research
Active collaboration covers scenarios in which an academic engages in joint research with a federal agency, either as a co-investigator, a subrecipient, a contractor, or a consultant. This type of collaboration can be useful for agencies that need to leverage the expertise, facilities, data, or networks of academics to conduct research that advances their mission, goals, or priorities.
Academic considerations
- Career stage: Collaborative research is likely to be attractive to junior faculty, who are seeking opportunities to access data that might not be otherwise available, and to foster new relationships with partners. This is particularly true if there is a commitment that findings or evaluations will be publishable, and if the collaboration does not interfere with teaching and service obligations. Collaborative projects are also likely to be of interest to senior faculty — if work aligns with their established research agenda — and publication of findings may be (slightly) less of a requirement.
- Soft vs. hard money: Researchers on hard money contracts, where their salary and research expenses are supported by institutional funds, may be more motivated by the opportunity to use and publish internal data from the agency. Researchers on soft money contracts, where their salary and research expenses depend largely on external funding sources, may be more motivated by the availability of grants and financial support from the agency.
- Timing: Depending on the scope of the collaboration, and the availability of funding for the researcher, efforts could be targeted for academics’ summer months or their sabbaticals. Alternatively, collaborative research could be integrated into the regular academic year, as part of the researcher’s ongoing research activities.
- Incentives: As mentioned above, collaborative research can offer some benefits for faculty, such as access to data and information, publication opportunities, funding sources, and partnership networks. Collaborative research can also provide faculty with more direct and immediate impact on policy or practice, as well as recognition from the agency and stakeholders (and, perhaps to a lesser extent, the academic community).
Regulatory & structural considerations
- Contracting: The contracting requirements for collaborative research will vary greatly depending on the structure and scope of the collaboration, the partnering agency, and the use of internal government data or resources. Readers are encouraged to explore agency-specific guidance when considering the ideal mechanism for a given project. Some possible contracting mechanisms are extramural grants, service contracts, or cooperative research and development agreements. Each mechanism has different terms and conditions regarding the scope of work, the budget, the intellectual property rights, the reporting requirements, and the oversight responsibilities.
- Regulatory compliance: Collaborative research involving both governmental and non-governmental partners will require compliance with various laws, regulations, and authorities. These include but are not limited to:
- Federal Acquisition Regulation (FAR), which establishes the policies and procedures for acquiring supplies and services with appropriated funds;
- Federal Grant and Cooperative Agreement Act (FGCAA), which provides criteria for determining whether to use a grant or a cooperative agreement to provide assistance to non-federal entities;
- Other Transaction Authority (OTA), a contracting mechanism that provides (most) agencies with the ability to enter into flexible research & development agreements that are not subject to the regulations on standard contracts, grants, or cooperative agreements
- OMB’s Uniform Guidance, which set forth the administrative requirements, cost principles, and audit requirements for federal awards
- Bayh-Dole Act, which allows academic institutions to retain title to inventions made with federal funding, subject to certain conditions and obligations.
- Collaborative research may also require compliance with ethical standards and guidelines for human subjects research, such as the Belmont Report and the Common Rule.
Case studies
External collaboration between academic researchers and government agencies has repeatedly proven fruitful for both parties. For example, in May 2020, the Rhode Island Department of Health partnered with researchers at Brown University’s Policy Lab to conduct a randomized controlled trial evaluating the effectiveness of different letter designs in encouraging COVID-19 testing. This study identified design principles that improved uptake of testing by 25–60% without increasing cost, and led to follow-on collaborations between the institutions. The North Carolina Office of Strategic Partnerships provides a prime example of how government agencies can take steps to facilitate these collaborations. The office recently launched the North Carolina Project Portal, which serves as a platform for the agency to share their research needs, and for external partners — including academics — to express interest in collaborating. Researchers are encouraged to contact the relevant project leads, who then assess interested parties on their expertise and capacity, extend an offer for a formal research partnership, and initiate the project.
Short-term placements
Short-term placements allow for an academic researcher to work at a federal agency for a limited period of time (typically one year or less), either as a fellow, a scholar, a detailee, or a special government employee. This type of collaboration can be useful for agencies that need to fill temporary gaps in expertise, capacity, or leadership, or to foster cross-sector exchange and learning.
Academic considerations
- Career stage: Short-term placements may be more appealing to senior faculty, who have more established and impact-focused research agendas, and who may seek to influence policy or practice at the highest levels. Junior faculty may be less interested in placements, particularly if they are still progressing towards tenure — unless the position offers opportunities for publication, funding, or recognition that are relevant to their tenure or promotion criteria.
- Soft vs. hard money: Faculty in soft money positions may face more challenges in arranging short-term placement if they have ongoing grants or labs to maintain; but placements where external resources are available (e.g., established fellowships), could be an attractive option when ongoing commitments are manageable. The impact of hard money will depend largely on the type of placement and the expectations for whether institutional support or external resources will cover a faculty member’s time away from the university.
- Timing: Sabbaticals are an ideal time for short-term placements, as they allow faculty to pursue intensive research or external engagement, without interfering with their regular academic duties. However, convincing faculty to use their sabbaticals for short-term placement may require a longer discovery and recruitment period, as well as a strong value proposition that highlights the benefits and incentives of the collaboration. Because most faculty are subject to the academic calendar, June and January tend to be ideal start dates for this type of engagement.
- Incentives: Short-term placements can offer benefits for academics, such as having an impact on policy or practice, gaining access to new data or research areas, and building relationships with agency officials and other stakeholders. However, short-term placements can also involve some costs and/or risks for participating faculty, including logistical complications, relocation, confidentiality constraints, and publication restrictions.
Regulatory & structural considerations
- Contracting: Short-term placements require a formal agreement or contract between the agency and the academic. There are several contracting & hiring mechanisms that can facilitate short-term placement, such as the Intergovernmental Personnel Act (IPA) Mobility Program, the Experts & Consultants authority, Schedule A(r), or the Special Government Employee (SGE) designation. Each mechanism has different eligibility criteria, terms and conditions, and administrative processes. Alternatively, many fellowship programs already exist within agencies or through outside organizations, which can streamline the process and handle logistics on behalf of both the academic institution and the agency.
- Compensation: The payment of salary support, travel, overhead, etc. will depend on the contracting mechanism and the agreement between the agency and the academic institution. Costs are generally covered by the organization that is expected to benefit most from the placement, which is often the agency itself; though some authorities for facilitating cross-sector exchange (e.g., the IPA program and Experts and Consultants authority) allow research institutions to cost-share or cover the expense of an expert’s compensation when appropriate. External fellowship programs also occasionally provide external resources to cover costs.
- Role and expectations: Placements, more so than more informal collaborations, require clear communication and understanding of the role and expectations. The academic should also be prepared to adapt to the agency’s norms and processes, which will differ from those in academia, and to perform work that may not reflect their typical contribution. The academic should also be aware of their rights and obligations as a federal employee or contractor.
- Confidentiality: Placements may involve access to confidential or sensitive information from the agency, such as classified data or personal information. Academics will likely be required to sign a non-disclosure agreement (NDA) that defines the scope and terms of confidentiality, and will often be subject to security clearance or background check procedures before entering their role.
Case studies
Various programs exist throughout government to facilitate short-term rotations of outside experts into federal agencies and offices. One of the most well-known examples is the American Association for the Advancement of Science (AAAS) Science & Technology Policy Fellowship (STPF) program, which places scientists and engineers from various disciplines and career stages in federal agencies for one year to apply their scientific knowledge and skills to inform policy making and implementation. The Schedule A(r) hiring authority tends to be well-suited for these kinds of fellowships; it is used, for example, by the Bureau of Economic Analysis to bring on early career fellows through the American Economic Association’s Summer Economics Fellows Program. In some circumstances, outside experts are brought into government “on loan” from their home institution to do a tour of service in a federal office or agency; in these cases, the IPA program can be a useful mechanism. IPAs are used by the National Science Foundation (NSF) in its Rotator Program, which brings outside scientists into the agency to serve as temporary Program Directors and bring cutting-edge knowledge to the agency’s grantmaking and priority-setting. IPA is also used for more ad-hoc talent needs; for example, the Office of Evaluation Sciences (OES) at GSA often uses it to bring in fellows and academic affiliates.
Long-term rotations
Long-term rotations allow an academic to work at a federal agency for an extended period of time (more than one year), either as a fellow, a scholar, a detailee, or a special government employee. This type of collaboration can be useful for agencies that need to recruit and retain expertise, capacity, or leadership in areas that are critical to their mission, goals, or priorities.
Academic considerations
- Career stage: Long-term rotations may be more feasible for senior faculty, who have more experience in their discipline and are likely to have more flexibility and support from their institutions to take a leave of absence. Junior faculty may face more barriers and risks in pursuing long-term rotations, such as losing momentum in their research productivity, missing opportunities for tenure or promotion, or losing connection with their academic peers and mentors.
- Soft vs. hard money: Faculty in soft money positions may have more ability to seek longer-term rotations, as the provision of external support is more in line with their institutions’ expectations. Faculty in hard money positions may face difficulties seeking long-term rotations, as institutional provision of resources comes with expectations for teaching and service that administrations may be wary of pausing for extended periods of time.
- Timing: Long-term rotations require careful planning and coordination with the academic institution and the federal agency, as it may involve significant changes in the academic’s schedule, workload, and responsibilities. These rotations may be easier to arrange during sabbaticals or other periods of leave from the academic institution, but will often still require approval from the institution’s administration. Because most faculty are subject to the academic calendar, June and January tend to be ideal start dates for sabbatical or secondment engagements.
- Incentives: Long-term rotations offer an opportunity for faculty to gain valuable experience and insight into the impact frontier — both in terms of policy and practice — of their field or discipline. These experiences can yield new skills or competencies that enhance their academic performance or career advancement, can help academics build strong relationships and networks with agency officials and other stakeholders, and can provide a lasting impact on public good. However, long-term roles involve challenges for faculty, such as adjusting to a different organizational structure, balancing expectations from both the agency and the academy, and transitioning back into academic work and productivity following the rotation.
Regulatory & structural considerations
- Regulatory and structural considerations — including contracting, compensation, and expectations — are similar to those of short-term placements, and tend to involve the same mechanisms and processes.
- The desired length of a long-term rotation will affect how agencies select and apply the appropriate mechanism. For example, IPA assignments are initially made for up to two years, and can then be extended for another two years when relevant — yielding a maximum continuous term length of four years.
- Longer time frames typically require additional structural considerations. Specifically, extensions of mechanisms like the IPA may be required, or more formal governmental employment may be prioritized at the outset. Given that these types of placements are often bespoke, these considerations should be explored in depth for the agency’s specific needs and regulatory context.
Case study
One example of a long-term rotation that draws experts from academia into federal agency work is the Advanced Research Projects Agency (ARPA) Program Manager (PM) role. ARPA PMs — across DARPA, IARPA, ARPA-E, and now ARPA-H — are responsible for leading high-risk, high-reward research programs, and have considerable autonomy and authority in defining their research vision, selecting research performers, managing their research budget, and overseeing their research outcomes. PMs are typically recruited from academia, industry, or government for a term of three to five years, and are expected to return to their academic institutions or pursue other career opportunities after their term at the agency. PMs coming from academia or nonprofit organizations are often brought on through the IPA mobility program, and some entities also have unique term-limited, hiring authorities for this purpose. PMs can also be hired as full government employees; this mechanism is primarily used for candidates coming from the private sector.
Laying the Foundation for the Low-Carbon Cement and Concrete Industry
This report is part of a series on underinvested clean energy technologies, the challenges they face, and how the Department of Energy can use its Other Transaction Authority to implement programs custom tailored to those challenges.
Cement and concrete production is one of the hardest industries to decarbonize. Solutions for low-emissions cement and concrete are much less mature than those for other green technologies like solar and wind energy and electric vehicles. Nevertheless, over the past few years, young companies have achieved significant milestones in piloting their technologies and certifying their performance and emissions reductions. In order to finance new manufacturing facilities and scale promising solutions, companies will need to demonstrate consistent demand for their products at a financially sustainable price. Demand support from the Department of Energy (DOE) can help companies meet this requirement and unlock private financing for commercial-scale projects. Using its Other Transactions Authority, DOE could design a demand-support program involving double-sided auctions, contracts for difference, or price and volume guarantees. To fund such a program using existing funds, the DOE could incorporate it into the Industrial Demonstrations Program. However, additional funding from Congress would allow the DOE to implement a more robust program. Through such an initiative, the government would accelerate the adoption of low-emissions cement and concrete, providing emissions reductions benefits across the country while setting the United States up for success in the future clean industrial economy.
Introduction
Besides water, concrete is the most consumed material in the world. It is the material of choice for construction thanks to its durability, versatility, and affordability. As of 2022, the cement and concrete sector accounted for nine percent of global carbon emissions. The vast majority of the embodied emissions of concrete come from the production of Portland cement. Cement production emits carbon through the burning of fossil fuels to heat kilns (40% of emissions) and the chemical process of turning limestone and clay into cement using that heat (60% of emissions). Electrifying production facilities and making them more energy efficient can help decarbonize the former but not the latter, which requires deeper innovation.
Current solutions on the market substitute a portion of the cement used in concrete mixtures with Supplementary Cementitious Materials (SCMs) like fly ash, slag, or unprocessed limestone, reducing the embodied emissions of the resulting concrete. But these SCMs cannot replace all of the cement in concrete, and currently there is an insufficient supply of readily usable fly ash and slag for wider adoption across the industry.
The next generation of ultra-low-carbon, carbon-neutral, and even carbon-negative solutions seeks to develop alternative feedstocks and processes for producing cement or cementitious materials that can replace cement entirely and to capture carbon in aggregates and wet concrete. The DOE reports that testing and scaling these new technologies is crucial to fully eliminate emissions from concrete by 2050. Bringing these new technologies to the market will not only help the United States meet its climate goals but also promote U.S. leadership in manufacturing.
A number of companies have established pilot facilities or are in the process of constructing them. These companies have successfully produced near-carbon-neutral and even carbon-negative concrete. Building off of these milestones, companies will need to secure financing to build full-scale commercial facilities and increase their manufacturing capacity.
Challenges Facing Low-Carbon Cement and Concrete
A key requirement for accessing both private-sector and government financing for new facilities is that companies obtain long-term offtake agreements, which assure financiers that there will be a steady source of revenue once the facility is built. But the boom-and-bust nature of the construction industry discourages construction companies and intermediaries from entering into long-term financial commitments in case there won’t be a project to use the materials for. Cement, aggregates, and other concrete inputs also take up significant volume, so it would be difficult and costly for potential offtakers to store excess amounts during construction lulls. For these reasons, construction contractors procure concrete on an as-needed, project-specific basis.
Adding to the complexity, structural features of the cement and concrete market increase the difficulty of securing long-term offtake agreements:
- Long, fragmented supply chain: While the supply chain is highly concentrated at either end, there are multiple intermediaries between the actual producers of cement, aggregates, and other inputs and the final construction customers. These include the thousands of ready-mix concrete producers, along with materials dealers, construction contractors, and subcontractors. As a result, construction customers usually aren’t buying materials themselves, and their contractors or subcontractors often aren’t buying materials directly from cement producers.
- Regional fragmentation: Cement, aggregates, and other concrete inputs are heavy products, which entail high freight costs and embodied emissions from transportation, so producers have a limited range in which they are willing to ship their product. After these products are shipped to a ready-mix concrete facility, the fresh concrete must then be delivered to the construction site within 60 to 90 minutes or the concrete will harden. As a result, the localization of supply chains limits the potential customers for a new manufacturing plant.
- Low margins: The cement and concrete markets operate with very low margins, so buyers are highly sensitive to price. Consequently, low-carbon cement and concrete may struggle to compete against conventional options due to their green premiums.
Luckily, private construction is not the only customer for concrete. The U.S. government (federal, state, and local combined) accounts for roughly 50% of all concrete procurement in the country. Used correctly, the government’s purchasing power can be a powerful lever for spurring the adoption of decarbonized cement and concrete. However, the government faces similar barriers as the private sector against entering into long-term offtake agreements. Government procurement of concrete goes through multiple intermediaries and operates on an as-needed, project-specific basis: government agencies like the General Services Administration (GSA) enter into agreements with construction contractors for specific projects, and then the contractors or their subcontractors make the ultimate purchasing decisions for concrete.
Federal Support
The Federal Buy Clean Initiative, enacted in 2021 by the Biden Administration, is starting to address the procurement challenge for low-carbon cement and concrete. Among the initiative’s programs is the allocation of $4.5 billion from the Inflation Reduction Act (IRA) for the GSA and the Department of Transportation (DOT) to use lower-carbon construction materials. Under the initiative, the GSA is piloting directly procuring low-embodied-carbon materials for federal construction projects. To qualify as low-embodied-carbon concrete under the GSA’s interim requirements, concrete mixtures only have to achieve a roughly 25–50% reduction in carbon content,1 depending on the compressive strength. The requirement may be even less if no concrete meeting this standard is available near the project site. Since the bar is only slightly below traditional concrete, young companies developing the solutions to fully decarbonize concrete will have trouble competing in terms of price against companies producing more well-established but higher-emission solutions like fly ash, slag, and limestone concrete mixtures to secure procurement contracts. Moreover, the just-in-time and project-specific nature of these procurement contracts means they still don’t address juvenile companies’ need for long-term price and customer security in order to scale up.
The ideal solution for this is a demand-support program. The DOE Office of Clean Energy Demonstrations (OCED) is developing a demand-support program for the Hydrogen Hubs initiative, setting aside $1 billion for demand-support to accompany the $7 billion in direct funding to regional Hydrogen Hubs. In its request for proposals, OCED says that the hydrogen demand-support program will address the “fundamental mismatch in [the market] between producers, who need long-term certainty of high-volume demand in order to secure financing to build a project, and buyers, who often prefer to buy on a short-term basis at more modest volumes, especially for products that have yet to be produced at scale and [are] expected to see cost decreases.”
A demand-support program could do the same for low-carbon cement and concrete, addressing the market challenges that grants alone cannot. OCED is reviewing applications for the $6.3 billion Industrial Demonstrations Program. Similar to the Hydrogen Hubs, OCED could consider setting aside $500 million to $1 billion of the program funds to implement demand-support programs for the two highest-emitting heavy industries, low-carbon cement/concrete and steel, at $250 million to $500 million each.
Additional funding from Congress would allow DOE to implement a more robust demand-support program. Federal investment in industrial decarbonization grew from $1.5 billion in FY21 to over $10 billion in FY23, thanks largely to new funding from BIL and IRA. However, the sector remains underfunded relative to its emissions, contributing 23% of the country’s emissions while receiving less than 12% of Federal climate innovation funding. A promising piece of legislation that was recently introduced is The Concrete and Asphalt Innovation Act of 2023, which would, among other things, direct the DOE to establish a program of research, development, demonstration, and commercial application of low-emissions cement, concrete, asphalt binder, and asphalt mixture. This would include a demonstration initiative authorized at $200 million and the production of a five-year strategic plan to identify new programs and resources needed to carry out the mission. If the legislation is passed, the DOE could propose a demand-support program in its strategic plan and request funding from Congress to set it up, though the faster route would be for Congress to add a section to the Act directly establishing a demand-support program within DOE and authorizing funding for it before passing the Act.
Other Transactions Authority
BIL and IRA gave DOE an expanded mandate to support innovative technologies from early-stage research through commercialization. In order to do so, DOE must be just as innovative in its use of its available authorities and resources. Tackling the challenge of bringing technologies from pilot to commercialization requires DOE to look beyond traditional grant, loan, and procurement mechanisms. Previously, we have identified the DOE’s Other Transaction Authority (OTA) as an underleveraged tool for accelerating clean energy technologies.
OTA is defined in legislation as the authority to enter into transactions that are not government grants or contracts in order to advance an agency’s mission. This negative definition provides DOE with significant freedom to design and implement flexible financial agreements that can be tailored to address the unique challenges that different technologies face. DOE plans to use OTA to implement the hydrogen demand-support program, and it could also be used for a demand-support program for low-carbon cement and concrete. The DOE’s new Guide to Other Transactions provides official guidance on how DOE personnel can use the flexibilities provided by OTA.
Defining Products for Demand Support
Before setting up a demand-support program, DOE first needs to define what a low-carbon cement or concrete product is and the value it provides in emissions avoided. This is not straightforward due to (1) the heterogeneity of solutions, which prevents apples-to-apples comparisons in price, and (2) variations in the amount of avoided emissions that different solutions can provide. To address the first issue, for products that are not ready-mix concrete, the DOE should calculate the cost of a unit of concrete made using the product, based on a standardized mix ratio of a specific compressive strength and market prices for the other components of the concrete mix. To address the second issue, the DOE should then divide the calculated price per unit of concrete (e.g., $/m3) by the amount of CO2 emissions avoided per unit of concrete compared to the NRCMA’s industry average (e.g., kg/m3) to determine the effective price per unit of CO2 emissions avoided. The DOE can then fairly compare bids from different projects using this metric. Such an approach would result in the government providing demand support for the products that are most cost-effective at reducing carbon emissions, rather than solely the cheapest.
Furthermore, the DOE should put an upper limit on the amount of embodied carbon that the concrete product or concrete made with the product must meet in order to qualify as “low carbon.” We suggest that the DOE use the limits established by the First Movers Coalition, an international corporate advanced market commitment for concrete and other hard-to-abate industries organized by the World Economic Forum. The limits were developed through conversations with incumbent suppliers, start-ups, nonprofits, and intergovernmental organizations on what would be achievable by 2030. The limits were designed to help move the needle towards commercializing solutions that enable full decarbonization.
Companies that participate in a DOE demand-support program should be required after one or two years of operations to confirm that their product meets these limits through an Environmental Product Declaration.2 Using carbon offsets to reach that limit should not be allowed, since the goal is to spur the innovation and scaling of technologies that can eventually fully decarbonize the cement and concrete industry.
Below are some ideas for how DOE can set up a demand-support program for low-carbon cement and concrete.
Program Proposals
Double-Sided Auction
Double-sided auctions are designed to support the development of production capacity for green technologies and products and the creation of a market by providing long-term price certainty to suppliers and facilitating the sale of their products to buyers. As the name suggests, a double-sided auction consists of two phases: First, the government or an intermediary organization holds a reverse auction for long-term purchase agreements (e.g., 10 years) for the product from suppliers, who are incentivized to bid the lowest possible price in order to win. Next, the government conducts annual auctions of short-term sales agreements to buyers of the product. Once sales agreements are finalized, the product is delivered directly from the supplier to the buyer, with the government acting as a transparent intermediary. The government thus serves as a market maker by coordinating the purchase and sale of the product from producers to buyers. Government funding covers the difference between the original purchase price and the final sale price, reducing the impact of the green premium for buyers and sellers.
While the federal government has not yet implemented a double-sided auction program, OCED is considering setting up the hydrogen demand-support measure as a “market maker” that provides a “ready purchaser/seller for clean hydrogen.” Such a market maker program could be implemented most efficiently through double-sided auctions.
Germany was the first to conceive of and develop the double-sided auction scheme. The H2Global initiative was established in 2021 to support the development of production capacity for green hydrogen and its derivative products. The program is implemented by Hintco, an intermediary company, which is currently evaluating bids for its first auction for the purchase of green ammonia, methanol, and e-fuels, with final contracts expected to be announced as soon as this month. Products will start to be delivered by the end of 2024.

(Source: H2Global)
A double-sided auction scheme for low-carbon cement and concrete would address producers’ need for long-term offtake agreements while matching buyers’ short-term procurement needs. The auctions would also help develop transparent market prices for low-carbon cement and concrete products.
All bids for purchase agreements should include detailed technical specifications and/or certifications for the product, the desired price per unit, and a robust, third-party life-cycle assessment of the amount of embodied carbon per unit of concrete made with the product, at different compressive strengths. Additionally, bids of ready-mix concrete should include the location(s) of their production facility or facilities, and bids of cement and other concrete inputs should include information on the locations of ready-mix concrete facilities capable of producing concrete using their products. The DOE should then select bids through a pure reverse auction using the calculated effective price per unit of CO2 emissions avoided. To account for regional fragmentation, the DOE could conduct separate auctions for each region of the country.
A double-sided auction presents similar benefits to the low-carbon cement and concrete industry as an advance market commitment would. However, the addition of an efficient, built-in system for the government to then sell that cement or concrete allotment to a buyer means that the government is not obligated to use the cement or concrete itself. This is important because the logistics of matching cement or concrete production to a suitable government construction project can be difficult due to regional fragmentation, and the DOE is not a major procurer of cement and concrete.3 Instead, under this scheme, federal, state, or local agencies working on a construction project or their contractors could check the double-sided auction program each year to see if there is a product offering in their region that matches their project needs and sustainability goals for that year, and if so, submit a bid to procure it. In fact, this should be encouraged as a part of the Federal Buy Clean Initiative, since the government is such an important consumer of cement and concrete products.
Contracts for Difference
Contracts for difference (CfD, or sometimes called two-way CfD) programs aim to provide price certainty for green technology projects and close the gap between the price that producers need and the price that buyers are willing to offer. CfD have been used by the United Kingdom and France primarily to support the development of large-scale renewable energy projects. However, CfD can also be used to support the development of production capacity for other green technologies. OCED is considering CfD (also known as pay-for-difference contracts) for its hydrogen demand-support program.
CfD are long-term contracts signed between the government or a government-sponsored entity and companies looking to expand production capacity for a green product.4 The contract guarantees that once the production facility comes online, the government will ensure a steady price by paying suppliers the difference between the market price for which they are able to sell their product and a predetermined “strike price.” On the other hand, if the market price rises above the strike price, the supplier will pay the difference back to the government. This prevents the public from funding any potential windfall profits.


(Source: Canadian Climate Institute)
A CfD program could provide a source of demand certainty for low-carbon cement and concrete companies looking to finance the construction of pilot- and commercial-scale manufacturing plants or the retrofitting of existing plants. The selection of recipients and strike prices should be determined through annual reverse auctions. In a typical reverse auction for CfD, the government sets a cap on the maximum number of units of product and the max strike price they’re willing to accept. Each project candidate then places a sealed bid for a unit price and the amount of product they plan to produce. The bids are ranked by unit price, and projects are accepted from low to high unit price until either the max total capacity or max strike price is reached. The last project accepted sets the strike price for all accepted projects. The strike price is adjusted annually for inflation but otherwise fixed over the course of the contract. Compared to traditional subsidy programs, a CfD program can be much more cost-efficient thanks to the reverse auction process. The UK’s CfD program has seen the strike price fall with each successive round of auctions.
Applying this to the low-carbon cement and concrete industry requires some adjustments, since there are a variety of products for decarbonizing cement and concrete. As discussed prior, the DOE should compare project bids according to the effective price per unit CO2 abated when the product is used to make concrete. The DOE should also set a cap on the maximum volume of CO2 it wishes to abate and the maximum effective price per unit of CO2 abated that it is willing to pay. Bids can then be accepted from low to high price until one of those caps is hit. Instead of establishing a single strike price, the DOE should use the accepted project’s bid price as the strike price to account for the variation in types of products.
Backstop Price Guarantee
A CfD program could be designed as a backstop price guarantee if one removes the requirement that suppliers pay the government back when market prices rise above the strike price. In this case, the DOE would set a lower maximum strike price for CO2 abatement, knowing that suppliers will be willing to bid lower strike prices, since there is now the opportunity for unrestricted profits above the strike price. The DOE would then only pay in the worst-case scenario when the market price falls below the strike price, which would operate as an effective price floor.
Backstop Volume Guarantee
Alternatively, the DOE could address demand uncertainty by providing a volume guarantee. In this case, the DOE could conduct a reverse auction for volume guarantee agreements with manufacturers, wherein the DOE would commit to purchasing any units of product short of the volume guarantee that the company is unable to sell each year for a certain price, and the company would commit to a ceiling on the price they will charge buyers.5 Using OTA, the DOE could implement such a program in collaboration with DOT or GSA, wherein DOE would purchase the materials and DOT or GSA would use the materials for their construction needs.
Other Considerations for Implementation
Rather than directly managing a demand-support program, the DOE should enter into an OT agreement with an external nonprofit entity to administer the contracts.6 The nonprofit entity would then hold auctions and select, manage, and fulfill the contracts. DOE is currently in the process of doing this for the hydrogen demand-support program.
A nonprofit entity could provide two main benefits. First, the logistics of implementing such a program would not be trivial, given the number of different suppliers, intermediaries, and offtakers involved. An external entity would have an easier and faster time hiring staff with the necessary expertise compared to the federal hiring process and limited budget for program direction that the DOE has to contend with. Second, the entity’s independent nature would make it easier to gain lasting bipartisan support for the demand-support program, since the entity would not be directly associated with any one administration.
Coordination with Other DOE Programs
The green premium for near-zero-carbon cement and concrete products is steep, and demand-support programs like the ones proposed in this report should not be considered a cure-all for the industry, since it may be difficult to secure a large enough budget for any one such program to fully address the green premium across the industry. Rather, demand-support programs can complement the multiple existing funding authorities within the DOE by closing the residual gap between emerging technologies and conventional alternatives after other programs have helped to lower the green premium.
The DOE’s Loan Programs Office (LPO) received a significant increase in their lending authorities from the IRA and has the ability to provide loans or loan guarantees to innovative clean cement facilities, resulting in cheaper capital financing and providing an effective subsidy. In addition, the IRA and the Bipartisan Infrastructure Law provided substantial new funding for the demonstration of industrial decarbonization technologies through OCED.
Policies like these can be chained together. For example, a clean cement start-up could simultaneously apply to OCED for funding to demonstrate their technology at scale and a loan or loan guarantee from LPO after due diligence on their business plan. Together, these two programs drive down the cost of the green premium and derisk the companies that successfully receive their support, leaving a much more modest price premium that a mechanism like a double-sided auction could affordably cover with less risk.
Successfully chaining policies like this requires deep coordination across DOE offices. OCED and LPO would need to work in lockstep in conducting technical evaluations and due diligence of projects that apply to both and prioritize funding of projects that meet both offices’ criteria for success. The best projects should be offered both demonstration funding from OCED and conditional commitments from LPO, which would provide companies with the confidence that they will receive follow-on funding if the demonstration is successful and other conditions are met, while posing no added risk to LPO since companies will need to meet their conditions first before receiving funds. The assessments should also consider whether the project would be a strong candidate for receiving demand support through a double-sided auction, CfD program, or price/volume guarantee, which would help further derisk the loan/loan guarantee and justify the demonstration funding.
Candidates for receiving support from all three public funding instruments would of course need to be especially rigorously evaluated, since the fiscal risk and potential political backlash of such a project failing is also much greater. If successful, such coordination would ensure that the combination of these programs substantially moves the needle on bringing emerging technologies in green cement and concrete to commercial scale.
Conclusion
Demand support can help address the key barrier that low-carbon cement and concrete companies face in scaling their technologies and financing commercial-scale manufacturing facilities. Whichever approach the DOE chooses to take, the agency should keep in mind (1) the importance of setting an ambitious standard for what qualifies as low-carbon cement and concrete and comparing proposals using a metric that accounts for the range of different product types and embodied emissions, (2) the complex implementation logistics, and (3) the benefits of coordinating a demand-support program with the agency’s demonstration and loan programs. Implemented successfully, such a program would crowd in private investment, accelerate commercialization, and lay the foundation for the clean industrial economy in the United States.
Breaking Ground on Next-Generation Geothermal Energy
This report is part one of a series on underinvested clean energy technologies, the challenges they face, and how the Department of Energy can use its Other Transaction Authority to implement programs custom tailored to those challenges.
The United States has been gifted with an abundance of clean, firm geothermal energy lying below our feet – tens of thousands of times more than the country has in untapped fossil fuels. Geothermal technology is entering a new era, with innovative approaches on their way to commercialization that will unlock access to more types of geothermal resources. However, the development of commercial-scale geothermal projects is an expensive affair, and the U.S. government has severely underinvested in this technology. The Inflation Reduction Act and the Bipartisan Infrastructure Law concentrated clean energy investments in solar and wind, which are great near-term solutions for decarbonization, but neglected to invest sufficiently in solutions like geothermal energy, which are necessary to reach full decarbonization in the long term. With new funding from Congress or potentially the creative (re)allocation of existing funding, the Department of Energy (DOE) could take a number of different approaches to accelerating progress in next-generation geothermal energy, from leasing agency land for project development to providing milestone payments for the costly drilling phases of development.
Introduction
As the United States power grid transitions towards clean energy, the increasing mix of intermittent renewable energy sources like solar and wind must be balanced by sources of clean firm power that are available around the clock in order to ensure grid reliability and reduce the need to overbuild solar, wind, and battery capacity. Geothermal power is a leading contender for addressing this issue.
Conventional geothermal (also known as hydrothermal) power plants tap into existing hot underground aquifers and circulate the hot water to the surface to generate electricity. Thanks to an abundance of geothermal resources close to the earth’s surface in the western part of the country, the United States currently leads the world in geothermal power generation. Conventional geothermal power plants are typically located near geysers and steam vents, which indicate the presence of hydrothermal resources belowground. However, these hydrothermal sites represent just a small fraction of the total untapped geothermal potential beneath our feet — more than the potential of fossil fuel and nuclear fuel reserves combined.
Next-generation geothermal technologies, such as enhanced geothermal systems (EGS), closed-loop or advanced geothermal systems (AGS), and other novel designs, promise to allow access to a wider range of geothermal resources. Some designs can potentially also serve double duty as long-duration energy storage. Rather than tapping into existing hydrothermal reservoirs underground, these technologies drill into hot dry rock, engineer independent reservoirs using either hydraulic stimulation or extensive horizontal drilling, and then introduce new fluids to bring geothermal energy to the surface. These new technologies have benefited from advances in the oil and gas industry, resulting in lower drilling costs and higher success rates. Furthermore, some companies have been developing designs for retrofitting abandoned oil and gas wells to convert them into geothermal power plants. The commonalities between these two sectors present an opportunity not only to leverage the existing workforce, engineering expertise, and supply chain from the oil and gas industry to grow the geothermal industry but also to support a just transition such that current workers employed by the oil and gas industry have an opportunity to help build our clean energy future.
Over the past few years, a number of next-generation geothermal companies have had successful pilot demonstrations, and some are now developing commercial-scale projects. As a result of these successes and the growing demand for clean firm power, power purchase agreements (PPAs) for an unprecedented 1GW of geothermal power have been signed with utilities, community choice aggregators (CCAs), and commercial customers in the United States in 2022 and 2023 combined. In 2023, PPAs for next-generation geothermal projects surpassed those for conventional geothermal projects in terms of capacity. While this is promising, barriers remain to the development of commercial-scale geothermal projects. To meet its goal of net-zero emissions by 2050, the United States will need to invest in overcoming these barriers for next-generation geothermal energy now, lest the technology fail to scale to the level necessary for a fully decarbonized grid.
Meanwhile, conventional hydrothermal still has a role to play in the clean energy transition. The United States needs all the clean firm power that it can get, whether that comes from conventional or next-generation geothermal, in order to retire baseload coal and natural gas plants. The construction of conventional hydrothermal power plants is less expensive and cheaper to finance, since it’s a tried and tested technology, and there are still plenty of untapped hydrothermal resources in the western part of the country.
Challenges Facing Geothermal Projects
Funding is the biggest barrier to commercial development of next-generation geothermal projects. There are two types of private financing: equity financing or debt financing. Equity financing is more risk tolerant and is typically the source of funding for start-ups as they move from the R&D to demonstration phases of their technology. But because equity financing has a dilutive effect on the company, when it comes to the construction of commercial-scale projects, debt financing is preferred. However, first-of-a-kind commercial projects are almost always precluded from accessing debt financing. It is commonly understood within industry that private lenders will not take on technology risk, meaning that technologies must be at a Technology Readiness Level (TRL) of 9, where they have been proven to operate at commercial scale, and government lenders like the DOE Loan Programs Office (LPO) generally will not take on any risk that private lenders won’t. Manifestations of technology risk in next-generation geothermal include the possibility of underproduction, which would impact the plant’s profitability, or that capacity will decline faster than expected, reducing the plant’s operating lifetime. Moving next-generation technologies from the current TRL-7 level to TRL-9 will be key to establishing the reliability of these emerging technologies and unlocking debt financing for future commercial-scale projects.
Underproduction will likely remain a risk, though to a lesser extent, for next-generation projects even after technologies reach TRL-9. This is because uncertainty in the exploration and subsurface characterization process makes it possible for developers to overestimate the temperature gradient and thus the production capacity of a project. Hydrothermal projects also share this risk: the factors determining the production capacity for hydrothermal projects include not only the temperature gradient but also the flow rate and enthalpy of the natural reservoir. In the worst-case scenario, drilling can result in a dry hole that produces no hot fluids at all. This becomes a financial issue if the project is unable to generate as much revenue as expected due to underproduction or additional wells must be drilled to compensate, driving up the total project cost. Thus, underproduction is a risk shared by both next-generation and conventional geothermal projects. Research into improvements to the accuracy and cost of geothermal exploration and subsurface characterization can help mitigate this risk but may not eliminate it entirely, since there is a risk-cost trade-off in how much time is spent on exploration and subsurface characterization.
Another challenge for both next-generation and conventional geothermal projects is that they are more expensive to develop than solar or wind projects. Drilling requires significant upfront capital expenditures, making up about half of the total capital costs of developing a geothermal project, if not more. For example, in EGS projects, the first few wells can cost around $10 million each, while conventional hydrothermal wells, which are shallower, can cost around $3–7 million each. While conventional hydrothermal plants only consist of two to six wells on average, designs for commercial EGS projects can require several times that amount of wells. Luckily, EGS projects benefit from the fact that wells can be drilled identically, so projects expect to move down the learning curve as they drill more wells, resulting in faster and cheaper drilling. Initial data from commercial-scale projects currently being developed suggest that the learning curves may be even steeper than expected. Nevertheless, this will need to be proven at scale across different locations. Some companies have managed to forgo expensive drilling costs by focusing on developing technologies that can be installed within idle hydrothermal wells or abandoned oil and gas wells to convert them into productive geothermal wells.
Beyond funding, geothermal projects need to obtain land where there are suitable geothermal resources and permits for each stage of project development. The best geothermal resources in the United States are concentrated in the West, where the federal government owns most of the land. The Bureau of Land Management (BLM) manages a lot of that land, in addition to all subsurface resources on federal land. However, there is inconsistency in how the BLM leases its land, depending on the state. While Nevada BLM has been very consistent about holding regular lease sales each year, California BLM has not held a lease sale since 2016. Adding to the complexity is the fact that although BLM manages all subsurface resources on federal land, surface land may sometimes be managed by a different agency, in which case both agencies will need to be involved in the leasing and permitting process.
Last, next-generation geothermal companies face a green premium on electricity produced using their technology, though the green premium does not appear to be as significant of a challenge for next-generation geothermal as it is for other green technologies. In states with high renewables penetration, utilities and their regulators are beginning to recognize the extra value that clean firm power provides in terms of grid reliability. For example, the California Public Utility Commission has issued an order for utilities to procure 1 GW of clean, firm power by 2026, motivating a wave of new demand from utilities and community choice aggregators. As a result of this demand and California’s high electricity prices in general, geothermal projects have successfully signed a flurry of PPAs over the past year. These have included projects located in Nevada and Utah that can transmit electricity to California customers. In most other western states, however, electricity prices are much lower, so utility companies can be reluctant to sign PPAs for next-generation geothermal projects if they aren’t required to, due to the high cost and technology risk. As a result, next-generation geothermal projects in those states have turned to commercial customers, like those operating data centers, who are willing to pay more to meet their sustainability goals.
Federal Support
The federal government is beginning to recognize the important role of next-generation geothermal power for the clean energy transition. For the first time in 2023, geothermal energy became eligible for the renewable energy investment and production tax credits, thanks to technology-neutral language introduced in the Inflation Reduction Act (IRA). Within the DOE, the agency launched the Enhanced Geothermal Shot in 2022, led by the Geothermal Technologies Office (GTO), to reduce the cost of EGS by 90% to $45/MWh by 2035 and make geothermal widely available. In 2020, the Frontier Observatory for Research in Geothermal Energy (FORGE), a dedicated underground field laboratory for EGS research, drilling, and technology testing established by GTO in 2014, drilled their first well using new approaches and tools the lab had developed. This year, GTO announced funding for seven EGS pilot demonstrations from the Bipartisan Infrastructure Law (BIL), for which GTO is currently reviewing the first round of applications. GTO also awarded the Geothermal Energy from Oil and gas Demonstrated Engineering (GEODE) grant to a consortium formed by Project Innerspace, the Society of Petroleum Engineering International, and Geothermal Rising, with over 100 partner entities, to transfer best practices from the oil and gas industry to geothermal, support demonstrations and deployments, identify barriers to growth in the industry, and encourage workforce adoption.
While these initiatives are a good start, significantly more funding from Congress is necessary to support the development of pilot demonstrations and commercial-scale projects and enable wider adoption of geothermal energy. The BIL notably expanded the DOE’s mission area in supporting the deployment of clean energy technologies, including establishing the Office of Clean Energy Demonstrations (OCED) and funding demonstration programs from the Energy Division of BIL and the Energy Act of 2020. However, the $84 million in funding authorized for geothermal pilot demonstrations was only a fraction of the funding that other programs received from BIL and not commensurate to the actual cost of next-generation geothermal projects. Congress should be investing an order of magnitude more into next-generation geothermal projects, in order to maintain U.S. leadership in geothermal energy and reap the many benefits to the grid, the climate, and the economy.
Another key issue is that DOE has currently and in the past limited all of its funding for next-generation geothermal to EGS technologies only. As a result, companies pursuing closed-loop/AGS and other next-generation technologies cannot qualify, leading some projects to be moved abroad. Given GTO’s historically limited budget, it’s possible that this was the result of a strategic decision to focus their funding on one technology rather than diluting it across multiple technologies. However, given that none of these technologies have been successfully commercialized at a wide scale yet, DOE may be missing the opportunity to invest in the full range of viable approaches. DOE appears to be aware of this, as the agency currently has a working group on AGS. New funding from Congress would allow DOE to diversify its investments to support the demonstration and commercial application of other next-generation geothermal technologies.
Alternatively, there are a number of OCED programs with funding from BIL that have not yet been fully spent (Table 1). Congress could reallocate some of that funding towards a new program supporting next-generation geothermal projects within OCED. Though not ideal, this may be a more palatable near-term solution for the current Congress than appropriating new funding.
A third option is that DOE could use some of the funding for the Energy Improvements in Rural and Remote Areas program, of which $635 million remains unallocated, to support geothermal projects. Though the program’s authorization does not explicitly mention geothermal energy, geothermal is a good candidate given the abundance of geothermal production potential in rural and remote areas in the West. Moreover, as a clean firm power source, geothermal has a comparative advantage over other renewable energy sources in improving energy reliability.
Other Transactions Authority
BIL and IRA gave DOE an expanded mandate to support innovative technologies from early stage research through commercialization. To do so, DOE will need to be just as innovative in its use of its available authorities and resources. Tackling the challenge of scaling technologies from pilot to commercialization will require DOE to look beyond traditional grant, loan, and procurement mechanisms. Previously, we identified the DOE’s Other Transaction Authority (OTA) as an underleveraged tool for accelerating clean energy technologies.
OTA is defined in legislation as the authority to enter into any transaction that is not a government grant or contract. This negative definition provides DOE with significant freedom to design and implement flexible financial agreements that can be tailored to the unique challenges that different technologies face. OT agreements allow DOE to be more creative, and potentially more cost-effective, in how it supports the commercialization of new technologies, such as facilitating the development of new markets, mitigating risks and market failures, and providing innovative new types of demand-side “pull” funding and supply-side “push” funding. The DOE’s new Guide to Other Transactions provides official guidance on how DOE personnel can use the flexibilities provided by OTA.
With additional funding from Congress, the DOE could use OT agreements to address the unique barriers that geothermal projects face in ways that may not be possible through other mechanisms. Below are four proposals for how the DOE can do so. We chose to focus on supporting next-generation geothermal projects, since the young industry currently requires more governmental support to grow, but we included ideas that would benefit conventional hydrothermal projects as well.
Program Proposals
Geothermal Development on Agency Land
This year, the Defense Innovation Unit issued its first funding opportunity specifically for geothermal energy. The four winning projects will aim to develop innovative geothermal power projects on Department of Defense (DoD) bases for both direct consumption by the base and sale to the local grid. OT agreements were used for this program to develop mutually beneficial custom terms. For project developers, DoD provided funding for surveying, design, and proposal development in addition to land for the actual project development. The agreement terms also gave companies permission to use the technology and information gained from the project for other commercial use. For DoD, these projects are an opportunity to improve the energy resilience and independence of its bases while also reducing emissions. By implementing the prototype agreement using OTA, DoD will have the option to enter into a follow-on OT agreement with project developers without further competition, expediting future processes.
DOE could implement a similar program for its 2.4 million acres of land. In particular, the DOE’s land in Idaho and other western states has favorable geothermal resources, which the DOE has considered leasing. By providing some funding for surveying and proposal development like the DoD, the DOE can increase the odds of successful project development, compared to simply leasing the land without funding support. The DOE could also offer technical support to projects from its national labs.
With such a program, a lot of the value that the DOE would be providing is the land itself, which the DOE currently has more of than actual funding for geothermal energy. The funding needed for surveying and proposal development is much less than would be needed to support the actual construction of demonstration projects, so GTO could feasibly request funding for such a program through the annual appropriations process. Depending on the program outcomes and the resulting proposals, the DOE could then go back to Congress to request follow-on funding to support actual project construction.
Drilling Cost-Share Program
To help defray the high cost of drilling, the DOE could implement a milestone-based cost-share program. There is precedent for government cost-share programs for geothermal: in 1973, before the DOE was even established, Congress passed the Geothermal Loan Guarantee Program to provide “investment security to the public and private sectors to exploit geothermal resources” in the early days of the industry. Later, the DOE funded the Cascades I and II Cost Shared Programs. Then, from 2000 to 2007, the DOE ran the Geothermal Resource Exploration and Definitions (GRED) I, II, and III Cost-Share Programs. This year, the DOE launched its EGS Pilot Demonstrations program.
A milestone payment structure could be favorable for supporting expensive, next-generation geothermal projects because the government takes on less risk compared to providing all of the funding upfront. Initial funding could be provided for drilling the first few wells. Successful and on-time completion of drilling could then unlock additional funding to drill more wells, and so on. In the past, both the DoD and the National Aeronautics and Space Administration (NASA) have structured their OT agreements using milestone payments, most famously between NASA and SpaceX for the development of the Falcon9 space launch vehicle. The NASA and SpaceX agreement included not just technical but also financial milestones for the investment of additional private capital into the project. The DOE could do the same and include both technical and financial milestones in a geothermal cost-share program.
Risk Insurance Program
Longer term, the DOE could implement a risk insurance program for conventional hydrothermal and next-generation geothermal projects. Insuring against underproduction could make it easier and cheaper for projects to be financed, since the potential downside for investors would be capped. The DOE could initially offer insurance just for conventional hydrothermal, since there is already extensive data on past commercial projects that can inform how the insurance is designed. In order to design insurance for next-generation technologies, more commercial-scale projects will first need to be built to collect the data necessary to assess the underproduction risk of different approaches.
France has administered a successful Geothermal Public Risk Insurance Fund for conventional hydrothermal projects since 1982. The insurance originally consisted of two parts: a Short-Term Fund to cover the risk of underproduction and a Long-Term Fund to cover uncertain long-term behavior over the operating lifetime of the geothermal plant. The Short-Term Fund asked project owners to pay a premium of 1.5% of the maximum guaranteed amount. In return, the Short-Term Fund provided a 20% subsidy for the cost of drilling the first well and, in the case of reduced output or a dry hole, a compensation between 20% and 90% of the maximum guaranteed amount (inclusive of the subsidy that has already been paid). The exact compensation is determined based on a formula for the amount necessary to restore the project’s profitability with its reduced output. The Short-Term Fund relied on a high success rate, especially in the Paris Basin where there is known to be good hydrothermal resources, to fund the costs of failures. Geothermal developers that chose to get coverage from the Short-Term Fund were required to also get coverage from the Long-Term Fund, which was designed to hedge against the possibility of unexpected geological or geothermal changes within the wells, such as if their output declined faster than expected or severe corrosion or scaling occurred, over the geothermal plant’s operating lifetime. The Long-Term Fund ended in 2015, but a new iteration of the Short-Term Fund was approved in 2023.
The Netherlands has successfully run a similar program to the Short-Term Fund since the 2000s. Private-sector attempts at setting up geothermal risk insurance packages in Europe and around the world have mostly failed, though. The premiums were often too high, costing up to 25–30% of the cost of drilling, and were established in developing markets where not enough projects were being developed to mutualize the risk.
To implement such a program at the DOE, projects seeking coverage would first submit an application consisting of the technical plan, timeline, expected costs, and expected output. The DOE would then conduct rigorous due diligence to ensure that the project’s proposal is reasonable. Once accepted, projects would pay a small premium upfront; the DOE should keep in mind the failed attempts at private-sector insurance packages and ensure that the premium is affordable. In the case that either the installed capacity is much lower than expected or the output capacity declines significantly over the course of the first year of operations, the Fund would compensate the project based on the level of underproduction and the amount necessary to restore the project’s profitability with a reduced output. The French Short-Term Fund calculated compensation based on characteristics of the hydrothermal wells; the DOE would need to develop its own formulas reflective of the costs and characteristics of different next-generation geothermal technologies once commercial data actually exists.
Before setting up a geothermal insurance fund, the DOE should investigate whether there are enough geothermal projects being developed across the country to ensure the mutualization of risk and whether there is enough commercial data to properly evaluate the risk. Another concern for next-generation geothermal is that a high failure rate could cause the fund to run out. To mitigate this, the DOE will need to analyze future commercial data for different next-generation technologies to assess whether each technology is mature enough for a sustainable insurance program. Last, poor state capacity could impede the feasibility of implementing such a program. The DOE will need personnel on staff that are sufficiently knowledgeable about the range of emerging technologies in order to properly evaluate technical plans, understand their risks, and design an appropriate insurance package.
Production Subsidy
While the green premium for next-generation geothermal has not been an issue in California, it may be slowing down project development in other states with lower electricity prices. The Inflation Reduction Act introduced a new clean energy Production Tax Credit that included geothermal energy for the first time. However, due to the higher development costs of next-generation geothermal projects compared to other renewable energy projects, that subsidy is insufficient to fully bridge the green premium. DOE could use OTA to introduce a production subsidy for next-generation geothermal energy with varied rates depending on the state that the electricity is sold to and its average baseload electricity price (e.g., the production subsidy likely would not apply to California). This would help address variations in the green premium across different states and expand the number of states in which it is financially viable to develop next-generation geothermal projects.
Conclusion
The United States is well-positioned to lead the next-generation geothermal industry, with its abundance of geothermal resources and opportunities to leverage the knowledge and workforce of the domestic oil and gas industry. The responsibility is on Congress to ensure that DOE has the necessary funding to support the full range of innovative technologies being pursued by this young industry. With more funding, DOE can take advantage of the flexibility offered by OTA to create agreements tailored to the unique challenges that the geothermal industry faces as it begins to scale. Successful commercialization would pave the way to unlocking access to 24/7 clean energy almost anywhere in the country and help future-proof the transition to a fully decarbonized power grid.
FAS Annual Report 2023
Friends and Colleagues,
In today’s political climate in Washington, it is sometimes hard to believe that change is possible. Yet, at the Federation of American Scientists (FAS), we know firsthand that progress happens when the science community has a seat at the policymaking table. At our core, we believe that when passionate advocates join forces and share a commitment to ongoing learning, adaptation, and a drive toward action – science and technology progress can both solve the toughest challenges and uncover new ways to deliver the greatest impact.
In 2023, we remained steadfast in our ability to spur collective action. FAS supported our federal partners on the most significant investments in science and technology in decades with the Creating Helpful Incentives to Produce Semiconductors and Science Act (CHIPS) and the Inflation Reduction Act. Our Talent Hub team placed 71 Impact Fellows on tours of service in government and secured a first-of-its-kind partnership with the U.S. Department of Agriculture (USDA) to place 35 Impact Fellows in key positions within USDA over the next five years. Our expert network published 47 actionable policy memos through our Day One Project platform and drove impact by working with the U.S. Department of Transportation (USDOT) to launch the new Advanced Research Projects Agency-Infrastructure (ARPA-I). And our renowned Nuclear Information Project continues to inform the public and challenge assumptions about nuclear weapons arsenals and trends with record breaking public attention. I hope you’ll read more about all of our wins in this year’s FAS Impact Report.
FAS remains focused on honoring our 80-year legacy as a leading voice on global risk while seeking out new policy areas and domains that advance and support science and technology priorities. To support this new era for FAS, we completed a full rebrand—modernizing our look and retelling our story—and rolled out organization-wide strategic goals to drive and define the impact we seek to instill across government. Together, we focus on more than progress for its own sake—we intentionally create the systems and paradigms that make such progress sustainable and tangible.
We have continued to build our team and expertise, and with that growth we are inspired by the caliber of our new teammates. We also remain committed to fulfilling our expectations on Diversity, Equity, Inclusion and Belonging (DEIB) and continue to advocate for stronger commitments to social equality with all of our partners.
It is impossible for me to fit the entire year’s successes into a single letter, but I hope our annual report brings my update to life.
Thank you for your continued support,
Dan Correa, FAS CEO

Our Commitment to Diversity, Equity, and Inclusion
FAS is committed—both in principle and in practice—to creating a diverse, equitable, and inclusive environment for all individuals interested in addressing contemporary issues where science, technology, and innovation policy can deliver dramatic progress.
In 2023, FAS expanded its DEIB strategy beyond its initial pledge to:
- Conduct a first-of-its-kind cultural audit
- Implement changes to create transparent hiring and financial practices
- Produce accountability mechanisms to track progress
Much like our work advancing policy change, FAS approaches the mission of infusing DEIB principles into our organizational culture and the importance of broadening our team’s perspectives with urgency. We also recognize that as a science organization with national reach, we can model forward-thinking approaches to these issues that others can emulate. We acknowledge that we still have a long way to go before claiming success, but FAS is committed to this journey for the long run.

Policy Entrepreneurship in Action
For several years, FAS has been evangelizing the power of policy entrepreneurship to galvanize policy change, helping an entire community of experts and practitioners embrace the tools, mindsets and networks needed to get results. The power of policy entrepreneurship is two-fold:
- It encourages those with lived-experience and expertise to champion the still-underappreciated ways in which science and technology are central to policy solutions.
- Time and again, it yields tangible policy change.
In FY23, FAS advanced policy entrepreneurship across all of its core issue domains by convening change agents, crafting policy memos, curating policy ideas, and seeding countless actionable policy ideas through policy entrepreneurship. Below are just some of our highlights over the past year.
Championing Critical Funding across the Science and Technology (S&T) Ecosystem—FY23 Omnibus Spending Bill
Public investments in science and technology have declined precipitously since the Cold War, when two percent of the U.S. gross domestic product (GDP) went to research and development (R&D). With estimates of R&D investment currently below one percent of GDP and challenges from peer competitors like China threatening U.S. leadership in emerging technologies, FAS advocates for strong investments in critical and emerging technologies as well as science, technology, engineering, and math (STEM) education to maintain America’s edge in innovation.
In December 2022, President Biden signed the FY23 Omnibus appropriations package into law, funding a broad range of new science and technology priorities. This funding will strengthen our country’s ability to invest in better science and technology education, stay globally competitive and ensure that innovation opportunities are available across the country. The bill included provisions that stemmed from a number of ideas that FAS staff and Day One Project contributors helped seed, including:
- A moonshot platform for education R&D. FAS co-leads the Alliance for Learning Innovation (ALI) coalition, bringing together education nonprofits, philanthropy, and the private sector to advocate boosting investment in research to transform K–12 student outcomes. In FY23, ALI championed the inclusion in the Omnibus of a $30 million pilot program to establish a National Center for Advanced Development in Education (NCADE), a new platform for transformative research in education (including several ideas outlined in Day One Project memos). NCADE represents a true milestone for the education community which has been calling for a moonshot education R&D platform for more than a decade.
- Initial appropriations of $3.2 million for the Advanced Research Projects Agency-Infrastructure (ARPA-I), which will help USDOT build an ambitious moonshot research agenda to address stubborn challenges across transportation, safety, equity, and more. The ARPA-I concept, refined in a 2020 Day One Project USDOT transition workshop, was first authorized in the infrastructure package. FAS is helping USDOT with initial agenda-setting for this transformative research agenda.
- An investment of $500 million for Regional Tech Hubs, a major downpayment to fund planning for a nationwide network of regional innovation clusters that expand U.S. innovation capacity in key technology areas. The $1 billion Build Back Better Regional Challenge inspired the design of the Tech Hubs; Day One Project authors contributed to this initiative and repeatedly made the case for regional technology hubs. As part of a multi-organization campaign for Omnibus appropriations, our team undertook targeted outreach to more than 30 entrepreneurial ecosystem groups and influential economic actors across 10 strategically important states.
Reversing Megafire through Science and Data
Against a backdrop of the growing scourge of megafires, FAS has helped to put wildfires on the policy agenda in a bipartisan way that would have seemed impossible only a year ago. FAS organized more than 30 experts to contribute actionable policy ideas that have been shared directly with the Congressionally-mandated Wildland Fire Mitigation and Management Commission. Through this effort, we are advancing our goal of helping reduce the risks of catastrophic uncontrolled fires and protect people from the health risks of wildfire smoke while promoting beneficial controlled fire to improve ecosystem health. FAS policy recommendations influenced recommendations in the Commission’s report to Congress to guide a legislative implementation strategy which has included $1.6 billion in appropriations requests for smoke and public health.
Addressing Inequities in Medical Devices
The COVID-19 public health emergency revealed deep disparities in medical device use, specifically with pulse oximeters—devices widely used to measure oxygen saturation in blood. Medical researchers and policymakers had overlooked this issue for years until the COVID-19 pandemic revealed a large disparity in the diagnosis and treatment of severe respiratory conditions in Black and Brown communities. Through policy entrepreneurship, FAS identified an opportunity on a previously under-examined health policy issue and achieved two major wins.
First, FAS brought together more than 60 stakeholders to highlight policy opportunities to address racial bias in pulse oximeters and to cultivate a comprehensive strategy to address biases and inequities in medical innovation from industry to philanthropy and government by hosting an in-person Forum on Bias in Pulse Oximetry in November 2022.
Second, recognizing the importance of continuing the conversation on disparate impacts of technology and the COVID-19 pandemic on underrepresented communities, FAS developed a research and policy agenda for near-term mitigation of inequities in pulse oximetry and other medical technologies as well as the long-term solutions from the Bias in Pulse Oximetry Forum. FAS’ research and convening on this issue prompted the Veterans Health Administration (VHA)—a major health agency within the U.S. government—to evaluate the use of all pulse oximeters (~50 types) and to understand the impact of the technologies on the more than nine million patients served by the VHA system.

Empowering Expert Talent to Drive Impact Through Government Service
An effective and innovative federal government cannot exist without access to diverse scientific and technical experts to help solve large-scale challenges that impact the public. Yet, too often, the federal government struggles to identify and recruit the expertise needed and to navigate the flexible hiring authorities appropriate for expert tours of service. To support government agency partners, FAS created the Talent Hub to find the best and brightest individuals interested in federal tours of service and provide the needed technical assistance to place them in high-impact roles.
In FY23, the Talent Hub team placed 71 Impact Fellows in government tours of service and built programming to train and equip these fellows with the skills necessary to effectively deliver on the agency’s mission. Below are a few highlights of FAS’s talent innovation model in action.
Delivering science and technical talent to strengthen the capacity of USDA
Climate change diminishes land and water resources, reducing agricultural productivity and leading to conflict over scarce resources. To blunt the effects of climate on our food systems, the government needs experts fast.
FAS’s success in delivering highly qualified scientific and technical talent shaped by agencies’ needs resulted in our greatest success to date: a partnership with USDA to place 35 Impact Fellows within the department over the next five years to build an efficient, resilient, and sustainable food supply chain.
Helping to Create the First U.S. National Nature Assessment
Following the 27th Conference of the Parties (COP27) in Egypt in November 2022, the U.S. government released a Nature-Based Solutions Roadmap, the first U.S. government strategy developed to scale up nature-based solutions. Nature-based solutions combat climate change at lower costs than traditional infrastructure. To contribute to governmental expertise and implement policy within this area, FAS placed a fellow inside the White House’s Office of Science and Technology Policy (OSTP) in 2021 to design, coordinate, and implement this key Biden administration initiative.
Heather Tallis, an Impact Fellow serving at OSTP for the last two years, helped to establish the National Nature Assessment. This new effort of the U.S. Global Change Research Program will assess the status, trends and future projections regarding the health of U.S. lands, waters, and wildlife and the benefits they provide to the economy, climate mitigation and adaptation, equity, health and national security. In her role Heather co-chaired the interagency process of the Nature-Based Solutions Roadmap.
Within the Executive Office of the President (EOP), she generated the idea for the Nature-Based Solutions Roadmap with EOP colleagues, advocated to get it included in an executive order, and co-led a government-wide effort across 15 agencies to implement the first National Nature Assessment. This comprehensive effort, led by Heather, resulted in an important new tool for the U.S. government to forecast how nature might change and what those changes may mean for the economy and the lives of Americans.
Empowering New Voices to Start Their career in Nuclear Weapons Field
FAS launched the New Voices on Nuclear Weapons (NVNW) Fellowship in the summer of 2023. The Fellowship was specifically created to address the high barriers to entry into the nuclear field by providing young nuclear scholars with financial support, mentorship, and opportunities for publication. During the four-month pilot program, the four inaugural NVNW fellows worked with a senior academic or policy expert outside of FAS to co-author research projects that provide a creative perspective on rethinking nuclear deterrence policy. Through the NVNW program, FAS is fostering the next generation of talent in the nuclear field, which is critical as nuclear tensions continue to rise and as experienced talent exits the field.
Developing Leaders with Cross-Sector Knowledge and Bolstering the DOE Pipeline for the Clean Energy Transition
Investing in a robust talent pipeline is critical at the Department of Energy (DOE), where roughly four percent of DOE employees are under 30. Building this pipeline is crucial for the clean energy transition that’s already underway—not only for not the federal government, but for the entire ecosystem. To meet clean energy deployment estimates across the country, clean energy jobs will need to increase threefold by 2025 and almost sixfold by 2030.
FAS is taking a multifaceted approach to supporting the clean energy talent pipeline. Following the publication of our report, FAS hosted a workshop with internal talent and human capital champions to share recommendations for how DOE can strengthen its recruitment of technical experts while retaining and diversifying the skills of its existing workforce. FAS is now engaging with various DOE program offices to execute on those recommendations, including leading a communications effort to profile successful DOE innovators while continuing to build sustainable pathways for onboarding technical talent. Last year, FAS expanded our DOE fellowships in a partnership with the Oak Ridge Institute for Science and Education (ORISE) to recruit and place an initial cohort of fellows funded by DOE into DOE program offices. Building on this initial success, DOE contracted with FAS to recruit mid- to senior-level career technical talent to implement a broad range of ambitious priorities to stimulate a clean energy transition.

Enhancing Government’s Capacity to Implement
FAS experts frequently collaborate with stakeholders in Congress and the executive branch to help solve complex science and technology policy challenges that align with government priorities and needs. In FY23, FAS’s unique ability to coordinate actors across the legislative and executive branches and facilitate crucial discourse and planning efforts across government agencies yielded tangible successes as described below.
Accelerating Technology Deployment through Flexible Financial Mechanisms to Maximize Spending from the Bipartisan Infrastructure Law (BIL) and the Inflation Reduction Act (IRA)
Promising technologies and opportunities for innovation exist across health, clean energy, and other domains but often lack an existing market—or guarantee of a future market—to support their creation and commercialization. The federal government can play a unique role in signaling and even guaranteeing demand for these solutions, including using its power as a buyer.
FAS worked with the DOE front office to diffuse flexible financial mechanisms to support and accelerate the deployment of novel clean energy technologies that lower greenhouse gas emissions, while supporting the implementation of BIL and IRA. FAS compiled a set of policy recommendations for how DOE could leverage its Other Transactions Authority (OTA) to accelerate commercialization and scale high-impact clean energy technologies. FAS recommended that DOE use its other transaction authority by establishing a formal internal process that encourages the formation of consortia to promote efficiency and collaboration across technology areas, while still appropriately mitigating risk.
These recommendations prompted DOE to release informed guidance in September 2023 for how program offices and leaders across the agency can leverage other transactions to catalyze demand for clean energy. DOE continues to engage FAS in ongoing discussions on deploying OTAs and other flexible financial mechanisms to stimulate demand and accelerate deployment of promising technologies.
Creating stronger infrastructure through innovation
The United States faces multiple challenges in using innovation to not only deliver transportation infrastructure that is more resilient against climate change, but also to deliver on the clean energy transition and advance equity for communities that have historically been excluded from decision-making on these projects. To address these challenges, in November 2021 Congress passed the Infrastructure Investment and Jobs Act (IIJA), which included $550 billion in new funding for dozens of new programs across the USDOT.
The bill created the Advanced Research Projects Agency-Infrastructure (ARPA-I) and historic investments in America’s roads and bridges. ARPA-I’s mission is to unlock the full potential of public and private innovation ecosystems to improve U.S. infrastructure by accelerating climate game-changers across the entire U.S. R&D ecosystem. Since its authorization, USDOT has invited FAS to use our expertise to scope advanced research priorities across diverse infrastructure topics where targeted research can yield innovative new infrastructure technologies, materials, systems, capabilities, or processes through ARPA-I.
For example, this year FAS has engaged more than 160 experts in ARPA-I program idea generation and created 50 wireframes for ARPA-I’s initial set of programs, leading to a powerful coalition of stakeholders and laying a strong foundation for the potential that ARPA-I can achieve as it evolves. ARPA-I’s authorization and subsequent initial appropriation in December 2022 provides an opportunity to tackle monumental challenges across transportation and infrastructure through breakthrough innovation. FAS’s programming is helping shape the future of the ARPA-I office.
Providing Government with the Tools to Assess Risks in Artificial Intelligence (AI) and Biosecurity
With increased warnings that AI may support the development of chemical and biological weapons, the federal government must act to protect the public from malicious actors. Senators Ed Markey (D-MA) and Ted Budd (R-NC) introduced the Artificial Intelligence and Biosecurity Risk Assessment Act and the Strategy for Public Health Preparedness and Response to Artificial Intelligence Threats Act with FAS’s technical assistance. These two pieces of legislation empower the federal government to better understand public health security risks associated with AI by directing the Assistant Secretary for Preparedness and Response of the U.S. Department of Health and Human Services (HHS) to conduct comprehensive risk assessments of advances in AI.
Helping International STEM Students and Workers in the United States
Sixty percent of computer science PhDs and nearly 50% of STEM PhDs are foreign born, and these workers have contributed to America’s continuing science and technological leadership. FAS has worked across the legislative and executive branches of government to keep the best and brightest science and technology minds in the United States.
In the legislative branch, interest in keeping talented scientific and technical talent in the United States has increased as a natural security concern. Recognizing the importance of this moment, FAS provided technical assistance to the offices of Senators Dick Durbin (D-IL) and Mike Rounds (R-SD) and Representatives Bill Foster (D-IL11) and Mike Lawler (R-NY17) in introducing the Keep STEM Talent Act of 2023, a bill that would make it easier for international students with advanced STEM degrees to stay in the United States after graduation.
An executive branch rule states that most nonimmigrants (i.e., non-green card holders) must renew visas outside the United States at an American embassy or consulate overseas. This rule requires students and workers to leave the United States during school or employment and bear the costs of going back to their country of origin; it also creates an administrative burden for consular officers who have heavy caseloads. FAS experts published a policy document that provides specific recommendations for how to reinstate domestic visa renewal. The State Department implemented some of these recommendations through a pilot program. This pilot program, the first step to solving this challenge, allows high-skilled immigrants to renew their work visas in the United States rather than having to travel to their home country to do so.

Driving Accountability through Expert Analysis
FAS has a legacy of pursuing a vision of eradicating the global risks that threaten human civilization. To achieve this vision, boosting the collective understanding of real and perceived threats is a critical public good. FAS remains committed to enhancing transparency around nuclear risks, which underpins informed debate and provide a means of defusing risks. Here are a few examples of FAS’s work to drive accountability through expert analysis.
Enhancing transparency around nuclear risks
Since 2003, the Nuclear Information Project (NIP) team at FAS has published the most comprehensive and accurate nuclear weapon transparency information. Three major wins this year include:
- FAS discovered increasing evidence that the U.S. Air Force’s nuclear mission may be returning to UK soil for the first time in 15 years. Based on satellite imagery and budgetary documents, the NIP team concluded that the United States is currently preparing the infrastructure at Royal Air Force (RAF) Lakenheath in the UK to potentially receive nuclear weapons in the future. This discovery was picked up by national and local news outlets and led to protests at Lakenheath, a UK National Day of Action against nuclear weapons, and members of parliament writing op-eds on the topic and asking questions in parliament.
- Following Putin’s March 2023 announcement about Russia’s intent to establish a North Atlantic Treaty Organization (NATO)-style nuclear sharing relationship with Belarus, the Nuclear Information Project team published several investigations identifying the most likely candidate bases for Belarus’s new “nuclear sharing” mission. The NIP team also explored the knowns and unknowns surrounding Putin’s March 2023 statement and continuously analyzed satellite imagery for updates regarding the claimed deployment of these nuclear weapons. These assessments contributed timely and thorough analyses to the ongoing debate about the strategic implications and costs of this type of nuclear-sharing arrangement.
- Over the past year, the Nuclear Information Project team contributed chapters about global nuclear forces to both the Nuclear Weapons Ban Monitor and the annual SIPRI Yearbook. The former was used to inform state policy during the First Meeting of States Parties to the Treaty on the Prohibition of Nuclear Weapons, and the latter was translated into nearly a dozen languages and covered in more than 6,000 articles and broadcasts worldwide. Together, these two contributions by our team constituted some of the most widely-sourced and heavily-cited publications about global nuclear forces over the past year.

Converting New Ideas into Action through Fiscal Sponsorship
Recognizing that the best ideas often need a place to incubate, FAS hosts a fiscal sponsorship program that supports burgeoning entrepreneurs in science and technology policy. FAS provides sponsorship and support to give important ideas life, forge new initiatives, and expand impact in the science community. Below is just one example of the power of our fiscal sponsorship in action.
FLi Sci: Supporting young scholars’ career paths in science
In FY23, FAS supported the fiscal sponsorship of FLi Sci, an education nonprofit initiative that builds pipelines for high school and college students who are first-generation college students or from low-income backgrounds and sets them up for success navigating a scientific career.
In FY23, FLi Sci had a breakthrough year, making significant strides in its effort to change the face of science. The program:
- Expanded its reach, recruiting 20 FLi Sci Scholars from seven states, with over 95% identifying as Black or Hispanic and 80% as female or nonbinary.
- Established financial stability, growing program revenue by about 407% (from ~$65,000 to $330,000) in FY23.
- Increased program offerings, launching two more programs focused on psychology and data science research.

What’s Next for FAS
At FAS, we are proud of our impact and realize there is still more to be done. While we are working to expand the breadth and depth of our work above, we also see three major opportunities for FAS in the next fiscal year.
Expanding Government’s Capacity
The U.S. government is critical to solving the largest problems of the 21st century. While significant progress has been made, institutional complexity challenges the government’s ability to quickly innovate and deliver on its mission. Lackluster incentives, bureaucratic bottlenecks, and the lack of feedback loops slow progress and hinder capacity building across four key areas: financial mechanisms, evidence, talent, and culture. This work is especially important in an election year where either a second term or new administration will bring new people and ideas to Washington, DC, and the government’s ability to execute these ideas hinges on its capacity.
FAS is in a unique position to support the federal government in building federal capacity. Since delivering 100 implementation-ready policy proposals for the 2020 presidential transition, FAS has grown and matured, expanding our capabilities as an organization. We are working to diagnose key science and technology policy issues ripe for bipartisan innovation and support. As we move forward with our findings, FAS will use our Day One platform to publicize grand challenges in this space and gather the best ideas from experts across the country on how best to solve these issues.
Mitigating Global Risk
FAS was founded to address the new, human-created nuclear danger that threatened global extinction. Today, in a world vastly more complicated than the one into which nuclear weapons were introduced, FAS supports the development and execution of sound public policy based on proven and effective technical skills to improve the human condition and, increasingly, to reduce global risks.
FAS’s new Global Risk program is focused on both the promise and peril posed by evolving AI capabilities in the nuclear landscape and beyond. Dedicated to reducing nuclear dangers and ensuring that qualified technical experts are integral parts of the policy process, FAS seeks to advance its work in support of U.S. and global security at the intersection between nuclear weapons, AI, and global risk. By drawing on technical experts, engaging the policy community, convening across multiple skill sets and sectors, and developing joint projects and collaborations with the government, FAS seeks to drive positive policy outcomes and shape the security landscape for the better.
Deepening Knowledge of Emerging Technologies across All Branches of Government
AI’s rapid evolution, combined with a lack of understanding of how it works, makes today’s policy decisions incredibly important but fraught with misconceptions. This is a pivotal moment, and FAS seeks to engage, educate, and inspire congressional staff, executive branch personnel, military decision makers, and state lawmakers on AI’s substantial potential—and risks. Our mission is to translate this transformative technology for lawmakers by advancing impactful policy development and promoting positive and productive discourse.
FAS finds itself in an unprecedented position to directly inform and influence crucial decisions that will shape AI governance. Our nonpartisan expertise and ability to move rapidly have made us the go-to resource for members of Congress across party lines when they require technical advice on AI-related issues. In the 118th Congress, FAS’s AI team has provided support on six vital AI bills and received requests for assistance and briefings on AI-related topics from over 40 congressional offices.
We recognize that this momentum offers FAS a unique opportunity to not only continue guiding policymakers with much-needed perspectives but also strive for actionable and equitable policy change that addresses the challenges linked with advancements in artificial intelligence.

FAS Values
At FAS, we accomplish our work by practicing four values that animate our theory of change: Impact Driven, Customer Focused, Entrepreneurial, and Growth Oriented. Our incredible staff lives and breathes these values in their daily work, and we’re honored to have them tell you why they are so important for them.
“I see our team embody our ‘Impact Driven‘ value every day as we work not only in service of those who bring science and tech into the public domain, but also those for whom their innovations impact. We relentlessly seek to identify, understand, and fill the gaps between our nation’s reality and our collective ambition for a better future.”
Faith Savaiano, Associate Director, Social Innovation
“For me, customer focused work at FAS means understanding our audience. How are they helped with our policy analysis? What issues do they care about and want to see advanced in science and technology policy? In order to have the greatest impact, I strive to understand what is the why for our customers in government and philanthropy in shaping the policy conversation.”
Daniel Robinson, Manager, Grants and Development
“Securing equitable health and well-being for Americans requires navigating a complex continuum of upstream (the social determinants of health) and downstream (how healthcare is delivered) factors while also building a big tent of stakeholders working on all aspects of health security. Embodying an entrepreneurial mindset is essential as I’ve tackled thorny health topics that lack a constituency organizing around high-impact policy solutions at the federal level – like racism in medical technologies and climate change’s impact on human health – with a strong commitment that systemic change is possible if the right people are mobilized to act.”
Grace Wickerson, Health Equity Policy Manager
“Being growth-oriented means being curious and motivated to learn while intentionally cultivating an environment where others can do the same.”
Eliana Johns, Senior Associate, Nuclear Information Project
Fundraising and Development
The Federation of American Scientists continued our fundraising momentum from FY22 into FY23, securing $51 million in new commitments across 47 total awards and 31 unique funders, representing a 46% increase in funding allocations from last year. These investments by FAS’s philanthropic and agency partners reflect a sustained focus by FAS staff to continue diversifying and expanding our funding portfolio while simultaneously deepening our connections with existing partners and positioning FAS as an indispensable voice for evidence-based, scientifically-driven policy analysis and research.
The majority of the funding FAS receives (99.6%) is restricted for the use of specific projects and initiatives, while unrestricted funding (which only accounts for 0.04% of funding) bolsters the organization’s operational capacity.
The critical work being done at FAS would not be possible without the generous support of its philanthropic partners who continue to invest in the organization’s vision for the future.