Understanding the U.S. Bioeconomy: Agency Perspectives

The U.S. bioeconomy—defined by the National Institute of Standards and Technology (NIST) as “economic activity derived from the life sciences, particularly in the areas of biotechnology and biomanufacturing, including industries, products, services, and the workforce” and valued by some at ~$1 trillion—has been a major focus of policy development over the past few years. These policy advances include the White House Executive Order on “Advancing Biotechnology and Biomanufacturing Innovation for a Sustainable, Safe, and Secure American Bioeconomy” (Bioeconomy EO), the CHIPS & Science Act, and the Inflation Reduction Act (IRA). In March 2024, the Office of Science and Technology Policy (OSTP), announced the launch of the National Bioeconomy Board (NBB). The board will “partner across the public and private sectors to advance societal well-being, national security, sustainability, economic productivity, and competitiveness through biotechnology and biomanufacturing,” highlighting the Biden Administration’s commitment to future-proofing an economically sustainable U.S. bioeconomy. 

Despite these advances, the vast intersectionality inherent to the bioeconomy (e.g., with health, clean energy, national security, climate change, economic development) poses unique challenges for the U.S. government. This complexity makes it difficult for the various agencies to coordinate and even more difficult for the general public to understand the government’s approach to the bioeconomy. Nonetheless, to maintain the continued growth within the bioeconomy that has resulted from these policy advances, it will be imperative to clarify a strategic vision that coordinates and publicizes governmental efforts that support the burgeoning U.S. bioeconomy.

The NBB can play an important role in promoting this strategic vision. As directed by the Bioeconomy EO, the interagency through the Executive Office of the President set up the NBB to promote interagency coordination and collaboration on the bioeconomy. The NBB is co-chaired by OSTP, the Department of Commerce (DOC), and the Department of Defense (DOD), and nine other agencies make up the entirety of the board. Other agencies not represented on the NBB itself, including the Environmental Protection Agency (EPA), work with the NBB through various working groups and play an integral role. 

To understand the range of governmental priorities for the bioeconomy, the overarching strategy, the work underway, the various programs within the agencies, and the role of environmental sustainability, our team at the Federation of American Scientists (FAS) spoke with key agencies represented on the NBB to collect their perspectives.

The perspectives summarized below demonstrate that the agencies align bioeconomy-related initiatives to their varied mission areas and, through the NBB and other interagency activities, are working together to develop a shared vision. However, the summaries also show the diversity in focus that informs how agencies approach the bioeconomy. The agency views encompass the broader bioeconomy landscape, including biotechnologies from commodity fuels and agriculture to individualized therapeutics, and biomanufacturing solutions from biomass production to final product. This range highlights both the important role that each agency plays in supporting the U.S. bioeconomy as well as the challenge in coordinating their activities and programs across the federal government.

Approach

In order to collect perspectives from the agencies represented on the NBB around the U.S. bioeconomy, FAS conducted semi-structured interviews with key NBB officials from the OSTP, DOC, DOD, Department of Energy (DOE), Department of Health and Human Services (HHS), and the U.S. Department of Agriculture (USDA) from May 2024 through June 2024. With the exception of USDA, all agency interviews were conducted over Zoom and answers were documented by note-taking. All summaries have been reviewed by agency representatives to confirm consent and validity. The USDA perspective was summarized using publicly available reports and have also been confirmed for validity by an agency representative.

Perspectives from these agencies on the Bioeconomy EO deliverables, bioeconomy-related programs, coordination, goals, hurdles, and the role of environmental sustainability are summarized below. The full list of questions used in the semi-structured interviews can be found in Appendix A.

Agency Perspectives

Office of Science & Technology Policy

For OSTP’s perspective, FAS conducted a semi-structured interview with Dr. Sarah Glaven, principal assistant director for biotechnology and biomanufacturing.

The Office of Science & Technology Policy plays an important role in interagency coordination for topics, like the bioeconomy, that cut across many different agencies, and is one of the co-chairs for the NBB. In adherence with the Bioeconomy EO, OSTP has coordinated interagency efforts and published several reports on the bioeconomy: Bold Goals for U.S. Biotechnology and Biomanufacturing, Building the Bioworkforce of the Future, and Visions, Needs & Proposed Actions for Data for the Bioeconomy Initiative. They are currently working with interagency groups on several activities, including one that recently published a report, in conjunction with USDA and other agencies, that recommended revisions to the North American Industry Classification System (NAICS) and the North American Product Classification System (NAPCS) to better capture economic activity related to the bioeconomy. The creation of the NBB itself fulfills directives from both the Bioeconomy EO as well as the CHIPS & Science Act, which called on OSTP to establish a coordination office on these topics. Currently, due to a lack of funding, OSTP is not an official coordinating office but will function to coordinate activities through the NBB. 

According to OSTP, the Bioeconomy EO reflects the whole-of-government approach that will be needed to support the bioeconomy. For the near term, OSTP plans to show the value and utility of the NBB, execute policy from the Bioeconomy EO, prioritize specific actions from the resulting Bioeconomy EO reports, highlight significant investments, and produce a report on the NAICS and NAPCS codes. In the long term, OSTP hopes that the NBB will become a sustainable government entity that drives a clear national strategy to move the bioeconomy forward and enables the United States to work collaboratively with global partners. 

A key challenge is measuring the bioeconomy. It is difficult to prioritize, strategize, or advocate for additional resources in the absence of baseline economic metrics to track impact or estimate the potential return on investment. Ultimately, OSTP believes it is important to clarify the definition of the bioeconomy in order to create measurements and classifications.

A challenge for OSTP is continuity as it experiences staff turnover and administration changes. However, the NBB and coordination of the bioeconomy portfolio will be well positioned to persist, in part by relying on the NBB’s co-chairs. Also, the Bioeconomy EO allowed OSTP to create principal and assistant director positions for the bioeconomy portfolio, which can help ensure that it remains a high priority. At OSTP, this portfolio sits within the Industrial Innovation Group, which also houses coordination efforts for semiconductors and clean energy. OSTP leadership understands the importance of the bioeconomy and is keen to see the intersections of biomanufacturing with other initiatives, like the DOE’s Earth Shot programs and other clean energy initiatives. 

On the issue of environmental sustainability and the bioeconomy, OSTP highlights efforts by DOE to push for sustainable aviation fuels and USDA’s sustainable biomass supply chain framework as initiatives that are setting the pace for sustainability. There is also an opportunity to consider how biomanufacturing and biosynthesis fit into the broader sustainable chemistry landscape.

Department of Commerce

For DOC’s perspective, FAS conducted a semi-structured interview with Dr. Christopher Szakal, acting director, program coordination office at the National Institute of Standards and Technology.

The Department of Commerce is one of the NBB co-chairs. DOC is sector-agnostic and is interested in the bioeconomy as a way to support the broader economy, remain competitive, and solve broader challenges, such as those related to supply chain resilience. In response to the Bioeconomy EO, the DOC has released the bioeconomy lexicon from NIST and the Feasibility Study for measuring the bioeconomy from the Bureau of Economic Analysis. It has also participated in several interagency activities, including development of OSTP’s Bold Goals report and USDA’s Biomass Supply Chain report, as well as ongoing working groups focusing on updating systems for measuring economic activities (e.g., the NAICS and NAPCS codes) and on biological data and cybersecurity. Separate from the executive order, the Inflation Reduction Act provided significant investments for the Economic Development Administration in biotechnology-related regional technology hubs. Other ongoing activities at DOC in support of the bioeconomy include efforts to support biotechnology and biomanufacturing standards development at NIST, supply chain analyses at the International Trade Administration, work at the Bureau of Industry and Security and at the Patent and Trademark Office to ensure a safe and fair market, and the Workforce Development Strategy.

By nature, DOC keeps a broad perspective and tries to understand how the bioeconomy intersects with other parts of the economy and how technological developments may impact progress. There are important intersections of the bioeconomy with artificial intelligence (AI) and with data security, and policy development in these other areas will have implications for the bioeconomy. For example, the October 2023 Executive Order on AI called for significant new requirements for providers of synthetic nucleic acids to conduct biosecurity screening, which will have implications for biotechnology and biomanufacturing. NIST is tasked with developing standards for this new policy. The intersectional nature of the bioeconomy requires coordination both within the DOC and across the U.S. government. A key challenge is the need for sustained funding because coordination requires time and effort. 

On environmental sustainability, the DOC prioritizes the market and what U.S. companies will find profitable in both the near term and the long term. Elevating sustainability has been challenging because there is uncertainty in how sustainability is measured. Additionally, market drivers have been inconsistent relative to the level needed to address the uncertainty. DOC is looking to utilize the NBB to help provide clarity on how to achieve more consistent market forces in support of sustainability to drive growth of the bioeconomy. 

Department of Defense

For DOD’s perspective, FAS conducted a semi-structured interview with Dr. Peter Emanuel, senior research scientist, bioengineering at U.S. Army Combat Capabilities Development Command.

The Department of Defense is one of the NBB co-chairs. In September 2022 (before the Bioeconomy EO was announced), DOD announced a $1.2 billion investment in biomanufacturing. In March 2023, DOD released a Biomanufacturing Strategy, which was informed by both the National Defense Authorization Act for Fiscal Year 2023 and the Bioeconomy EO. In support of this strategy and the investments made by DOD, the Department’s Defense Production Act Investments (DPAI) Office published an open Request for Information that sought input from industry on biomanufactured products and process capabilities that could help address defense needs. Significant additional investments in biomanufacturing are likely to be forthcoming.

The bioeconomy portfolio is a tiny portion of the overall programmatic budget for DOD. Previously, the DOD’s interest in biology and biotechnology was limited to military medicine and chemical and biological defense, but the department is increasingly focused on nonmedical biomanufacturing applications and believes that they will be key for ensuring national security. The department also acknowledges the importance of workforce development and the need for standardization and infrastructure for the bioeconomy and strongly supports these areas. This commitment can be seen with DOD’s large investments in 2020 in BioMADE, a Manufacturing Innovation Institute focused on creating a sustainable, domestic end-to-end bioindustrial manufacturing ecosystem. 

In the future, DOD hopes to take advantage of biomanufacturing’s potential to support defense objectives beyond just medical countermeasures and other human health-related advances, such as production of bio-based materials, chemicals, and foods. However, DOD faces challenges both internally and externally in communicating the full potential of the bioeconomy and biomanufacturing for DOD. 

On environmental sustainability, DOD believes that economic and environmental sustainability for the bioeconomy go hand-in-hand. For example, a company that could make chemicals without waste would have a significant economic advantage and would support environmental sustainability. Historically, DOD has seen significant costs due to polluted sites, and so understands the value of cleaner products and processes. In addition, DOD is investing in different technologies that would valorize waste streams.

Department of Energy

For DOE’s perspective, FAS conducted a semi-structured interview with Dr. Valerie Reed, director, Bioenergy Technologies Office.

The Department of Energy has many goals for advancing the bioeconomy, with the common denominator being to decarbonize America’s transportation and fuel sectors and to build resilient clean energy for generations to come. In response to the Bioeconomy EO, the DOE contributed to the OSTP Bold Goals report and was tasked to work with other agencies to write reports on National Security Recommendations for Federal Procurement (forthcoming) and best practices for cyber security documentation. Furthermore, DOE also played a large role in an upcoming biotechnology and biomanufacturing report mandated by the Bioeconomy EO. Outside of the direct requirements from the EO, the DOE plays a crucial role in supporting industrial biotechnology through additional reports and their involvement in ongoing interagency activities. For example, the Billion Ton Report provides a comprehensive assessment of biomass availability today and how to sustainably produce more than one billion tons of biomass per year to meet the demand for sustainable aviation fuel production. 

DOE’s bioeconomy efforts are concentrated within the Bioenergy Technologies Office (BETO) and the Office of Science. BETO aims to utilize biomass for sustainable and renewable fuel and chemical production, while the Office of Science supports fundamental research that enables the bioeconomy, including synthetic biology and thermochemical conversion. Under the Inflation Reduction Act and CHIPS & Science Act, significant support was given to bioenergy solutions and clean energy demonstrations, including DOE tax incentives aimed at carbon reduction in fuel production.

In the short term, DOE is focused on prioritizing the use of biomass for Sustainable Aviation Fuel (SAF) and marine fuel production, as well as supporting renewable diesel and ethanol for medium- and heavy-duty vehicles. Long-term goals include transitioning to electrification using biomass, achieving substantial SAF production by 2035 through the SAF Grand Challenge, and scaling up the production of specific chemicals by 2035 as part of the industrial decarbonization strategy. Additionally, in coordination with the USDA, there are focused efforts to increase cultivation of purpose-grown energy crops.

One of the major hurdles the DOE currently faces, and may continue to face in the future, is ensuring sustained funding levels that support ongoing development. Currently, biomass is seen as an expensive feedstock. While the IRA provided an initial policy bridge, it is essential to establish a longer-term incentive to meet market demand, like the 40B (SAF production) and 45Z (clean fuel production) tax credits. 

On environmental sustainability, the DOE is very focused on goals for decarbonization of transportation and fuels, including replacing petroleum-based products with sustainable biomass solutions and conducting life cycle assessments (LCAs) to measure sustainability impacts throughout the supply chain. DOE created the GREET Model for LCAs, which was updated recently, to reduce ambiguity and to help standardize the process for measuring carbon emissions. Additionally, DOE’s Clean Fuels and Products Earthshot is an important cross-agency collaboration that supports accelerating bio-based fuels and chemicals production and decarbonizing both the fuel and chemical industry.

Department of Health & Human Services

For HHS’s perspective, FAS conducted a semi-structured interview with Dr. Lyric Jorgenson, associate director for science policy and the director of the Office of Science Policy at the National Institutes of Health (NIH), and Dr. Julia Limage, director, Office of Strategy, Policy, and Requirements in the Administration for Strategic Preparedness and Response (ASPR).

The Department of Health & Human Services has two representatives on the NBB, one from NIH and one from ASPR. HHS has a broad mission in support of human health, and many of its programs could be considered part of the bioeconomy. However, the Bioeconomy EO outlined a set of priorities that called for additional focus at HHS on advances specific to biotechnology and biomanufacturing, many of which were included in the OSTP Bold Goals report. The EO also tasked HHS with leading the establishment of a Biosafety and Biosecurity Innovation Initiative; a strategic plan for this initiative will be available soon. Another area of intersection of HHS and the Bioeconomy EO is on the regulatory side: the Food and Drug Administration worked with USDA and EPA to provide updates on the regulatory system as deliverables for the EO. Many of the activities related to the EO draw on interagency working groups and other ongoing activities—for example, the work toward pandemic preparedness and biodefense, as well as collaborations between NIH and NSF on health-relevant research.

In the near future, HHS will focus on advancing biotechnologies such as multi-omic medicine, gene editing, and other therapeutics tailored to individual patients. Biomanufacturing and scale-up is another key focus to increase speed and availability of key medicines. In regards to public health, the COVID pandemic highlighted the need for fast and secure biomanufacturing for vaccine production. The Biomedical Advanced Research and Development Authority (BARDA) in ASPR has made significant investments in biomanufacturing for this reason. ASPR also has an Office of Industrial Base Management and Supply Chain to support domestic biomanufacturing in case of public health emergencies.

For HHS, activities related to the bioeconomy directly and unambiguously support the department’s mission and will continue to be prioritized. A key challenge for HHS is the need for sustained funding, especially for coordination, which requires time and effort above and beyond programmatic work. To be effective, activities initiated by the Bioeconomy EO will need to be funded. Some HHS activities, including some related to biomanufacturing of medical countermeasures, were funded with COVID supplemental funding that will soon run out.

On environmental sustainability, HHS has not had any significant focus. However, there have been efforts to decrease the use of single-use plastics and equipment in research and public health activities.

United States Department of Agriculture

For USDA’s perspective, FAS gathered information from publicly available reports and documents, with guidance and direction from Herrick Fox, USDA’s bioeconomy coordinator in the Office of the Chief Economist, and Greg Jaffe, senior advisor in the Office of the Secretary.

The Bioeconomy EO tasked USDA with a wide range of deliverables, and USDA has released many related reports and products that reflect its bioeconomy-related priorities. One set of deliverables focuses on biomass and feedstocks, and supports the strategic vision outlined for agriculture in OSTP’s Bold Goals report. This includes the report on Building a Resilient Biomass Supply—A Plan to Enable the Bioeconomy in America, along with an Implementation Framework. USDA also has a long-standing focus on bio-based products, including support of the BioPreferred Program, a program created by the 2002 Farm Bill to increase the purchase of bio-based products and reauthorized in the 2018 Farm Bill. Their recent Economic Impact Analysis of the U.S. Biobased Products Industry report summarizes the status of bio-based products, an important component of the bioeconomy.

USDA also plays a central role in regulating biotechnology products, and the Bioeconomy EO called for updates to the regulatory system. In response, USDA (along with the FDA and the EPA) conducted stakeholder outreach, which is summarized in a report on Ambiguities, Gaps, and Uncertainties in Regulation of Biotechnology Under the Coordinated Framework. USDA also released a Plan for Regulatory Reform under the Coordinated Framework for the Regulation of Biotechnology and produced an updated Coordinated Framework website. Activities to improve coordination across the three major regulatory agencies are ongoing.

Unlike most federal departments and agencies, most programs and activities at USDA have a link with the life sciences, including those that support food and fiber, forests and grasslands, and other natural resources, as well as the manufacturing of numerous bio-based products and biofuels from these resources and the R&D and infrastructure that supports it. From USDA’s perspective, the department has served the bioeconomy since its founding in 1862. This broad focus provides many opportunities for strategic partnerships with other parts of the U.S. government working on the bioeconomy, and there are many different ways that USDA can contribute to the NBB.

On environmental sustainability, USDA has demonstrated its commitment to developing a circular bioeconomy, which is reflected in its Biomass Plan, and in its support for bio-based products and sustainable agriculture initiatives.

Conclusion

The agencies that make up the NBB highlight the complex nature of the U.S. bioeconomy and the various sectors that fall under it. Nevertheless, despite this complexity, the NBB is providing a whole-of-government approach to enable agencies to better support the burgeoning U.S. bioeconomy. The work underway is underpinned by the agencies’ priorities and programmatic expertise, but comes together to build the foundational base needed to support and grow the U.S. bioeconomy. Each agency also has a focus on environmental sustainability, with some, like DOE, DOD, and USDA, having a stronger focus due to their direct connections with the environment. Finally, agencies also agree on the need for more data on the bioeconomy’s impact as the different sectors evolve and the need for sustained funding to promote coordination, which takes time and effort beyond just programmatic work.


Appendix A. Interview Questions

  1. In response to the September 2022 Bioeconomy EO, your agency has produced some reports and other deliverables on the bioeconomy.
    • Are we missing any deliverables? Are there any other reports or activities that are already completed or still to come in response to the EO?
  1. Are there programs or other deliverables relevant to the bioeconomy that your agency has pursued under the Inflation Reduction Act or the CHIPS & Science Act?
  1. Are there other activities within your agency that you believe support the bioeconomy? Is the bioeconomy broader than what was captured by the EO and these other efforts?
  1. What does your agency hope to achieve in the foreseeable future and in the more distant future regarding the U.S. bioeconomy?
    • Are these goals related to the OSTP Bold Goals Report or other deliverables for the Bioeconomy EO?
    • To what extent will this progress be prioritized within your agency? How central to your agency is progress in the bioeconomy – now and into the future?
  1. What are the major hurdles your agency currently faces or may face in the future in reaching these goals?
  1. How does your agency tackle the issue of creating an environmentally sustainable bioeconomy and/or a circular bioeconomy?
    • Are there any initiatives in place currently or coming up in the near future that speak towards this?

Critical Thinking on Critical Minerals

Access to critical minerals supply chains will be crucial to the clean energy transition in the United States. Batteries for electric vehicles, in particular, will require the U.S. to consume an order of magnitude more lithium, nickel, cobalt, and graphite than it currently consumes. Currently, these materials are sourced from around the world. Mining of critical minerals is concentrated in just a few countries for each material, but is becoming increasingly geographically diverse as global demand incentivizes new exploration and development. Processing of critical minerals, however, is heavily concentrated in a single country—China—raising the risk of supply chain disruption. 

To address this, the U.S. government has signaled its desire to onshore and diversify critical minerals supply chains through key legislation, such as the Bipartisan Infrastructure Law and the Inflation Reduction Act, and trade policies. The development of new mining and processing projects entails significant costs, however, and project financiers require developers to demonstrate certainty that projects will generate profit through securing long-term offtake agreements with buyers. This is made difficult by two factors: critical minerals markets are volatile, and, without subsidies or trade protections, domestically-produced critical minerals have trouble competing against low-priced imports, making it difficult for producers and potential buyers to negotiate a mutually agreeable price (or price floor). As a result, progress in expanding the domestic critical minerals supply may not occur fast enough to catch up to the growing consumption of critical minerals.

To accelerate project financing and development, the Department of Energy (DOE) should help generate demand certainty through backstopping the offtake of processed, battery-grade critical minerals at a minimum price floor. Ideally, this would be accomplished by paying producers the difference between the market price and the price floor, allowing them to sign offtake agreements and sell their products at a competitive market price. Offtake agreements, in turn, allow developers to secure project financing and proceed at full speed with development.

While demand-side support can help address the challenges faced by individual developers, market-wide issues with price volatility and transparency require additional solutions. Currently, the pricing mechanisms available for battery-grade critical minerals are limited to either third-party price assessments with opaque sources or the market exchange traded price of imperfect proxies. Concerns have been raised about the reliability of these existing mechanisms, hindering market participation and complicating discussions on pricing. 

As the North American critical minerals industry and market develops, DOE should support the parallel development of more transparent, North American based pricing mechanisms to improve price discovery and reduce uncertainty. In the short- and medium-term, this could be accomplished through government-backed auctions, which could be combined with offtake backstop agreements. Auctions are great mechanisms for price discovery, and data from them can help improve market price assessments. In the long-term, DOE could support the creation of new market exchanges for trading critical minerals in North America. Exchange trading enables greater price transparency and provides opportunities for hedging against price volatility. 

Through this two-pronged approach, DOE would simultaneously accelerate the development of the domestic critical minerals supply chain through addressing short-term market needs, while building a more transparent and reliable marketplace for the future.

Introduction

The global transportation system is currently undergoing a transition to electric vehicles (EVs) that will fundamentally transform not only our transportation system, but also domestic manufacturing and supply chains. Demand for lithium ion batteries, the most important and expensive component of EVs, is expected to grow 600% by 2030 compared to 2023, and the U.S. currently imports a majority of its lithium batteries. To ensure a stable and successful transition to EVs, the U.S. needs to reduce its import-dependence and build out its domestic supply chain for critical minerals and battery manufacturing. 

Crucial to that will be securing access to battery-grade critical minerals. Lithium, nickel, cobalt, and graphite are the primary critical minerals used in EV batteries. All four were included in the 2023 Department of Energy (DOE) Critical Minerals List. Cobalt and graphite are considered at risk of shortage in the short-term (2020-2025), while all four materials are at risk in the medium-term (2025-2030).

As shown in Figure 1, the domestic supply chain for batteries and critical minerals consists primarily of downstream buyers like automakers and battery assemblers, though there are a growing number of battery cell manufacturers thanks to domestic sourcing requirements in the Inflation Reduction Act (IRA) incentives. The U.S. has major gaps in upstream and midstream activities—mining of critical minerals, refining/processing, and the production of active materials and battery components. These industries are concentrated globally in a small number of countries, presenting supply chain risks. By developing new domestic industries within these gaps, the federal government can help build out new, resilient clean energy supply chains. 

This report is organized into three main sections. The first section provides an overview of current global supply chains and the process of converting different raw materials into battery-grade critical minerals. The second section delves into the pricing and offtake challenges that projects face and proposes demand-side support solutions to provide the price and volume certainty necessary to obtain project financing. The final section takes a look at existing pricing mechanisms and proposes two approaches that the government can take to facilitate price discovery and transparency, with an eye towards mitigating market volatility in the long term. Given DOE’s central role in supporting the development of domestic clean energy industries, the policies proposed in this report were designed with DOE in mind as the main implementer.

Figure 1. Lithium-ion battery supply chain

Adapted from Li-BRIDGE

Segments highlighting in light blue indicated gaps in U.S. supply chains. See original graphic from Li-BRIDGE for more information.

Section 1. Understanding Critical Minerals Supply Chains

Global Critical Minerals Sources

Globally, 65% or more of processed lithium, cobalt, and graphite originates from a single country: China (Figure 2). This concentration is particularly acute for graphite, 91% of which was processed by China in 2023. This market concentration has made downstream buyers in the U.S. overly dependent on sourcing from a single country. The concentration of supply chains in any one country makes them vulnerable to disruptions within that country—whether they be natural disasters, pandemics, geopolitical conflict, or macroeconomic changes. Moreover, lithium, nickel, cobalt, and graphite are all expected to experience shortages over the next decade. In the case of future shortages, concentration in other countries puts U.S. access to critical minerals at risk. Rocky foreign relations and competition between the U.S. and China over the past few years have put further strain on this dependence. In October 2023, China announced new export controls on graphite, though it has not yet restricted supply, in response to the U.S.’s export restrictions on semiconductor chips to China and other “foreign entities of concern” (FEOC).

Expanding domestic processing of critical minerals and manufacturing of battery components can help reduce dependence on Chinese sources and ensure access to critical minerals in future shortages. However, these efforts will hurt Chinese businesses, so the U.S. will also need to anticipate additional protectionist measures from China.

On the other hand, mining of critical minerals—with the exception of graphite and rare earth elements—occurs primarily outside of China. These operations are also concentrated in a small handful of countries, shown in Figure 3. Consequently, geopolitical disruptions affecting any of those primary countries can significantly affect the price and supply of the material globally. For example, Russia is the third largest producer of nickel. In the aftermath of Russia’s invasion of Ukraine at the beginning of 2022, expectations of shortages triggered a historic short squeeze of nickel on the London Metal Exchange (LME), the primary global trading platform, significantly disrupting the global market. 
To address global supply chain concentration, new incentives and grant programs were passed in the IRA and the Bipartisan Infrastructure Law. These include the 30D clean vehicle tax credit, the 45X advanced manufacturing production credit, and the Battery Materials Processing Grants Program (see Domestic Price Premium section for further discussion). Thanks to these policies, there are now on the order of a hundred North American projects in mining, processing, and active1 material manufacturing in development. The success of these and future projects will help create new domestic sources of critical minerals and batteries to feed the EV transition in the U.S. However, success is not guaranteed. A number of challenges to investment in the critical minerals supply chain will need to be addressed first.

Battery Materials Supply Chain

Critical minerals are used to make battery electrodes. These electrodes require specific forms of critical minerals for their production processes: typically lithium hydroxide or carbonate, nickel sulfate, cobalt sulfate, and a blend of coated spherical graphite and synthetic graphite.2

Lithium hydroxide/carbonate typically comes from two sources: spodumene, a hard rock ore that is mined primarily in Australia, and lithium brine, which is primarily found in South America (Figure 3). Traditionally, lithium brine must be evaporated in large open-air pools before the lithium can be extracted, but new technologies are emerging for direct lithium extraction that significantly reduces the need for evaporation. Whereas spodumene mining and refining are typically conducted by separate entities, lithium brine operations are typically fully integrated. A third source of lithium that has yet to be put into commercial production is lithium clay. The U.S. is leading the development of projects to extract and refine lithium from clay deposits.
Lithium Hydroxide and Lithium Carbonate

Lithium hydroxide/carbonate typically comes from two sources: spodumene, a hard rock ore that is mined primarily in Australia, and lithium brine, which is primarily found in South America (Figure 3). Traditionally, lithium brine must be evaporated in large open-air pools before the lithium can be extracted, but new technologies are emerging for direct lithium extraction that significantly reduces the need for evaporation. Whereas spodumene mining and refining are typically conducted by separate entities, lithium brine operations are typically fully integrated. A third source of lithium that has yet to be put into commercial production is lithium clay. The U.S. is leading the development of projects to extract and refine lithium from clay deposits.

Nickel sulfate can be made from either nickel metal, which was historically the preferred feedstock, or directly from nickel intermediate products, such as mixed hydroxide precipitate and nickel matte, which are the feedstocks that most Chinese producers have switched to in the past few years (Figure 4). Though demand from batteries is driving much of the nickel project development in the U.S., since nickel metal has a much larger market than nickel sulfate, developers are designing their projects with the flexibility to produce either nickel metal or nickel sulfate.
Nickel Sulfate

Nickel sulfate can be made from either nickel metal, which was historically the preferred feedstock, or directly from nickel intermediate products, such as mixed hydroxide precipitate and nickel matte, which are the feedstocks that most Chinese producers have switched to in the past few years (Figure 4). Though demand from batteries is driving much of the nickel project development in the U.S., since nickel metal has a much larger market than nickel sulfate, developers are designing their projects with the flexibility to produce either nickel metal or nickel sulfate.

Cobalt is primarily produced in the Democratic Republic of the Congo from cobalt-copper ore. Cobalt can also be found in lesser amounts in nickel and other metallic ores. Cobalt concentrate is extracted from cobalt-bearing ore and then processed into cobalt hydroxide. At this point, the cobalt hydroxide can be further processed into either cobalt sulfate for batteries or cobalt metal and other chemicals for other purposes.
Cobalt Sulfate

Cobalt is primarily produced in the Democratic Republic of the Congo from cobalt-copper ore. Cobalt can also be found in lesser amounts in nickel and other metallic ores. Cobalt concentrate is extracted from cobalt-bearing ore and then processed into cobalt hydroxide. At this point, the cobalt hydroxide can be further processed into either cobalt sulfate for batteries or cobalt metal and other chemicals for other purposes.

Battery cathodes come in a variety of chemistries: lithium nickel manganese cobalt (NMC) is the most common in lithium-ion batteries thanks to its higher energy density, while lithium iron phosphate is growing in popularity for its affordability and use of more abundantly available materials, but is not as energy dense. Cathode active material (CAM) manufacturers purchase lithium hydroxide/carbonate, nickel sulfate, and cobalt sulfate and then convert them into CAM powders. These powders are then sold to battery cell manufacturers, who coat them onto copper electrodes to produce cathodes.
Cathode Active Materials

Battery cathodes come in a variety of chemistries: lithium nickel manganese cobalt (NMC) is the most common in lithium-ion batteries thanks to its higher energy density, while lithium iron phosphate is growing in popularity for its affordability and use of more abundantly available materials, but is not as energy dense. Cathode active material (CAM) manufacturers purchase lithium hydroxide/carbonate, nickel sulfate, and cobalt sulfate and then convert them into CAM powders. These powders are then sold to battery cell manufacturers, who coat them onto copper electrodes to produce cathodes.

Graphite can be synthesized from petroleum needle coke, a fossil fuel waste material, or mined from natural deposits. Natural graphite typically comes in the form of flakes and is reshaped into spherical graphite to reduce its particle size and improve its material properties. Spherical graphite is then coated with a protective layer to prevent unwanted chemical reactions when charging and discharging the battery.
Natural and Synthetic Graphite

Graphite can be synthesized from petroleum needle coke, a fossil fuel waste material, or mined from natural deposits. Natural graphite typically comes in the form of flakes and is reshaped into spherical graphite to reduce its particle size and improve its material properties. Spherical graphite is then coated with a protective layer to prevent unwanted chemical reactions when charging and discharging the battery.

The majority of battery anodes on the market are made using just graphite, so there is no intermediate step between processors and battery cell manufacturers. Producers of battery-grade synthetic graphite and coated spherical graphite sell these materials directly to cell manufacturers, who coat them onto electrodes to make anodes. These battery-grade forms of graphite are also referred to as graphite anode powder or, more generally, as anode active materials. Thus, the terms graphite processor and graphite anode manufacturer are interchangeable.
Anode Active Material

The majority of battery anodes on the market are made using just graphite, so there is no intermediate step between processors and battery cell manufacturers. Producers of battery-grade synthetic graphite and coated spherical graphite sell these materials directly to cell manufacturers, who coat them onto electrodes to make anodes. These battery-grade forms of graphite are also referred to as graphite anode powder or, more generally, as anode active materials. Thus, the terms graphite processor and graphite anode manufacturer are interchangeable.

Section 2. Building Out Domestic Production Capacity

Challenges Facing Project Developers

Offtake Agreements

Offtake agreements (a.k.a. supply agreements or contracts) are an agreement between a producer and a buyer to purchase a future product. They are a key requirement for project financing because they provide lenders and investors with the certainty that if a project is built, there will be revenue generated from sales to pay back the loan and justify the valuation of the business. The vast majority of feedstocks and battery-grade materials are sold under offtake agreements, though small amounts are also sold on the spot market in one-off transactions. Offtake agreements are made at every step of the supply chain: between miners and processors (if they’re not vertically integrated), between processors and component manufacturers; and between component manufacturers and cell manufacturers. Due to domestic automakers’ concerns about potential material shortages upstream and the desire to secure IRA incentives, many of them have also been entering into offtake agreements directly with North American miners and processors. Tesla has started constructing their own domestic lithium processing plant.

Historically, these offtake agreements were structured as fixed-price deals. However, when prices on the spot market go too high, sellers often find a way to rip up the contract, and vice versa, when spot prices go too low, buyers often find a way to get out of the contract. As a result, more and more offtake agreements for battery-grade lithium, nickel, and cobalt have become indexed to spot prices, with price floors and/or ceilings set as guardrails and adjustments for premiums and discounts based on other factors (e.g. IRA compliance, risk from a greenfield producer, etc.). 

Graphite is the one exception where buyers and suppliers have mostly stuck to fixed-price agreements. There are two main reasons for this: graphite pricing is opaque and products exhibit much more variation, complicating attempts to index the price. As a result, cell manufacturers don’t consider the available price indexes to accurately reflect the value of the specific products they are buying.

Offtake agreements for battery cells are also typically partially indexed on the price of the critical minerals used to manufacture them. In other words, a certain amount of the price per unit of battery cell is fixed in the agreement, while the rest is variable based on the index price of critical minerals at the time of transaction.

Domestic critical minerals projects face two key challenges to securing investment and offtake agreements: market volatility and a lack of price competitiveness. The price difference between materials produced domestically and those produced internationally stems from two underlying causes: the current oversupply from Chinese-owned companies and the domestic price premium. 

Market Volatility

Lithium, cobalt, and graphite have relatively low-volume markets with a small customer base compared to traditional commodities. Low-volume products experience low liquidity, meaning it can be difficult to buy or sell quickly, so slight changes in supply and demand can result in sharp price swings, creating a volatile market. Because of the higher risk and smaller market, companies and investors tend to prefer mining and processing of base metals, such as copper, which have much larger markets, resulting in underinvestment in production capacity. 

In comparison, nickel is a base metal commodity, primarily used for stainless steel production. However, due to its rapidly growing use in battery production, its price has become increasingly linked to other battery materials, resulting in greater volatility than other base metals. Moreover, the short squeeze in 2022 forced LME to suspend trading and cancel transactions for the first time in three decades. As a result, trust in the price of nickel on LME faltered, many market participants dropped out, and volatility grew due to low trading volumes.

For all four of these materials, prices reached record highs in 2022 and subsequently crashed in 2023 (Figure 4). Nickel, cobalt, and graphite experienced price declines of 30-45%, while lithium prices dropped by an enormous 75%. As discussed above, market volatility discourages investment into critical minerals production capacity. The current low prices have caused some domestic projects to be paused or canceled. For example, Jervois halted operation of its Idaho cobalt mine in March 2023 due to cobalt prices dropping below its operating costs. In January 2024, lithium giant Albemarle announced that it was delaying plans to begin construction on a new South Carolina lithium hydroxide processing plant.

Retrospective analysis suggests that mining companies, battery investors, and automakers had all made overly optimistic demand projections and ramped up their production a bit too fast. These projections assumed that EV demand would keep growing as fast as it did immediately after the pandemic and that China’s lifting of pandemic restrictions would unlock even faster growth in the largest EV market. Instead, China, which makes up over 60% of the EV market, emerged into an economic downturn, and global demand elsewhere didn’t grow quite as fast as projected, as backlogs built up during the pandemic were cleared. (It is important to note that the EV market is still growing at significant rates—global EV sales increased by 35% from 2022 to 2023—just not as fast as companies had wished.) Consequently, supply has temporarily outpaced demand. Midstream and upstream companies stopped receiving new purchase orders while automakers worked through their stock build-up. Prices fell rapidly as a result and are now bottoming out. Some companies are waiting for prices to recover before they restart construction and operation of existing projects or invest in expanding production further. 

While companies are responding to short-term market signals, the U.S. government needs to act in anticipation of long-term demand growth outpacing current planned capacity. Price volatility in critical minerals markets will need to be addressed to ensure that companies and financiers continue investing in expanding production capacity. Otherwise, demand projections suggest that the supply chain will experience new shortages later this decade. 

Oversupply

The current oversupply of critical minerals has been exacerbated by below market-rate financing and subsidies from the Chinese government. Many of these policies began in 2009, incentivizing a wave of investment not just in China, but also in mineral-rich countries. These subsidies played a large role in the 2010s in building out nascent battery critical minerals supply chains. Now, however, they are causing overproduction from Chinese-owned companies, which threatens to push out competitors from other countries.

Overproduction begins with mining. Chinese companies are the primary financial backers for 80% of both the Democratic Republic of the Congo’s cobalt mines and Indonesia’s nickel mines. Chinese companies have also expanded their reach in lithium, buying half of all the lithium mines offered for sale since 2018, in addition to domestically mining 18% of global lithium.  For graphite, 82% of natural graphite was mined directly in China in 2023, and nearly all natural and synthetic graphite is processed in China.

After the price crash in 2023, while other companies pulled back their production volume significantly, Chinese-owned companies pulled back much less and in some cases continued to expand their production, generating an oversupply of lithium, cobalt, nickel, and natural and synthetic graphite. Government policies enabled these decisions by making it financially viable for Chinese companies to sell materials at low prices that would otherwise be unsustainable. 

Domestic Price Premium (and Current Policies Addressing It) 

Domestically-produced critical minerals and battery electrode active materials come with a higher cost of production over imported materials due to higher wages and stricter environmental regulations in the U.S. The IRA’s new 30D and 45X tax credit and upcoming section 301 tariffs help address this problem by creating financial incentives for using domestically produced materials, allowing them to compete on a more even playing field with imported materials. 

The 30D New Clean Vehicle Tax Credit provides up to $7,500 per EV purchased, but it requires eligible EVs to be manufactured from critical minerals and battery components that are FEOC-compliant, meaning they cannot be sourced from companies with relationships to China, North Korea, Russia, and Iran. It also requires that an increasing percentage of critical minerals used to make the EV batteries be extracted or processed in the U.S. or a Free Trade Agreement country. These two requirements apply to lithium, nickel, cobalt, and graphite. For graphite, however, since nearly all processing occurs in China and there is currently no domestic supply, the US Treasury has chosen to exempt it from the 30D tax credit’s FEOC and domestic sourcing requirements until 2027 to give automakers time to develop alternate supply chains.

The 45X Advanced Manufacturing Production Tax Credit subsidizes 10% of the production cost for each unit of critical minerals processed. The Internal Revenue Service’s proposed regulations for this tax credit interprets the legislation for 45X as applying only to the value-added production cost, meaning that the cost of purchasing raw materials and processing chemicals is not included in the covered production costs. This limits the amount of subsidy that will be provided to processors. The strength of 45X, though, is that unlike the 30D tax credit, there is no sunset clause for critical minerals, providing a long term guarantee of support. 

In terms of tariffs, the Biden administration announced in May 2024, a new set of section 301 tariffs on Chinese products, including EVs, batteries, battery components, and critical minerals. The critical minerals tariffs include a 25% tariff on cobalt ores and concentrates that will go into effect in 2024 and a 25% tariff on natural flake graphite that will go into effect in 2026. In addition, there are preexisting 25% tariffs in section 301 for natural and synthetic graphite anode powder. These tariffs were previously waived to give automakers time to diversify their supply chains, but the U.S. Trade Representative (USTR) announced in May 2024 that the exemptions would expire for good on June 14th, 2024, citing the lack of progress from automakers as a reason for not extending them.

Current State of Supply Chain Development

For lithium, despite market volatility, offtake demand for existing domestic projects has remained strong thanks to IRA incentives. Based on industry conversations, many of the projects that are developed enough to make offtake agreements have either signed away their full output capacity or are actively in the process of negotiating agreements. Strong demand combined with tax incentives has enabled producers to negotiate offtake agreements that guarantee a price floor at or above their capital and operating costs. Lithium is the only material for which the current planned mining and processing capacity for North America is expected to meet demand from planned U.S. gigafactories.

Graphite project developers report that the 25% tariff coming into force will be sufficient to close the price gap between domestically produced materials and imported materials, enabling them to secure offtake agreements at a sustainable price. Furthermore, the Internal Revenue Service will require 30D tax credit recipients to submit period reports on progress that they are making on sourcing graphite outside of China. If automakers take these reports and the 2027 exemption deadline seriously, there will be even more motivation to work with domestic graphite producers. However, the current planned production capacity for North America still falls significantly short of demand from planned U.S. battery gigafactories. Processing capacity is the bottleneck for production output, so there is room for additional investment in processing capacity.

Pricing has been a challenge for cobalt though. Jervois briefly opened the only primary cobalt mine in the U.S. before shutting down a few months later due to the price crash. Jervois has said that as soon as prices for standard-grade cobalt rise above $20/pound, they will be able to reopen the mine, but that has yet to happen. Moreover, the real bottleneck is in cobalt processing, which has attracted less attention and investment than other critical minerals in the U.S. There are currently no cobalt sulfate refineries in North America; only one or two are in development in the U.S. and a few more in Canada.3

Nickel sulfate is also facing pricing challenges, and, similar to cobalt, there is an insufficient amount of nickel sulfate processing capacity being developed domestically. There is one processing plant being developed in the U.S. that will be able to produce either nickel metal or nickel sulfate and a few more nickel sulfate refineries being developed in Canada.

Policy Solutions to Support the Development of Processing Capacity

The U.S. government should prioritize the expansion of processing capacity for lithium, graphite, cobalt, and nickel. Demand from domestic battery manufacturing is expected to outpace the current planned capacity for all of these materials, and processing capacity is the key bottleneck in the supply chain. Tariffs and tax incentives have resulted in favorable pricing for lithium and graphite project developers, but cobalt and nickel processing has gotten less support and attention. 

DOE should provide demand-side support for processed, battery-grade critical minerals to accelerate the development of processing capacity and address cobalt and nickel pricing needs. The Office of Manufacturing and Energy Supply Chains (MESC) within DOE would be the ideal entity to administer such a program, given its mandate to address vulnerabilities in U.S. energy supply chains. In the immediate term, funding could come from MESC’s Battery Materials Processing Grants program, which has roughly $1.9B in remaining, uncommitted funds. Below we propose a few demand-support mechanisms that MESC could consider.

Long term, the Bipartisan Policy Center proposes that Congress establish and appropriate funding for a new government corporation that would take on the responsibility of administering demand-support mechanisms as necessary to mitigate volume and price uncertainty and ensure that domestic processing capacity grows to sufficiently meet critical minerals needs.

Offtake Backstops

Offtake backstops would commit MESC to guaranteeing the purchase of a specific amount of materials at a minimum negotiated price if producers are unable to find buyers at that price. This essentially creates a price floor for specific producers while also providing a volume guarantee. Offtake backstops help derisk project development and enable developers to access project financing. Backstop agreements should be made for at least the first five years of a plant’s operations, similar to a regular offtake agreement. Ideally, MESC should prioritize funding for critical minerals with the largest expected shortages based on current planned capacity—i.e., nickel, cobalt, and graphite.

There are two primary ways that DOE could implement offtake backstops:

First. The simplest approach would be for DOE to pay processors the difference between the spot price index (adjusted for premiums and discounts) and the pre-negotiated price floor for each unit of material, similar to how a pay-for-difference or one-sided contract-for-difference would work.4 This would enable processors to sign offtake agreements with no price floor, accelerating negotiations and thus the pace of project development. Processors could also choose to keep some of their output capacity uncommitted so that they can sell their products on the spot market without worrying about prices collapsing in the future.

A more limited form of this could look like DOE subsidizing the price floor for specific offtake agreements between a processor and a buyer. This type of intervention requires a bit more preliminary work from processors, since they would have to identify and bring a buyer to the table before applying for support.

Second. Purchasing the actual materials would be a more complex route for DOE to take, since the agency would have to be ready to receive delivery of the materials. The agency could do this by either setting up a system of warehouses suitable for storing battery-grade critical minerals or using “virtual warehousing,” as proposed by the Bipartisan Policy Center. An actual warehousing system could be set up by contracting with existing U.S. warehouses, such as those in LME and CME’s networks, to expand or upgrade their facilities to store critical minerals. These warehouses could also be made available for companies’ to store their private stockpiles, increasing the utility of the warehousing system and justifying the cost of setting it up. Virtual warehousing would entail DOE paying producers to store materials on-site at their processing plants. 

The physical reserve provides an additional opportunity for DOE to address market volatility by choosing when it sells materials from the reserve. For example, DOE could pause sales of a material when there is an oversupply on the market and prices dip or ramp up sales when there is a shortage and prices spike. However, this can only be used to address short-term fluctuations in supply and demand (e.g. a few months to a few years at most), since these chemicals have limited shelf lives. 

A third way to implement offtake backstops that would also support price discovery and transparency is discussed in Section 3. 


Section 3. Creating Stable and Transparent Markets

Concerns about Pricing Mechanisms

Market volatility in critical minerals markets has raised concerns about just how reliable the current pricing mechanisms for these markets are. There are two main ways that prices in a market are determined: third-party price assessments and market exchanges. A third approach that has attracted renewed attention this year is auctions. Below, we walk through these three approaches and propose potential solutions for addressing challenges in price discovery and transparency. 

Index Pricing

Price reporting agencies like Fastmarkets and Benchmark Mineral Intelligence offer subscription services to help market participants assess the price of commodities in a region. These agencies develop rosters of companies for each commodity, who regularly contribute information on transaction prices. That intel is then used to generate price indexes. Fastmarkets and Benchmark’s indexes are primarily based on prices provided by large, high-volume sellers and buyers. Smaller buyers may pay higher than index prices. 

It can be hard to establish reliable price indexes in immature markets if there is an insufficient volume of transactions or if the majority of transactions are made by a small set of companies. For example, lithium processing is concentrated among a small number of companies in China and spot transactions are a minority share of the market. New entrants and smaller producers have raised concern that these companies have significant control over Asian spot prices reported by Fastmarkets and Benchmark, which are used to set offtake agreement prices, and that the price indexes are not sufficiently transparent.

Exchange Trading

Market exchanges are a key feature of mature markets that helps reduce market volatility. Market exchanges allow for a wider range of participants, improving market liquidity, and enables price discovery and transparency. Companies up and down the supply chain can use physically-delivered futures and options contracts to hedge against price volatility and gain visibility into expectations for the market’s general direction to help inform decision-making. This can help derisk the effect of market volatility on investments in new production capacity.

Of the materials we’ve discussed, nickel and cobalt metal are the only two that are physically traded on a market exchange, specifically LME. Metals make good exchange commodities due to their fungibility. Other forms of nickel and cobalt are typically priced as a percentage of the payable price for nickel and cobalt metal. LME’s nickel price is used as the global benchmark for many nickel products, while the in-warehouse price of cobalt metal in Rotterdam, Europe’s largest seaport, is used as the global benchmark for many cobalt products. These pricing relationships enable companies to use nickel and cobalt metal as proxies for hedging related materials.

After nickel trading volumes plummeted on LME in the wake of the short squeeze, doubts were raised about LME’s ability to accurately benchmark its price, sparking interest in alternative exchanges. In April 2024, UK-based Global Commodities Holdings Ltd (GCHL) launched a new trading platform for nickel metal that is only available to producers, consumers, and merchants directly involved in the physical market, excluding speculative traders. The trading platform will deliver globally “from Baltimore to Yokohama.” GCHL is using the prices on the platform to publish its own price index and is also working with Intercontinental Exchange to create cash-settled derivatives contracts. This new platform could potentially expand to other metals and critical minerals. 

In addition to LME’s troubles though, changes in the battery supply chain have led to a growing divergence between the nickel and cobalt metal traded on exchanges and the actual chemicals used to make batteries. Chinese processors who produce most of the global supply of nickel sulfate have mostly switched from nickel metal to cheaper nickel intermediate products as their primary feedstock. Consequently, market participants say that the LME exchange price for nickel metal, which is mostly driven by stainless steel, no longer reflects market conditions for the battery sector, raising the need for new tradeable contracts and pricing mechanisms. For the cobalt industry, 75% of demand comes from batteries, which use cobalt sulfate. Cobalt metal makes up only 18% of the market, of which only 10-15% is traded on the spot market. As a result, cobalt chemicals producers have transitioned away from using the metal reference price towards fixed-prices or cobalt sulfate payables. 

These trends motivate the development of new exchange contracts for physically trading nickel and cobalt chemicals that can enable price discovery separate from the metals markets. There is also a need to develop exchange contracts for materials like lithium and graphite with immature markets that exhibit significant volatility. 

However, exchange trading of these materials is complicated by their nature as specialty chemicals: they have limited shelf lives and more complex storage requirements, unlike metal commodities. Lithium and graphite products also exhibit significant variations that affect how buyers can use them. For example, depending on the types and level of impurities in lithium hydroxide/carbonate, manufacturers of cathode active materials may need to conduct different chemical processes to remove them. Offtakers may also require that products meet additional specifications based on the characteristics they need for their CAM and battery chemistries.

For these reasons, major exchanges like LME, the Chicago Mercantile Exchange (CME), and the Singapore Exchange (SGX) have instead chosen to launch cash-settled contracts for lithium hydroxide/carbonate and cobalt hydroxide that allow for financial trading, but require buyers and sellers to arrange physical delivery separately from the exchange. Large firms have begun to participate increasingly in these derivatives markets to hedge against market volatility, but the lack of physical settlement limits their utility to producers who still need to physically deliver their products in order to make a profit. Nevertheless, CME’s contracts for lithium and cobalt have seen significant growth in transaction volume. LME, CME, and SGX all use Fastmarkets’ price indexes as the basis for their cash-settled contracts. 

As regional industries mature and products become more standardized, these exchanges may begin to add physically settled contracts for battery-grade critical minerals. For example, the Guangzhou Futures Exchange (GFEX) in China, where the vast majority of lithium refining currently occurs, began offering physically settled contracts for lithium carbonate in August 2023. Though the exchange exhibited significant volatility in its first few months, raising concerns, the first round of physical deliveries in January 2024 occurred successfully, and trading volumes have been substantial this year. Access to GFEX is currently limited to Chinese entities and their affiliates, but another trading platform could come to do the same for North America over the next few decades as lithium production volume grows and a spot market emerges. Abaxx Exchange, a Singapore-based startup, has also launched a physically settled futures contract for nickel sulfate with delivery points in Singapore and Rotterdam. A North American delivery point could be added as the North American supply chain matures. 

No market exchange for graphite currently exists, since products in the industry vary even greater than other materials. Even the currently available price indexes are not seen as sufficiently robust for offtake pricing. 

Auctions

In the absence of a globally accessible market exchange for lithium and concerns about the transparency of index pricing, Albemarle, the top producer of lithium worldwide, has turned to auctions of spodumene concentrate and lithium carbonate as a means to improve market transparency and an “approach to price discovery that can lead to fair product valuation.” Albemarle’s first auction in March of spodumene concentrate in China closed at a price of $1200/ton, which was in line with spot prices reported by Asian Metal, but about 10% greater than prices provided by other price reporting agencies like Fastmarkets. Plans are in place to continue conducting regular auctions at the rate of about one per week in China and other locations like Australia. Lithium hydroxide will be auctioned as well. Auction data will be provided to Fastmarkets and other price reporting agencies to be formulated into publicly available price indexes.

Auctions are not a new concept: in 2021 and 2022, Pilbara Minerals regularly conducted auctions of spodumene on its own platform Battery Metals Exchange, helping to improve market sentiment. Now, though, the company says that most of its material is now committed to offtakers, so auctions have mostly stopped, though it did hold an auction for spodumene concentrate in March. If other lithium producers join Albemarle in conducting auctions, the data could help improve the accuracy and transparency of price indexes. Auctions could also be used to inform the pricing of other battery-grade critical minerals. 

Policy Solutions to Support Price Discovery and Transparency Across the Market

Right now, the only pricing mechanisms available to domestic project developers are spot price indexes for battery-grade critical minerals in Asia or global benchmarks for proxies like nickel and cobalt metal. Long-term, the development of new pricing mechanisms for North America will be crucial to price discovery and transparency in this new market. There are two ways that DOE could help facilitate this: one that could be implemented immediately for some materials and one that will require domestic production volume to scale up first.

First. Government-Backed Auctions: Auctions require project developers to keep a portion of their expected output uncommitted to any offtakers. However, there is a risk that future auctions won’t generate a price sufficient to offset capital and operating expenses, so processors are unlikely to do this on their own, especially for their first domestic project. MESC could address this by providing a backstop guarantee for the portion of a producer’s output that they commit to regularly auctioning for a set timespan. If, in the future, auctions are unable to generate a price above a pre-negotiated price floor, then DOE would pay sellers the difference between the highest auction price and the price floor for each unit sold. Such an agreement could be made using DOE’s Other Transaction Authority. DOE could separately contract with a platform such as MetalsHub to conduct the auction. 

Government-backed auctions would enable the discovery of a true North American price for different battery-grade critical minerals and the raw materials used to make them, generating a useful comparison point with Asian spot prices. Such a scheme would also help address developers’ price and demand needs for project financing. These backstop-auction agreements could be complementary to the other types of backstop agreements proposed earlier and potentially more appealing than physically offtaking materials since the government would not have to receive delivery of the materials and there would be a built-in mechanism to sell the materials to an appropriate buyer. If successful, companies could continue to conduct auctions independently after the agreements expire.

Second. New Benchmark Contracts: Employ America has proposed that the Loan Programs Office (LPO) could use Section 1703 to guarantee lending to a market exchange to develop new, physically settled benchmark contracts for battery-grade critical minerals. The development of new contracts should include producers in the entire North American region. Canada also has a significant number of mines and processing plants in development. Including those projects would increase the number of participants, market volume, and liquidity of new benchmark contracts.

In order for auctions or new benchmark contracts to operate successfully, three prerequisites must be met:

  1. There must be a sufficient volume of materials available for sale (i.e. production output that is not committed to an offtaker).
  2. There must be sufficient product standardization in the industry such that materials produced by different companies can be used interchangeably by a significant number of buyers.
  3. There must be a sufficient volume of demand from buyers, brokers, and traders.

Market exchanges typically conduct research into stakeholders to understand whether or not the market is mature enough to meet these requirements before they launch a new contract. Interest from buyers and sellers must indicate that there would be sufficient trading volume for the exchange to make a profit greater than the cost of setting up the new contract. A loan from LPO under Section 1703 can help offset some of those upfront costs and potentially make it worthwhile for an exchange to launch a new contract in a less mature market than they typically would. 

Government-backed auctions, on the other hand, solve the first prerequisite by offering guarantees to producers for keeping a portion of their production output uncommitted. Product standardization can also be less stringent, since each producer can hold separate auctions, with varying material specifications, unlike market exchanges where there must be a single set of product standards.

Given current market conditions, no battery-grade critical minerals can meet the above prerequisites for new benchmark contracts, primarily due to a lack of available volume, though there are also issues with product standardization for certain materials. However, nickel, cobalt, lithium, and graphite could be good candidates for government-backed auctions. DOE should start engaging with project developers that have yet to fully commit their output to offtakers and gauge their interest in backstop-auction agreements. 

Nickel and Cobalt

As discussed prior, there are only a handful of nickel and cobalt sulfate refineries currently being developed in North America, making it difficult to establish a benchmark contract for North America. None of the project developers have yet signed offtake agreements covering their full production capacity, so backstop-auction agreements could be appealing to project developers and their investors. Given that more than half of the projects in development are located in Canada, MESC and DOE’s Office of International Affairs should collaborate with the Canadian government in designing and implementing government-backed auctions. 

Lithium

Domestic companies have expressed interest in establishing North American-based spot markets and price indexes for lithium hydroxide and carbonate, but say that it will take quite a few years before production volume is large enough to warrant that. Product variation has also been a concern from lithium processors when the idea of a market exchange or public auction has been raised. Lessons could be learned from the GFEX battery-grade lithium carbonate contracts. GFEX set standards on the purity, moisture, loss on ignition, and maximum content of different impurities. Some Chinese companies were able to meet these standards, while others were not, preventing them from participating in the futures market or requiring them to trade their materials as lower-purity industrial-grade lithium carbonate, which sells for a discounted price. Other companies producing lithium of much higher quality than the GFEX standards, opted to continue selling on the spot market because they could charge a premium on the standard price. Despite some companies choosing not to participate, trading volumes on GFEX have been substantial, and the exchange was able to weather through initial concerns of a short squeeze, suggesting that challenges with product variation can be overcome through standardization.

Analysts have proposed that spodumene could be a better candidate for exchange trading, since it is fungible and does not have the limited shelf-life or storage requirements of lithium salts. 60% of global lithium comes from spodumene, and the U.S. has some of the largest spodumene deposits in the world, so spodumene would be a good proxy for lithium salts in North America. However, the two domestic developers of spodumene mines are planning to construct processing plants to convert the spodumene into battery-grade lithium on-site. Similarly, the two Canadian mines that currently produce spodumene are also planning to build their own processing plants. These vertical integration plans mean that there is unlikely to be large amounts of spodumene available for sale on a market exchange in the near future.

DOE could, however, work with miners and processors to sign backstop-auction agreements for smaller amounts of lithium hydroxide/carbonate and spodumene that they have yet to commit to offtakers. This may be especially appealing to companies that have announced delays to project development due to current low market prices and help derisk bringing timelines forward. Interest in these future auctions could also help gauge the potential for developing new benchmark contracts for lithium hydroxide/carbonate further down the line.

Graphite

Natural and synthetic graphite anode material products currently exhibit a great range of variation and insufficient product standardization, so a market exchange would not be viable at the moment. As the domestic graphite industry develops, DOE should work with graphite anode material producers and battery manufacturers to understand the types and degree of variations that exist across products and discuss avenues towards product standardization. Government-backed auctions could be a smaller-scale way to test the viability of product standards developed from that process, perhaps using several tiers or categories to group products. Natural and synthetic graphite would have to be treated separately, of course. 

Conclusion

The current global critical minerals supply chain partially reflects the results of over a decade of focused, industrial policies implemented by the Chinese government. If the U.S. wants to lead the clean energy transition, critical minerals will also need to become a cornerstone of U.S. industrial policy. Developing a robust North American critical minerals industry would bolster U.S. energy security and independence and ensure a smooth energy transition. 

Promising progress has already been made in lithium, with planned processing capacity expected to meet demand from future battery manufacturing. However, market and pricing challenges remain for battery-grade nickel, cobalt, and graphite, which will fall far short of future demand without additional intervention. This report proposes that DOE take a two-pronged approach to supporting the critical minerals industry through offtake backstops, which address project developers’ current pricing dilemmas, and the development of more reliable and transparent pricing mechanisms such as government-backed auctions, which will set up markets for the future.

While the solutions proposed in this report focus on DOE as the primary implementer, Congress also has a role to play in authorizing and appropriating new funding necessary to execute a cohesive industrial strategy on critical minerals . The policies proposed in this report can also be applied to other critical minerals crucial for the energy transition and our national security. Similar analysis of other critical minerals markets and end uses should be conducted to understand how these solutions can be tailored to those industry needs. 

Building a Whole-of-Government Strategy to Address Extreme Heat

Comprehensive recommendations from +85 experts to enable a heat-resilient nation

From August 2023 to March 2024, the Federation of American Scientists (FAS) talked with +85 experts to source 20 high-demand opportunity areas for ready policy innovation and 65 policy ideas. In response, FAS recruited 33 authors to work on +18 policy memos through our Extreme Heat Policy Sprint from January 2024 to April 2024, generating an additional +100 policy recommendations to address extreme heat. Our experts’ full recommendations can be found here. In total, FAS has collected +165 recommendations for 34 offices and/or agencies. Key opportunity areas are described below and link out to a set of featured recommendations. Find the 165 policy ideas developed through expert engagement here.


America is rapidly barreling towards its next hottest summer on record. While we wait for a national strategy, states, counties, and cities around the country have taken up the charge of addressing extreme heat in their communities and are experimenting on the fly. California has announced $200 million to build resilience centers that protect communities from extreme heat and has created an all-of-government action plan to address extreme heat. Arizona, New Jersey, and Maryland are all actively developing extreme heat action plans of their own. Miami-Dade County considered passing some of the strictest workplace heat rules (although the measure ultimately failed). Additionally, New York City and Los Angeles have driven cool roof adoption through funding programs and local ordinances, which can reduce energy demands, improve indoor comfort, and potentially lower local outside air temperatures.  

While state and local governments can make significant advances, national extreme heat resilience requires a “whole of government” federal approach, as it intersects health, energy, housing, homeland and national security, international relations, and many more policy domains. The federal government plays a critical role in scaling up heat resilience interventions through research and development, regulations, standards, guidance, funding sources, and other policy levers. But what are the transformational policy opportunities for action?

Sourcing Opportunities and Ideas for Policy Innovation

During Fall 2023, FAS engaged +85 experts in conversations around federal policies needed to address extreme heat. Our stakeholders included: 22 academic researchers, 33 non-profit organization leaders, 12 city and state government employees, 3 private company leaders, 2 current or former Congressional staffers, 3 National Labs leaders, and 10 current or former federal government employees. Our conversations were guided by the following four questions:

Our conversations with experts sourced 20 high-demand opportunity areas for policy innovation and 65 policy ideas. To go deeper, FAS recruited 33 authors to work on +18 policy memos through our Extreme Heat Policy Sprint, generating an additional +100 policy recommendations to address extreme heat’s impacts and build community resilience. Our policy memos from the Extreme Heat Policy Sprint, published in April 2024, provide a more comprehensive dive into many of the key policy opportunities articulated in this report. Overall, FAS’ work scoping the policy landscape, understanding the needs of key actors, identifying demand signals, and responding to these demands has generated +165 policy recommendations for 34 offices and/or agencies.

Opportunities for Extreme Heat Policy Innovation

The following 20 “opportunity areas” are not exhaustive, yet can serve as inspiration for the building blocks of a future strategic initiative.

Facilitate Government-Wide Coordination

The first opportunity is an overarching call to action: the need for a government-wide extreme heat strategic initiative. This can build upon the National Integrated Health Health Information System’s (NIHHIS) National Heat Strategy, set to release this year. This strategy would define the problems to solve, create targets and galvanizing goals, set and assign priorities for federal agencies, review available resources for financial assistance, assess regulatory and rulemaking authority where applicable, highlight legislative action, and include evaluation metrics and timeline for review, adjustment, and renewal of programs. In creating this strategy, one interviewee recommended there should be a comprehensive review of “heat exposure settings” and federal actors that can safeguard Americans in these settings: homes, workplaces, schools and childcare facilities, transit, senior living facilities, correctional facilities, and outdoor public spaces. Through scoping potential regulations, standards, guidelines, planning processes, research agendas, and financial assistance, the federal government will then be prepared to support its intergovernmental actors and communities.

Accelerate Resilient Cooling Technologies, Building Codes, and Urban Infrastructure

On average, Americans spend 90% of their time indoors, making the built environment a critical site for heat exposure mitigation. To keep cool, especially in places of the U.S. not used to extreme heat, buildings are increasingly reliant on mechanical cooling interventions. While a life-saving necessity, air conditioning (AC) consumes significant amounts of electricity, putting high demands on aging grid infrastructure during the hottest days. Excess heat from air conditioners can lead to higher outdoor temperatures and even more AC demand. Finally, ACs are useless interventions if there’s no power, an increasing risk due to growing energy poverty and grid failure. In these scenarios, our current construction is likely to widely “fail” in its ability to cool residents.

Resilient cooling strategies, like high-energy efficiency cooling systems, demand/response systems, and passive cooling interventions, need policy actions to rapidly scale for a warming world. For example, cool roofs, walls, and surfaces can keep buildings cool and less reliant on mechanical cooling, but are often not considered a part of weatherization audits and upgrades. District cooling, such as through networked geothermal, can keep entire neighborhoods cool while relying on little electricity, but is still in the demonstration project phase in the United States. Heat pumps are also still out of reach for many Americans, making it essential to design technologies that work for different housing types (i.e. affordable housing construction). Initiatives like the Department of Energy’s (DOE) Affordable Home Energy Shot can bring these technologies into reach for millions of Americans, but only if it is given sufficient financial resources. DOE’s Office of Clean Energy Demonstrations and State and Community Energy Programs FY25 budget request to strengthen heat resilience in disadvantaged communities through energy solutions could be a step towards realizing innovative heat technologies. Further, the Environmental Protection Agency’s Energy Star program can further incentivize low-power and resilient cooling technologies — if rebates are designed that take advantage of these technologies.

Thermal resilience of buildings must also be considered, for both day-to-day operations and emergency blackout scenarios. DOE can work with stakeholders to create “cool” building standards and metrics with human health and safety in mind, and integrate them into building codes like ASHREI 189.1 and 90 series. These codes are “win-wins” for building designers, creating buildings that consume far less electricity while keeping inhabitants safe from the heat. DOE can assist in conducting more demonstration projects for building strategies that ensure indoor survivability in everyday and extreme conditions. 

Intervention efficacy and applicability are still evolving for extreme heat resilience interventions at the community scale, such as cool pavements, urban greening, shading, ventilation corridors, and development regulations (i.e. solar orientation). Individual interventions and their interactions need more evidence of their costs and benefits, potential tradeoffs and maladaptations. The National Institutes of Standards and Technology works on building and urban planning standards for other natural hazards, such as their National Windstorm Impact Reduction Program (NWIRP) and their Community Resilience program, and could serve as a “technology test-bed” for heat resilience practices and advance our understanding of their effectiveness as well as how to measure and account for benefits and costs. This could be done in partnership with the National Science Foundation, which has been dedicating funding for use-inspired research and technology development for climate resilience.

Finally, the U.S. government is the largest landlord in the nation. As the General Services Administration is rapidly decarbonizing its buildings, it can also be a test site for new technologies, building designs, planning, and resilience metrics development and analysis.

Adapt Transportation to the Heat

Public transportation is a site of high exposure to extreme heat. While the Department of Transportation’s Promoting Resilient Operations for Transformative, Efficient, and Cost-saving Transportation (PROTECT) grants are for “surface transportation resilience,” multiple of our local and regional government interviewees expressed difficulty successfully applying to these grants for “cooling” infrastructure, like water fountains, shade, and air-conditioned bus shelters. DOT should make extreme heat resilience explicit in its eligibility requirements as well as review the benefit-cost analysis (BCA) formula and how it might disadvantage cool infrastructure. 

Asphalt and concrete roadways contribute to the urban heat island effect and hotter weather makes asphalt in particular more vulnerable to cracking. DOT should leverage its research and development (R&D) capabilities to develop and deploy reflective and cool materials as a part of transportation infrastructure improvements. Finally, DOT should also consider the levers available to incentivize cool surfaces and cool materials as a part of transportation construction.

Create More Heat-Resilient Schools for Sustained Learning

Higher temperatures combined with minimal to no air conditioning in older school buildings have led to an increase in the number of “heat days”, or school closures due to dangerous temperatures. Pulling children out of the classroom not only negatively impacts them, but also puts increasing strain on families that rely on schools as childcare. Even when school is in session, many students are attempting to learn in classrooms exceeding 80°F, a temperature threshold where studies have repeatedly shown that students struggle to learn and fall short of true academic performance. This is because heat reduces cognitive function and ability to concentrate – both essential to learning. Learning loss from rising heat will only compound the learning losses from the COVID-19 pandemic. The Environmental Protection Agency predicts that the total lost future income attributable to heat-related learning losses may reach $6.9 billion at 2°C (a threshold we are well on the way to meeting) and $13.4 billion at 4°C. Schools need guidance on how to deal with the heat crisis currently at hand, while being supported as they plan necessary climate adaptations needed for a hotter world. 

At a minimum, schools can be encouraged to formalize plans for school heat preparedness to protect both the health of students and safeguard their learning. No federal heat safety recommendations yet exist and thus will need to be created by the Department of Education (Ed), EPA, FEMA, the National Oceanic and Atmospheric Administration (NOAA), and others. Title I Grants, in alignment with Justice40, could then assist schools in adapting to climate change that includes researched guidance on ways to cool students indoors, outdoors, and through behavioral management. Further, school system leaders need a better system to track how schools are currently experiencing extreme heat and what strategies could be employed to respond to heat exposure (closing schools, informed behavioral interventions to manage heat exposure, green infrastructure to build resilience, etc). Federal involvement is essential for creating this tool. Finally, to address the root causes of excessive classroom heat, schools will need to transform their infrastructure through HVAC investments and improvements, greening, playground material changes and shading. HVAC costs alone are expected to be $40 billion for all U.S. schools that need infrastructure improvements. While Inflation Reduction Act (IRA) tax credits are available for updating HVAC systems, many low-wealth schools will not be able to finance the gap between the credit coverage and the true cost and will need additional financial assistance.

Make Housing and Eviction Policy More Climate-Aware and Resilient

Most of the U.S. lacks minimum cooling requirements for buildings and existence of a cooling device within the property. Adoption of the latest building energy codes, despite their previously described limitations, can still be a cost-saving and life-saving advancement according to research by the DOE. For new properties, the Federal Housing Finance Agency could require that they adhere to the latest energy codes to receive a mortgage from Government Sponsored Enterprises, which is already under consideration by Housing and Urban Development (HUD) and the U.S. Department of Agriculture (USDA) for their mortgage products. For older construction, there could be requirements for adequate cooling to exist in the property at the point of sale. 

For all property types, weatherization audits, through the Weatherization Assistance Program (WAP) and Low-Income Home Energy Assistance Program (LIHEAP), can be expanded to consider heat resilience and cooling efficiency of the property and then identify upgrades such as more efficient HVAC, building envelope improvements, cool roofs, cool walls, shade, and other infrastructure. If cooling the entire property is unfeasible or costly, homeowners could benefit from creating “Climate Safe Rooms”  which are guaranteed to be safe during a heat wave. DOE and HUD could collaborate to demonstrate climate safe rooms in affordable housing, where many residents lack access to consistent cooling.

Some housing types are more risky than others. People living in manufactured homes in Arizona were 6 to 8 times more likely to die indoors due to extreme heat. This is because of poorly functioning or completely defunct cooling systems and/or inability to pay electric bills. Manufactured home park landlords can also set a variety of rules for homeowners, including banning cooling devices like window ACs and shade systems. While states like Arizona have now passed laws making these bans illegal, there is a need for a nationwide policy for secure access to cooling. HUD does not regulate manufactured homes parks, but does finance the parks through Section 207 mortgages and could stipulate park owners must guarantee resident safety. Finally, HUD could also update the Manufactured Home Construction and Safety Standards to allow for HVAC and other cooling regulations in local building codes to apply to manufactured homes, as they do for other forms of housing, as well as require homes perform to a certain level of cooling under high heat conditions. 

Renter’s are another highly vulnerable population. Most states do not require landlords to provide cooling devices to tenants or keep housing below risky temperatures. HUD for example does not require cooling devices in public housing, although regulations exist for heating. HUD could implement similar guarantees of a “right to cool”. Evictions in the summer months are also on the rise, due to rising rents compounded with rising energy costs, putting people out in the deadly heat. Keeping people in housing should be of the utmost importance, yet implementation remains fractured across the nation. Eviction moratoriums at a national level have been challenged by the Supreme Court, which overturned the CDC’s COVID-19 moratorium.

Address Communities’ Needs for Long-Term Infrastructure Funding Support

Heat vulnerability mapping has advanced significantly in the past few years. Federal programs like the NIHHIS’s Urban Heat Island Mapping Campaigns have mapped +60 communities in the United States that have guided city policy. The Census’ new product, Community Resilience Estimates (CRE) for Heat, assesses vulnerability at the level of individuals and households. Finally, researchers and non-profit organizations have been developing tools that can assess risk and also aid in individual or local decision-making, such as the Climate Health and Risk Tool and Heat Factor

Advancements in our understanding of heat’s impacts and potential interventions have not translated to sustained resources to support transformative infrastructure development. As one interviewee put it “communities that have mapped their urban heat islands are still waiting on funding opportunities to build relevant infrastructure projects”. Federal grants for mitigation and resilience may or may not consider heat resilience projects “cost-effective” and aligned with grant-making objectives, leading to rejection. 

FEMA’s Hazard Mitigation Grants (HMGP), made available only after a federally-declared disaster, can only be used for extreme heat in specific circumstances and recommends that cost-effective heat mitigation projects will also “reduce risks of other hazards”. Another example, FEMA’s BRIC grant has rejected cooling centers, HVAC upgrades, and weatherization activities, all strategies with some benefit to preventing morbidity and mortality. Green infrastructure projects, with co-benefits such as flood mitigation, have been more successful, often because the BCA is based on the property-damaging hazard, flooding. Only one FEMA BRIC project has been funded with heat as the main hazard, an urban greening project in Portland, Oregon. This unknown regarding grant success can lead to communities not applying with a heat-focused project, when time could be better spent securing grants for other community priorities. FEMA’s announcement that it will fund net-zero projects, including passive heating and cooling, through its HMGP and BRIC programs and Public Assistance could shift the paradigm, yet communities will likely need more guidance and technical assistance to execute these projects.
To invest in resilience to the growing risk of heat, policymakers will need to create a dedicated and reliable funding resource. Federal stakeholders can look to the states for models. California’s Integrated Climate Adaptation and Resiliency Program’s Extreme Heat and Community Resilience grants are currently slated to allocate $118 million to 20-40 communities for planning and implementation grants over three rounds. To start, FEMA could replicate this program, similar to its specific programs for wildfires, providing $50,000 to $5 million to a wide range of heat resilience projects, and make it eligible for joint funding through BRIC. DOE’s $105 million FY25 budget request for a program for planning, development, and demonstration of community-scale solutions to mitigate extreme heat in low-income communities is a step in the right direction. If funded, the program would benefit from coordinating with FEMA’s BRIC program on high-impact solutions.

Set Indoor and Outdoor Temperature Standards and Workplace Protections to Protect Human Health

Our understanding of when heat becomes risky to human health and impacts daily governance is still in development. Our interviewees shared that there is not yet consensus or agreement on the lower threshold for 1) when outdoor and indoor temperatures risks begin and 2) at what level of continued exposure should there be cause for action, such as implementing breaks for workers or deploying rapid emergency cooling to residents. For workplaces, guidelines will come soon: the Occupational Health and Safety Administration (OSHA) is set to release their heat standard for indoor and outdoor workers by the end of 2024, which will advance heat safety for workers across the country. For all other settings (such as residential settings and schools), the jury is still out on a valid threshold and a regulatory mechanism to establish it.

Enforcement of standards is necessary for realizing their full potential. In preparation for a workplace heat standard, interviewees recommended the Department of Labor create an advanced Hazard Alert System for Heat (using an evolved data standard discussed in a later section) in order to better pinpoint regulatory enforcement. Small businesses will also need help to be prepared for compliance with the new standard. DOL and the Small Businesses Administration should consider setting up a navigator program for resourcing energy-efficient, worker-centric cooling strategies, leveraging IRA funds where applicable.

Build the Extreme Heat Resilience Workforce

Extreme heat is not just a challenge to worker health, it’s also a challenge to workforce ability and capacity. As heat becomes a threat to the entire nation, many fields are needing to rapidly adapt to entirely new knowledge bases. For example, much of the health workforce, doctors, nurses, public health workers, receive little to no education on climate change and climate’s health impacts. Programs are beginning to crop up, such as Harvard’s C-Change Program, yet will need support to scale. With the federal government being the nation’s largest single source funder of graduate medical education, there are many levers at their disposal to develop, incentivize, and even require climate and health education. The U.S. Public Health Commissioned Corps is another program that could mobilize a climate-aware health workforce, placing professionals with a deep awareness of climate change’s impact on health in local communities.

The weatherization and decarbonization workforce must also be made aware and ready for heat’s growing impacts and emerging strategies to build building and community-scale resilience. While promising strategies exist for heat mitigation, such as cool walls and roofs, these interventions are largely not considered during weatherization audits and energy efficiency audits. Tax credits that have been created by the IRA/BIL could be used for interventions for passive or low-energy cooling, yet a lack of clarity prevents their uptake and implementation. For example, EPA’s EnergyStar program used to certify roofing products before the program sunsetted in 2022. Stakeholders at DOE and EPA should consider their role in workforce readiness for extreme heat, collaborating with third party entities to build awareness about these promising strategies.

Navigating all of the benefits of the IRA and BIL is challenging for resource-strapped communities and households. Program navigators for weatherization assistance and resilience could be an incredible asset to low-resource communities, and leverage IRA resources for technical assistance as well as the newly created American Climate Corps.

Finally, the federal government workforce is being stretched thin by the sheer number of new mandates in IRA and BIL. To meet the moment, agencies have used flexible hiring mechanisms like the Intergovernmental Personnel Act (IPAs) and for some offices its BIL and IRA connected Direct Hire Authority to make those critical talent decisions and staff their agencies. DOE, for example, has exceeded its goals – hiring over 1000 new employees to date. But not all agencies and offices have access to the Direct Hire Authority –  and it’s set to expire anywhere between 2025 (for IRA) and 2027 (for BIL). Congress should be encouraged to expand this authority, extend it beyond 2025 and 2027 respectively, and remove the limit on the number of staff allowed. Further, agencies should be encouraged to use other flexible hiring mechanisms like IPAs and other termed positions. The federal government should have the talent needed to meet its current mandates and be prepared to solve problems like extreme heat.

Build Healthcare System Preparedness

Years of underinvestment in preparedness have impacted U.S. health infrastructure’s surveillance, data collection, and workforce capacity to respond to emerging climate threats like extreme heat. The Administration for Strategic Planning and Response’s Hospital Preparedness Program, which prepares healthcare systems for emergencies, has had its budget reduced by 67% from FY 2002-FY2022, considering inflation. Further, the Center for Disease Control and Prevention (CDC) has seen a 20% budget reduction from FY 2002-2022. The CDC’s Climate Ready States and Cities Initiative can only support nine states, one city, and one county, despite 40 jurisdictions having applied. The Trust for America’s Health (TFAH) found increasing funding from $10 million to $110 million is required to support all states, and improve climate surveillance. The TFAH also found that an additional $75 million is needed to extend the CDC’s National Environmental Public Health Tracking Program, a program that tracks threats and plans interventions, to every state. Finally, the Office of Climate Change and Health Equity, the sole office within Health and Human Services solely dedicated to the intersection of climate and health, has yet to receive direct appropriations to support its work. 

Centers for Medicare and Medicaid (CMS) and the Healthcare Resources and Services Administration (HRSA) provide critical investments to healthcare facilities, operations, care provision, and the medical workforce, yet have no publicly available programs dedicated to building climate resilience in the face of rising temperatures. The Veterans Health Administration (VHA), the largest integrated healthcare system in the U.S., includes responding to heat wave exposure in its agency Climate Action Plan and has made commitments to developing biosurveillance systems that incorporate external data on air quality, temperature, heat index, and weather as well as upgrading medical center infrastructure. This is critical as 62% of VHA medical centers are exposed to extreme heat and the VHA sees a rise in heat-related illness in the Veteran population. Given its sheer size, systems changes like this made by the VHA can drive real change in healthcare practice. 

To build resilience to extreme heat within healthcare systems, our interviews and literature review highlighted that these three actions are most critical: 1) increasing surveillance and tracking of heat-related illness through improvements to medical diagnosis and coding practices and technological systems (i.e. EHRs); 2) leveraging healthcare financing for preventative treatments (i.e. cooling devices), incentives for climate-change preparedness, accurate coding and treatment, and quality care delivery (CQIs), and requirements for accreditation and reimbursements; and 3) fostering capacity-building through grants, technical assistance, planning support and guidance, and emergency preparedness. 

Design Activation Thresholds for Public Health, Medical, and Emergency Responses

Despite the fact that extreme heat events have overwhelmed local capacity and triggered local disaster declarations, heat is not explicitly required in healthcare preparedness efforts authorized under the Pandemics and All Hazards Preparedness Act (PAHPA), insufficiently included or not included at all in local and state hazard mitigation plans required by FEMA, and there has yet to be a federal disaster declaration for heat. This all inhibits the deployment of federal resources to mitigation, planning, and response that states and local jurisdictions rely on for other hazards. Our interviewees recommended that there needs to be better “activation thresholds” for heat i.e. markers that the hazard has reached a level of impact that needs additional capacity and resources. Most thresholds set right now just rely on high-temperatures, not the risk factors that exacerbate the impacts of heat. Data inputs into these locally-relevant thresholds can include wet-bulb globe temperature (which accounts for humidity), heat stress risk, level of acclimatization, nighttime temperatures, building conditions and cooling device uptake, work situations, other compounding health risks like wildfire smoke, and other factors. These activation thresholds should also be designed around the most heat-vulnerable populations, such as children, the elderly, pregnant people, and those with comorbidities. 

Increased transmission of viral pathogens and pathogen spread is also a growing risk of overall hotter average temperatures that needs more attention. Increased pathogen surveillance and correlation with existing climate conditions would greatly enable U.S. pandemic and endemic disease surveillance. Finally, no program to date at the Biomedical Advanced Development and Research Authority has focused on creating climate-aware medical countermeasures and the 2022-2026 strategic plan includes no mention of climate change. 

Reduce Energy Burdens, Utility Insecurity, and Grid Insecurity

As temperatures rise, so do energy bills. Americans are facing an ever-growing burden of energy debt. 16% (20.9 million people) of U.S. households find themselves behind on their energy bills, increasing the risk of utility shut-offs due to non-payment. The Low Income Home Energy Assistance Program (LIHEAP) exists to relieve energy burdens, yet was designed primarily for heating assistance. Thus, the LIHEAP formulas advantage states with historically frigid climates. Further, most states use their LIHEAP budgets for heating first, leaving what remains for cooling assistance (or just don’t offer cooling assistance at all). As a result, nationally from 2001-2019, only 5% of energy assistance went to cooling. Finally, the LIHEAP program is massively oversubscribed, and can only service a portion of needy families. To adapt to a hotter world, LIHEAP’s budgets must increase and allocation formulas will need to be made more “cooling”-aware and equitable for hot-weather states. The FY25 presidential budget keeps LIHEAP’s funding levels at $4.1 billion, while also proposing expanding eligible activities that will draw on available resources. The National Energy Assistance Directors Association recent analysis found that this funding level could cut ~1.5 million families from the program and cut program benefits like cooling.

Another key issue is that 31 states have no policy preventing energy shut-offs during excessive heat events and even the states that have policies vary widely in their cut-off points. These cut-off policies are all set at the state level, and there is still an ongoing need to identify best practices that save lives. While the Public Utility Regulatory Policies Act of 1978 (PURPA) prohibits electric utilities from shutting off home electricity for overdue bills when doing so would be dangerous for someone’s health, it does not have explicit protections for extreme weather (hot/cold). Reforms to PURPA could be considered that require utilities to have moratoriums on energy shut-offs during extreme heat seasons.

Finally, grid resilience will become even more essential in a hotter climate. Power outages and blackouts during extreme heat events are deadly. If a blackout were to occur in Phoenix, Arizona during the summer, nearly 900,000 people would need immediate medical attention. Rising use of AC itself is a risk factor for blackouts due to increases in energy demand. The North American Electric Reliability Corporation (NERC), a regulatory organization that works to reduce risks to power grid infrastructure, issued a dire warning that two-thirds of the U.S.  are facing reliability challenges because of heatwaves. Ensuring grids are ready for the climate to come should be top priority for DOE, the Federal Emergency Management Agency (FEMA), and the Federal Energy Regulatory Commission (FERC). Given the risks to human health, the Centers for Disease Control and Prevention (CDC) should work with public health organizations to prepare for blackouts and grid failure events.

Address Critical Needs of Confined Populations Facing Heat

Confined populations, whether because of their medical status or legal status, are vulnerable to extreme heat indoors. Long-term care facilities are required by law to keep properties within 71-81℉. Yet, long-term care facilities are reporting challenges actually meeting resident’s needs in a disaster, such as a power outage, calling for a need for more coordination with CMS. 

Incarcerated populations on the other hand are not guaranteed any cooling, even as summers become more brutal. This directly leads to an increase in deaths, 45% of U.S. detention facilities saw spikes in deaths on hazardous heat days from 1982 to 2020. Despite this lack of sufficient cooling being “cruel and unusual” punishment, there has been no public activity to date from the Department of Justice to secure cooling infrastructure for federal prisons or work with state prisons to expand cooling infrastructure. The National Institute of Corrections does recommend ASHRAE 55 Thermal Environmental Conditions for Human Occupancy to corrections institutions, though this metric needs to be updated for our evolving understanding of extreme heat’s risks to human health.

Anticipate and Prevent Supply Chain Disruptions 

Hotter temperatures are changing the landscape of American and global food production. 70% of global agriculture is expected to be affected by heat stress by 2045. Recent heat waves have already killed crops and livestock en masse, leading to lower yields and even shortages for certain products – like olive oil, potatoes, coffee, rice, and fruits. Rising heat is also poised to reshape local and state economies that rely on their changing climatic capabilities to produce certain crops. Oranges, a $5 billion dollar industry for Florida, are struggling in the heat which stresses the trees and provides fertile ground for pathogens. As a result, Florida is facing its worst citrus yield since the Great Depression. A decrease in winter chill is another growing risk, as many perennial crops have adapted to certain amounts of accumulated winter chill to develop and bloom. Winter-time heat is shaking up plants’ biological clocks, decreasing quality and yield. Overall, extreme heat is impacting American household bottom lines in the short-term and long-term through heat-exacerbated earning losses and spiking food prices. 

Ensuring ongoing access to critical commodity and specialty agricultural products in a future of higher temperatures is a national security priority. Resilience of products to extreme heat could be included as a future requirement in the Federal Supplier Climate Risks and Resilience Rule that governs Federal Acquisition Regulations. Further, FAS’ work scoping the federal landscape has shown there are few federal research and development programs, financial assistance opportunities, and incentives for heat resilience, and our interviewees concurred with that assessment. The U.S. Department of Agriculture (USDA) can prepare farmers for future climate risks and hotter temperatures, ensuring consistent food production and reducing the losses and needed economic pay-outs from the USDA through crop insurance and disaster assistance. The USDA can accelerate advances in biotechnology and genetic engineering to improve heat resilience of agricultural products while also encouraging practices like shade, effective water management, and soil regeneration that build system-wide resilience. As Congress continues to consider reauthorizations and appropriations for the Farm Bill, they should consider fully funding the Agriculture Advanced Research and Development Authority to advance resilient agriculture R&D while also increasing funding to the USDA Climate Hubs to support roll-out of heat resilient practices.

Connect Drought Resilience and Heat Resilience Strategies

Hotter winters have literal downstream consequences. Warming is shrinking the snowpack that feeds rivers, leading to further groundwater reliance, straining aquifers to the brink of complete collapse. Warmer temperatures also leads to more surface water evaporating, thus leaving less to seep through the ground to replenish overstressed aquifers. Rising temperatures also mean that plants need more water, as they evapotranspirate at greater rates to keep their internal temperatures in-check. All of these factors compound the growing risk of drought facing American communities. Drought, now made worse by high heat conditions, accounts for a significant portion of annual agricultural losses. 80% of 2023 emergency disaster designations declared by the United States Department of Agriculture (USDA) were for drought and/or excessive heat. Secure access to water is an escalating catastrophe, and to address it requires a national strategy that accounts for future hotter temperatures and how they will put strain on water accounts necessary to sustain agricultural production and human habitation.

Heat and dry weather/drought also combine to make prime conditions for megawildfires. The smoke then generated by these fires compounds the health impacts of extreme heat, with research showing that concurrent effects of heat and smoke drive up the number of hospitalizations and deaths. More funding from Congress is needed to improve wildfire forecasting and threat intelligence in the era of compounding hazards.

Reform the Benefit-Costs Analysis

Benefit cost analysis (BCA) is a critical tool for guiding infrastructure investments, and yet is not set up to account for the benefits of heat mitigation investments. When the focus of the BCA is mitigating property damage and loss of life, it will discount impact’s that go beyond those damages such as economic losses, learning losses, wage losses, and healthcare costs. Research will likely be needed to generate the pre-calculated benefits of heat mitigation infrastructure, such as avoiding heat illness, death, and wage losses and preventing widespread power failures (a growing risk). Further, strategies that enhance an equitable response, articulated in the recent update to the Office of Management and Budget’s Circular A-4, need to be quantified. This could include response efforts that protect the most vulnerable populations to extreme heat, such as checking in on heat sensitive households identified by the CRE for Heat. Developing these metrics will take time, and should be done in partnership with agencies like the DOE, EPA, and CDC. Finally, FEMA’s BCA is often based on a single hazard, the one with the highest BCA ratio, making it more challenging to work on multi-hazard resilience. FEMA should develop BCA methods that allow for accounting of an infrastructure investment for community resilience to many hazards (like resilience hubs).

Create the “Plan” for How the Federal Emergency Management Agency and Others Should Respond to an Extreme Heat Disaster

Extreme heat’s extended duration, from a few days to several months, poses a significant challenge to existing disaster policy’s focus on acute events that damage property. An acute focus on infrastructure damages by FEMA has been an insurmountable barrier to all past attempts to declare extreme heat as a disaster and receive federal disaster assistance. Because in theory, FEMA can reimburse state and local governments for any disaster response effort that exceeds local resources, including heat waves. Our interviewees acknowledged that federal recognition that heat waves are disasters will only come with extending the definition of what a disaster is.

New governance models will need to be created for climate and health hazards like extreme heat, focusing on an adaptation forward, people-centered disaster response approach given the outsized impact of heat hazards on human health and economic productivity. Such a shift will challenge the federal government’s existing authorities authorized under national disaster law, the Stafford Act, which at this current moment does not consider “human damages” beyond loss of life. Thus, we do not see how existing infrastructure fails to provide critical function during these heat hazard events, such as secure learning, secure workplaces, secure municipal operations, secure healthcare delivery, and resultantly strains or exceeds local resources to respond. By quantifying more of these damages, there will then be an existing incentive to design responses that address current impacts and plan for and mitigate future impacts. 

Finally, there are highly-risky heat disasters that we need to be executing planning scenarios for, specifically an extended power outage in a city under high-heat conditions. A power outage during the summer in Phoenix would send 800,000 people to the emergency room, which would very likely overwhelm local resources and those of all surrounding jurisdictions. There is a need for a power outage during an extended heat wave to be an included planning scenario for emergency management exercises lead by state and local governments. FEMA should produce a comprehensive list of everything a city needs to be prepared for a catastrophic power outage.

Spur Insurance and Financing Innovation

While insurance is the countries’ largest industry, few insurance products and services exist in the U.S. to cover the losses from extreme heat. The U.S. Department of the Treasury recently acknowledged this lack of comprehensive insurance for extreme heat’s impacts in its comprehensive report on how climate change worsens household finances. Heat insurance for individuals could manifest in a variety of ways: security from utility cost spikes during extreme weather events, real-estate assessment and scoring for future heat-risk, “worker safety” coverage to protect wages during extremely hot days where it might be unsafe to work, protections for household items/resources lost due an extended blackout or power outage, and full coverage for healthcare expenses caused by or exacerbated by heat waves. California is currently leading the country on thinking through the role of the insurance industry in mitigating extreme heat’s impacts, and should be a model to watch by federal stakeholders to see what can be scaled and replicated across the nation.

Further, it is important that investments made today are resilient for the climate conditions of tomorrow. The Office of Management and Budget’s November 2023 memo on climate-smart infrastructure, currently being implemented, provides technical guidance on how federal financial assistance programs can and should be invested in climate resilience. A yet unexplored financial lever for climate resilience identified in our interviews is federally-backed municipal bonds. Climate change is undermining this once stable investment, as cities and local governments struggle to pay back interest due to the rising costs of addressing hazards. The municipal bond market could price climate risk when deciding on interest payments, and give beneficial rates to jurisdictions that have done a full analysis of their risks and made steps towards resilience.

Finally, there is a need to update assessments of heat risk that are used to make insurance and financial decisions. Recent research by the DOE has found that the FEMA NRI property damage data appear to be deficient and underestimate damages when compared to published values for recent U.S. extreme temperature events. To start, FEMA should consider including metrics in its NRI that characterize the building stock (i.e. by adherence to certain building codes) and its thermal comfort levels (even with cooling devices) as well as thermal resilience.

Incorporate Future Climate Projections into Planning at All Levels

Recent research has shown that cities and counties are barreling toward temperature thresholds at which it would be dangerous to operate municipal services, affecting the operations of daily life. Yet little of this future risk is accounted for in the various planning activities (for public health, emergency preparedness, grid security, transportation, urban design, etc) done by local and state governments. Our interviewees expressed that because many plans are based on historical and current risk data, there is little anticipation of the future impacts of hotter temperatures when making current planning choices. 

One example stood out around nature-based solutions (NBS): while NBS has received over a billion dollars in federal funding and is argued as an approach to mitigate extreme heat’s impacts – planners are not always considering whether the trees planted today will survive effectively in 20-30 years of warming. Reporting has shown that Southern Nevada is at risk of losing many of its shade trees due to inadequate species selection, as the trees that once thrived in this climate exceed their zones of heat tolerance. 

Changes are being made to some federally-required planning processes to require assessment of future risk. FEMA’s National Mitigation Planning Program now requires state and local governments to plan for future risks caused by climate change, land use, and population change to receive emergency disaster funds and mitigation funding. While extreme heat is a noteworthy future risk, it is not explicitly required in the new guidelines. As of April 2023, only half of U.S. states had a section dedicated to extreme heat in their Hazard Mitigation Plans.

Climate.gov, operated by NOAA, was a recommended starting place for a library of future climate files that can be brought into planning processes and resilience analysis. Technical assistance and decision-making tools that support planners in making predictive analyses based on future extreme temperature conditions can help inform the effective design of resilient transportation systems, infrastructure investments, public health activities, and grids, and ensure accurate estimations of investment cost effectiveness over the measure lifetime.

Set Standards for Data Collection and Analysis

While official CDC-reported deaths from heat, approximately 1670 in 2022, exceed those from any other natural hazard, experts widely agree this number is an undercount. True mortality is likely at a rate of 10,000 deaths a year from extreme heat under current climate conditions. Many factors compound this systematic undercount: hospitals often do not consider extreme heat in their hazard preparedness plans, there’s a lack of awareness around ICD-10 coding for heat illness, death attribution exacerbated/caused by heat is often attributed to other causes. Retraining the healthcare workforce and modernizing death counting for climate change will take time, our interviewees acknowledged. Thus, decision makers need better data and surveillance systems now to address this growing public health crisis. Excess deaths analysis could provide a proxy data point for the true number of heat deaths, and has already been employed by California to assess the impact of past heat waves. The CDC has utilized excess death methods in tracking the COVID-19 pandemic, and could apply this analysis to “climate killers” like extreme heat to inform healthcare system planning ahead of Summer 2024 (such as forecasting tools like HeatRisk). It will be critical to set a standard methodology in order to compare heat’s impacts in different communities across the United States. True mortality is also essential to enhancing the benefit-cost analysis for heat mitigation and resilience.

Our conversations also highlighted the data gaps that exist around counting worker injuries and deaths due to extreme heat. For work-related heat-health impacts, injuries or deaths are often only counted if there’s a hospital admission that is a required report, heat-exacerbated injuries (i.e. falls) aren’t often counted as heat-related, and harms off the job (i.e. long-term kidney impacts) go unnoticed. Studies estimate that California alone saw 20,000 heat injuries a year, while The U.S. Department of Labor (DOL) reports only 3400 injuries a year nationally. DOL could track how overall workplace injuries correlate with temperature to develop a methodology that would yield much more accurate numbers around true heat impacts.

Finally, anticipating the full risks of heat due to factors like existing infrastructure, social vulnerability, and levels of community resilience, remains a work in progress. For example, FEMA’s National Risk Index (which informs environmental justice tools like the Climate and Economic Justice Screening Tool and the Community Disaster Resilience Zones program) has notable limitations due to its reliance on previous weather data and narrow focus on mortality reduction, leading to underestimates of damages when compared to published values for recent U.S. extreme temperature events. There is a big opportunity to develop a standard data set for extreme heat risks and vulnerabilities in current and future anticipated climate conditions. This data set can then produce high-quality and relevant tools for community decision making (like FEMA’s Flood Maps) and inform federal screening tools and funding decisions. 

Create Regulatory Oversight Infrastructure for Extreme Heat

There are only a few regulatory levers currently in place or in the regulatory pipeline to protect Americans from the growing heat and build more heat resilient communities. These include the temperature standards for senior living facilities set by CMS and OSHA’s upcoming heat standard. There are many more common settings: homes, schools and childcare facilities, transit, correctional facilities, and outdoor public spaces where regulations are needed. There will also need to be expanded enforcement of the regulations, including better monitoring of temperatures outdoors and indoors. HUD, EPA, and NOAA should work to identify expansion opportunities to indoor and outdoor air temperature monitoring, seeking additional funding from Congress where needed

Future regulations for mitigating extreme heat exposure can be conceptualized in the following three ways: technology standards, the required presence of a cooling and/or thermal-regulating technology, behavioral guidelines and expectations, required actions to avert overexposure, and performance standards, requirements that heat exposure cannot cross a certain threshold. These potential regulations will need to be conceptualized, reviewed, and implemented by several federal agencies, as authority for different aspects of heat exposure is fragmented across the federal government. Some examples of regulatory levers identified through our interviews (and introduced in previous sections) include:

Conclusion

Extreme heat, both acute and chronic, is a growing threat to American livelihoods, affecting household incomes, students’ learning, worker safety, food security, and health and wellbeing. While the policy landscape for addressing heat is nascent, this report offers recommendations for near and long term solutions that policymakers can consider. Complimentary to FAS’ Extreme Heat Policy Sprint, we hope this report can be a toolkit for potential realistic actions.

Heat Hazards and Migrant Rights: Protecting Agricultural Workers in a Changing Climate

KEY TAKEAWAYS

KEY FACTS


In 2008, Maria Isabel Vasquez Jimenez, a 17-year-old pregnant farmworker, tragically died from heatstroke while working in the vineyards of California. Despite laboring for more than nine hours in the sweltering heat, Maria was denied access to shade and adequate water breaks. Management never called 911 and instructed her fiancé to lie about the events. To this day, her death underscores the dire need for robust protections for those who endure extreme conditions to feed our nation.

This heartbreaking incident is not isolated. With the United States shattering over a thousand temperature records last year, the crisis of heat-related illnesses in the agricultural sector is intensifying. Rising global temperatures are making heat waves more frequent and severe, posing a significant threat to farmworkers who are essential to our food supply. While progress is being made towards comprehensive heat safety regulations, we must now focus on ensuring these protections are equitably implemented to safeguard all farmworkers from the intensifying threats of climate change, especially vulnerable groups like migrants. As individual stories shed light on the real-life tragedies of neglecting climate resilience, broader climate trends reveal a significant rise in these risks, affecting agricultural workers nationwide.

Climate change & agriculture

Rising Temperatures

Climate change poses significant challenges to global agricultural systems, threatening food security, livelihoods, and the overall sustainability of farming practices. Among the various climate-related hazards, rising temperatures stand out as a primary concern for agricultural productivity and worker health and safety. The Environmental Protection Agency (EPA) reports that the average temperature in the United States has increased by 1.8°F over the past century, with the most significant increases occurring in the last few decades. According to the Intergovernmental Panel on Climate Change, global average temperatures have been steadily increasing due to the accumulation of greenhouse gasses in the atmosphere, primarily from human activities such as burning fossil fuels and deforestation. This warming trend is expected to continue, critically impacting agricultural operations worldwide. The Union of Concerned Scientists predicts that by mid-century, the average number of days with a heat index above 100°F in the United States will more than double, severely impacting agricultural productivity and worker health. As the climate continues to change, the direct threats to those who supply our food become increasingly severe, particularly for farmworkers exposed to the elements.

Threats to Farmworkers

In agriculture, rising temperatures worsen challenges like water scarcity, soil degradation, and pest infestations, and introduce new risks like heat stress for farmworkers. As temperatures rise, heatwaves become more frequent, intense, and prolonged, posing serious threats to the health and well-being of agricultural workers who perform physically demanding tasks outdoors. Heat stress can lead to heat-related illnesses such as heat exhaustion and heatstroke, which can be life-threatening if not properly managed. Prolonged exposure to high temperatures can impair cognitive function, reduce productivity, and increase the risk of accidents and injuries in the workplace. According to the Public Citizen, from 2000 to 2010, as many as 2,000 workers  died each year from heat-related causes in the United States, while farmworkers are 20 times more likely to die from heat-related illnesses than other workers.

Given the critical role of agricultural workers in food production and supply chains, protecting their health and safety in the face of escalating heat risks is critical. Comprehensive heat safety standards and regulations are essential to mitigate the adverse impacts of climate change on farmworkers and ensure the sustainability and resilience of agricultural operations. By implementing comprehensive heat safety measures such as heat acclimatization guidelines, shade access, and regular rest breaks, agricultural employers can minimize the risk of heat-related illnesses and injuries. Effective heat standard implementation requires collaboration among policymakers, industry stakeholders, and worker advocacy groups to address climate change challenges and protect agricultural workers. Beyond the direct effects of heat, farmworkers also face compounded environmental hazards that further jeopardize their health and safety.

Compounded Hazards

While the focus of this discussion is on heat safety regulations, it’s important to recognize that these regulations intersect with broader environmental and health challenges faced by agricultural workers. High temperatures often coincide with wildfire seasons, leading to increased exposure to wildfire smoke. This overlap amplifies health risks like respiratory and cardiovascular diseases, disproportionately affecting workers with vulnerable conditions. Effective protection against these compounded hazards requires coordination among policymakers and industry leaders. Comprehensive standards and holistic safety measures are crucial to mitigate the risks associated with heat and to address the broader spectrum of environmental pollutants. While environmental hazards are a significant concern, the specific vulnerabilities of migrant workers introduce additional layers of risk and complexity.

Challenges faced by migrant workers

Recognizing these challenges is only the first step; next, we must assess how current protections measure up and where they fall short in safeguarding these vulnerable populations.

Understanding the Vulnerabilities

Migrant agricultural workers face socioeconomic, legal, and environmental challenges that increase their vulnerability to heat hazards. Economically, many migrant workers endure low wages and lack access to adequate healthcare, which complicates their ability to cope with and recover from heat-related illnesses. A study by the National Center for Farmworker Health found that 85% of migrant workers earn less than the federal poverty level, making it difficult for them to access necessary medical care. Legally, the fragile status of many migrant workers, including those on temporary visas or without documentation, exacerbates their vulnerability. These workers often hesitate to report violations or seek help due to fear of retaliation, job loss, or deportation.

Harsh Working Conditions

Additionally, migrant workers frequently labor in conditions that provide minimal protection against the elements. Excessive heat exposure is compounded by inadequate access to water, shade, and breaks, making outdoor work particularly dangerous during heatwaves. Furthermore, many migrant workers return after work to substandard housing that lacks essential cooling or ventilation, preventing effective recovery from daily heat exposure and exacerbating dehydration and heat-related health risks. According to the National Center for Farmworker Health, about 40% of migrant farmworkers in the United States live in homes without air conditioning.

Barriers to Protection

The barriers to effective heat protection for migrant workers are extensive and complex, which may prevent them from accessing crucial protections and resources, including:

Language Diversity. The migrant worker community is incredibly diverse, encompassing individuals from various cultural and linguistic backgrounds. In the U.S. agricultural sector, over 50% of workers report limited English proficiency. This diversity may present a significant challenge to understand their rights and the safety measures available to them. Even when regulations and protections are in place, the communication of these policies often fails to reach non-English speaking workers effectively, leading to misunderstandings that can prevent them from advocating for their safety and well-being. The National Agricultural Workers Survey reports that 77% of farmworkers in the United States are foreign-born, with 68% primarily speaking Spanish, highlighting the language barriers that complicate effective communication of safety regulations.

Vulnerable Visas & Immigration Status. Visa statuses and undocumented immigration also play a critical role in the vulnerability of migrant workers. Workers holding temporary visas, such as H-2A visas, often face precarious employment conditions because these visas tie them to specific employers, limiting their ability to assert their rights without fear of retaliation. Undocumented workers are particularly susceptible to exploitation and abuse by employers who may use their immigration status as leverage. Fear of deportation and legal repercussions further discourages reporting workplace incidents, perpetuating a cycle of exploitation and vulnerability.

Undocumented workers are particularly susceptible to exploitation and abuse by employers who may use their immigration status as leverage

via Tim Mossholder

Farmworker Housing. Farmworker housing often lacks proper cooling or ventilation, increasing heat exposure risks during off-work hours. Many agricultural workers live in substandard housing characterized by overcrowding, poor insulation, and inadequate access to air conditioning or ventilation systems. Poor living conditions worsen heat-related illnesses, particularly during extreme weather. Limited access to cooling amenities after long hours of outdoor labor exacerbates heat stress and heightens the health risks associated with heat exposure.

Recognizing these challenges is only the first step; next, we must assess how current protections measure up and where they fall short in safeguarding these vulnerable populations.

Review of existing protections

Federal Efforts

Currently, there is no overarching federal mandate specifically addressing heat exposure, leaving significant gaps in worker protection, especially for vulnerable populations like migrant workers. However, the federal government has taken several critical steps to address heat safety in the interim. OSHA has moved beyond relying solely on the General Duty Clause, launching a National Emphasis Program that prioritizes inspections on high-heat days and increases outreach in vulnerable industries. The Biden administration’s Heat Hazard Alert in July 2023 further emphasized employers’ responsibilities, while the initiation of a federal heat standard through OSHA’s rulemaking process signals a commitment to sweeping, nationwide protections.

These efforts reflect progress but it’s crucial that these federal efforts evolve to address the unique challenges faced by workers, ensuring that no one is left behind in the implementation of heat safety measures. The true test of these regulations will be their ability to safeguard those most at risk, bridging gaps in protection and creating a more resilient workforce in the face of rising temperatures.

State-Level Protections

At the state level, the scenario is mixed, with states like California, Washington, and Oregon having implemented their own heat safety regulations, which provide a model for other states and potentially for federal standards. Oregon’s regulations, for instance, require employers to provide drinking water, access to shade, and adequate rest periods during high heat conditions. These measures are designed not just to respond to the immediate needs of workers but also to educate them on the risks of heat exposure and the importance of self-care in high temperatures. When Oregon implemented stricter heat safety standards, it saw a significant reduction in heat-related illnesses reported among agricultural workers. By requiring more frequent breaks, adequate hydration, and access to shade, Oregon’s regulations demonstrate how well-designed policies can decrease the incidence of heat stress and related medical emergencies. California has also taken a comprehensive approach with its Heat Illness Prevention Program, which extends protections to both outdoor and indoor workers, reflecting the broad scope of heat hazards. This program is noted for its requirements, including training programs that educate workers on preventing heat illness, emergency response strategies, and the necessity of acclimatization.

Legislative Challenges & Need for Unified Approach

Conversely, legislative actions in states like Florida and Texas represent a significant challenge to advancements in occupational heat safety. For example, Florida’s HB 433, recently signed into law, expressly prohibits local governments from enacting regulations that would mandate workplace protections against heat exposure. This legislation stalls progress and endangers workers by blocking local standards tailored to the state’s specific needs.

The contradiction between states pushing for more stringent protections and those opposing regulatory measures illustrates a fragmented approach that could undermine worker safety nationwide. Without a federal standard, the protection a worker receives is largely dependent on state policies, which may not adequately address the specific risks associated with heat exposure in increasingly hot climates. This patchwork of regulations underscores the importance of a unified federal standard that could provide consistent and enforceable protections across all states, ensuring that no worker, regardless of geographical location, is left vulnerable to the dangers of heat exposure.

With an understanding of the gaps in current heat safety regulations, the next crucial step is fostering effective stakeholder engagement to drive meaningful changes.

Engaging Stakeholders: Beyond Public Comment

While progress has been made in recognizing the need for heat safety regulations, we must now focus on ensuring equitable representation in the policy-making process. Traditional engagement methods have often fallen short in capturing the voices of those most impacted by these policies, particularly vulnerable groups like migrant agricultural workers. Regulatory agencies must rethink their strategies to include more direct and inclusive approaches, empowering workers to contribute meaningfully to policies that directly affect their safety and well-being.

Challenges in Traditional Engagement

The traditional approaches to stakeholder engagement, particularly in regulatory settings, often rely heavily on formal mechanisms like public comment periods. While these methods are structured to gather feedback, they frequently fall short of engaging those most impacted by the policies—namely, the workers themselves. Many workers, especially in labor-intensive sectors like agriculture, may not have the time, resources, or knowledge to participate in these processes. Relying on online submissions or weekday meetings during work hours can exclude many workers whose insights are crucial for shaping effective regulations. A survey conducted by the Migrant Clinicians Network found that fewer than 10% of migrant workers had participated in any form of public comment or feedback process related to workplace safety.

The complexities of these workers’ lives—ranging from language barriers to fear of retaliation—mean that conventional engagement strategies may not effectively reach or address their concerns. This gap highlights a critical need for regulatory bodies to rethink and expand their engagement strategies to include more direct and inclusive methods.

As we push for broader and more inclusive engagement, we must also consider systemic improvements that can solidify these efforts into lasting safety standards.

Looking Forward: Systemic Improvements & Community Collaboration

Protecting migrant workers from extreme heat requires systemic improvements and a coordinated approach to address gaps in current regulations and foster collaborative efforts among stakeholders. By combining the strengths of government agencies, employers, and community advocates, we can develop robust solutions of heat safety which protect the well-being of vulnerable workers while supporting the productivity and resilience of the agricultural industry.

Systemic Changes Needed

To effectively protect migrant workers from the dangers of extreme heat, systematic changes are required. On the regulatory side, this includes boosting the human resources and funding available to agencies like OSHA to ensure they can effectively implement and enforce new heat safety standards. Building robust infrastructure for enforcement and consultation is crucial, as is ensuring these bodies can handle the demands of new regulatory programs. From the employer and industry perspective, federal support is essential. Incentives such as tax breaks or reimbursement programs similar to those provided under the Families First Coronavirus Response Act during the COVID-19 pandemic could motivate employers to adhere more strictly to safety standards, knowing they can recoup some costs associated with implementing safety measures like paid sick leave.

Fostering a Safe Reporting Culture

Creating a workplace that encourages safe and open communication is vital. Employers must be encouraged to establish non-retaliatory policies and to offer regular training sessions that educate workers about their rights and the importance of reporting safety violations. Reporting mechanisms should protect employee anonymity to reduce fear of retaliation. These practices can improve safety, while also enhancing worker retention and morale, contributing to a healthier workplace culture.

Role of Community & Grassroots Advocacy

Grassroots organizations and community advocates play a pivotal role in shaping and enforcing heat safety regulations. These groups often have direct insights into the needs and challenges of workers on the ground and can help tailor educational and enforcement strategies to the community context. Collaborations with these organizations can facilitate the delivery of multilingual training and legal assistance, ensuring that workers are well-informed about their rights and the safety measures in place to protect them. Additionally, these partnerships can help to monitor compliance and gather grassroots feedback on the efficacy of the regulatory measures. A notable example is the partnership between California Rural Legal Assistance and local farming communities to develop heat stress prevention training tailored to the languages and cultures of the workers. This program has improved knowledge and awareness of heat stress risks among workers, and has also empowered them to take proactive steps in managing their health during extreme conditions. Evaluations of this initiative show a marked improvement in both the adoption of safety practices and worker satisfaction, highlighting the importance of community-driven approaches in policy implementation.

To support these systemic changes, strategic investments are essential, not only to enhance regulatory capacity but to ensure the long-term health and productivity of the agricultural workforce.

The Power of Investment

Investing in heat safety offers strategic, far-reaching benefits for both workers and employers alike. By funding regulatory frameworks and workplace safety programs, organizations can effectively mitigate the impact of heat-related illnesses and injuries. Such investments can enhance regulatory agencies’ capacity to enforce standards while creating safer, more productive work environments that benefit businesses and employees. An investment approach to heat safety strengthens economic sustainability, worker well-being, and industry compliance.

By funding regulatory frameworks and workplace safety programs, organizations can effectively mitigate the impact of heat-related illnesses and injuries.

via Tim Mossholder

Envisioning Enhanced Regulatory Capacity

In the pursuit of more effective heat safety regulations, one critical aspect overlooked is the role of increased investment in regulatory agencies like OSHA. An addition of resources into these bodies is not merely a bureaucratic expansion but a potential lifesaver. Research consistently demonstrates that increased funding for regulatory enforcement can significantly enhance compliance and improve safety outcomes. This investment empowers agencies to provide greater education and outreach, conduct more inspections, and enforce compliance more effectively, which are essential for protecting workers from heat-related hazards. Enhancing the capacity of organizations like OSHA to enforce heat safety standards saves lives, while supporting economic efficiency and sustainability in labor-intensive industries. These investments ensure that safety regulations evolve from paper to practice, significantly impacting the lives of those they are designed to protect.

Economic Benefit

Economic analyses further support the notion that investing in worker safety is not just a cost but a strategic benefit. Studies show that every dollar spent on improving workplace safety yields substantial returns in reducing the costs of workplace injuries and deaths. For instance, implementing stringent heat safety measures not only reduces the incidence of heat-related illnesses but also cuts down on associated costs such as medical expenses, workers’ compensation, and lost workdays. This is particularly relevant in sectors like agriculture, where the physical nature of the work increases vulnerability to heat stress. The economic benefit for employers extends beyond direct cost savings. Maintaining a safe work environment enhances a company’s reputation, aids in employee retention, and increases productivity. Workers are more likely to stay with an employer they trust to prioritize their health and safety, which is crucial in industries facing labor shortages. A culture that encourages reporting and promptly addresses safety concerns can significantly reduce the risk of severe injuries and fatalities, further lowering potential liabilities and insurance costs.

Employer Benefit

A compelling example of the benefits of proactive safety measures is the Gold Star Grower Program in North Carolina. This program recognizes agricultural employers who provide housing that  meets and exceeds the requirements of the Migrant Housing Act of North Carolina. This recognition serves as a badge of honor, indicating to potential employees that these employers value worker well-being. Reports suggest that workers actively seek out employers with this certification, preferring to work in environments where their health and safety are a priority. A preference like this can drive more growers to participate in safety programs, fostering a broader culture of safety and compliance within the industry.

Call for Collaborative Action

As the climate crisis continues, so does the threat of heat exposure to agricultural workers, posing grave risks to their health and to the core of our food supply systems. The necessity for comprehensive heat safety measures is now both urgent and undeniable. 

Governments at every level, employers across industries, community groups, and the workers themselves must unite to create resilient, practical strategies that prioritize safety and health. The cost of inaction is stark, exceeding $100 billion annually— not only affecting the economy but leading to the irreplaceable loss of life and well-being. 

We are at a critical juncture which demands a unified, strong response to heat hazards. By adopting systemic improvements and fostering a culture of collaboration and proactive communication, we have the opportunity to safeguard those most vulnerable to the impacts of climate threats.  

As we progress towards implementing rigorous heat safety regulations, our focus must now shift to ensuring these protections reach all workers equitably. Let’s mobilize, from grassroots movements to national policy reforms, to create inclusive implementation strategies that protect our most vulnerable workers, particularly migrants, and secure our collective future.

For resources on how you can support these critical efforts, please refer to the guides provided in Appendix A and B, which offer strategies for advocacy, community engagement, and policy development. Together, our collective efforts can protect our most vulnerable and build a resilient path forward in the face of climate change.


APPENDIX A: RESOURCE GUIDE

Further information and support on heat-related safety and worker rights

Resources for Migrant Workers

Resources for Employers

Resources for Policymakers


APPENDIX B: ACTION GUIDE

Support Legislative Changes

Participate in Advocacy Efforts

Engage in Policy Development

A Guide to Public Deliberation

Science is advancing at an unprecedented speed, and scientists are facing major ethical dilemmas daily. Unfortunately, the general public rarely gets opportunities to share their opinions and thoughts on these ethical challenges, moving us, as a society, towards a future that is not inclusive of most people’s ideas and beliefs. Scientists regularly call for public engagement opportunities to discuss cutting-edge research. In fact, “71% of scientists [associated with the American Association for the Advancement of Science (AAAS)] believe the public has either some or a lot of interest in their specialty area.” Sadly, scientists’ calls often go unnoticed and unanswered, as there continue to be inadequate mechanisms for these engagement opportunities to come to fruition.

To Deliberate or Not to Deliberate

Public deliberation, when performed well, can lead to more transparency, accountability to the public, and the emergence of ideas that would otherwise go unnoticed. Due to the direct involvement of participants from the public, decisions made through such initiatives can also be seen as more legitimate. On a societal level, public deliberation has been shown to encourage pluralism among participants.

Despite the importance of deliberation, it’s important to note that it is not always the best way to engage the public. Planning a public deliberation event — a citizens’ panel, for instance — takes a large amount of time and resources. Plus, incentivizing a random sample of citizens to participate (which is considered the gold standard of deliberation) is difficult. It’s therefore paramount to first assess whether the topic of focus is suitable for public deliberation. 

To assess the appropriateness of a deliberation topic, consider the following criteria (inspired by criteria set forth by Stephanie Solomon and Julia Abelson and the Kettering Foundation):

  1. Does the issue involve conflicting public opinions? Issues that involve setting priorities in healthcare, for example, may benefit from public deliberation as there is no singular correct answer; deliberation may offer a more clear and holistic view of what is best for a community, according to the community.
  2. Is the issue controversial? If so, deliberation can be a good tool as it brings many opinions into view and can foster pluralism as mentioned previously.
  3. Does the issue have no clear-cut solution and is “intractable, ongoing, or systemic”?
  4. Do all available solutions have significant drawbacks?
  5. Does the community at large have an interest in the problem?
  6. Would the discussion of the issue benefit from a combination of expert and real-world experience and knowledge (what Solomon and Abelson call “hybrid” topics)? Certain issues may solely require technical knowledge but many issues would benefit from the views of the public as well.1
  7. Are citizens and the government on the same page about the issue? If not, public deliberation can foster trust, but only if the initiative is done with the intention of taking the public’s conclusions into account.

Setting Goals

If it’s deemed that the topic is suitable for public deliberation, the next step is to set goals for the public deliberation initiative. Julia Abelson, Lead of the Public Engagement in Health Policy Project and Professor at McMaster University, has explained that one of the significant differentiating factors between successful and unsuccessful initiatives is thoughtful planning and organization — including setting clear goals and objectives organizers would like to meet by the end of deliberation. Having an end goal not only helps with planning but also allows for a realistic goal to be shared with deliberation participants. Setting unrealistic expectations as to what the deliberation process is meant to achieve — and subsequently not achieving those goals — will lead participants and citizens, in general, to lose trust in the deliberation process (and organizational body).

Is the goal of deliberation to bring new ideas into view and share those with relevant agencies (governmental or otherwise)? Is the goal instead to enact change in current policies? Is the goal to help shape new policies? The aforementioned Citizens’ Reference Panel on Health Technologies in Canada did not directly impact the government’s decisions, but served to make experts aware of a viewpoint they had not previously explored. This is in contrast to the typical “sit and listen” initiatives that don’t have as much of a capacity to encourage new ideas to emerge. In another instance, a citizens’ jury in Buckinghamshire, England was formed to discuss how to tackle back pain in the county. The Buckinghamshire Health Authority promised to implement the citizens’ recommendations (as was mandated by a charity that was supporting this public deliberation effort) — and they did.

Expanding on the idea of making promises and accountability, it’s important for the organizing body — which may or may not include a federal agency — to consider its role in implementing the conclusions of the deliberation. Promising to implement the conclusion of the deliberations can serve to invigorate discussion and make participants more engaged, knowing that their discussions can have a direct impact on future decisions. For instance, the British Columbia Biobank Deliberation involved a “commitment at the outset of the deliberation from the leaders of a proposed BC BioLibrary (now funded by the Michael Smith Foundation for Health Research) that the Bio-Library’s policy discussions would consider suggestions from this deliberation.” Researchers have suggested this may have contributed to participants’ interest in the deliberation event. Despite some examples of implementation following deliberation (such as the Buckinghamshire and Ontario examples), there continues to be a lack of adequate change based on the public’s recommendations. One other instance comes from NASA’s 2014 efforts to involve the public in the discussion around planetary defense (in the context of asteroids) through a participatory technology assessment (PTA). It seems that the PTA helped to spur the creation of NASA’s Planetary Defense Coordination Office. 

Furthermore, providing updates on implementation to participants, and the public at large, would provide another crucial aspect of accountability: “explanations and justifications.” However, these updates on their own would not fulfill an organization or agency’s duty to accountability as that requires an active dialogue with the public (which is precisely why implementing the conclusions of public deliberation initiatives is important).  

When to Deliberate: Agenda Setting for Citizens

As mentioned above, deliberation can happen at various points during the policymaking pipeline. It has become increasingly popular to include the public early on in the process, such as in an agenda-setting role. This allows the public not only to engage in discussions about a topic but to also set the priorities and frame how the discussions will move forward. As Naomi Scheinerman writes, “with proper agenda setting and precedent creation, the resulting […] questions would be more reflective of what the public is interested in discussing rather than of the companies, industries, and other stakeholder groups.”

A trailblazing model in citizen agenda-setting has been the Ostbelgien Model. The model involves both a permanent Citizens’ Council and ad hoc Citizens’ Panels. Though the members of the Citizens’ Council rotate (and are chosen randomly), one of the permanent roles of the Council is to select topics for the ad hoc Citizens’ Panels, with citizens having a direct hand in what issues their fellow citizens and government should tackle. Since its inception in 2019, the Citizens’ Council has asked Citizens’ Panels to tackle issues such as “how to improve the working conditions of healthcare workers” and “inclusive education.” 

Framing

One of the pillars of the success of public deliberation is a well-scoped question that is framed appropriately. Issues that are framed unfairly, meaning they place emphasis on a specific part of the issue while ignoring others, can lead to inaccurate results and a loss of trust between the public and the organizers. Though this depends on the goals of the deliberation, it’s often best for questions to be specific in their scope to allow for concrete results at the end of the deliberation initiative. For example, an online deliberation session in New York City aimed to assess the public’s views on who should be given priority access to COVID-19 vaccines. One of the questions asked participants to rank the order in which they think a pre-specified list of essential workers should get access to the vaccine. This allows for discussion while retaining a clear focus.

Another example comes from climate change. Climate change can be framed in many ways —  through an economic frame, a public health frame, a justice frame, and others. These various framings impact how the public reacts to the issue; in the case of the economic frame, it has led to “political divisiveness.” Focusing instead on the public health frame, for instance, led to greater agreement on policy decisions. Similarly, according to a 2023 policy paper from the Organisation for Economic Co-operation and Development (OECD), an issue like COVID-19 can be less polarizing if the framing used is about solutions to the pandemic rather than solely vaccines. Importantly, the organizers of the public deliberation initiative do not have sole control over the framing of the issue. Citizens often have a pre-existing “frame of thought.” This makes frames tricky yet essential in making it possible to appropriately and productively deliberate a topic. 

Framing is implicit in that participants in deliberation are not aware of it, making it all the more crucial to be wary of the framing. Thus, it becomes clear how seemingly unimportant factors, such as setting, also affect deliberation. According to Mauro Barisione, the framing of the setting includes:

Selecting a Type of Public Deliberation

Another factor that merits attention at this point is the type of public deliberation being undertaken. Though public deliberation has been referred to as one entity thus far, there are many different types, including, but not limited to, citizens’ juries, planning cells, consensus conferences, citizens’ assemblies, and deliberative polls. Below are some further details about various types of public deliberation (where a source is not included below, it was adapted from Smith & Setälä).

Citizens’ juries


Planning cells


Consensus conferences/citizens’ conferences


Citizens’ assemblies


Deliberative polls


A note on online deliberation

The COVID-19 pandemic forced many initiatives to shift to a fully online modality. This highlighted many of the opportunities as well as challenges that online deliberation presents. One consideration is accessibility, a double-edged sword when it comes to deliberation. Virtual deliberation alleviates the need for a venue or hotel accommodations — decreasing costs for organizers — and may allow participants to continue to go to work at the same time. However, difficulties with using technology and a lack of access to a device or an internet connection are drawbacks. Another opportunity presented by virtual deliberation is to provide more balanced viewpoints on the topic of deliberation. For instance, there are no geographical barriers as to the experts organizers can invite to speak at an event. 

A concern somewhat unique to online deliberation is data privacy and security. While this can also be an issue with in-person initiatives, many tools that participants are familiar with and may prefer to use do not have robust security.


A note on cost

While the cost of many deliberation initiatives is not publicly available, the available estimates range from $20,000 (citizens’ jury) to $95,000 (consensus conference) to $2.6 million (Europe-wide deliberative poll of 4300 people) to $5.5 million (citizens’ assembly). Note that these costs come from a range of time points and locations (though they have been adjusted for inflation) and only serve as rough estimates. A major contributor to these costs, particularly for longer deliberative initiatives, is hotel or venue costs as well as the reimbursement of participants. This reimbursement is costly but a part of the founding philosophy of many types of deliberation, including that of planning cells.


Selecting Participants

Many different approaches can be taken to selecting participants for deliberative forums. Unfortunately, there are inherent trade-offs in selecting a sampling method or approach. For instance, random sampling is more in line with the principle of “equal opportunity” and may promote “cognitive diversity”— the diversity of ideas, experiences, and approaches participants bring to the event — but is prone to creating deliberation groups that are not representative of the population at large. This is particularly true when the deliberative forum has few participants. This is why, depending on the type of deliberation event (and therefore number of participants chosen), a different type of sampling may be appropriate. 

Another approach is random-stratified sampling, where participants are randomly chosen and invited to participate in the deliberative event. There is often an unequal distribution among those who accept the invitation — for instance, individuals with higher socio-economic statuses may respond disproportionately more. In this case, a more representative sample may be chosen from those who responded. Quotas may also be set, such as ensuring that a certain number of female-identifying participants are included in a deliberative event. For this method, the organizers must decide on groups of individuals who are primarily affected by the topic being discussed, as well as groups often excluded from such deliberations. A deliberative forum on immigration, for instance, may call for the presence of a participant who is an immigrant to ensure polarization does not take place. In certain instances, purposive sampling — where individuals from groups whose views are specifically being sought are purposefully chosen — may also be appropriate. Furthermore, some researchers suggest including a “critical mass” of individuals from typically underserved groups. This can serve to make participants more comfortable in speaking up, ensure that the diversity of discussions is retained when participants are broken up into smaller groups (in certain forms of public deliberation), and provide a step in avoiding tokenism.

Furthermore, there are newer methods of selecting participants that combine both random and stratified sampling — namely algorithms that try to maximize both representation and equal opportunity of participation. One instance is the LEXIMIN algorithm which “choose[s] representative panels while selecting individuals with probabilities as close to equal as mathematically possible.” This algorithm is open-access and can be used at panelot.org

Aside from considerations for selecting participants, it’s important to consider the selected individuals’ ability and willingness to participate. Several factors can dissuade selected individuals from taking part, including but not limited to, the cost of missing work, the cost of childcare, transportation costs, and lack of trust in the organizing body or agency. Prohibitive costs are addressed by several of the deliberation models discussed in the “Selecting a Type of Public Deliberation” section. These models strongly suggest stipends which, at minimum, cover incidental expenses. A lack of trust is a particularly important issue to address as it can hinder the organizer’s ability to reach individuals typically left out of policymaking discussions. One approach to addressing this once again brings us to making — and critically, keeping — promises regarding the implementation of the conclusions of participants. Framing (as discussed in an earlier section) can also contribute to building trust, though, importantly, this is not a gap that can be bridged overnight. A more extensive discussion on inclusion in public deliberation forums can be found here.

Bringing On Experts & Creating Materials

Prior to selecting the group who will participate in the public deliberation activity, steps need to be taken to organize which experts will be part of the event and create the informational material that will be provided to participants before deliberations begin. 

Here, efforts must be made to ensure sufficient and balanced information is presented without creating a framing event where participants enter discussions with a biased perspective. It has been found that participants readily integrate the facts and opinions presented by experts/witnesses prior to deliberation and critically engage with their points. A deliberative engagement initiative in British Columbia, Canada about biobanking brought on a variety of experts and stakeholders to present to participants. To ensure fairness, presenters were “given specific topics, limited presentation times, and asked to use terms as defined in the information booklet” that was previously provided. A unique component included in this initiative was the ability for participants to ask presenters questions in between the two deliberative session weekends, which were two weeks apart, through a website. 

In addition, participants were provided with booklets and readings. In the case of the British Columbia initiative, to create booklets and background materials, a literature review was performed. Once more, the materials should provide a balance of opinions. They should include the most important facts relevant to the question at hand, some of the most common/salient approaches and points with regards to the question, and the weaknesses of each approach/point (Mauro Barisione). It is also best to keep materials succinct, with some deliberative initiatives keeping their materials to one page long.

Though the traditional approach is to have experts present prior to deliberation, other methods have also been used. For instance, a Colorado deliberation initiative focused on future water supply used an “on tap but not on top” expert approach. Rather than call experts to present information, they instead provided one-page information sheets, followed directly by deliberation. Experts were present during the deliberation session. When prompted by a participant, a facilitator would ask an expert to briefly join the group to answer the participant’s question. The approach was largely successful, though one “rogue expert” frequently interjected in a group’s discussion, providing his own opinions. One limiting factor to this approach is time; the deliberative sessions mentioned above were two hours long. But many other forms of deliberation are significantly longer, making coordinating with experts for long durations of time difficult. Despite these challenges, this approach provides an interesting way of integrating experts into the deliberation process so their expertise is best used and the participants’ questions are best answered as they arise.

Facilitation

A good facilitator or moderator is critical to the deliberation process. As explained by Kara N. Dillard, moderators set the ground rules for the discussion and prevent any one participant from dominating the session; this is called presentation. It has been found that clearly setting expectations for the discussion can lead to greater deliberative functioning — which, for our purposes, includes the exchange of ideas/reasons, equality, and freedom to speak and be heard — according to participants. Moderators also guide the discussion in two main ways: asking questions that challenge what participants have already discussed (elicitation); and connecting ideas that were previously brought up to new topics and “play[ing] devil’s advocate” to bring forth new ideas (interpretation). At the end of the session, moderators also help participants produce conclusions by asking what areas of consensus and contention were present throughout the discussion.

Moderators can take multiple approaches to facilitating, with one framework proposed by Kara N. Dillard separating moderators into three groups: passive, moderate, and involved. Passive moderators take a “backseat” approach to moderating. They often describe their role to participants as only being there to prevent a participant from dominating the conversation, rather than actively leading it. This has led to unfocused discussions and unclear conclusions. Participants often jumped around and went off-topic. Though this passive approach may work in some instances, a moderate or involved approach often leads to better deliberation.

Involved facilitators actively lead the discussion by asking questions that challenge participants to think in new ways, sometimes acting as a “quasi-participant.” In line with this, these moderators often play devil’s advocate to move the discussion in new, albeit related, directions. These moderators ask follow-up questions and “editorialize” to help participants flesh out their ideas together and aim to pinpoint points of contention so participants can further discuss them. If participants begin to veer off-topic, involved moderators will move the group back into a more focused direction while also connecting this new topic to the main question, allowing for new thoughts to emerge. These moderators take the time to sum up the main points brought up by participants after each point so conclusions become clear. Once more, this approach may not work in all instances but often leads to deeper conversations and more focused conclusions.

As implied by the name, moderate facilitators are somewhere in between passive and involved facilitators. These moderators ask questions to guide the discussion, but don’t often challenge the participants and let them take the wheel. These moderators use the elicitation strategy frequently, an important difference between moderate and passive moderators.

Due to the skills needed to facilitate a deliberation event well, organizers or government agencies looking to organize these events may require would-be facilitators to undergo brief training

What Comes Next

After deliberation has taken place, the next step is to write a report summarizing the conclusions of the deliberative forum. As we have seen several times with other topics, there are multiple approaches to this. One approach is to leave the report writing to the facilitators, organizers, or researchers who use their own takeaways from the deliberation (in the case of facilitators) or summarize based on recordings or transcripts (in the case of organizers or researchers). However, this method introduces bias into the process and doesn’t allow participants to be directly involved in creating conclusions or next steps.

An alternative is to allocate time towards coming up with conclusions together with participants both throughout and at the end of the deliberative session. Recall that involved facilitators frequently summarize the conclusions of the group throughout the deliberation, making this final task both more efficient and more participant-led. Participants can directly and immediately add on to or push back against the facilitator’s summary. As a guideline, Public Agenda, an organization conducting public engagement research, divides the summary into the following sections: areas of agreement, areas of disagreement, questions requiring further research, and high-priority action steps.

ALI Task Force Findings to Improve Education R&D

The Alliance for Learning Innovation (ALI) coalition, which includes the Federation of American Scientists, EdCounsel, and InnovateEdu, today celebrate the release of three task force briefs aimed at enhancing education research and development (“ed R&D”). With pressing issues such as declining literacy and math scores, chronic absenteeism, and the rise of technologies like AI, a strong ed R&D infrastructure is vital. In 2023, ALI convened three task forces to recommend ways to bolster ed R&D. The task forces focused on state and local ed R&D infrastructure, inclusive ed R&D, and the critical role of Historically Black Colleges and Universities (HBCUs), Minority-Serving Institutions (MSIs), and Tribal Colleges and Universities (TCUs) in this ecosystem.

State and Local Education R&D Infrastructure

Read here

Supporting R&D at the local level encourages an environment of continuous learning, accelerating improvements to educational methods based on new evidence and pioneering research. Therefore, given that over 90% of K-12 education funding comes from state and local sources, the ALI task force recommends that capacity-building, vision alignment, and investment in state and local education agencies (SEAs and LEAs) is prioritized. Preparing these entities to leverage R&D resources within their specific locales, in rural and urban contexts, will enable the infrastructure to best meet the unique needs of communities and students across the country. Additionally, supporting human capacity and development, modernizing data systems, and strengthening collaborative partnerships and fellowships across research institutions and key stakeholders in the ecosystem, will set the stage for more context-specific and effective ed R&D infrastructure at the state and local levels.

Inclusive Education R&D

Read here

Traditional education R&D is often dominated by privileged institutions and individuals with outsized access to capital and opportunities, sidelining the needs and perspectives of historically marginalized communities. To address this imbalance, intentional efforts are needed to create a more inclusive R&D ecosystem. The task force recommends that government actors implement multidimensional measures of progress and simplify application processes for R&D funding. Continuing dialogue on equity and inclusion will create space for identifying possible biases in approaches and processes. In sum, inclusion is imperative to achieving greater equity in education and supporting all learners of diverse backgrounds and communities.

The Role of HBCUs, MSIs, & TCUs in Education R&D

Read here

Achieving collaborative infrastructure and inclusion in ed R&D requires the strong participation of Historically Black Colleges and Universities (HBCUs), Minority-Serving Institutions (MSIs), and Tribal Colleges and Universities (TCUs). An equitable education R&D ecosystem must focus on the representation of these institutions and diverse student populations in research topics, grants, and funding to support learners from all backgrounds, particularly those of disadvantaged circumstances. Actionable steps include establishing diverse peer review panels, incentivizing grant proposals from minority-serving institutions, and creating specialized scholar programs. Additionally, programs should explicitly outline resource accessibility, leadership dynamics, funder relationships, grant processes, and inclusive language to dismantle structural inequalities and make the invisible visible.

Conclusion

Recommendations from the ALI task forces propose that sufficient funding, inclusivity, and diverse representation of higher education institutions are strong first steps in a path toward a more equitable and effective education system. The education R&D ecosystem must be a learning-oriented network committed to the principles of innovation that the system itself strives to promote across best practices in education and learning.

K-12 STEM Education For the Future Workforce: A Wish List for the Next Five Year Plan

This report was prepared in partnership with the Alliance for Learning Innovation (ALI), to advocate for building a better research and development (R&D) infrastructure in education. The Federation of American Scientists believes that STEM education evolution is necessary to prepare today’s students for tomorrow’s in-demand scientific and technological careers, as well as being a national security pursuit.

American STEM Education in Context

“This country is in the midst of a STEM and data literacy crisis,” opined Elena Gerstmann and Laura Albert in a recent piece for The Hill. Their sentiment represents a widely held concern that America’s global leadership in scientific and technological innovation, anchored in educational excellence, is being relinquished, thereby jeopardizing our economy and national security. Their message recycles a 65-year-old warning to U.S. policy makers, educators, and employers when the USSR seemingly eclipsed our innovation pace with the launch of Sputnik. 

Life magazine devoted their March 1958 edition to a scathing comparison of the playful approach to STEM education in U.S. schools versus the no-nonsense rigor of Russian classrooms. The issue’s theme, “Crisis in Education” was summed up soberly: “The outcomes of the arms race will depend eventually on our schools and those of the Russians.” America answered the bell and came out swinging. Under President Eisenhower, the National Aeronautics and Space Administration (NASA) and the Defense Advanced Research Projects Agency (DARPA) were both established in 1958, as was the National Defense Education Act that channeled billions of dollars into K-12 and collegiate STEM education. By innumerable metrics (the Apollo program, the internet, GPS, and manufacturing dominance, all fueled by an internationally envied higher education system), the United States reclaimed preeminence in STEM innovation.

LIFE March 24, 1958

Over the next four decades tectonic shifts in demographics, economics, and politics rearranged continental competition such that complacent U.S. education systems were once again called on the carpet. In 2001, shortly before terrorists struck the World Trade Center and Pentagon, a U.S. Senate report on homeland vulnerability echoed that of Life magazine decades prior: “The inadequacies of our systems of research and education pose a greater threat to U.S. national security over the next quarter century than any potential conventional war that we might imagine.”  The painfully prescient study, product of the Hart-Rudman Commission on National Security/21st Century, identified the advancement of information technology, bioscience, energy production, and space science, all overlain by economic and geopolitical destabilization, as the nation’s greatest challenge and our new Sputnik. The Commission called on reformed education systems to quadruple the number of scientists and engineers and to dramatically increase the number and skills of science and mathematics teachers. As in 1958, leaders responded boldly, creating the Department of Homeland Security in 2001, and planting the seeds for the 2007 America Creating Opportunities to Meaningfully Promote Excellence in Technology, Education, and Science (COMPETES) Act.

Funding for research and development across federal agencies significantly increased over the decade, including a budget boost for the National Science Foundation’s grant programs supporting emergent scholars (Faculty Early Career Development Program, or CAREER), the research capacities of targeted jurisdictions (Established Program to Stimulate Competitive Research, or EPSCoR), Graduate Research Fellowships (GRF), the Robert Noyce Teacher Scholarships, the Advanced Technological Education (ATE) program, and others designed to bolster diverse talent pipelines to STEM careers. Despite increases in the number of students studying science and engineering in the U.S, there is still a significant gap in diverse representation and equitable access to opportunities in the STEM field; ensuring greater inclusion and diversity in the American science and engineering landscape is essential to engaging the “missing millions,” or persistently underrepresented minority groups and women, in the nation’s STEM workforce and education programs.

Nearly a quarter century later, America is once again in a STEM talent crisis. The solutions of Hart-Rudman and of the Eisenhower era need an update. This latest Sputnik moment, unlike the space race that motivated the National Defense Education Act, and the terrorism that spawned Homeland Security, is more perfuse and profound, permeating every aspect of our lives: artificial intelligence and machine learning, CRISPR (clustered regularly interspaced short palindromic repeat), quantum computing, 6G and 7G communications, semiconductors, hydrogen and other energy sources, lithium and other ionic energy storage, robotics, big data, blockchain, biopharmaceuticals, and other emergent technologies.

To relinquish the lead in these arenas would put the U.S. economy, national security, and social fabric in the hands of other nations. Our new USSR is a roulette wheel of friends and foes vying for STEM supremacy including Singapore, Japan, China, Germany, the UK, Taiwan, Saudi Arabia, India, South Korea and many more. Not unlike the education crises that came to a head in 1958 and in 2001, our educational Achilles heel is a lack of exposure to and under-preparedness for STEM career pursuit for the majority of diverse young Americans. Further, the U.S. Bureau of Labor Statistics projects that STEM career opportunities will grow 10.8% by 2032, more than four times faster than non-STEM occupations.

What the United States has going for it in 2024 (and was comparatively lacking in the 1950s and the early 2000s) are STEM-rich local schools, communities, and states. Powered by investments of federal agencies (e.g., Smithsonian, NSF, NASA, DOL, ED and others), state governments (governors in Massachusetts, Iowa, Alabama, for example), nonprofits (Project Lead The Way and the Teaching Institute for Excellence in STEM for example), and industries (Regeneron, Collins Aerospace, John Deere, Google, etc.), STEM is now seen as an imperative field by most Americans.  

Today’s STEM education landscape presents significant opportunities and challenges. Existing models of excellence demonstrate readiness to scale. To focus on what works and to channel resources in the direction of broader impacts for maximal benefit is to answer the call of our omni-present 2024 Sputnik.

The Current State: Future STEM Workforce Cultivation

At its root, STEM education is about workforce cultivation for high-demand and high-skill occupations of fundamental importance to American economic vitality and national security. In the ideal state, STEM education also prepares all learners to be critical thinkers who make evidence-based decisions by equipping them with analytical, computational, and scientific ways of knowing. STEM students should learn effective collaboration and problem-solving skills with an interdisciplinary approach, and feel prepared to apply STEM skills and knowledge to everyday life as voters, consumers, parents, and citizens.1

Target Audiences and Service Providers 

The early childhood education community (pre-K-grade 3), both in school and out-of-school (at informal learning centers), has emerged over the last decade as a prime target for boosting STEM education as research findings accumulate around the importance of early exposure to and comfort with STEM concepts and processes. Popular providers of kits and activities, curricula, software platforms, and professional development for educators include Hand2Mind (Numberblocks), Robo Wunderkind, StoryTimeSTEM (Dragonland), NewBoCo (Tiny Techies), BirdBrain Tech (Finch robot), FIRST Lego League (Discover), Museum of Science Boston (Wee Engineer), Iowa Regents’ Center for Early Developmental Education (Light & Shadow), and Mind Research (Spatial-Temporal Math).  

The elementary to middle school level of STEM education options both in and out of school enjoys the richest menu of STEM programming on the market, reflecting stronger curricular freedom to integrate content compared to high schools. Popular STEM programs include Blackbird Code, Derivita Math, FUSE Studio, Positive Physics, Micro:bit, Nepris (now Pathful), Project Lead The Way (Launch and Gateway), FIRST Tech Challenge, Code.org (CS Discoveries), Bootstrap Data Science, and many more.    

The secondary education STEM landscape differs from pre-K-8 in a significant way: although discrete STEM activities and programs are plentiful for integration into secondary science, mathematics, and other classes, the adoption of packaged courses or after-school enrichment opportunities is more common. Project Lead The Way and Code.org offer an array of stand-alone elective STEM courses2, as do local community colleges and universities. Nonprofits and industry sources offer STEM enrichment programs such as the Society of Women Engineers’ SWEnext Leadership Academy, Google’s CodeNext, the Society of Hispanic Professional Engineers’ Virtual STEM Labs, and Girls Who Code’s Summer Immersion. Finally, a number of federal, state, nonprofit and business organizations conduct future workforce programs for targeted students including the federal TRIO program, Advancement Via Individual Determination (AVID), Jobs for America’s Graduates (JAG), and Jobs For the Future (JFF). 

 Investment in STEM Education 

A modestly conservative estimate of the total American investment in STEM education annually is $12 billion, nearly the equivalent of the entire budget of the National Science Foundation or the Environmental Protection Agency. 

For fiscal year 2023 the White House budgeted $4.0424 billion for STEM education across 16 agencies that make up the Subcommittee on Federal Coordination in STEM Education (FC-STEM). Total nonprofit and philanthropic investments are more elusive since there are so many, with origins of their dollars often overlapping with state or local government (grants for example), and wildly variable definitions of STEM investments. That said, U.S. charitable giving to the education sector totaled $64 billion in 2019. A reasonable assumption that two percent made its way to STEM education equates to over $1 billion contributed to the overall funding pie. Business and industry in the United States contribute well over $5 billion annually, a conservative estimated proportion of total annual STEM education market share among ten nations, according to a recent study. K-12 schools spend well over $1 billion on STEM, a minimally modest fraction of the $870 billion total spent on K-12 across the U.S. The same figure would likely be true of America’s annual $700 billion higher education expenditure, minimally $1 billion to STEM. Elusive as definitive figures can be in this space, a glaring reality is that funds are streaming into STEM education at a level where measurable results should be expected. Are resources being distributed for maximal impact? Are measures capturing that impact? Is it enough money?

There are approximately 55.4 million K-12 students across the nation. At $12 billion per year on STEM, that comes to about $217 worth of STEM education annually per young American. Is that enough to move the needle? The answer is a qualified “yes” based on Iowa’s experience. The state launched a legislatively funded STEM education program in 2012, investing on average about $4.2 million annually to provide enrichment opportunities for about one-fifth of all K-12 students, or 100,000 per year. To date, about 1.2 million youth have been served through a total investment of about $50 million. That calculates to $42 per student. The result? Among participants: increased standardized test scores in math and science; increased interest in STEM study and careers; a near doubling of post-secondary STEM majors at community colleges and universities. Thus, from Iowa’s experience, the amount of funding toward American STEM education is adequate to expect systemic gains. The qualifier is that Iowa funds flow toward increased equity (most needy are top priority), school-work alignment (career linked curriculum, professional development), and proof of effectiveness (rigorously vetted and carefully monitored programs). Variance in these three factors can separate ambitions from realities.

Ambitions vs. Realities

The federal STEM education strategic plan Charting a Course for Success: America’s Strategy for STEM Education, identified three consensus goals for U.S. STEM education: a strong STEM literacy foundation for all Americans; increased diversity, equity, and inclusion in STEM study and work; and preparation of the STEM workforce of the future. Three challenges lie between those goals and reality.

Elusive equity. The provision of quality STEM education opportunities to Americans most in need is universally embraced yet difficult to achieve at the program level. Unequal funding of school STEM programs across urban, rural, and suburban public and private school districts equate to less experienced educators and diminished material resources (laboratories, computers, transportation to enrichment experiences) in socioeconomically disadvantaged communities. The challenge is then compounded by the lack of role models to inspire and support youth of underserved subpopulations by race, ability, ethnicity, gender, and geography. Bias, whether implicit or explicit, fuels stereotype threat and identity doubt for too many individuals in schools, colleges, and workplaces, countering diversity and equity efforts.    

School-work misalignment. For most learners, the school experience can seem quite different from the higher education which follows, and the work and life experiences beyond. Employer and learner polls unearth misalignment in priorities: employers value in new hires skills such as relationship building, dealing with complexity and ambiguity, balancing opposing views, collaboration, co-creativity, and cultural sensitivity, in addition to expectations of work-related experiences. Schools typically proclaim missions like “Educating each student to be a lifelong learner and a caring, responsible citizen” omitting the importance of employability. Learners feel that school taught them time management, academic knowledge, and analytical skills, while experiential learning remains limited.

Elusive proof. Evidence of effect can be vexingly evasive. The 2022 progress report of the federal STEM plan clarified the difficulty in verifying reach to those most in need: the identification of participants in STEM programs can be restricted for privacy/legality reasons. The gathering of racial, ethnic, and demographic data on STEM participants may often be unreliable given self-reported or observational identifications as well as the fleeting, often anonymous encounters typical of “STEM Night” or informal experiences at science centers, zoos, and museums.  

Participant profiles aside, variability in program assessments – design and objectives – make meaningful meta-analysis challenging, which creates difficulties in scaling promising STEM programs. “We recommend that states and programs prioritize research and evaluation using a common framework, common language, and common tools” advised a group of evaluators recently.     

Exemplars 

Plentiful success stories exist at the local, regional, and national levels. The following six exemplars are each funded in whole or in part by federal and/or state grants. The first examples are local education systems (one in-school, one out-of-school) masterfully aligning learning experiences to career preparation. The second pair of examples profile a regional out-of-school STEM program powerfully documenting effects on participants, and an in-school enrichment course demonstrating success. And the final pair of examples are a nationwide equity program successfully preparing STEM educators to effectively serve students of diversity, and an exciting consortium effort aimed at refocusing the entire educational enterprise on skills that matter most.

1.a. School-work alignment at the local level

The Barrow Community School District (BCSD) in Georgia is strongly committed to work-based learning (WBL). All 15,000 students are required to take a sequence of exploratory STEM career classes beginning in ninth grade. Fifteen career pathways are available ranging from computing to health, manufacturing to engineering. It all culminates in an optional senior year internship serving 400 students annually. Interns earn dual-enrollment credits in partnership with local colleges and are paid by the employer host. Interns spend 7.5 to 15 hours per week at work experiences in a hospital, on a construction site, or in a production plant. The district employs a full time WBL coordinator to oversee, administer, and evaluate, as well as to cultivate community employer partners. Teachers are expected to spend one week in an industry externship every three to five years. The BCSD commitment to a school experience aligned to future careers is something that every student in any district ought to be able to experience.

1.b. Diverse workforce of the future – local-to-global level

The World Smarts STEM Challenge is a community-based, after-school, real-world problem-solving experience for student workforce development. Funded by a 2021 National Science Foundation ITEST (Innovative Technology Experiences for Students and Teachers) grant in partnership with North Carolina State University, students in the Washington D.C. area are assigned bi-national groups (arranged through a partnership with the International Research and Exchanges Board) to collaborate in solving local/global STEM issues via virtual communications. Groups are mentored by industry professionals. In the process, students develop skills in innovation, investigation, problem-solving, and global citizenship for careers in STEM. Participant diversity is a primary objective. Learners of underrepresented backgrounds, including Black, Hispanic, economically disadvantaged, and female students, are actively recruited from local schools. Educator-facilitators are treated to professional development opportunities to build mentorship skills that support students. The end-product is a World Smarts STEM Activation Kit for implementing the model elsewhere.

2.a. Proof of effect at the regional level out-of-school

NE STEM 4U is an after-school program serving Omaha, Nebraska regional elementary school youth. Programs are hands-on problem-based challenges relevant to children. The staff were interested in the effect of their activities on the excitement, curiosity, and STEM concept gains of participants. The instrument they chose to use is the Dimensions of Success (DoS) observational tool of the P.E.A.R. Institute (Program in Education, Afterschool & Resiliency). The DoS is conducted by a certified administrator who observes and rates four groups of criteria: the learning environment itself, level of engagement in the activity, STEM knowledge and skills gained, and relevancy. Through multiple cohorts over two years, the DoS findings validated the learning approach at NE STEM 4U across dimensions, though with natural variations in positive effect. The upshot is not only that this after-school model is readily replicable, but that the DoS observation tool is a thoroughly vetted, powerful, and readily available instrument that could become a “common tool” in the STEM education program evaluation community.  

2.b. Proof of effect at the regional school level

From a modest New York origin in 1997, Project Lead The Way (PLTW) has blossomed into a nationwide tour de force in STEM education, funded by the Kern Foundation, Chevron, and other philanthropies. Adopted at the community school level where trained educators integrate units at the pre-K-5 and middle school levels (Launch, and Gateway, respectively), or offer courses at the secondary level (Algebra, Computer Science, Engineering, Biomedical), all share a common focus on developing in-demand, transportable skills like problem solving, critical and creative thinking, collaboration, and communication. Career connections are a mainstay. To that end, PLTW is notable for expecting schools to form advisory boards of local employers for feedback and connections. Attitudinal surveys attest to increased student interest in STEM careers. 

3.a. Equity at the national level – diversity and inclusion

The National Alliance for Partnerships in Equity (NAPE) offers a wide array of professional development programs related to STEM equity. One module is called Micromessaging to Reach and Teach Every Student. Educators in and out of school convey micro-messages to students at every encounter. Micro-messages are subtle and typically unconscious. Sometimes they are helpful – a smile or eye contact. Sometimes they can be harmful towards individuals or reveal bias towards a group to which a student may belong – a furrowed brow or a stereotypical comment. Exceedingly rare is micro-message expertise in the teacher preparatory pipeline or in standard professional development. Yet micro-messaging is tremendously influential in the self-perceptions of learners as welcome in STEM. 

3.b. Equity at the national level – leveling the playing field

Durable skills – e.g., teamwork, collaboration, negotiation, empathy, critical thinking, initiative, risk-taking, creativity, adaptability, leadership, and problem-solving – define jobs of the future. AI and automation cannot replace durable skills. The nonprofit America Succeeds has championed a list of 100 durable skills grouped into 10 competencies, based on industry input. They studied state standards for college and career readiness against those competencies and prescribe remedies to states whose standards fall short (most U.S. states). Durable Skills, packaged by America Succeeds, is an equity service par excellence – every learner can command these 100 enduring skills, setting them up for success.

Black and white photo of early 20th century science class

The Case for Increased Investment in STEM Education R&D at the Federal and State Level

Billions of dollars pour into American STEM education each year. Millions of learners and employers benefit from the investment. Outstanding programs produce undeniably successful results for individuals and organizations. And yet, “This country is in the midst of a STEM and data literacy crisis.” How can that be? Here are some of the factors in play.

Recent STEM Education/Workforce Investment Trends

The bi-annual Science and Technology Indicators compiled by the National Science Board were released in March 2024. Noteworthy findings (necessarily a couple of years old given the retrospective analysis) include:

The federal government funds 52% of all academic research and development taking place at colleges and universities (2021).

Contrasting the findings of the NSB against current federal budgets, FY2024 appropriation for STEM education research and development is a work in progress. In comparison to FY23, the budget presented to Congress by the executive branch called for increases for STEM spending across many agencies but not all. The U.S. House and Senate generally propose reductions in spending. The Defense Department’s STEM education line, for example, the National Defense Education Program, is slated for significant reduction (-7.3 percent to -20 percent). The Department of Energy’s Office of Science which funds STEM education, is slated for a slight increase (+1.7 percent). The same is true for the NSF’s STEM education programs (+1.6 percent). NASA’s Office of STEM Engagement is on track for a slight decrease (-.3 percent). The Department of Agriculture’s Research and Education budget is down slightly (-1.7 percent). The U.S. Geological Survey’s Science Support budget that includes human capital development, is down slightly (-1.2 percent). The Department of Education’s Institute for Education Sciences was slated for significant increase by the executive branch though slated for reduction in both the House and Senate budgets. The Department of Homeland Security’s Science and Technology budget which includes funding for university-based centers and minority institution programs is set for reduction (-1.3 percent to -19 percent).

Significant STEM education and workforce development support resides within the CHIPS and Science Act of 2022 which has yet to be fully funded by the Congress. An overall trend in shifting R&D, including education, from federal to private sector support means greater reliance on business and industry to invest in STEM program development. The NSB Indicators report highlights this shift in R&D investment: federal government investment in R&D is at 19 percent in 2021 (down from 30 percent in 2011), while the business sector now funds 75 percent of U.S. R&D funding.

A bottom-line interpretation is that federal investment in STEM education/workforce development, though significant, can hardly be described as a generational response to an economic and national security crisis.

Emergent Frontiers

Meanwhile, economic Sputniks are circling the globe. All driven by semiconducting silicon and germanium chips. Yet another testament to American STEM education is the home-grown invention of chips. But they are built mostly elsewhere – Taiwan, South Korea, and Japan. Semiconductors lie at the heart of our communications (e.g. cell phones, satellites), transportation (e.g. planes, trains, automobiles), defense (e.g. guidance systems and risk analytics), health (e.g. pacemakers, insulin pumps), lifestyle (e.g. dishwashers, Siri and Alexa), and virtually every other aspect of life and commerce. The federal government committed $53 billion through the 2022 CHIPS and Science Act to expand semiconductor talent development, research, and manufacture in the U.S., amplified by $231 billion in commitments to semiconductor development by business and industry. Guidance through the National Strategy on Microelectronics Research was recently released by the White House Office of Science and Technology Policy. When fully realized, the CHIPS Act may come to be a generational response to an international adversarial threat far more profound than Sputnik. 

Equally compelling and weighty in terms of life, liberty, and the pursuit of happiness is to lead in research and development as well as governance around artificial intelligence. Extraordinary workplace and homelife evolution are underway resulting from applications of this new technology. For example, AI dramatically increases precision and thus reduces error in health care. Machine learning is far superior to human eyes at image analysis – MRI or x-ray – for detecting cancer early. On a lighter note, machine learning can dramatically increase the likely appeal of new movies by compressing millions of historic data points and a sea of YouTube videos into a sure box office hit. Conversely are the misuses both present and potential, to AI. The displacement of radiologists, movie script writers, and countless others whose routine, analytical, or creative skills can be performed by robots and neural networked sensors is troublesome yes, but a mild effect of AI compared to the proneness of our privacy, our democratic systems, business and finance integrity, and national defense structures for starters.

The White House Blueprint for an AI Bill of Rights plants an important stake in the ground around AI safeguards. But it does not speak to the cultivation of future managers of AI. Similarly, the U.S. Department of Education report Artificial Intelligence and the Future of Teaching and Learning advises on risks of and uses for AI in diagnostics and descriptive statistics. However, guidance for preparing the upcoming generation to manage AI is not included. The National Science Foundation supports several AI-education studies that may prove worthy of scaling.

A potpourri of additional emergent trends fuel the current STEM crisis. Many are technological innovations, unearthing powers of manipulation and control with which society is ill-prepared to manage. Quantum computing is one such innovation – using subatomic particle positioning, qubits, to store information. Computers will become exponentially faster and more powerful, possibly solving climate change while also deciphering everyone’s passwords. Relatedly, revolutions in cybersecurity and data analytics may be out ahead of societal grasp. Many educational programs at the local and national levels have emerged in this space, including eCybermission from the Army Education Outreach Program (AEOP), and Data Science Foundations using sports, finance, and other contexts for sense-making, from EverFi.

Not everyone needs to know how a microwave oven works in order to use it effectively. But U.S. citizens bear the responsibility for weighing ethical, equitable, and legal dimensions of STEM advancements as voters, educators, parents, and consumers. Whether it be CRISPR alterations of individuals’ genetics, socioeconomic dimensions of factory automation, morality aspects of Directed Energy Weaponry (DEW), the cost/benefit balance of climate mitigation technologies such as carbon sequestration, and so on, STEM education and workforce development need to be out front. That requires additional investment.

Supply-Demand Imbalance

Emergent technologies will drive job opportunities in the STEM arena that are expected to grow at four times the rate of jobs in other sectors in the coming decade. While it is encouraging that post-secondary STEM certificates and degrees have increased over the last decade (growing from 982,000 in 2012 to 1,310,000 in 2021), this growth is a ripple when the field needs a wave. Further, significant subpopulations of Americans are underrepresented in STEM majors and jobs. Women make up just about one-third of the science and engineering workforce. While racial and ethnic subgroups including Alaska Native, Black or African American, American Indian, and Hispanic or Latino comprise 30% of the total workforce, just 23% are in STEM jobs. Rural residency exacerbates those disparities for all subpopulations regarding the STEM education pipeline. While 40% of urban adults have at least a bachelor’s degree, only 25% of rural residents do.

The commitment to diversify the STEM talent pipeline is a universal consensus across federal, state, local, corporate, nonprofit, and philanthropic investors in STEM education and workforce development. Numerous programs devoted to equity and inclusion are at work today with promising results, ripe for scaling.

Impact on Individuals and Society

Of all the arguments supporting increased investment in STEM education R&D to solve our current STEM crisis – tepid federal spending, ominously powerful inventions, and the dearth of talent for advancing and managing those inventions – a fourth argument eclipses each of them: STEM education improves the lives of individuals irrespective of their occupation. And in so doing, STEM education improves communities and the country at large.

Learners fortunate to enjoy quality STEM education develop creativity through imaginative design, interpretation, and representation of investigations. The tools they use strengthen technology literacy. The mode of discovery is highly social, honing communication and cooperation skills. With no sage-on-the-stage they develop independence of thought. Failure happens, forging perseverance and resilience in its wake. Asking and answering questions nurtures curiosity. Defending and refuting ideas cultivates critical thinking, Truth and facts are evidence-based yet always tentative. Empathy is cultivated through alternative interpretations or points of view. And confidence to pursue STEM as a career comes from doing STEM.

The prospect of an entire population of Americans thus equipped is the most compelling case for strategically increased R&D investment in STEM education.

Photo of 2008 Ethics in the Science Classroom

Policy Recommendations for Increasing the Efficacy of Education R&D to Support STEM Education

Where do federal, state, local, corporate, nonprofit, and philanthropic STEM investors look for guidance in the alignment and leveraging of their dollars to nationwide priorities? The closest we have to a “master plan” is the federal STEM education strategic plan mandated by the America COMPETES Act. Updated every five years by the White House Office of Science and Technology Policy in close collaboration with federal agencies, the 2018-2023 plan is due for an update, and it is likely the next iteration will be released soon. 

While the STEM community waits, valuable input on the next iteration was recently provided to the OSTP from the STEM Education Coalition. Coalition members, (numbering over 600) represent the spectrum of STEM advocates – business and industry, higher education, formal and informal K-12 education, nonprofits, and national/state policy groups – and collectively hold great sway in matters of STEM education nationally. The expiring federal STEM plan is closely reflective of their input, as its successor likely will be as well. 

Six of the following ten recommendations build upon the STEM Education Coalition’s priorities, while the remaining four recommendations address gaps in the pipeline from STEM education to workforce pathways.

In order to maximize research and development to improve STEM education, we have distilled ten recommendations:

  1. Devote resources (human and financial) to both the scaling of, and continued research and development in, interventions that disrupt the status quo when it comes to rural under-reach and under-service in STEM education.
  2. Devote resources to both the scaling of, and continued research and development in transdisciplinary (a.k.a. Convergent) STEM teaching and learning, formally and informally.
  3. STEM teacher recruitment and training to support learning characterized on page 11 is a high-value target for investment in both the scaling of existent models as well as research and development on this essential frontier.
  4. Expand student authentic career-linked or work-based learning experiences to all, earning credits while acquiring job skills, by improving coordination capacity, and crediting – especially earning core (graduation) credits. 
  5. Devote resources to research and development on coordination across components of the STEM education system – in school and out of school, educator preparation – at the local, state and national levels.
  6. Devote resources to research and development toward improved awareness/communication systems of Federal STEM education agencies.
  7. Devote resources to research and development on supporting the training of STEM teachers and professionals for career coaching on a real-time, as-needed basis for all youth.
  8. Devote resources to research and development on the expansion of local/global challenge-solution learning opportunities and how they  influence student self-efficacy and STEM career trajectories.
  9. Devote resources to research and development of a digital platform readily accessible, easily navigable, and comprehensively thorough, for education-providers to harvest effective, vetted STEM programs from across the entire producer spectrum.
  10. Devote resources to the design and development of a catalog of STEM/workforce education “discoveries” funded by federal grant agencies (e.g., NSF’s I-Test, DR-K12, INCLUDES, CSforAll, etc.) to be used by STEM educators, developers and practitioners.

Recommendation 1. Devote resources (human and financial) to both the scaling of, and continued research and development in, interventions that disrupt the status quo when it comes to rural under-reach and under-service in STEM education.

Aligning to the STEM Ed Coalition’s priority of “Achieving Equity in STEM Education Must Be a National Priority,” this recommendation is central to the success of STEM education. The economic and moral imperative to broaden access to quality STEM education and to high-demand STEM careers is a national consensus. Lack of access and opportunity across rural America, where 20% of all youth attend half of all school districts  and where persistent inequality hits members of racial and ethnic minority groups hardest, creates a high-value target.

STEM Excellence and Leadership Project

Identifying and nurturing STEM talent in rural K-12 settings can be a challenge. The Belin-Blank Center for Gifted Education and Talent Development successfully designed and implemented the “STEM Excellence and Leadership Project” at the middle school level. Funded by the NSF’s Advancing Informal STEM Learning program, flexible professional development, wide-net-casting of students, networking within the community, and career-counseling, resulted in increased creatively, critical thinking, and positive perceptions of mathematics and science.

Recommendation 2. Devote resources to both the scaling of, and continued research and development in transdisciplinary (a.k.a. Convergent) STEM teaching and learning, formally and informally. 

Aligning to the STEM Ed Coalition’s Priority “Science Education Must Be Elevated as a National Priority within a Transdisciplinary Well-Rounded STEM Education,” we need more investment in R&D to understand the transdisciplinary STEM teaching and learning models that improve student outcomes. America’s formal education model remains largely reflective of the 1894 recommendations of the Committee of Ten: annually teach all students History, English, Mathematics, Physics, Chemistry, etc. This prevailing “layer cake” approach serves transdisciplinary education poorly. Even the Next Generation Science Standards upon which state and district science standards are largely based, focuses on developing… “an in-depth understanding of content and develop key skills…” All modern STEM-related challenges facing Generations Z, Alpha, and Beta require an entirely different brand of education – one of transdisciplinary inquiry.

USPTO Motivates Young Innovators and Entrepreneurs

The United States Patent and Trade Office (USPTO)’s National Summer Teacher Institute (NSTI) on Innovation, STEM, and Intellectual Property (IP) trains teachers to incorporate concepts of making, inventing, and intellectual property creation and protection into classroom instruction, with the goal to inspire and motivate young innovators and entrepreneurs. To date the program claims 22,000 hours of IP and invention education training of 444 teachers in 50 states – 110 of whom have inventions – now equipped to spread the power of invention education and IP to hundreds of thousands of learners across the country and the world. We should better understand the program components that enable this kind of transdisciplinary learning.

Recommendation 3. STEM teacher recruitment and training to support learning is a high-value target for investment in both the scaling of existent models as well as research and development on this essential frontier. 

Aligning to the STEM Ed Coalitions’ priority “Increase the Number of STEM Teachers in Our Nation’s Classrooms,” we need to deploy more education R&D to address America’s well-documented STEM teacher shortage. But the shortage is only half of the challenge we face. The other half is equipping teachers to authentically teach STEM, not merely a discipline underneath the STEM umbrella. Efforts such as the NSF’s Robert Noyce Teacher Scholarship program and the UTeach model support the production of excellent teachers of mathematics and science, but not STEM overall. To teach in a convergent (transdisciplinary) fashion through collaborative community partnerships, on local/global complex issues is beyond the scope and capacity of traditional teacher preparatory models.

Example Programs

Two means for equipping educators to teach STEM are (1) in their pre-professional preparation, and (2) as in-service professional development for disciplinary instructors. Promising examples are flourishing.

  1. STEM Teaching Certificate. A few U.S. states and some national organizations have built STEM licenses and endorsements. Georgia State University’s STEM Certificate program trains teachers to bring a convergent STEM approach to whatever course, “[candidates] figure out how to work across their schools, with the arts, with connections to other subjects.”
  2. In-service STEM Externships. Teachers in industry externships discover workplace connections and durable skills important to build in classrooms. Numerous businesses (e.g., 3M), organizations (e.g. Aerospace/NASA), and states (e.g., Iowa’s NSF ITEST funded externships) conduct variations on the concept, with compelling results.

Recommendation 4. Expand student authentic career-linked or work-based learning experiences to all, earning credits while acquiring job skills, by improving coordination capacity, and crediting – especially earning core (graduation) credits.

Aligning to the STEM Ed Coalition’s priority to “Support Partnerships with Community Based STEM Organizations, Out of School Providers and Informal Learning Providers” education R&D needs to better understand career based learning models that work and deploy these evidence-based practices at scale.

Example Programs

With all 50 U.S. states aggressively pursuing work-based learning (WBL) policies and support, there is an opportunity to study and codify what states are learning to improve and iterate faster. According to the Education Commission of the States, 33 states have a definition for WBL, though variable. Nearly all states report WBL as a state strategy for their Workforce Innovation and Opportunity Act (WIOA) profile. Twenty-eight states legislate funding to support WBL. Less than half of all states permit WBL to count for graduation credits. Of all states, Tennessee presents a particularly aggressive WBL profile worthy of scale/replication. 

Recommendation 5. Devote resources to research and development on coordination across components of the STEM education system – in school and out of school, educator preparation – at the local, state and national levels.

Aligning to the STEM Ed Coalition’s priority to “Take a Systemic Approach to Future STEM Education Interventions,” more R&D should be deployed to study ecosystem models to understand the components that lead to student outcomes

The STEM learning that takes place during the K-12 school day may or may not mesh well with the STEM learning that takes place at museum nights or at summer camp. In both instances, it may or may not align well with local, state, or national assessments. The preparation of educators is widely variable. The curricular content classroom-to-classroom and state-to-state varies. To drop novel grant-funded interventions into the mix is a random act of hope.

Example Programs

STEM Learning Ecosystems now number over 100 across the U.S., providing vertebral backbone to a national coordinative skeleton for STEM education. Formally designated by their membership in the STEM Learning Ecosystems Community of Practice supported by the Teaching Institute for Excellence in STEM (TIES), they each unite “…pre-K-16 schools; community-based organizations, such as after-school and summer programs; institutions of higher education; STEM-expert organizations, such as science centers, museums, corporations, intermediary and non-profit organizations and professional associations; businesses; funders; and informal experiences at home and in a variety of environments” to “…spark young people’s engagement, develop their knowledge, strengthen their persistence and nurture their sense of identity and belonging in STEM disciplines.” Every one of America’s 20,000 cities and towns ought to have a STEM Ecosystem. Just 19,900 to go.

Recommendation 6. Devote resources to research and development toward improved awareness/communication systems of Federal STEM education agencies.

Aligning to the STEM Ed Coalition’s priority toClarify and Define the Role of Federal Agencies and OSTP in Supporting STEM Education” we should utilize R&D and inspiration from other fields to ensure we are propagating knowledge and systems in ways that foster increased transparency and evidence-use.

Awareness is the weak link in the chain of federal STEM education outreach to consumers at local levels. Seventeen federal agencies engage in STEM education via 156 programs spanning pre-K-12 formal and informal, higher education, and adult education.

In 2018-19 a strong push was put forth by the OSTP and the Federal Coordination in STEM subcommittee (FC-STEM) to build STEM.gov or STEMeducation.gov in the spirit of AI.gov and Grants.gov. A one-stop clearinghouse through which Americans can explore and discover funding, programs, and expertise in STEM. To date, the closest analog is https://www.ed.gov/stem.

Example Programs

Discrete programs of various federal agencies have employed clever tactics for awareness and communication, as described in the 2022 Progress Report on the Implementation of the Federal STEM Education Strategic Plan. The AmeriCorp program, for example, partnered with Mathematica to build a web-based interactive SCALERtool useable by education professionals, local education agencies, state education agencies, nonprofits, state and local government agencies, universities and colleges, tribal nations, and others to request participants to address local challenges they have identified, including STEM. Similarly, the National Institute of Standards and Technology launched their NIST Educational STEM Resource registry (NEST-R) to provide wide access to NIST educational and workforce development content including STEM resource records. Can the concept be broadened to a grand unifying collective? 

Recommendation 7. Devote resources to research and development on supporting the training of STEM teachers and professionals for career coaching on a real-time, as-needed basis for all youth. 

Gen Z and Gen Alpha may end up in jobs like machine learning tech, molecular medical therapist, cryptocurrency auditor, big data distiller, climate change mitigator, or jetpack mechanic. From whom can they expect good career coaching? It is unrealistic to expect that their school counselors can keep up, with an average caseload of 385 students across all disciplines, their hands are full. STEM teachers, both the disciplinary and the integrated type, are best positioned to take on more responsibility for career coaching, with the help of counselors, administrators, librarians in fact it is an all-hands-on-deck challenge.

Example Programs

Meaningful Career Conversations is a program begun in Colorado now spreading to other states. It is a light training experience of four hours to equip educators and others with whom youth come into contact to conduct conversations that steer students toward reflection, exploration, and consideration of career pathways of interest. Trainings are based upon starters and prompts that get students talking about and reflecting on their strengths and interests, such as “What activities or places make you feel safe and valued? Why?” Not a silver bullet, but a model of distributed responsibility which, by engaging core teachers and other adults in career guidance, can help more students find their way toward a STEM career.

Recommendation 8. Devote resources to research and development on the expansion of local/global challenge-solution learning opportunities and how they  influence student self-efficacy and STEM career trajectories.

The standardization of a vision for STEM in classrooms across America will take time and resources. In the meantime, programs like MIT Solve can fast-track authentic learning experiences in school and after school. It is the ultimate student-centeredness to invite groups of youth to think big – identify challenges for which they are enthused and tap all imaginable resources in dreaming up solutions – to command their own learning.

Example Programs

Common in higher education are capstone projects, applied coursework, even entire college missions (e.g., Olin College) that center the student learning experience around local/global challenges and solutions. 

For citizens of all ages there are opportunities like Changemakers Challenges, and the “Reinvent the Toilet” competition of the Gates Foundation.

At the K-12 level, FIRST Lego League teams learn about robotics through humanitarian themes such as adaptive technologies for the disabled. The World Food Prize offers student group projects focused on global food security challenges. Of similar format is Future Cities, and Invention Convention. These well-evaluated programs are prime for expansion or replication.

Recommendation 9.  Devote resources to research and development of a digital platform readily accessible, easily navigable, and comprehensively thorough, for education-providers to harvest effective, vetted STEM programs from across the entire producer spectrum.

More than 50 different programs are named in this paper, each an exemplar, a mere snapshot of the STEM programs available to the pre-K-12 community in and out of school. Therein lies a challenge/opportunity uniquely defining this moment in American educational history compared to the 1958 and 2001 crises: an embarrassment of riches.

Example Programs

The number of databases and resource catalogs on STEM education programs available to educators is almost as overwhelming as the number of programs themselves. A few standouts help dampen the decibels (though none are perfect):  

  1. What Works Clearinghouse (WWC). Established in 2002 under the Institute for Education Sciences at the U.S. Department of Education, the WWC does the hard work for educators of reviewing the research to make evidence-based recommendations about instruction. A priceless service. The trick is distillation. Their goal to digest and disseminate education research gets the material down to the level of curriculum developers, publishers, teacher-trainers, etc. Overwhelming though, for casual-shopping educators.  
  2. STEMworks Database. Born under Change The Equation in 2012, WestEd acquired STEMworks in 2017, a tool to sift through the noise using a rigorous rubric (Design Principles) to present sure-fire winning STEM programs to educators and organizations. Programs (kits, courses, software, lessons) submit applications for expert review. The result is a “Searchable honor-roll” of high-quality STEM. The hitch? Relatively few providers apply, especially not the emergent or experimental yet to acquire robust impact evidence.

Recommendation 10. Devote resources to the design and development of a catalog of STEM/workforce education “discoveries” funded by federal grant agencies (e.g., NSF’s I-Test, DR-K12, INCLUDES, CSforAll, etc.) to be used by STEM educators, developers and practitioners.

This recommendation relates to recommendation #9 except expressly regarding federal programs, and related to recommendation #6 except not a mere roster of offerings, but a vetted (and user-friendly) What Works Clearinghouse for all prior grants that yields empirical support for preK-12 STEM, across all agencies. What a treasure-trove of proven interventions and innovations across NSF, DE, DOE, DoD and on, mostly unknown to practitioners across the United States.

Each federal agency currently posts STEM opportunities at their websites (e.g., http://www.ed.gov/stem, http://dodstem.us/, http://www.nsf.gov/funding, http://www.nasa.gov/education, https://science.education.nih.gov/). These tools are valuable, but a desperate need remains for a singular STEM.gov style searchable landing page. 

There must be a way to view what worked for the thousands of R&D projects funded by these agencies. An online shopping mall for successful preK-12 STEM curricula, teaching approaches, equity practices, virtual platforms, etc. CoSTEM could create a “STEM Ideas that Work” landing page to ensure that emerging research insights are captured in systematic and accessible ways.

Example Programs

The Ideas That Work resource is an analog. Curated by the Office of Special Education Programs at the U.S. Department of Education, it is a searchable database that includes all grants past and current exclusive to the NSF. Special educators and families can search, e.g., “behavioral challenge” yielding resources and toolkits, training modules, tip sheets, etc. 

Black and white photo of early 20th century science class

Recommended Actions of ALI and Other Stakeholders

While we hope to see many of these recommendations in the forthcoming Five Year STEM Plan, to actualize these recommendations, it will take multiple actors working together to advance the STEM education field.  

The Alliance for Learning Innovation has perhaps the most potent of tools among STEM/workforce stakeholders to affect change: communication.

ALI should host events, publish white papers, develop convenings and deploy mass media and other awareness and advocacy modes for rallying the august collective of member organizations toward amplifying America’s rural STEM equity opportunity, career coaching capacity, educator-employer partnership potential, convergence approach to learning, along with six other recommendations, doing more for preparing the future STEM workforce than any other action, including investment. 

Investment is a close second-most impactful action ALI can take. If all STEM investors – federal, state, corporate, and philanthropic, aligned around a finite array of pressing priorities served by a proven set of interventions (the very function of this report), the collective impact would transform systems. What it would take is an aggregator. ALI or a designee organization, functioning as an agent for businesses, philanthropies and other STEM investors, can make funding recommendations (or more ambitiously, pool investor funds) based on consensus goals of the STEM cooperative, acting to focus investments accordingly.               

Federal Agencies have made significant gains toward cooperative and complementary STEM education support through the sustenance of interagency working groups on Computational Literacy, Convergence, Strategic Partnerships, Transparency & Accountability, Inclusion in STEM, and Veterans and Military Spouses in STEM. As a result, improvements are being made in coordination and increased transparency about federal education R&D investments, especially between the National Science Foundation and the Department of Education. And yet, more needs to be done. 

Business, Industry, and Philanthropic Organizations have the ability to pilot or expand proven programs to national scale, as many examples herein attest. However, the impact of the investments of the private sector may fall short of systemic change due to a smorgasbord of pet programs chosen by each entity, leading to incremental rather than wholesale progress.

Business, industry, and philanthropic investors in STEM education should pool their resources around a finite array of proven programs for maximal, collective impact. A functional intermediary such as the Alliance for Learning Innovation could represent the interests of all non-government STEM funders by winnowing the horde of pre-K-12 STEM education programs to only those most effective at achieving consensus goals and priorities. The outcome might be a Consumer Reports-style top-rated performers menu that concentrates investments, amplifying impact. Like federal agencies, non-government funders should consider driving the advancement of transdisciplinary (convergent) STEM education, work-based or career-linked learning, the synchronization of in-school and out-of-school STEM education, educator career-coaching capacity, and the development of rural, diverse STEM workforce talent.     

States are best positioned to help local education/workforce organizations meet the human resource challenges and the material challenges inhibiting full production of future workers for high demand careers. It is state government that sets the policies that determine practices.

K-12 formal and informal education at the daily practical level bears the greatest responsibility to act on behalf of the future STEM workforce. Insofar as government and non-government funded programs support, and state policies empower, and preparatory trainings equip, educators should seize this moment in history to help American economic vitality and national security one student at a time.

Others at the table include post-secondary institutions, media outlets, faith communities, local trade and professional societies, social service providers, families, and citizens at-large. Each should contribute to the goal of producing a vibrant future workforce by advocating for education research and development policies at the state and federal levels and by partnering with formal and nonformal learning organizations to inspire tomorrow’s innovators in today’s classrooms. 

Students work in cell biology lab in Peckham Hall, 2012

Conclusion

American competitiveness through innovation is driven by leading-edge education systems. Legitimate concern for whether those systems can maintain their lead surfaces during periods of vulnerability whether eclipsed in the space race, or comparatively under-armored in military advancement, or surpassed in the advancement of information technology. To relinquish leadership in innovation is a threat to the U.S. economy and national security. In response to periodic threats to American innovation preeminence, bold investments in STEM education have produced waves of talent for securing the helm. 

This era is different. A myriad of fronts for innovation advancement – automation, machine learning, molecular medicine, energy transformation, cybersecurity – each harboring an existential challenge, heighten the imperative for action to an unprecedented level. And yet, the U.S. has never been more prepared to act. A wealth of pre-K-12 STEM programs and infrastructure stand in testament to legacy investments by the federal government and the private sector. This time, the challenge is to engage a broader swath of the population, especially those underserved and underrepresented in STEM programs of the past. And in tight budgetary times, broadened opportunities must utilize evidence-based solutions proven to work, whether they be in the realm of teacher preparation, equity and inclusion, early learning, informal education, community engagement, mathematics, coding, quantum physics, or all the above and more.

The best time to invest is when the pathway to success is clear. The tools and the know-how for producing tomorrow’s STEM workforce reside within pre-K-12 systems today. For public and private investors alike, there is an opportunity for amplification through collective impact. By collectively identifying high-impact solutions transparent in design and indisputable in effect, aligning resources for surgical precision rather than shotgun spray, and scaling known winners to all young Americans, the current challenge to U.S. innovation leadership will be met. Enough with moving the needle. It is time to pin the needle, shattering the gauge.  

Predicting Progress: A Pilot of Expected Utility Forecasting in Science Funding

Read more about expected utility forecasting and science funding innovation here.

The current process that federal science agencies use for reviewing grant proposals is known to be biased against riskier proposals. As such, the metascience community has proposed many alternate approaches to evaluating grant proposals that could improve science funding outcomes. One such approach was proposed by Chiara Franzoni and Paula Stephan in a paper on how expected utility — a formal quantitative measure of predicted success and impact — could be a better metric for assessing the risk and reward profile of science proposals. Inspired by their paper, the Federation of American Scientists (FAS) collaborated with Metaculus to run a pilot study of this approach. In this working paper, we share the results of that pilot and its implications for future implementation of expected utility forecasting in science funding review. 

Brief Description of the Study

In fall 2023, we recruited a small cohort of subject matter experts to review five life science proposals by forecasting their expected utility. For each proposal, this consisted of defining two research milestones in consultation with the project leads and asking reviewers to make three forecasts for each milestone:

  1. the probability of success;
  2. The scientific impact of the milestone, if it were reached; and
  3. The social impact of the milestone, if it were reached.

These predictions can then be used to calculate the expected utility, or likely impact, of a proposal and design and compare potential portfolios.

Key Takeaways for Grantmakers and Policymakers

The three main strengths of using expected utility forecasting to conduct peer review are

Despite the apparent complexity of this process, we found that first-time users were able to successfully complete their review according to the guidelines without any additional support. Most of the complexity occurs behind-the-scenes, and either aligns with the responsibilities of the program manager (e.g., defining milestones and their dependencies) or can be automated (e.g., calculating the total expected utility). Thus, grantmakers and policymakers can have confidence in the user friendliness of expected utility forecasting. 

How Can NSF or NIH Run an Experiment on Expected Utility Forecasting?

An initial pilot study could be conducted by NSF or NIH by adding a short, non-binding expected utility forecasting component to a selection of review panels. In addition to the evaluation of traditional criteria, reviewers would be asked to predict the success and impact of select milestones for the proposals assigned to them. The rest of the review process and the final funding decisions would be made using the traditional criteria. 

Afterwards, study facilitators could take the expected utility forecasting results and construct an alternate portfolio of proposals that would have been funded if that approach was used, and compare the two portfolios. Such a comparison would yield valuable insights into whether—and how—the types of proposals selected by each approach differ, and whether their use leads to different considerations arising during review. Additionally, a pilot assessment of reviewers’ prediction accuracy could be conducted by asking program officers to assess milestone achievement and study impact upon completion of funded projects.

Findings and Recommendations

Reviewers in our study were new to the expected utility forecasting process and gave generally positive reactions. In their feedback, reviewers said that they appreciated how the framing of the questions prompted them to think about the proposals in a different way and pushed them to ground their assessments with quantitative forecasts. The focus on just three review criteria–probability of success, scientific impact, and social impact–was seen as a strength because it simplified the process, disentangled feasibility from impact, and eliminated biased metrics. Overall, reviewers found this new approach interesting and worth investigating further. 

In designing this pilot and analyzing the results, we identified several important considerations for planning such a review process. While complex, engaging with these considerations tended to provide value by making implicit project details explicit and encouraging clear definition and communication of evaluation criteria to reviewers. Two key examples are defining the proposal milestones and creating impact scoring systems. In both cases, reducing ambiguities in terms of the goals that are to be achieved, developing an understanding of how outcomes depend on one another, and creating interpretable and resolvable criteria for assessment will help ensure that the desired information is solicited from reviewers. 

Questions for Further Study

Our pilot only simulated the individual review phase of grant proposals and did not simulate a full review committee. The typical review process at a funding agency consists of first, individual evaluations by assigned reviewers, then discussion of those evaluations by the whole review committee, and finally, the submission of final scores from all members of the committee. This is similar to the Delphi method, a structured process for eliciting forecasts from a panel of experts, so we believe that it would work well with expected utility forecasting. The primary change would therefore be in the definition and approach for eliciting criterion scores, rather than the structure of the review process. Nevertheless, future implementations may uncover additional considerations that need to be addressed or better ways to incorporate forecasting into a panel environment. 

Further investigation into how best to define proposal milestones is also needed. This includes questions such as, who should be responsible for determining the milestones? If reviewers are involved, at what part(s) of the review process should this occur? What is the right balance between precision and flexibility of milestone definitions, such that the best outcomes are achieved? How much flexibility should there be in the number of milestones per proposal? 

Lastly, more thought should be given to how to define social impact and how to calibrate reviewers’ interpretation of the impact score scale. In our report, we propose a couple of different options for calibrating impact, in addition to describing the one we took in our pilot. 

Interested grantmakers, both public and private, and policymakers are welcome to reach out to our team if interested in learning more or receiving assistance in implementing this approach.


Introduction

The fundamental concern of grantmakers, whether governmental or philanthropic, is how to make the best funding decisions. All funding decisions come with inherent uncertainties that may pose risks to the investment. Thus, a certain level of risk-aversion is natural and even desirable in grantmaking institutions, especially federal science agencies which are responsible for managing taxpayer dollars. However, without risk, there is no reward, so the trade-off must be balanced. In mathematics and economics, expected utility is the common metric assumed to underlie all rational decision making. Expected utility has two components: the probability of an outcome occurring if an action is taken and the value of that outcome, which roughly corresponds with risk and reward. Thus, expected utility would seem to be a logical choice for evaluating science funding proposals. 

In the debates around funding innovation though, expected utility has largely flown under the radar compared to other ideas. Nevertheless, Chiara Franzoni and Paula Stephan have proposed using expected utility in peer review. Building off of their paper, the Federation of American Scientists (FAS) developed a detailed framework for how to implement expected utility into a peer review process. We chose to frame the review criteria as forecasting questions, since determining the expected utility of a proposal inherently requires making some predictions about the future. Forecasting questions also have the added benefit of being resolvable–i.e., the true outcome can be determined after the fact and compared to the prediction–which provides a learning opportunity for reviewers to improve their abilities and identify biases. In addition to forecasting, we incorporated other unique features, like an exponential scale for scoring impact, that we believe help reduce biases against risky proposals. 

With the theory laid out, we conducted a small pilot in fall of 2023. The pilot was run in collaboration with Metaculus, a crowd forecasting platform and aggregator, to leverage their expertise in designing resolvable forecasting questions and to use their platform to collect forecasts from reviewers. The purpose of the pilot was to test the mechanics of this approach in practice, see if there are any additional considerations that need to be thought through, and surface potential issues that need to be solved for. We were also curious if there would be any interesting or unexpected results that arise based on how we chose to calculate impact and total expected utility. It is important to note that this pilot was not an experiment, so we did not have a control group to compare the results of the review with. 

Since FAS is not a grantmaking institution, we did not have a ready supply of traditional grant proposals to use. Instead, we used a set of two-page research proposals for Focused Research Organizations (FROs) that we had sourced through separate advocacy work in that area.1 With the proposal authors’ permission, we recruited a cohort of twenty subject matter experts to each review one of five proposals. For each proposal, we defined two research milestones in consultation with the proposal authors. Reviewers were asked to make three forecasts for each milestone:

  1. The probability of success;
  2. The scientific impact, conditional on success; and
  3. The social impact, conditional on success.

Reviewers submitted their forecasts on Metaculus’ platform; in a separate form they provided explanations for their forecasts and responded to questions about their experience and impression of this new approach to proposal evaluation. (See Appendix A for details on the pilot study design.)

Insights from Reviewer Feedback

Overall, reviewers liked the framing and criteria provided by the expected utility approach, while their main critique was of the structure of the research proposals. Excluding critiques of the research proposal structure, which are unlikely to apply to an actual grant program, two thirds of the reviewers expressed positive opinions of the review process and/or thought it was worth pursuing further given drawbacks with existing review processes. Below, we delve into the details of the feedback we received from reviewers and their implications for future implementation.

Feedback on Review Criteria

Disentangling Impact from Feasibility

Many of the reviewers said that this model prompted them to think differently about how they assess the proposals and that they liked the new questions. Reviewers appreciated that the questions focused their attention on what they think funding agencies really want to know and nothing more: “can it occur?” and “will it matter?” This approach explicitly disentangles impact from feasibility: “Often, these two are taken together, and if one doesn’t think it is likely to succeed, the impact is also seen as lower.” Additionally, the emphasis on big picture scientific and social impact “is often missing in the typical review process.” Reviewers also liked that this approach eliminates what they consider biased metrics, such as the principal investigator’s reputation, track record, and “excellence.” 

Reducing Administrative Burden

The small set of questions was seen as more efficient and less burdensome on reviewers. One reviewer said, “I liked this approach to scoring a proposal. It reduces the effort to thinking about perceived impact and feasibility.” Another reviewer said, “On the whole it seems a worthwhile exercise as the current review processes for proposals are onerous.” 

Quantitative Forecasting

Reviewers saw benefits to being asked to quantify their assessments, but also found it challenging at times. A number of reviewers enjoyed taking a quantitative approach and thought that it helped them be more grounded and explicit in their evaluations of the proposals. However, some reviewers were concerned that it felt like guesswork and expressed low confidence in their quantitative assessments, primarily due to proposals lacking details on their planned research methods, which is an issue discussed in the section “Feedback on Proposals.” Nevertheless, some of these reviewers still saw benefits to taking a quantitative approach: “It is interesting to try to estimate probabilities, rather than making flat statements, but I don’t think I guess very well. It is better than simply classically reviewing the proposal [though].” Since not all academics have experience making quantitative predictions, we expect that there will be a learning curve for those new to the practice. Forecasting is a skill that can be learned though, and we think that with training and feedback, reviewers can become better, more confident forecasters.

Defining Social Impact

Of the three types of questions that reviewers were asked to answer, the question about social impact seemed the harder one for reviewers to interpret. Reviewers noted that they would have liked more guidance on what was meant by social impact and whether that included indirect impacts. Since questions like these are ultimately subjective, the “right” definition of social impact and what types of outcomes are considered most valuable will depend on the grantmaking institution, their domain area, and their theory of change, so we leave this open to future implementers to clarify in their instructions. 

Calibrating Impact

While the impact score scale (see Appendix A) defines the relative difference in impact between scores, it does not define the absolute impact conveyed by a score. For this reason, a calibration mechanism is necessary to provide reviewers with a shared understanding of the use and interpretation of the scoring system. Note that this is a challenge that rubric-based peer review criteria used by science agencies also face. Discussion and aggregation of scores across a review committee helps align reviewers and average out some of this natural variation.2

To address this, we surveyed a small, separate set of academics in the life sciences about how they would score the social and scientific impact of the average NIH R01 grant, which many life science researchers apply to and review proposals for. We then provided the average scores from this survey to reviewers to orient them to the new scale and help them calibrate their scores. 

One reviewer suggested an alternative approach: “The other thing I might change is having a test/baseline question for every reviewer to respond to, so you can get a feel for how we skew in terms of assessing impact on both scientific and social aspects.” One option would be to ask reviewers to score the social and scientific impact of the average grant proposal for a grant program that all reviewers would be familiar with; another would be to ask reviewers to score the impact of the average funded grant for a specific grant program, which could be more accessible for new reviewers who have not previously reviewed grant proposals. A third option would be to provide all reviewers on a committee with one or more sample proposals to score and discuss, in a relevant and shared domain area.

When deciding on an approach for calibration, a key consideration is the specific resolution criteria that are being used — i.e., the downstream measures of impact that reviewers are being asked to predict. One option, which was used in our pilot, is to predict the scores that a comparable, but independent, panel of reviewers would give the project some number of years following its successful completion. For a resolution criterion like this one, collecting and sharing calibration scores can help reviewers get a sense for not just their own approach to scoring, but also those of their peers.

Making Funding Decisions

In scoring the social and scientific impact of each proposal, reviewers were asked to assess the value of the proposal to society or to the scientific field. That alone would be insufficient to determine whether a proposal should be funded though, since it would need to be compared with other proposals in conjunction with its feasibility. To do so, we calculated the total expected utility of each proposal (see Appendix C). In a real funding scenario, this final metric could then be used to compare proposals and determine which ones get funded. Additionally, unlike a traditional scoring system, the expected utility approach allows for the detailed comparison of portfolios — including considerations like the expected proportion of milestones reached and the range of likely impacts.

In our pilot, reviewers were not informed that we would be doing this additional calculation based on their submissions. As a result, one reviewer thought that the questions they were asked failed to include other important questions, like “should it occur?” and “is it worth the opportunity cost?” Though these questions were not asked of reviewers explicitly, we believe that they would be answered once the expected utility of all proposals is calculated and considered, since the opportunity cost of one proposal would be the expected utility of the other proposals. Since each reviewer only provided input on one proposal, they may have felt like the scores they gave would be used to make a binary yes/no decision on whether to fund that one proposal, rather than being considered as a part of a larger pool of proposals, as it would be in a real review process.

Feedback on Proposals

Missing Information Impedes Forecasting

The primary critique that reviewers expressed was that the research proposals lacked details about their research plans, what methods and experimental protocols would be used, and what preliminary research the author(s) had done so far. This hindered their ability to properly assess the technical feasibility of the proposals and their probability of success. A few reviewers expressed that they also would have liked to have had a better sense of who would be conducting the research and each team member’s responsibilities. These issues arose because the FRO proposals used in our pilot had not originally been submitted for funding purposes, and thus lacked the requirements of traditional grant proposals, as we noted above. We assume this would not be an issue with proposals submitted to actual grantmakers.3  

Improving Milestone Design

A few reviewers pointed out that some of the proposal milestones were too ambiguous or were not worded specifically enough, such that there were ways that researchers could technically say that they had achieved the milestone without accomplishing the spirit of its intent. This made it more challenging for reviewers to assess milestones, since they weren’t sure whether to focus on the ideal (i.e., more impactful) interpretation of the milestone or to account for these “loopholes.” Moreover, loopholes skew the forecasts, since they increase the probability of achieving a milestone, while lowering the impact of doing so if it is achieved through a loophole.

One reviewer suggested, “I feel like the design of milestones should be far more carefully worded – or broken up into sub-sentences/sub-aims, to evaluate the feasibility of each. As the questions are currently broken down, I feel they create a perverse incentive to create a vaguer milestone, or one that can be more easily considered ‘achieved’ for some ‘good enough’ value of achieved.” For example, they proposed that one of the proposal milestones, “screen a library of tens of thousands of phage genes for enterobacteria for interactions and publish promising new interactions for the field to study,” could be expanded to

  1. “Generate a library of tens of thousands of genes from enterobacteria, expressed in E. coli
  2. “Validate their expression under screenable conditions
  3. “Screen the library for their ability to impede phage infection with a panel of 20 type phages
  4. “Publish … 
  5. “Store and distribute the library, making it as accessible to the broader community”

We agree with the need for careful consideration and design of milestones, given that “loopholes” in milestones can detract from their intended impact and make it harder for reviewers to accurately assess their likelihood. In our theoretical framework for this approach, we identified three potential parties that could be responsible for defining milestones: (1) the proposal author(s), (2) the program manager, with or without input from proposal authors, or (3) the reviewers, with or without input from proposal authors. This critique suggests that the first approach of allowing proposal authors to be the sole party responsible for defining proposal milestones is vulnerable to being gamed, and the second or third approach would be preferable. Program managers who take on the task of defining milestones should have enough expertise to think through the different potential ways of fulfilling a milestone and make sure that they are sufficiently precise for reviewers to assess.

Benefits of Flexibility in Milestones

Some flexibility in milestones may still be desirable, especially with respect to the actual methodology, since experimentation may be necessary to determine the best technique to use. For example, speaking about the feasibility of a different proposal milestone – “demonstrate that Pro-AG technology can be adapted to a single pathogenic bacterial strain in a 300 gallon aquarium of fish and successfully reduce antibiotic resistance by 90%” – a reviewer noted that 

“The main complexity and uncertainty around successful completion of this milestone arises from the native fish microbiome and whether a CRISPR delivery tool can reach the target strain in question. Due to the framing of this milestone, should a single strain be very difficult to reach, the authors could simply switch to a different target strain if necessary. Additionally, the mode of CRISPR delivery is not prescribed in reaching this milestone, so the authors have a host of different techniques open to them, including conjugative delivery by a probiotic donor or delivery by engineered bacteriophage.”

Peer Review Results

Sequential Milestones vs. Independent Outcomes

In our expected utility forecasting framework, we defined two different ways that a proposal could structure its outcomes: as sequential milestones where each additional milestone builds off of the success of the previous one, or as independent outcomes where the success of one is not dependent on the success of the other(s). For proposals with sequential milestones in our pilot, we would expect the probability of success of milestone 2 to be less than the probability of success of milestone 1 and for the opposite to be true of their impact scores. For proposals with independent outcomes, we do not expect there to be a relationship between the probability of success and the impact scores of milestones 1 and 2. There are different equations for calculating the total expected utility, depending on the relationship between outcomes (see Appendix C).

For each of the proposals in our study, we categorized them based on whether they had sequential milestones or independent outcomes. This information was not shared with reviewers. Table 1 presents the average reviewer forecasts for each proposal. In general, milestones received higher scientific impact scores than social impact scores, which makes sense given the primarily academic focus of research proposals. For proposals 1 to 3, the probability of success of milestone 2 was roughly half of the probability of success of milestone 1; reviewers also gave milestone 2 higher scientific and social impact scores than milestone 1. This is consistent with our categorization of proposals 1 to 3 as sequential milestones.

Table 1. Mean forecasts for each proposal.
See next section for discussion about the categorization of proposal 4’s milestones.
Milestone 1Milestone 2
ProposalMilestone CategoryProbability of SuccessScientific Impact ScoreSocial Impact ScoreProbability of SuccessScientific Impact ScoreSocial Impact Score
1sequential0.807.837.350.418.228.25
2sequential0.886.413.720.368.217.62
3sequential0.687.076.450.348.207.50
4?0.726.583.920.477.064.19
5independent0.557.142.370.406.662.25

Further Discussion on Designing and Categorizing Milestones

We originally categorized proposal 4’s milestones as sequential, but one reviewer gave milestone 2 a lower scientific impact score than milestone 1 and two reviewers gave it a lower social impact score. One reviewer also gave milestone 2 roughly the same probability of success as milestone 1. This suggests that proposal 4’s milestones can’t be considered strictly sequential. 

The two milestones for proposal 4 were

The reviewer who gave milestone 2 a lower scientific impact score explained: “Given the wording of the milestone, I do not believe that if the scientific milestone was achieved, it would greatly improve our understanding of the brain.” Unlike proposals 1-3, in which milestone 2 was a scaled-up or improved-upon version of milestone 1, these milestones represent fundamentally different categories of output (general-purpose tool vs specific model). Thus, despite the necessity of milestone 1’s tool for achieving milestone 2, the reviewer’s response suggests that the impact of milestone 2 was being considered separately rather than cumulatively.

Milestone Design Recommendations
Explicitly define sequential milestones
Recommendation 1

To properly address this case of sequential milestones with different types of outputs, we recommend that for all sequential milestones, latter milestones should be explicitly defined as inclusive of prior milestones. In the above example, this would imply redefining milestone 2 as “Complete milestone 1 and develop a model of the C. elegans nervous system…” This way, reviewers know to include the impact of milestone 1 in their assessment of the impact of milestone 2.

Clarify milestone category with reviewers
Recommendation 2

To help ensure that reviewers are aligned with program managers in how they interpret the proposal milestones (if they aren’t directly involved in defining milestones), we suggest that either reviewers be informed of how program managers are categorizing the proposal outputs so they can conduct their review accordingly or allow reviewers to decide the category (and thus how the total expected utility is calculated), whether individually or collectively or both.

Allow for a flexible number of milestones
Recommendation 3

We chose to use only two of the goals that proposal authors provided because we wanted to standardize the number of milestones across proposals. However, this may have provided an incomplete picture of the proposals’ goals, and thus an incomplete assessment of the proposals. We recommend that future implementations be flexible and allow the number of milestones to be determined based on each proposal’s needs. This would also help accommodate one of the reviewers’ suggestion that some milestones should be broken down into intermediary steps.

Importance of Reviewer Explanations

As one can tell from the above discussion, reviewers’ explanation of their forecasts were crucial to understanding how they interpreted the milestones. Reviewers’ explanations varied in length and detail, but the most insightful responses broke down their reasoning into detailed steps and addressed (1) ambiguities in the milestone and how they chose to interpret ambiguities if they existed, (2) the state of the scientific field and the maturity of different techniques that the authors propose to use, and (3) factors that improve the likelihood of success versus potential barriers or challenges that would need to be overcome.

Exponential Impact Scales Better Reflect the Real Distribution of Impact 

The distribution of NIH and NSF proposal peer review scores tends to be skewed such that most proposals are rated above the center of the scale and there are few proposals rated poorly. However, other markers of scientific impact, such as citations (even with all of its imperfections), tend to suggest a long tail of studies with high impact. This discrepancy suggests that traditional peer review scoring systems are not well-structured to capture the nonlinearity of scientific impact, resulting in score inflation. The aggregation of scores at the top end of the scale also means that very negative scores have a greater impact than very positive scores when averaged together, since there’s more room between the average score and the bottom end of the scale. This can generate systemic bias against more controversial or risky proposals.

In our pilot, we chose to use an exponential scale with a base of 2 for impact to better reflect the real distribution of scientific impact. Using this exponential impact scale, we conducted a survey of a small pool of academics in the life sciences about how they would rate the impact of the average funded NIH R01 grant. They responded with an average scientific impact score 5 and an average social impact score of 3, which are much lower on our scale compared to traditional peer review scores4, suggesting that the exponential scale may be beneficial for avoiding score inflation and bunching at the top. In our pilot, the distribution of scientific impact scores was centered higher than 5, but still less skewed than NIH peer review scores for significance and innovation typically are. This partially reflects the fact that proposals were expected to be funded at one to two orders of magnitude more than NIH R01 proposals are, so impact should also be greater. The distribution of social impact scores exhibits a much wider spread and lower center.

Figure 1. Distribution of Impact scores for milestone 1 (top) and 2 (bottom)

Conclusion

In summary, expected utility forecasting presents a promising approach to improving the rigor of peer review and quantitatively defining the risk-reward profile of science proposals. Our pilot study suggests that this approach can be quite user-friendly for reviewers, despite its apparent complexity. Further study into how best to integrate forecasting into panel environments, define proposal milestones, and calibrate impact scales will help refine future implementations of this approach. 

More broadly, we hope that this pilot will encourage more grantmaking institutions to experiment with innovative funding mechanisms. Reviewers in our pilot were more open-minded and quick-to-learn than one might expect and saw significant value in this unconventional approach. Perhaps this should not be so much of a surprise given that experimentation is at the heart of scientific research. 

Interested grantmakers, both public and private, and policymakers are welcome to reach out to our team if interested in learning more or receiving assistance in implementing this approach. 

Acknowledgements

Many thanks to Jordan Dworkin for being an incredible thought partner in designing the pilot and providing meticulous feedback on this report. Your efforts made this project possible!


Appendix A: Pilot Study Design

Our pilot study consisted of five proposals for life science-related Focused Research Organizations (FROs). These proposals were solicited from academic researchers by FAS as part of our advocacy for the concept of FROs. As such, these proposals were not originally intended as proposals for direct funding, and did not have as strict content requirements as traditional grant proposals typically do. Researchers were asked to submit one to two page proposals discussing (1) their research concept, (2) the motivation and its expected social and scientific impact, and (3) the rationale for why this research can not be accomplished through traditional funding channels and thus requires a FRO to be funded.

Permission was obtained from proposal authors to use their proposals in this study. We worked with proposal authors to define two milestones for each proposal that reviewers would assess: one that they felt confident that they could achieve and one that was more ambitious but that they still thought was feasible. In addition, due to the brevity of the proposals, we included an additional 1-2 pages of supplementary information and scientific context. Final drafts of the milestones and supplementary information were provided to authors to edit and approve. Because this pilot study could not provide any actual funding to proposal authors, it was not possible to solicit full length research proposals from proposal authors.

We recruited four to six reviewers for each proposal based on their subject matter expertise. Potential participants were recruited over email with a request to help review a FRO proposal related to their area of research. They were informed that the review process would be unconventional but were not informed of the study’s purpose. Participants were offered a small monetary compensation for their time.

Confirmed participants were sent instructions and materials for the review process on the same day and were asked to complete their review by the same deadline a month and a half later. Reviewers were told to assume that, if funded, each proposal would receive $50 million in funding over five years to conduct the research, consistent with the proposed model for FROs. Each proposal had two technical milestones, and reviewers were asked to answer the following questions for each milestone: 

  1. Assuming that the proposal is funded by 2025, will the milestone be achieved before 2031?
  2. What will be the average scientific impact score, as judged in 2032, of accomplishing the milestone?
  3. What will be the average social impact score, as judged in 2032, of accomplishing the milestone?

The impact scoring system was explained to reviewers as follows:

Please consider the following in determining the impact score: the current and expected long-term social or scientific impact of a funded FRO’s outputs if a funded FRO accomplishes this milestone before 2030.

The impact score we are using ranges from 1 (low) to 10 (high). It is base 2 exponential, meaning that a proposal that receives a score of 5 has double the impact of a proposal that receives a score of 4, and quadruple the impact of a proposal that receives a score of 3. In a small survey we conducted of SMEs in the life sciences, they rated the scientific and social impact of the average NIH R01 grant — a federally funded research grant that provides $1-2 million for a 3-5 year endeavor — on this scale to be 5.2 ± 1.5 and 3.1 ± 1.3, respectively. The median scores were 4.75 and 3.00, respectively.

Below is an example of how a predicted impact score distribution (left) would translate into an actual impact distribution (right). You can try it out yourself with this interactive version (in the menu bar, click Runtime > Run all) to get some further intuition on how the impact score works. Please note that this is meant solely for instructive purposes, and the interface is not designed to match Metaculus’ interface.

The choice of an exponential impact scale reflects the tendency in science for a small number of research projects to have an outsized impact. For example, studies have shown that the relationship between the number of citations for a journal article and its percentile rank scales exponentially.

Scientific impact aims to capture the extent to which a project advances the frontiers of knowledge, enables new discoveries or innovations, or enhances scientific capabilities or methods. Though each is imperfect, one could consider citations of papers, patents on tools or methods, or users of software or datasets as proxies of scientific impact. 

Social impact aims to capture the extent to which a project contributes to solving important societal problems, improving well-being, or advancing social goals. Some proxy metrics that one might use to assess a project’s social impact are the value of lives saved, the cost of illness prevented, the number of job-years of employment generated, economic output in terms of GDP, or the social return on investment. 

You may consider any or none of these proxy metrics as a part of your assessment of the impact of a FRO accomplishing this milestone.

Reviewers were asked to submit their forecasts on Metaculus’ website and to provide their reasoning in a separate Google form. For question 1, reviewers were asked to respond with a single probability. For questions 2 and 3, reviewers were asked to provide their median, 25th percentile, and 75th percentile predictions, in order to generate a probability distribution. Metaculus’ website also included information on the resolution criteria of each question, which provided guidance to reviewers on how to answer the question. Individual reviewers were blind to other reviewers’ responses until after the submission deadline, at which point the aggregated results of all of the responses were made public on Metaculus’ website. 

Additionally, in the Google form, reviewers were asked to answer a survey question about their experience: “What did you think about this review process? Did it prompt you to think about the proposal in a different way than when you normally review proposals? If so, how? What did you like about it? What did you not like? What would you change about it if you could?” 

Some participants did not complete their review. We received 19 complete reviews in the end, with each proposal receiving three to six reviews. 

Study Limitations

Our pilot study had certain limitations that should be noted. Since FAS is not a grantmaking institution, we could not completely reproduce the same types of research proposals that a grantmaking institution would receive nor the entire review process. We will highlight these differences in comparison to federal science agencies, which are our primary focus.

  1. Review Process: There are typically two phases to peer review at NIH and NSF. First, at least three individual reviewers with relevant subject matter expertise are assigned to read and evaluate a proposal independently. Then, a larger committee of experts is convened. There, the assigned reviewers present the proposal and their evaluation, and then the committee discusses and determines the final score for the proposal. Our pilot study only attempted to replicate the first phase of individual review.
  1. Sample Size: In our pilot, the sample size was quite small, since only five proposals were reviewed, and they were all in different subfields, so different reviewers were assigned to each proposal. NIH and NSF peer review committees typically focus on one subfield and review on the order of twenty or so proposals. The number of reviewers per proposal–three to six–in our pilot was consistent with the number of reviewers typically assigned to a proposal by NIH and NSF. Peer review committees are typically larger, ranging from six to twenty people, depending on the agency and the field.
  1. Proposals: The FRO proposals plus supplementary information were only two to four pages long, which is significantly shorter than the 12 to 15 page proposals that researchers submit for NIH and NSF grants. Proposal authors were asked to generally describe their research concept, but were not explicitly required to describe the details of the research methodology they would use or any preliminary research. Some proposal authors volunteered more information on this for the supplementary information, but not all authors did. 
  1. Grant Size: For the FRO proposals, reviewers were asked to assume that funded proposals would receive $50 million over five years, which is one to two orders of magnitude more funding than typical NIH and NSF proposals.

Appendix B: Feedback on Study-Specific Implementation

In addition to feedback about the review framework, we received feedback on how we implemented our pilot study, specifically the instructions and materials for the review process and the submission platforms. This feedback isn’t central to this paper’s investigation of expected value forecasting, but we wanted to include it in the appendix for transparency.

Reviewers were sent instructions over email that outlined the review process and linked to Metaculus’ webpage for this pilot. On Metaculus’ website, reviewers could find links to the proposals on FAS’ website and the supplementary information in Google docs. Reviewers were expected to read those first and then read through the resolution criteria for each forecasting question before submitting their answers on Metaculus’ platform. Reviewers were asked to submit the explanations behind their forecasts in a separate Google form.

Some reviewers had no problem navigating the review process and found Metaculus’ website easy to use. However, feedback from other reviewers suggested that the different components necessary for the review were spread out over too many different websites, making it difficult for reviewers to keep track of where to find everything they needed.

Some had trouble locating the different materials and pieces of information needed to conduct the review on Metaculus’ website. Others found it confusing to have to submit their forecasts and explanations in two separate places. One reviewer suggested that the explanation of the impact scoring system should have been included within the instructions sent over email rather than in the resolution criteria on Metaculus’ website so that they could have read it before reading the proposal. Another reviewer suggested that it would have been simpler to submit their forecasts through the same Google form that they used to submit their explanations rather than through Metaculus’ website. 

Based on this feedback, we would recommend that future implementations streamline their submission process to a single platform and provide a more extensive set of instructions rather than seeding information across different steps of the review process. Training sessions, which science funding agencies typically conduct, would be a good supplement to written instructions.

Appendix C: Total Expected Utility Calculations

To calculate the total expected utility, we first converted all of the impact scores into utility by taking two to the exponential of the impact score, since the impact scoring system is base 2 exponential:

Utility=2Impact Score.

We then were able to average the utilities for each milestone and conduct additional calculations. 

To calculate the total utility of each milestone, ui, we averaged the social utility and the scientific utility of the milestone:

ui = (Social Utility + Scientific Utility)/2.

The total expected utility (TEU) of a proposal with two milestones can be calculated according to the general equation:

TEU = u1P(m1 ∩ not m2) + u2P(m2 ∩ not m1) + (u1+u2)P(m1m2),

where P(mi) represents the probability of success of milestone i and

P(m1 ∩ not m2) = P(m1) – P(m1 ∩ m2)
P(m2 ∩ not m1) = P(m2) – P(m1 ∩ m2).

For sequential milestones, milestone 2 is defined as inclusive of milestone 1 and wholly dependent on the success of milestone 1, so this means that

u2, seq = u1+u2
P(m2) = Pseq(m1 ∩ m2)
P(m2 ∩ not m1) = 0.

Thus, the total expected utility of sequential milestones can be simplified as

TEU = u1P(m1)-u1P(m2) + (u2, seq)P(m2)
TEU = u1P(m1) + (u2, seq-u1)P(m2)

This can be generalized to

TEUseq = Σi(ui, seq-ui-1, seq)P(mi).

Otherwise, the total expected utility can be simplified to 

TEU = u1P(m1) + u2P(m2) – (u1+u2)P(m1 ∩ m2).

For independent outcomes, we assume 

Pind(m1 ∩ m2) = P(m1)P(m2), 

so

TEUind = u1P(m1) + u2P(m2) – (u1+u2)P(m1)P(m2).

To present the results in Tables 1 and 2, we converted all of the utility values back into the impact score scale by taking the log base 2 of the results.

Scaling AI Safely: Can Preparedness Frameworks Pull Their Weight?

A new class of risk mitigation policies has recently come into vogue for frontier AI developers. Known alternately as Responsible Scaling Policies or Preparedness Frameworks, these policies outline commitments to risk mitigations that developers of the most advanced AI models will implement as their models display increasingly risky capabilities. While the idea for these policies is less than a year old, already two of the most advanced AI developers, Anthropic and OpenAI, have published initial versions of these policies. The U.K. AI Safety Institute asked frontier AI developers about their “Responsible Capability Scaling” policies ahead of the November 2023 UK AI Safety Summit. It seems that these policies are here to stay.

The National Institute of Standards & Technology (NIST) recently sought public input on its assignments regarding generative AI risk management, AI evaluation, and red-teaming. The Federation of American Scientists was happy to provide input; this is the full text of our response. NIST’s request for information (RFI) highlighted several potential risks and impacts of potentially dual-use foundation models, including: “Negative effects of system interaction and tool use…chemical, biological, radiological, and nuclear (CBRN) risks…[e]nhancing or otherwise affecting malign cyber actors’ capabilities…[and i]mpacts to individuals and society.” This RFI presented a good opportunity for us to discuss the benefits and drawbacks of these new risk mitigation policies.

This report will provide some background on this class of risk mitigation policies (we use the term Preparedness Framework, for reasons to be described below). We outline suggested criteria for robust Preparedness Frameworks (PFs) and evaluate two key documents, Anthropic’s Responsible Scaling Policy and OpenAI’s Preparedness Framework, against these criteria. We claim that these policies are net-positive and should be encouraged. At the same time, we identify shortcomings of current PFs, chiefly that they are underspecified, insufficiently conservative, and address structural risks poorly. Improvement in the state of the art of risk evaluation for frontier AI models is a prerequisite for a meaningfully binding PF. Most importantly, PFs, as unilateral commitments by private actors, cannot replace public policy.

Motivation for Preparedness Frameworks

As AI labs develop potentially dual-use foundation models (as defined by Executive Order No. 14110, the “AI EO”) with capability, compute, and efficiency improvements, novel risks may emerge, some of them potentially catastrophic. Today’s foundation models can already cause harm and pose some risks, especially as they are more broadly used. Advanced large language models at times display unpredictable behaviors

To this point, these harms have not risen to the level of posing catastrophic risks, defined here broadly as “devastating consequences for vast numbers of people.” The capabilities of models at the current state of the art simply do not imply levels of catastrophic risk above current non-AI related margins.1 However, as these models continue to scale in training compute, some speculate they may develop novel capabilities that could potentially be misused. The specific capabilities that will emerge from further scaling remain difficult to predict with confidence or certainty. Some analysis indicates that as training compute for AI models has doubled approximately every six months since 2015, performance on capability benchmarks has also steadily improved. While it’s possible that bigger models could lead to better performance, it wouldn’t be surprising if smaller models emerge with better capabilities, as despite years of research by machine learning theorists, our knowledge of just how the number of model parameters relates to model capabilities remains uncertain. 

Nonetheless, as capabilities increase, risks may also increase, and new risks may appear. Executive Order 14110 (the Executive Order on Artificial Intelligence, or the “AI EO”) detailed some novel risks of potentially dual-use foundation models, including potential risks associated with chemical, biological, radiological, or nuclear (CBRN) risks and advanced cybersecurity risks. Other risks are more speculative, such as risks of model autonomy, loss of control of AI systems, or negative impacts on users including risks of persuasion.2 Without robust risk mitigations, it is plausible that increasingly powerful AI systems will eventually pose greater societal risks.

Other technologies that pose catastrophic risks, such as nuclear technologies, are heavily regulated in order to prevent those risks from resulting in serious harms. There is a growing movement to regulate development of potentially dual-use biotechnologies, particularly gain-of-function research on the most pathogenic microbes. Given the rapid pace of progress at the AI frontier, comprehensive government regulation has yet to catch up; private companies that develop these models are starting to take it upon themselves to prevent or mitigate the risks of advanced AI development.

Prevention of such novel and consequential risks requires developers to implement policies that address potential risks iteratively. That is where preparedness frameworks come in. A preparedness framework is used to assess risk levels across key categories and outline associated risk mitigations. As the introduction to OpenAI’s PF states, “The processes laid out in each version of the Preparedness Framework will help us rapidly improve our understanding of the science and empirical texture of catastrophic risk, and establish the processes needed to protect against unsafe development.” Without such processes and commitments, the tendency to prioritize speed over safety concerns might prevail. While the exact consequences of failing to mitigate these risks are uncertain, they could potentially be significant.

Preparedness frameworks are limited in scope to catastrophic risks. These policies aim to prevent the worst conceivable outcomes of the development of future advanced AI systems; they are not intended to cover risks from existing systems. We acknowledge that this is an important limitation of preparedness frameworks. Developers can and should address both today’s risks and future risks at the same time; preparedness frameworks attempt to address the latter, while other “trustworthy AI” policies attempt to address a broader swathe of risks. For instance, OpenAI’s “Preparedness” team sits alongside its “Safety Systems” team, which “focuses on mitigating misuse of current models and products like ChatGPT.”

A note about terminology: The term “Responsible Scaling Policy” (RSP) is the term that took hold first, but it presupposes scaling of compute and capabilities by default. “Preparedness Framework” (PF) is a term coined by OpenAI, and it communicates the idea that the company needs to be prepared as its models approach the level of artificial general intelligence. Of the two options, “Preparedness Framework” communicates the essential idea more clearly: developers of potentially dual-use foundation models must be prepared for and mitigate potential catastrophic risks from development of these models.

The Industry Landscape

In September of 2023, ARC Evals (now METR, “Model Evaluation & Threat Research”) published a blog post titled “Responsible Scaling Policies (RSPs).” This post outlined the motivation and basic structure of an RSP, and revealed that ARC Evals had helped Anthropic write its RSP (version 1.0) which had been released publicly a few days prior. (ARC Evals had also run pre-deployment evaluations on Anthropic’s Claude model and OpenAI’s GPT-4.) And in December 2023, OpenAI published its Preparedness Framework in beta; while using new terminology, this document is structurally similar to ARC Evals’ outline of the structure of an RSP. Both OpenAI and Anthropic have indicated that they plan to update their PFs with new information as the frontier of AI development advances.

Not every AI company should develop or maintain a preparedness framework. Since these policies relate to catastrophic risk from models with advanced capabilities, only those developers whose models could plausibly attain those capabilities should use PFs. Because these advanced capabilities are associated with high levels of training compute, a good interim threshold for who should develop a PF could be the same as the AI EO threshold for potentially dual-use foundation models; that is, developers of models trained on over 10^26 FLOPS (or October 2023-equivalent level of compute adjusted for compute efficiency gains).3 Currently, only a handful of developers have models that even approach this threshold. This threshold should be subject to change, like that of the AI EO, as developers continue to push the frontier (e.g. by developing more efficient algorithms or realizing other compute efficiency gains).

While several other companies published “Responsible Capability Scaling” documents ahead of the UK AI Safety Summit, including DeepMind, Meta, Microsoft, Amazon, and Inflection AI, the rest of this report focuses primarily on OpenAI’s PF and Anthropic’s RSP. 

Weaknesses of Preparedness Frameworks

Preparedness frameworks are not panaceas for AI-associated risks. Even with improvements in specificity, transparency, and strengthened risk mitigations, there are important weaknesses to the use of PFs. Here we outline a couple weaknesses of PFs and possible answers to them.

1. Spirit vs. text: PFs are voluntary commitments whose success depends on developers’ faithfulness to their principles.

Current risk thresholds and mitigations are defined loosely. In Anthropic’s RSP, for instance, the jump from the current risk level posed by Claude 2 (its state of the art model) to the next risk level is defined in part by the following: “Access to the model would substantially increase the risk of catastrophic misuse, either by proliferating capabilities, lowering costs, or enabling new methods of attack….” A “substantial increase” is not well-defined. This ambiguity leaves room for interpretation; since implementing risk mitigations can be costly, developers could have an incentive to take advantage of such ambiguity if they do not follow the spirit of the policy.

This concern about the gap between following the spirit of the PF and following the text might be somewhat eased with more specificity about risk thresholds and associated mitigations, and especially with more transparency and public accountability to these commitments.

To their credit, OpenAI’s PF and Anthropic’s RSP show a serious approach to the risks of developing increasingly advanced AI systems. OpenAI’s PF includes a commitment to fine-tune its models to better elicit capabilities along particular risk categories, then evaluate “against these enhanced models to ensure we are testing against the ‘worst case’ scenario we know of.” They also commit to triggering risk mitigations “when any of the tracked risk categories increase in severity, rather than only when they all increase together.” And Anthropic “commit[s] to pause the scaling and/or delay the deployment of new models whenever our scaling ability outstrips our ability to comply with the safety procedures for the corresponding ASL [AI Safety Level].” These commitments are costly signals that these developers are serious about their PFs.

2. Private commitment vs. public policy: PFs are unilateral commitments that individual developers take on; we might prefer more universal policy (or regulatory) approaches.

Private companies developing AI systems may not fully account for broader societal risks. Consider an analogy to climate change—no single company’s emissions are solely responsible for risks like sea level rise or extreme weather. The risk comes from the aggregate emissions of all companies. Similarly, AI developers may not consider how their systems interact with others across society, potentially creating structural risks. Like climate change, the societal risks from AI will likely come from the cumulative impact of many different systems. Unilateral commitments are poor tools to address such risks.

Furthermore, PFs might reduce the urgency for government intervention. By appearing safety-conscious, developers could diminish the perceived need for regulatory measures. Policymakers might over-rely on self-regulation by AI developers, potentially compromising public interest for private gains.

Policy can and should step into the gap left by PFs. Policy is more aligned to the public good, and as such is less subject to competing incentives. And policy can be enforced, unlike voluntary commitments. In general, preparedness frameworks and similar policies help hold private actors accountable to their public commitments; this effect is stronger with more specificity in defining risk thresholds, better evaluation methods, and more transparency in reporting. However, these policies cannot and should not replace government action to reduce catastrophic risks (especially structural risks) of frontier AI systems.

Suggested Criteria for Robust Preparedness Frameworks

These criteria are adapted from the ARC Evals post, Anthropic’s RSP, and OpenAI’s PF. Broadly, they are aspirational; no existing preparedness framework meets all or most of these criteria.

For each criterion, we explain the key considerations for developers adopting PFs. We analyze OpenAI’s PF and Anthropic’s RSP to illustrate the strengths and shortcomings of their approaches. Again, these policies are net-positive and should be encouraged. They demonstrate costly unilateral commitments to measuring and addressing catastrophic risk from their models; they meaningfully improve on the status quo. However, these initial PFs are underspecified and insufficiently conservative. Improvement in the state of the art of risk evaluation and mitigation, and subsequent updates, would make them more robust.

Suggested Criteria for Robust Preparedness Frameworks
Table 1: Summary of suggested criteria for robust preparedness frameworks.
BreadthPreparedness frameworks should cover the breadth of potential catastrophic risks of developing frontier AI models.“What risks are covered?”
Risk appetitePreparedness frameworks should define the developer’s acceptable risk level (“risk appetite”) in terms of likelihood and severity of risk.“What is an acceptable level of risk?”
ClarityPreparedness frameworks should clearly define capability levels and risk thresholds.“How will developers know they have hit capability levels associated with particular risks?”
EvaluationPreparedness frameworks should include detailed evaluation procedures for AI models, ensuring comprehensive risk assessment.“What tests will developers run on their models?”
MitigationFor different risk thresholds, preparedness frameworks should identify and commit to pre-specified risk mitigations.“What will developers do when their models reach particular levels of risk?”
RobustnessPreparedness frameworks’ pre-specified risk mitigations must effectively address potentially catastrophic risks.“How do developers know their risk mitigations will work?”
AccountabilityPreparedness frameworks should combine credible risk mitigation commitments with governance structures that ensure these commitments are fulfilled.“How can developers hold themselves accountable to their commitment to safety?”
AmendmentsPreparedness frameworks should include a mechanism for regular updates to the framework itself, in light of ongoing research and advances in AI.“How will developers change their PFs over time?”
TransparencyFor models with risk above the lowest level, both pre- and post-mitigation evaluation results and methods should be public, including any performed mitigations.“How will developers communicate about their models’ capabilities and risks?”

1. Preparedness frameworks should cover the breadth of potential catastrophic risks of developing frontier AI models. 

These risks may include:

Preparedness frameworks should apply to catastrophic risks in particular because they govern the scaling of capabilities of the most advanced AI models, and because catastrophic risks are of the highest consequence to such development. PFs are one tool among many that developers of the most advanced AI models should use to prevent harm. Developers of advanced AI models tend to also have other “trustworthy AI” policies, which seek to prevent and address already-existing risks such as harmful outputs, disinformation, and synthetic sexual content. Despite PFs’ focus on potentially catastrophic risks, faithfully applying PFs may help developers catch many other kinds of risks as well, since they involve extensive evaluation for misuse potential and adverse human impacts.

2. Preparedness frameworks should define the developer’s acceptable risk level (“risk appetite”) in terms of likelihood and severity of risk, in accordance with the NIST AI Risk Management Framework, section Map 1.5.

Neither OpenAI nor Anthropic has publicly declared their risk appetite. This is a nascent field of research, as these risks are novel and perhaps less predictable than eg. nuclear accident risk.5 NIST and other standard-setting bodies will be crucial in developing AI risk metrology. For now, PFs should state developers’ risk appetites as clearly as possible, and update them regularly with research advances.6

AI developers’ risk appetites might be different than a regulatory risk appetite. Developers should elucidate their risk appetite in quantitative terms so their PFs can be evaluated accordingly. As in the case of nuclear technology, regulators may eventually impose risk thresholds on frontier AI developers. At this point, however, there is no standard, scientifically-grounded approach to measuring the potential for catastrophic AI risk; this has to start with the developers of the most capable AI models.

3. Preparedness frameworks should clearly define capability levels and risk thresholds. Risk thresholds should be quantified robustly enough to hold developers accountable to their commitments.

OpenAI and Anthropic both outline qualitative risk thresholds corresponding with different categories of risk. For instance, in OpenAI’s PF, the High risk threshold in the CBRN category reads: “​​Model enables an expert to develop a novel threat vector OR model provides meaningfully improved assistance that enables anyone with basic training in a relevant field (e.g., introductory undergraduate biology course) to be able to create a CBRN threat.” And Anthropic’s RSP defines the ASL-3 [AI Safety Level] threshold as: “Low-level autonomous capabilities, or access to the model would substantially increase the risk of catastrophic misuse, either by proliferating capabilities, lowering costs, or enabling new methods of attack, as compared to a non-LLM baseline of risk.”

These qualitative thresholds are under-specified; reasonable people are likely to differ on what “meaningfully improved assistance” looks like, or a “substantial increase [in] the risk of catastrophic misuse.” In PFs, these thresholds should be quantified to the extent possible.

To be sure, the AI development research community currently lacks a good empirical understanding of the likelihood or quantification of frontier AI-related risks. Again, this is a novel science that needs to be developed with input from both the private and public sectors. Since this science is still developing, it is natural to want to avoid too much quantification. A conceivable failure mode is that developers “check the boxes,” which may become obsolete quickly, in lieu of using their judgment to determine when capabilities are dangerous enough to warrant higher risk mitigations. Again, as research improves, we should expect to see improvements in PFs’ specification of risk thresholds.

4. Preparedness frameworks should include detailed evaluation procedures for AI models, ensuring comprehensive risk assessment within a developer’s tolerance. 

Anthropic and OpenAI both have room for improvement on detailing their evaluation procedures. Anthropic’s RSP includes evaluation procedures for model autonomy and misuse risks. Its evaluation procedures for model autonomy are impressively detailed, including clearly defined tasks on which it will evaluate its models. Its evaluation procedures for misuse risk are much less well-defined, though it does include the following note: “We stress that this will be hard and require iteration. There are fundamental uncertainties and disagreements about every layer…It will take time, consultation with experts, and continual updating.” And OpenAI’s PF includes a “Model Scorecard,” a mock evaluation of an advanced AI model. This model scorecard includes the hypothetical results of various evaluations in all four of their tracked risk categories; it does not appear to be a comprehensive list of evaluation procedures.

Again, the science of AI model evaluation is young. The AI EO directs NIST to develop red-teaming guidance for developers of potentially dual-use foundation models. NIST, along with private actors such as METR and other AI evaluators, will play a crucial role in creating and testing red-teaming practices and model evaluations that elicit all relevant capabilities.

5. For different risk thresholds, preparedness frameworks should identify and commit to pre-specified risk mitigations.

Classes of risk mitigations may include:

Both OpenAI’s PF and Anthropic’s RSP commit to a number of pre-specified risk mitigations for different thresholds. For example, for what Anthropic calls “ASL-2” models (including its most advanced model, Claude 2), they commit to measures including publishing model cards, providing a vulnerability reporting mechanism, enforcing an acceptable use policy, and more. Models at higher risk thresholds (what Anthropic calls “ASL-3” and above) have different, more stringent risk mitigations, including “limit[ing] access to training techniques and model hyperparameters…” and “implement[ing] measures designed to harden our security…”

Risk mitigations can and should differ in approaches to development versus deployment. There are different levels of risk associated with possessing models internally and allowing external actors to interact with them. Both OpenAI’s PF and Anthropic’s RSP include different risk mitigation approaches for development and deployment. For example, OpenAI’s PF restricts deployment of models such that “Only models with a post-mitigation score of “medium” or below can be deployed,” whereas it restricts development of models such that “Only models with a post-mitigation score of “high” or below can be developed further.”

Mitigations should be defined as specifically as possible, with the understanding that as the state of the art changes, this too is an area that will require periodic updates. Developers should include some room for judgment here.

6. Preparedness frameworks’ pre-specified risk mitigations must effectively address potentially catastrophic risks.

Having confidence that the risk mitigations do in fact address potential catastrophic risks is perhaps the most important and difficult aspect of a PF to evaluate. Catastrophic risk from AI is a novel and speculative field; evaluating AI capabilities is a science in its infancy; and there are no empirical studies of the effectiveness of risk mitigations preventing such risks. Given this uncertainty, frontier AI developers should err on the side of caution.

Both OpenAI and Anthropic should be more conservative in their risk mitigations. Consider OpenAI’s commitment to restricting development: “[I]f we reach (or are forecasted to reach) ‘critical’ pre-mitigation risk along any risk category, we commit to ensuring there are sufficient mitigations in place…for the overall post-mitigation risk to be back at most to ‘high’ level.” To understand this commitment, we have to look at their threshold definitions. Under the Model Autonomy category, the “critical” threshold in part includes: “model can self-exfiltrate under current prevailing security.” Setting aside that this threshold is still quite vague and difficult to evaluate (and setting aside the novelty of this capability), a model that approaches or exceeds this threshold by definition can self-exfiltrate, rendering all other risk mitigations ineffective. A more robust approach to restricting development would not permit training or possessing a model that comes close to exceeding this threshold.

As for Anthropic, consider their threshold for “ASL-3,” which reads in part: “Access to the model would substantially increase the risk of catastrophic misuse…” The risk mitigations for ASL-3 models include the following: “Harden security such that non-state attackers are unlikely to be able to steal model weights and advanced threat actors (e.g. states) cannot steal them without significant expense.” While an admirable approach to development of potentially dual-use foundation models, assuming state actors seek out tools whose misuse involves catastrophic risk, a more conservative mitigation would entail hardening security such that it is unlikely that any actor, state or non-state, could steal the model weights of such a model.9

7. Preparedness frameworks should combine credible risk mitigation commitments with governance structures that ensure these commitments are fulfilled.

Preparedness Frameworks should detail governance structures that incentivize actually undertaking pre-committed risk mitigations when thresholds are met. Other incentives, including profit and shareholder value, sometimes conflict with risk management.

Anthropic’s RSP includes a number of procedural commitments meant to enhance the credibility of its risk mitigation commitments. For example, Anthropic commits to proactively planning to pause scaling of its models,10 publicly sharing evaluation results, and appointing a “Responsible Scaling Officer.” However, Anthropic’s RSP also includes the following clause: “[I]n a situation of extreme emergency, such as when a clearly bad actor (such as a rogue state) is scaling in so reckless a manner that it is likely to lead to lead to imminent global catastrophe if not stopped…we could envisage a substantial loosening of these restrictions as an emergency response…” This clause potentially undermines the credibility of Anthropic’s other commitments in the RSP, if at any time it can point to another actor who in its view is scaling recklessly.

OpenAI’s PF also outlines commendable governance measures, including procedural commitments, meant to enhance its risk mitigation credibility. It summarizes its operation structure: “(1) [T]here is a dedicated team “on the ground” focused on preparedness research and monitoring (Preparedness team), (2) there is an advisory group (Safety Advisory Group) that has a sufficient diversity of perspectives and technical expertise to provide nuanced input and recommendations, and (3) there is a final decision-maker (OpenAI Leadership, with the option for the OpenAI Board of Directors to overrule).” 

8. Preparedness frameworks should include a mechanism for regular updates to the framework itself, in light of ongoing research and advances in AI.

Both OpenAI’s PF and Anthropic’s RSP acknowledge the importance of regular updates. This is reflected in both of these documents’ names: Anthropic labels its RSP as “Version 1.0,” while OpenAI’s PF is labeled as “(Beta).”

Anthropic’s RSP includes an “Update Process” that reads in part: “We expect most updates to this process to be incremental…as we learn more about model safety features or unexpected capabilities…” This language directly commits Anthropic to changing its RSP as the state of the art changes. OpenAI references updates throughout its PF, notably committing to updating its evaluation methods and rubrics (“The Scorecard will be regularly updated by the Preparedness team to help ensure it reflects the latest research and findings”).

9. For models with risk above the lowest level, most evaluation results and methods should be public, including any performed mitigations

Publishing model evaluations and mitigations is an important tool for holding developers accountable to their PF commitments. Sensitivity about the level of transparency is key. For example, full information about evaluation methodology and risk mitigations could be exploited by malicious actors. Anthropic’s RSP takes a balanced approach in committing to “[p]ublicly share evaluation results after model deployment where possible, in some cases in the initial model card, in other cases with a delay if it serves a broad safety interest.” OpenAI’s PF does not commit to publishing its Model Scorecards, but OpenAI has since published related research on whether its models aid the creation of biological threats.

Conclusion

Preparedness frameworks represent a promising approach for AI developers to voluntarily commit to robust risk management practices. However, current versions have weaknesses—particularly their lack of specificity in risk thresholds, insufficiently conservative risk mitigation approaches, and inadequacy in addressing structural risks. Frontier AI developers without PFs should consider adopting them, and OpenAI and Anthropic should update their policies to strengthen risk mitigations and include more specificity.

Strengthening preparedness frameworks will require advancing AI safety science to enable precise risk quantification and develop new mitigations. NIST, academics, and companies plan to collaborate to measure and model frontier AI risks. Policymakers have a crucial opportunity to adapt regulatory approaches from other high-risk technologies like nuclear power to balance AI innovation and catastrophic risk prevention. Furthermore, standards bodies could develop more robust AI evaluations best practices, including guidance for third-party auditors.

Overall the AI community must view safety as an intrinsic priority, not just private actors creating preparedness frameworks. All stakeholders, including private companies, academics, policymakers and civil society organizations have roles to play in steering AI development toward societally beneficial outcomes. Preparedness frameworks are one tool, but not sufficient absent more comprehensive, multi-stakeholder efforts to scale AI safely and for the public good.

Many thanks to Madeleine Chang, Di Cooke, Thomas Woodside, and Felipe Calero Forero for providing helpful feedback.

Working with academics: A primer for U.S. government agencies

Collaboration between federal agencies and academic researchers is an important tool for public policy. By facilitating the exchange of knowledge, ideas, and talent, these partnerships can help address pressing societal challenges. But because it is rarely in either party’s job description to conduct outreach and build relationships with the other, many important dynamics are often hidden from view. This primer provides an initial set of questions and topics for agencies to consider when exploring academic partnership.

Why should agencies consider working with academics?

What considerations may arise when working with academics?

Table (Of Contents)
Characteristics of discussed collaborative structures
StructurePrimary needPotential mechanismsStructural complexityLevel of effort
Informal advisingKnowledge >> CapacityAd-hoc engagement; formal consulting agreementLowOccasional work, over the short- to long-term
Study groupsKnowledge > CapacityInformal working group; formal extramural awardModerateOccasional to part-time work, over the short- to medium-term
Collaborative researchCapacity ~= KnowledgeInformal research partnership, formal grant, or cooperative agreement / contractVariablePart-time work, over the medium- to long-term
Short-term placementsCapacity > KnowledgeIPA, OPM Schedule A(r), or expert contract; either ad-hoc or through a formal programModeratePart- to full-time work, over a short- to medium-term
Long-term rotationsCapacity >> KnowledgeIPA, OPM Schedule A(r), or SGE designation; typically through a formal programHighFull-time work, over a medium- to long-term
BOX 1. Key academic considerations
Academic career stages.

Academic faculty progress through different stages of professorship — typically assistant, associate, and full — that affect their research and teaching expectations and opportunities. Assistant professors are tenure-track faculty who need to secure funding, publish papers, and meet the standards for tenure. Associate professors have job security and academic freedom, but also more mentoring and leadership responsibilities; associate professors are typically tenured, though this is not always the case. Full professors are senior faculty who have a high reputation and recognition in their field, but also more demands for service and supervision. The nature of agency-academic collaboration may depend on the seniority of the academic. For example, junior faculty may be more available to work with agencies, but primarily in contexts that will lead to traditional academic outputs; while senior faculty may be more selective, but their academic freedom will allow for less formal and more impact-oriented work.

Soft vs. hard money positions.

Soft money positions are those that depend largely or entirely on external funding sources, typically research grants, to support the salary and expenses of the faculty. Hard money positions are those that are supported by the academic institution’s central funds, typically tied to more explicit (and more expansive) expectations for teaching and service than soft-money positions. Faculty in soft money positions may face more pressure to secure funding for research, while faculty in hard money positions may have more autonomy in their research agenda but more competing academic activities. Federal agencies should be aware of the funding situation of the academic faculty they collaborate with, as it may affect their incentives and expectations for agency engagement.

Sabbatical credits.

A sabbatical is a period of leave from regular academic duties, usually for one or two semesters, that allows faculty to pursue an intensive and unstructured scope of work — this can include research in their own field or others, as well as external engagements or tours of service with non-academic institutions . Faculty accrue sabbatical credits based on their length and type of service at the university, and may apply for a sabbatical once they have enough credits. The amount of salary received during a sabbatical depends on the number of credits and the duration of the leave. Federal agencies may benefit from collaborating with academic faculty who are on sabbatical, as they may have more time and interest to devote to impact-focused work.

Consulting/outside activity limits.

Consulting limits & outside activity limits are policies that regulate the amount of time that academic faculty can spend on professional activities outside their university employment. These policies are intended to prevent conflicts of commitment or interest that may interfere with the faculty’s primary obligations to the university, such as teaching, research, and service, and the specific limits vary by university. Federal agencies may need to consider these limits when engaging academic faculty in ongoing or high-commitment collaborations.

9 vs. 12 month salaries.

Some academic faculty are paid on a 9-month basis, meaning that they receive their annual salary over nine months and have the option to supplement their income with external funding or other activities during the summer months. Other faculty are paid on a 12-month basis, meaning that they receive their annual salary over twelve months and have less flexibility to pursue outside opportunities. Federal agencies may need to consider the salary structure of the academic faculty they work with, as it may affect their availability to engage on projects and the optimal timing with which they can do so.

Advisory relationships consist of an academic providing occasional or periodic guidance to a federal agency on a specific topic or issue, without being formally contracted or compensated. This type of collaboration can be useful for agencies that need access to cutting-edge expertise or perspectives, but do not have a formal deliverable in mind.

Academic considerations

Regulatory & structural considerations

Box 2. Key structural considerations
Regulatory guidance.

Federal agencies and academic institutions are subject to various laws and regulations that affect their research collaboration, and the ownership and use of the research outputs. Key legislation includes the Federal Advisory Committee Act (FACA), which governs advisory committees and ensures transparency and accountability; the Federal Acquisition Regulation (FAR), which controls the acquisition of supplies and services with appropriated funds; and the Federal Grant and Cooperative Agreement Act (FGCAA), which provides criteria for distinguishing between grants, cooperative agreements, and contracts. Agencies should ensure that collaborations are structured in accordance with these and other laws.

Contracting mechanisms.

Federal agencies may use various contracting mechanisms to engage researchers from non-federal entities in collaborative roles. These mechanisms include the IPA Mobility Program, which allows the temporary assignment of personnel between federal and non-federal organizations; the Experts & Consultants authority, which allows the appointment of qualified experts and consultants to positions that require only intermittent and/or temporary employment; and Cooperative Research and Development Agreements (CRADAs), which allow agencies to enter into collaborative agreements with non-federal partners to conduct research and development projects of mutual interest.

University Office of Sponsored Programs.

Offices of Sponsored Programs are units within universities that provide administrative support and oversight for externally funded research projects. OSPs are responsible for reviewing and approving proposals, negotiating and accepting awards, ensuring compliance with sponsor and university policies and regulations, and managing post-award activities such as reporting, invoicing, and auditing. Federal agencies typically interact with OSPs as the authorized representative of the university in matters related to sponsored research.

Non-disclosure agreements.

When engaging with academics, federal agencies may use NDAs to safeguard sensitive information. Agencies each have their own rules and procedures for using and enforcing NDAs involving their grantees and contractors. These rules and procedures vary, but generally require researchers to sign an NDA outlining rights and obligations relating to classified information, data, and research findings shared during collaborations.

A study group is a type of collaboration where an academic participates in a group of experts convened by a federal agency to conduct analysis or education on a specific topic or issue. The study group may produce a report or hold meetings to present their findings to the agency or other stakeholders. This type of collaboration can be useful for agencies that need to gather evidence or insights from multiple sources and disciplines with expertise relevant to their work.

Academic considerations

Regulatory & structural considerations

Case study

In 2022, the National Science Foundation (NSF) awarded the National Bureau of Economic Research (NBER) a grant to create the EAGER: Place-Based Innovation Policy Study Group. This group, led by two economists with expertise in entrepreneurship, innovation, and regional development — Jorge Guzman from Columbia University and Scott Stern from MIT — aimed to provide “timely insight for the NSF Regional Innovation Engines program.” During Fall 2022, the group met regularly with NSF staff to i) provide an assessment of the “state of knowledge” of place-based innovation ecosystems, ii) identify the insights of this research to inform NSF staff on design of their policies, and iii) surface potential means by which to measure and evaluate place-based innovation ecosystems on a rigorous and ongoing basis. Several of the academic leads then completed a paper synthesizing the opportunities and design considerations of the regional innovation engine model, based on the collaborative exploration and insights developed throughout the year. In this case, the study group was structured as a grant, with funding provided to the organizing institution (NBER) for personnel and convening costs. Yet other approaches are possible; for example, NSF recently launched a broader study group with the Institute for Progress, which is structured as a no-cost Other Transaction Authority contract.

Active collaboration covers scenarios in which an academic engages in joint research with a federal agency, either as a co-investigator, a subrecipient, a contractor, or a consultant. This type of collaboration can be useful for agencies that need to leverage the expertise, facilities, data, or networks of academics to conduct research that advances their mission, goals, or priorities.

Academic considerations

Regulatory & structural considerations

Case studies

External collaboration between academic researchers and government agencies has repeatedly proven fruitful for both parties. For example, in May 2020, the Rhode Island Department of Health partnered with researchers at Brown University’s Policy Lab to conduct a randomized controlled trial evaluating the effectiveness of different letter designs in encouraging COVID-19 testing. This study identified design principles that improved uptake of testing by 25–60% without increasing cost, and led to follow-on collaborations between the institutions. The North Carolina Office of Strategic Partnerships provides a prime example of how government agencies can take steps to facilitate these collaborations. The office recently launched the North Carolina Project Portal, which serves as a platform for the agency to share their research needs, and for external partners — including academics — to express interest in collaborating. Researchers are encouraged to contact the relevant project leads, who then assess interested parties on their expertise and capacity, extend an offer for a formal research partnership, and initiate the project.

Short-term placements allow for an academic researcher to work at a federal agency for a limited period of time (typically one year or less), either as a fellow, a scholar, a detailee, or a special government employee. This type of collaboration can be useful for agencies that need to fill temporary gaps in expertise, capacity, or leadership, or to foster cross-sector exchange and learning.

Academic considerations

Regulatory & structural considerations

Case studies

Various programs exist throughout government to facilitate short-term rotations of outside experts into federal agencies and offices. One of the most well-known examples is the American Association for the Advancement of Science (AAAS) Science & Technology Policy Fellowship (STPF) program, which places scientists and engineers from various disciplines and career stages in federal agencies for one year to apply their scientific knowledge and skills to inform policy making and implementation. The Schedule A(r) hiring authority tends to be well-suited for these kinds of fellowships; it is used, for example, by the Bureau of Economic Analysis to bring on early career fellows through the American Economic Association’s Summer Economics Fellows Program. In some circumstances, outside experts are brought into government “on loan” from their home institution to do a tour of service in a federal office or agency; in these cases, the IPA program can be a useful mechanism. IPAs are used by the National Science Foundation (NSF) in its Rotator Program, which brings outside scientists into the agency to serve as temporary Program Directors and bring cutting-edge knowledge to the agency’s grantmaking and priority-setting. IPA is also used for more ad-hoc talent needs; for example, the Office of Evaluation Sciences (OES) at GSA often uses it to bring in fellows and academic affiliates.

Long-term rotations allow an academic to work at a federal agency for an extended period of time (more than one year), either as a fellow, a scholar, a detailee, or a special government employee. This type of collaboration can be useful for agencies that need to recruit and retain expertise, capacity, or leadership in areas that are critical to their mission, goals, or priorities.

Academic considerations

Regulatory & structural considerations

Case study

One example of a long-term rotation that draws experts from academia into federal agency work is the Advanced Research Projects Agency (ARPA) Program Manager (PM) role. ARPA PMs — across DARPA, IARPA, ARPA-E, and now ARPA-H — are responsible for leading high-risk, high-reward research programs, and have considerable autonomy and authority in defining their research vision, selecting research performers, managing their research budget, and overseeing their research outcomes. PMs are typically recruited from academia, industry, or government for a term of three to five years, and are expected to return to their academic institutions or pursue other career opportunities after their term at the agency. PMs coming from academia or nonprofit organizations are often brought on through the IPA mobility program, and some entities also have unique term-limited, hiring authorities for this purpose. PMs can also be hired as full government employees; this mechanism is primarily used for candidates coming from the private sector.

Laying the Foundation for the Low-Carbon Cement and Concrete Industry

This report is part of a series on underinvested clean energy technologies, the challenges they face, and how the Department of Energy can use its Other Transaction Authority to implement programs custom tailored to those challenges.

Cement and concrete production is one of the hardest industries to decarbonize. Solutions for low-emissions cement and concrete are much less mature than those for other green technologies like solar and wind energy and electric vehicles. Nevertheless, over the past few years, young companies have achieved significant milestones in piloting their technologies and certifying their performance and emissions reductions. In order to finance new manufacturing facilities and scale promising solutions, companies will need to demonstrate consistent demand for their products at a financially sustainable price. Demand support from the Department of Energy (DOE) can help companies meet this requirement and unlock private financing for commercial-scale projects. Using its Other Transactions Authority, DOE could design a demand-support program involving double-sided auctions, contracts for difference, or price and volume guarantees. To fund such a program using existing funds, the DOE could incorporate it into the Industrial Demonstrations Program. However, additional funding from Congress would allow the DOE to implement a more robust program. Through such an initiative, the government would accelerate the adoption of low-emissions cement and concrete, providing emissions reductions benefits across the country while setting the United States up for success in the future clean industrial economy.

Besides water, concrete is the most consumed material in the world. It is the material of choice for construction thanks to its durability, versatility, and affordability. As of 2022, the cement and concrete sector accounted for nine percent of global carbon emissions. The vast majority of the embodied emissions of concrete come from the production of Portland cement. Cement production emits carbon through the burning of fossil fuels to heat kilns (40% of emissions) and the chemical process of turning limestone and clay into cement using that heat (60% of emissions). Electrifying production facilities and making them more energy efficient can help decarbonize the former but not the latter, which requires deeper innovation.

Current solutions on the market substitute a portion of the cement used in concrete mixtures with Supplementary Cementitious Materials (SCMs) like fly ash, slag, or unprocessed limestone, reducing the embodied emissions of the resulting concrete. But these SCMs cannot replace all of the cement in concrete, and currently there is an insufficient supply of readily usable fly ash and slag for wider adoption across the industry.

The next generation of ultra-low-carbon, carbon-neutral, and even carbon-negative solutions seeks to develop alternative feedstocks and processes for producing cement or cementitious materials that can replace cement entirely and to capture carbon in aggregates and wet concrete. The DOE reports that testing and scaling these new technologies is crucial to fully eliminate emissions from concrete by 2050. Bringing these new technologies to the market will not only help the United States meet its climate goals but also promote U.S. leadership in manufacturing. 

A number of companies have established pilot facilities or are in the process of constructing them. These companies have successfully produced near-carbon-neutral and even carbon-negative concrete. Building off of these milestones, companies will need to secure financing to build full-scale commercial facilities and increase their manufacturing capacity. 

A key requirement for accessing both private-sector and government financing for new facilities is that companies obtain long-term offtake agreements, which assure financiers that there will be a steady source of revenue once the facility is built. But the boom-and-bust nature of the construction industry discourages construction companies and intermediaries from entering into long-term financial commitments in case there won’t be a project to use the materials for. Cement, aggregates, and other concrete inputs also take up significant volume, so it would be difficult and costly for potential offtakers to store excess amounts during construction lulls. For these reasons, construction contractors procure concrete on an as-needed, project-specific basis. 

Adding to the complexity, structural features of the cement and concrete market increase the difficulty of securing long-term offtake agreements:

Luckily, private construction is not the only customer for concrete. The U.S. government (federal, state, and local combined) accounts for roughly 50% of all concrete procurement in the country. Used correctly, the government’s purchasing power can be a powerful lever for spurring the adoption of decarbonized cement and concrete. However, the government faces similar barriers as the private sector against entering into long-term offtake agreements. Government procurement of concrete goes through multiple intermediaries and operates on an as-needed, project-specific basis: government agencies like the General Services Administration (GSA) enter into agreements with construction contractors for specific projects, and then the contractors or their subcontractors make the ultimate purchasing decisions for concrete.

The Federal Buy Clean Initiative, enacted in 2021 by the Biden Administration, is starting to address the procurement challenge for low-carbon cement and concrete. Among the initiative’s programs is the allocation of $4.5 billion from the Inflation Reduction Act (IRA) for the GSA and the Department of Transportation (DOT) to use lower-carbon construction materials. Under the initiative, the GSA is piloting directly procuring low-embodied-carbon materials for federal construction projects. To qualify as low-embodied-carbon concrete under the GSA’s interim requirements, concrete mixtures only have to achieve a roughly 25–50% reduction in carbon content,1 depending on the compressive strength. The requirement may be even less if no concrete meeting this standard is available near the project site. Since the bar is only slightly below traditional concrete, young companies developing the solutions to fully decarbonize concrete will have trouble competing in terms of price against companies producing more well-established but higher-emission solutions like fly ash, slag, and limestone concrete mixtures to secure procurement contracts. Moreover, the just-in-time and project-specific nature of these procurement contracts means they still don’t address juvenile companies’ need for long-term price and customer security in order to scale up.

The ideal solution for this is a demand-support program. The DOE Office of Clean Energy Demonstrations (OCED) is developing a demand-support program for the Hydrogen Hubs initiative, setting aside $1 billion for demand-support to accompany the $7 billion in direct funding to regional Hydrogen Hubs. In its request for proposals, OCED says that the hydrogen demand-support program will address the “fundamental mismatch in [the market] between producers, who need long-term certainty of high-volume demand in order to secure financing to build a project, and buyers, who often prefer to buy on a short-term basis at more modest volumes, especially for products that have yet to be produced at scale and [are] expected to see cost decreases.” 

A demand-support program could do the same for low-carbon cement and concrete, addressing the market challenges that grants alone cannot. OCED is reviewing applications for the $6.3 billion Industrial Demonstrations Program. Similar to the Hydrogen Hubs, OCED could consider setting aside $500 million to $1 billion of the program funds to implement demand-support programs for the two highest-emitting heavy industries, low-carbon cement/concrete and steel, at $250 million to $500 million each.

Additional funding from Congress would allow DOE to implement a more robust demand-support program. Federal investment in industrial decarbonization grew from $1.5 billion in FY21 to over $10 billion in FY23, thanks largely to new funding from BIL and IRA. However, the sector remains underfunded relative to its emissions, contributing 23% of the country’s emissions while receiving less than 12% of Federal climate innovation funding. A promising piece of legislation that was recently introduced is The Concrete and Asphalt Innovation Act of 2023, which would, among other things, direct the DOE to establish a program of research, development, demonstration, and commercial application of low-emissions cement, concrete, asphalt binder, and asphalt mixture. This would include a demonstration initiative authorized at $200 million and the production of a five-year strategic plan to identify new programs and resources needed to carry out the mission. If the legislation is passed, the DOE could propose a demand-support program in its strategic plan and request funding from Congress to set it up, though the faster route would be for Congress to add a section to the Act directly establishing a demand-support program within DOE and authorizing funding for it before passing the Act.

BIL and IRA gave DOE an expanded mandate to support innovative technologies from early-stage research through commercialization. In order to do so, DOE must be just as innovative in its use of its available authorities and resources. Tackling the challenge of bringing technologies from pilot to commercialization requires DOE to look beyond traditional grant, loan, and procurement mechanisms. Previously, we have identified the DOE’s Other Transaction Authority (OTA) as an underleveraged tool for accelerating clean energy technologies. 

OTA is defined in legislation as the authority to enter into transactions that are not government grants or contracts in order to advance an agency’s mission. This negative definition provides DOE with significant freedom to design and implement flexible financial agreements that can be tailored to address the unique challenges that different technologies face. DOE plans to use OTA to implement the hydrogen demand-support program, and it could also be used for a demand-support program for low-carbon cement and concrete. The DOE’s new Guide to Other Transactions provides official guidance on how DOE personnel can use the flexibilities provided by OTA. 

Before setting up a demand-support program, DOE first needs to define what a low-carbon cement or concrete product is and the value it provides in emissions avoided. This is not straightforward due to (1) the heterogeneity of solutions, which prevents apples-to-apples comparisons in price, and (2) variations in the amount of avoided emissions that different solutions can provide. To address the first issue, for products that are not ready-mix concrete, the DOE should calculate the cost of a unit of concrete made using the product, based on a standardized mix ratio of a specific compressive strength and market prices for the other components of the concrete mix. To address the second issue, the DOE should then divide the calculated price per unit of concrete (e.g., $/m3) by the amount of CO2 emissions avoided per unit of concrete compared to the NRCMA’s industry average (e.g., kg/m3) to determine the effective price per unit of CO2 emissions avoided. The DOE can then fairly compare bids from different projects using this metric. Such an approach would result in the government providing demand support for the products that are most cost-effective at reducing carbon emissions, rather than solely the cheapest.

Furthermore, the DOE should put an upper limit on the amount of embodied carbon that the concrete product or concrete made with the product must meet in order to qualify as “low carbon.” We suggest that the DOE use the limits established by the First Movers Coalition, an international corporate advanced market commitment for concrete and other hard-to-abate industries organized by the World Economic Forum. The limits were developed through conversations with incumbent suppliers, start-ups, nonprofits, and intergovernmental organizations on what would be achievable by 2030. The limits were designed to help move the needle towards commercializing solutions that enable full decarbonization.

Companies that participate in a DOE demand-support program should be required after one or two years of operations to confirm that their product meets these limits through an Environmental Product Declaration.2 Using carbon offsets to reach that limit should not be allowed, since the goal is to spur the innovation and scaling of technologies that can eventually fully decarbonize the cement and concrete industry.

Below are some ideas for how DOE can set up a demand-support program for low-carbon cement and concrete.

Double-Sided Auction 

Double-sided auctions are designed to support the development of production capacity for green technologies and products and the creation of a market by providing long-term price certainty to suppliers and facilitating the sale of their products to buyers. As the name suggests, a double-sided auction consists of two phases: First, the government or an intermediary organization holds a reverse auction for long-term purchase agreements (e.g., 10 years) for the product from suppliers, who are incentivized to bid the lowest possible price in order to win. Next, the government conducts annual auctions of short-term sales agreements to buyers of the product. Once sales agreements are finalized, the product is delivered directly from the supplier to the buyer, with the government acting as a transparent intermediary. The government thus serves as a market maker by coordinating the purchase and sale of the product from producers to buyers. Government funding covers the difference between the original purchase price and the final sale price, reducing the impact of the green premium for buyers and sellers. 

While the federal government has not yet implemented a double-sided auction program, OCED is considering setting up the hydrogen demand-support measure as a “market maker” that provides a “ready purchaser/seller for clean hydrogen.” Such a market maker program could be implemented most efficiently through double-sided auctions.

Germany was the first to conceive of and develop the double-sided auction scheme. The H2Global initiative was established in 2021 to support the development of production capacity for green hydrogen and its derivative products. The program is implemented by Hintco, an intermediary company, which is currently evaluating bids for its first auction for the purchase of green ammonia, methanol, and e-fuels, with final contracts expected to be announced as soon as this month. Products will start to be delivered by the end of 2024.

A double-sided auction scheme for low-carbon cement and concrete would address producers’ need for long-term offtake agreements while matching buyers’ short-term procurement needs. The auctions would also help develop transparent market prices for low-carbon cement and concrete products.

(Source: H2Global)

A double-sided auction scheme for low-carbon cement and concrete would address producers’ need for long-term offtake agreements while matching buyers’ short-term procurement needs. The auctions would also help develop transparent market prices for low-carbon cement and concrete products. 

All bids for purchase agreements should include detailed technical specifications and/or certifications for the product, the desired price per unit, and a robust, third-party life-cycle assessment of the amount of embodied carbon per unit of concrete made with the product, at different compressive strengths. Additionally, bids of ready-mix concrete should include the location(s) of their production facility or facilities, and bids of cement and other concrete inputs should include information on the locations of ready-mix concrete facilities capable of producing concrete using their products. The DOE should then select bids through a pure reverse auction using the calculated effective price per unit of CO2 emissions avoided. To account for regional fragmentation, the DOE could conduct separate auctions for each region of the country.

A double-sided auction presents similar benefits to the low-carbon cement and concrete industry as an advance market commitment would. However, the addition of an efficient, built-in system for the government to then sell that cement or concrete allotment to a buyer means that the government is not obligated to use the cement or concrete itself. This is important because the logistics of matching cement or concrete production to a suitable government construction project can be difficult due to regional fragmentation, and the DOE is not a major procurer of cement and concrete.3 Instead, under this scheme, federal, state, or local agencies working on a construction project or their contractors could check the double-sided auction program each year to see if there is a product offering in their region that matches their project needs and sustainability goals for that year, and if so, submit a bid to procure it. In fact, this should be encouraged as a part of the Federal Buy Clean Initiative, since the government is such an important consumer of cement and concrete products.

Contracts for Difference

Contracts for difference (CfD, or sometimes called two-way CfD) programs aim to provide price certainty for green technology projects and close the gap between the price that producers need and the price that buyers are willing to offer. CfD have been used by the United Kingdom and France primarily to support the development of large-scale renewable energy projects. However, CfD can also be used to support the development of production capacity for other green technologies. OCED is considering CfD (also known as pay-for-difference contracts) for its hydrogen demand-support program. 

CfD are long-term contracts signed between the government or a government-sponsored entity and companies looking to expand production capacity for a green product.4 The contract guarantees that once the production facility comes online, the government will ensure a steady price by paying suppliers the difference between the market price for which they are able to sell their product and a predetermined “strike price.” On the other hand, if the market price rises above the strike price, the supplier will pay the difference back to the government. This prevents the public from funding any potential windfall profits.

A CfD program could provide a source of demand certainty for low-carbon cement and concrete companies looking to finance the construction of pilot- and commercial-scale manufacturing plants or the retrofitting of existing plants. The selection of recipients and strike prices should be determined through annual reverse auctions. In a typical reverse auction for CfD, the government sets a cap on the maximum number of units of product and the max strike price they’re willing to accept. Each project candidate then places a sealed bid for a unit price and the amount of product they plan to produce. The bids are ranked by unit price, and projects are accepted from low to high unit price until either the max total capacity or max strike price is reached. The last project accepted sets the strike price for all accepted projects. The strike price is adjusted annually for inflation but otherwise fixed over the course of the contract. Compared to traditional subsidy programs, a CfD program can be much more cost-efficient thanks to the reverse auction process. The UK’s CfD program has seen the strike price fall with each successive round of auctions.

Applying this to the low-carbon cement and concrete industry requires some adjustments, since there are a variety of products for decarbonizing cement and concrete. As discussed prior, the DOE should compare project bids according to the effective price per unit CO2 abated when the product is used to make concrete. The DOE should also set a cap on the maximum volume of CO2 it wishes to abate and the maximum effective price per unit of CO2 abated that it is willing to pay. Bids can then be accepted from low to high price until one of those caps is hit. Instead of establishing a single strike price, the DOE should use the accepted project’s bid price as the strike price to account for the variation in types of products.

Backstop Price Guarantee 

A CfD program could be designed as a backstop price guarantee if one removes the requirement that suppliers pay the government back when market prices rise above the strike price. In this case, the DOE would set a lower maximum strike price for CO2 abatement, knowing that suppliers will be willing to bid lower strike prices, since there is now the opportunity for unrestricted profits above the strike price. The DOE would then only pay in the worst-case scenario when the market price falls below the strike price, which would operate as an effective price floor.

Backstop Volume Guarantee

Alternatively, the DOE could address demand uncertainty by providing a volume guarantee. In this case, the DOE could conduct a reverse auction for volume guarantee agreements with manufacturers, wherein the DOE would commit to purchasing any units of product short of the volume guarantee that the company is unable to sell each year for a certain price, and the company would commit to a ceiling on the price they will charge buyers.5 Using OTA, the DOE could implement such a program in collaboration with DOT or GSA, wherein DOE would purchase the materials and DOT or GSA would use the materials for their construction needs.

Rather than directly managing a demand-support program, the DOE should enter into an OT agreement with an external nonprofit entity to administer the contracts.6 The nonprofit entity would then hold auctions and select, manage, and fulfill the contracts. DOE is currently in the process of doing this for the hydrogen demand-support program. 

A nonprofit entity could provide two main benefits. First, the logistics of implementing such a program would not be trivial, given the number of different suppliers, intermediaries, and offtakers involved. An external entity would have an easier and faster time hiring staff with the necessary expertise compared to the federal hiring process and limited budget for program direction that the DOE has to contend with. Second, the entity’s independent nature would make it easier to gain lasting bipartisan support for the demand-support program, since the entity would not be directly associated with any one administration.

The green premium for near-zero-carbon cement and concrete products is steep, and demand-support programs like the ones proposed in this report should not be considered a cure-all for the industry, since it may be difficult to secure a large enough budget for any one such program to fully address the green premium across the industry. Rather, demand-support programs can complement the multiple existing funding authorities within the DOE by closing the residual gap between emerging technologies and conventional alternatives after other programs have helped to lower the green premium. 

The DOE’s Loan Programs Office (LPO) received a significant increase in their lending authorities from the IRA and has the ability to provide loans or loan guarantees to innovative clean cement facilities, resulting in cheaper capital financing and providing an effective subsidy. In addition, the IRA and the Bipartisan Infrastructure Law provided substantial new funding for the demonstration of industrial decarbonization technologies through OCED. 

Policies like these can be chained together. For example, a clean cement start-up could simultaneously apply to OCED for funding to demonstrate their technology at scale and a loan or loan guarantee from LPO after due diligence on their business plan. Together, these two programs drive down the cost of the green premium and derisk the companies that successfully receive their support, leaving a much more modest price premium that a mechanism like a double-sided auction could affordably cover with less risk. 

Successfully chaining policies like this requires deep coordination across DOE offices. OCED and LPO would need to work in lockstep in conducting technical evaluations and due diligence of projects that apply to both and prioritize funding of projects that meet both offices’ criteria for success. The best projects should be offered both demonstration funding from OCED and conditional commitments from LPO, which would provide companies with the confidence that they will receive follow-on funding if the demonstration is successful and other conditions are met, while posing no added risk to LPO since companies will need to meet their conditions first before receiving funds. The assessments should also consider whether the project would be a strong candidate for receiving demand support through a double-sided auction, CfD program, or price/volume guarantee, which would help further derisk the loan/loan guarantee and justify the demonstration funding. 

Candidates for receiving support from all three public funding instruments would of course need to be especially rigorously evaluated, since the fiscal risk and potential political backlash of such a project failing is also much greater. If successful, such coordination would ensure that the combination of these programs substantially moves the needle on bringing emerging technologies in green cement and concrete to commercial scale. 

Demand support can help address the key barrier that low-carbon cement and concrete companies face in scaling their technologies and financing commercial-scale manufacturing facilities. Whichever approach the DOE chooses to take, the agency should keep in mind (1) the importance of setting an ambitious standard for what qualifies as low-carbon cement and concrete and comparing proposals using a metric that accounts for the range of different product types and embodied emissions, (2) the complex implementation logistics, and (3) the benefits of coordinating a demand-support program with the agency’s demonstration and loan programs. Implemented successfully, such a program would crowd in private investment, accelerate commercialization, and lay the foundation for the clean industrial economy in the United States.

Breaking Ground on Next-Generation Geothermal Energy

This report is part one of a series on underinvested clean energy technologies, the challenges they face, and how the Department of Energy can use its Other Transaction Authority to implement programs custom tailored to those challenges.

The United States has been gifted with an abundance of clean, firm geothermal energy lying below our feet – tens of thousands of times more than the country has in untapped fossil fuels. Geothermal technology is entering a new era, with innovative approaches on their way to commercialization that will unlock access to more types of geothermal resources. However, the development of commercial-scale geothermal projects is an expensive affair, and the U.S. government has severely underinvested in this technology. The Inflation Reduction Act and the Bipartisan Infrastructure Law concentrated clean energy investments in solar and wind, which are great near-term solutions for decarbonization, but neglected to invest sufficiently in solutions like geothermal energy, which are necessary to reach full decarbonization in the long term. With new funding from Congress or potentially the creative (re)allocation of existing funding, the Department of Energy (DOE) could take a number of different approaches to accelerating progress in next-generation geothermal energy, from leasing agency land for project development to providing milestone payments for the costly drilling phases of development.

As the United States power grid transitions towards clean energy, the increasing mix of intermittent renewable energy sources like solar and wind must be balanced by sources of clean firm power that are available around the clock in order to ensure grid reliability and reduce the need to overbuild solar, wind, and battery capacity. Geothermal power is a leading contender for addressing this issue. 

Conventional geothermal (also known as hydrothermal) power plants tap into existing hot underground aquifers and circulate the hot water to the surface to generate electricity. Thanks to an abundance of geothermal resources close to the earth’s surface in the western part of the country, the United States currently leads the world in geothermal power generation. Conventional geothermal power plants are typically located near geysers and steam vents, which indicate the presence of hydrothermal resources belowground. However, these hydrothermal sites represent just a small fraction of the total untapped geothermal potential beneath our feet — more than the potential of fossil fuel and nuclear fuel reserves combined.

Next-generation geothermal technologies, such as enhanced geothermal systems (EGS), closed-loop or advanced geothermal systems (AGS), and other novel designs, promise to allow access to a wider range of geothermal resources. Some designs can potentially also serve double duty as long-duration energy storage. Rather than tapping into existing hydrothermal reservoirs underground, these technologies drill into hot dry rock, engineer independent reservoirs using either hydraulic stimulation or extensive horizontal drilling, and then introduce new fluids to bring geothermal energy to the surface. These new technologies have benefited from advances in the oil and gas industry, resulting in lower drilling costs and higher success rates. Furthermore, some companies have been developing designs for retrofitting abandoned oil and gas wells to convert them into geothermal power plants. The commonalities between these two sectors present an opportunity not only to leverage the existing workforce, engineering expertise, and supply chain from the oil and gas industry to grow the geothermal industry but also to support a just transition such that current workers employed by the oil and gas industry have an opportunity to help build our clean energy future. 

Over the past few years, a number of next-generation geothermal companies have had successful pilot demonstrations, and some are now developing commercial-scale projects. As a result of these successes and the growing demand for clean firm power, power purchase agreements (PPAs) for an unprecedented 1GW of geothermal power have been signed with utilities, community choice aggregators (CCAs), and commercial customers in the United States in 2022 and 2023 combined. In 2023, PPAs for next-generation geothermal projects surpassed those for conventional geothermal projects in terms of capacity. While this is promising, barriers remain to the development of commercial-scale geothermal projects. To meet its goal of net-zero emissions by 2050, the United States will need to invest in overcoming these barriers for next-generation geothermal energy now, lest the technology fail to scale to the level necessary for a fully decarbonized grid. 

Meanwhile, conventional hydrothermal still has a role to play in the clean energy transition. The United States needs all the clean firm power that it can get, whether that comes from conventional or next-generation geothermal, in order to retire baseload coal and natural gas plants. The construction of conventional hydrothermal power plants is less expensive and cheaper to finance, since it’s a tried and tested technology, and there are still plenty of untapped hydrothermal resources in the western part of the country.

Funding is the biggest barrier to commercial development of next-generation geothermal projects. There are two types of private financing: equity financing or debt financing. Equity financing is more risk tolerant and is typically the source of funding for start-ups as they move from the R&D to demonstration phases of their technology. But because equity financing has a dilutive effect on the company, when it comes to the construction of commercial-scale projects, debt financing is preferred. However, first-of-a-kind commercial projects are almost always precluded from accessing debt financing. It is commonly understood within industry that private lenders will not take on technology risk, meaning that technologies must be at a Technology Readiness Level (TRL) of 9, where they have been proven to operate at commercial scale, and government lenders like the DOE Loan Programs Office (LPO) generally will not take on any risk that private lenders won’t. Manifestations of technology risk in next-generation geothermal include the possibility of underproduction, which would impact the plant’s profitability, or that capacity will decline faster than expected, reducing the plant’s operating lifetime. Moving next-generation technologies from the current TRL-7 level to TRL-9 will be key to establishing the reliability of these emerging technologies and unlocking debt financing for future commercial-scale projects. 

Underproduction will likely remain a risk, though to a lesser extent, for next-generation projects even after technologies reach TRL-9. This is because uncertainty in the exploration and subsurface characterization process makes it possible for developers to overestimate the temperature gradient and thus the production capacity of a project. Hydrothermal projects also share this risk: the factors determining the production capacity for hydrothermal projects include not only the temperature gradient but also the flow rate and enthalpy of the natural reservoir. In the worst-case scenario, drilling can result in a dry hole that produces no hot fluids at all. This becomes a financial issue if the project is unable to generate as much revenue as expected due to underproduction or additional wells must be drilled to compensate, driving up the total project cost. Thus, underproduction is a risk shared by both next-generation and conventional geothermal projects. Research into improvements to the accuracy and cost of geothermal exploration and subsurface characterization can help mitigate this risk but may not eliminate it entirely, since there is a risk-cost trade-off in how much time is spent on exploration and subsurface characterization.

Another challenge for both next-generation and conventional geothermal projects is that they are more expensive to develop than solar or wind projects. Drilling requires significant upfront capital expenditures, making up about half of the total capital costs of developing a geothermal project, if not more. For example, in EGS projects, the first few wells can cost around $10 million each, while conventional hydrothermal wells, which are shallower, can cost around $3–7 million each. While conventional hydrothermal plants only consist of two to six wells on average, designs for commercial EGS projects can require several times that amount of wells. Luckily, EGS projects benefit from the fact that wells can be drilled identically, so projects expect to move down the learning curve as they drill more wells, resulting in faster and cheaper drilling. Initial data from commercial-scale projects currently being developed suggest that the learning curves may be even steeper than expected. Nevertheless, this will need to be proven at scale across different locations. Some companies have managed to forgo expensive drilling costs by focusing on developing technologies that can be installed within idle hydrothermal wells or abandoned oil and gas wells to convert them into productive geothermal wells.

Beyond funding, geothermal projects need to obtain land where there are suitable geothermal resources and permits for each stage of project development. The best geothermal resources in the United States are concentrated in the West, where the federal government owns most of the land. The Bureau of Land Management (BLM) manages a lot of that land, in addition to all subsurface resources on federal land. However, there is inconsistency in how the BLM leases its land, depending on the state. While Nevada BLM has been very consistent about holding regular lease sales each year, California BLM has not held a lease sale since 2016. Adding to the complexity is the fact that although BLM manages all subsurface resources on federal land, surface land may sometimes be managed by a different agency, in which case both agencies will need to be involved in the leasing and permitting process.

Last, next-generation geothermal companies face a green premium on electricity produced using their technology, though the green premium does not appear to be as significant of a challenge for next-generation geothermal as it is for other green technologies. In states with high renewables penetration, utilities and their regulators are beginning to recognize the extra value that clean firm power provides in terms of grid reliability. For example, the California Public Utility Commission has issued an order for utilities to procure 1 GW of clean, firm power by 2026, motivating a wave of new demand from utilities and community choice aggregators. As a result of this demand and California’s high electricity prices in general, geothermal projects have successfully signed a flurry of PPAs over the past year. These have included projects located in Nevada and Utah that can transmit electricity to California customers. In most other western states, however, electricity prices are much lower, so utility companies can be reluctant to sign PPAs for next-generation geothermal projects if they aren’t required to, due to the high cost and technology risk. As a result, next-generation geothermal projects in those states have turned to commercial customers, like those operating data centers, who are willing to pay more to meet their sustainability goals. 

The federal government is beginning to recognize the important role of next-generation geothermal power for the clean energy transition. For the first time in 2023, geothermal energy became eligible for the renewable energy investment and production tax credits, thanks to technology-neutral language introduced in the Inflation Reduction Act (IRA). Within the DOE, the agency launched the Enhanced Geothermal Shot in 2022, led by the Geothermal Technologies Office (GTO), to reduce the cost of EGS by 90% to $45/MWh by 2035 and make geothermal widely available. In 2020, the Frontier Observatory for Research in Geothermal Energy (FORGE), a dedicated underground field laboratory for EGS research, drilling, and technology testing established by GTO in 2014, drilled their first well using new approaches and tools the lab had developed. This year, GTO announced funding for seven EGS pilot demonstrations from the Bipartisan Infrastructure Law (BIL), for which GTO is currently reviewing the first round of applications. GTO also awarded the Geothermal Energy from Oil and gas Demonstrated Engineering (GEODE) grant to a consortium formed by Project Innerspace, the Society of Petroleum Engineering International, and Geothermal Rising, with over 100 partner entities, to transfer best practices from the oil and gas industry to geothermal, support demonstrations and deployments, identify barriers to growth in the industry, and encourage workforce adoption. 

While these initiatives are a good start, significantly more funding from Congress is necessary to support the development of pilot demonstrations and commercial-scale projects and enable wider adoption of geothermal energy. The BIL notably expanded the DOE’s mission area in supporting the deployment of clean energy technologies, including establishing the Office of Clean Energy Demonstrations (OCED) and funding demonstration programs from the Energy Division of BIL and the Energy Act of 2020. However, the $84 million in funding authorized for geothermal pilot demonstrations was only a fraction of the funding that other programs received from BIL and not commensurate to the actual cost of next-generation geothermal projects. Congress should be investing an order of magnitude more into next-generation geothermal projects, in order to maintain U.S. leadership in geothermal energy and reap the many benefits to the grid, the climate, and the economy.

Another key issue is that DOE has currently and in the past limited all of its funding for next-generation geothermal to EGS technologies only. As a result, companies pursuing closed-loop/AGS and other next-generation technologies cannot qualify, leading some projects to be moved abroad. Given GTO’s historically limited budget, it’s possible that this was the result of a strategic decision to focus their funding on one technology rather than diluting it across multiple technologies. However, given that none of these technologies have been successfully commercialized at a wide scale yet, DOE may be missing the opportunity to invest in the full range of viable approaches. DOE appears to be aware of this, as the agency currently has a working group on AGS. New funding from Congress would allow DOE to diversify its investments to support the demonstration and commercial application of other next-generation geothermal technologies. 

Alternatively, there are a number of OCED programs with funding from BIL that have not yet been fully spent (Table 1). Congress could reallocate some of that funding towards a new program supporting next-generation geothermal projects within OCED. Though not ideal, this may be a more palatable near-term solution for the current Congress than appropriating new funding.

Table 1. OCED programs that have remaining unspent funding from BIL as of publication in January 2024.
OCED ProgramTotal FundingCommitted FundingUnspent Funding
Carbon Capture Demonstration Projects$2.547 billion$1.889 billion$658 million
Carbon Capture Large Scale Pilot Projects$937 million$820 million$117 million
Energy Improvements in Rural and Remote Areas$1 billion$365 million$635 million
Clean Energy Demonstration Program on Current and Former Mine Land$500 million$450 million$50 million
Energy Storage Demonstration Projects and Pilot Grant Program$355 million$349 million$6 million
Long-Duration Demonstration Program and Joint Initiative$150 million$30 million$120 million

A third option is that DOE could use some of the funding for the Energy Improvements in Rural and Remote Areas program, of which $635 million remains unallocated, to support geothermal projects. Though the program’s authorization does not explicitly mention geothermal energy, geothermal is a good candidate given the abundance of geothermal production potential in rural and remote areas in the West. Moreover, as a clean firm power source, geothermal has a comparative advantage over other renewable energy sources in improving energy reliability. 

Other Transactions Authority

BIL and IRA gave DOE an expanded mandate to support innovative technologies from early stage research through commercialization. To do so, DOE will need to be just as innovative in its use of its available authorities and resources. Tackling the challenge of scaling technologies from pilot to commercialization will require DOE to look beyond traditional grant, loan, and procurement mechanisms. Previously, we identified the DOE’s Other Transaction Authority (OTA) as an underleveraged tool for accelerating clean energy technologies. 

OTA is defined in legislation as the authority to enter into any transaction that is not a government grant or contract. This negative definition provides DOE with significant freedom to design and implement flexible financial agreements that can be tailored to the unique challenges that different technologies face. OT agreements allow DOE to be more creative, and potentially more cost-effective, in how it supports the commercialization of new technologies, such as facilitating the development of new markets, mitigating risks and market failures, and providing innovative new types of demand-side “pull” funding and supply-side “push” funding. The DOE’s new Guide to Other Transactions provides official guidance on how DOE personnel can use the flexibilities provided by OTA. 

With additional funding from Congress, the DOE could use OT agreements to address the unique barriers that geothermal projects face in ways that may not be possible through other mechanisms. Below are four proposals for how the DOE can do so. We chose to focus on supporting next-generation geothermal projects, since the young industry currently requires more governmental support to grow, but we included ideas that would benefit conventional hydrothermal projects as well.

Geothermal Development on Agency Land

This year, the Defense Innovation Unit issued its first funding opportunity specifically for geothermal energy. The four winning projects will aim to develop innovative geothermal power projects on Department of Defense (DoD) bases for both direct consumption by the base and sale to the local grid. OT agreements were used for this program to develop mutually beneficial custom terms. For project developers, DoD provided funding for surveying, design, and proposal development in addition to land for the actual project development. The agreement terms also gave companies permission to use the technology and information gained from the project for other commercial use. For DoD, these projects are an opportunity to improve the energy resilience and independence of its bases while also reducing emissions. By implementing the prototype agreement using OTA, DoD will have the option to enter into a follow-on OT agreement with project developers without further competition, expediting future processes.

DOE could implement a similar program for its 2.4 million acres of land. In particular, the DOE’s land in Idaho and other western states has favorable geothermal resources, which the DOE has considered leasing. By providing some funding for surveying and proposal development like the DoD, the DOE can increase the odds of successful project development, compared to simply leasing the land without funding support. The DOE could also offer technical support to projects from its national labs. 

With such a program, a lot of the value that the DOE would be providing is the land itself, which the DOE currently has more of than actual funding for geothermal energy. The funding needed for surveying and proposal development is much less than would be needed to support the actual construction of demonstration projects, so GTO could feasibly request funding for such a program through the annual appropriations process. Depending on the program outcomes and the resulting proposals, the DOE could then go back to Congress to request follow-on funding to support actual project construction. 

Drilling Cost-Share Program

To help defray the high cost of drilling, the DOE could implement a milestone-based cost-share program. There is precedent for government cost-share programs for geothermal: in 1973, before the DOE was even established, Congress passed the Geothermal Loan Guarantee Program to provide “investment security to the public and private sectors to exploit geothermal resources” in the early days of the industry. Later, the DOE funded the Cascades I and II Cost Shared Programs. Then, from 2000 to 2007, the DOE ran the Geothermal Resource Exploration and Definitions (GRED) I, II, and III Cost-Share Programs. This year, the DOE launched its EGS Pilot Demonstrations program.

A milestone payment structure could be favorable for supporting expensive, next-generation geothermal projects because the government takes on less risk compared to providing all of the funding upfront. Initial funding could be provided for drilling the first few wells. Successful and on-time completion of drilling could then unlock additional funding to drill more wells, and so on. In the past, both the DoD and the National Aeronautics and Space Administration (NASA) have structured their OT agreements using milestone payments, most famously between NASA and SpaceX for the development of the Falcon9 space launch vehicle. The NASA and SpaceX agreement included not just technical but also financial milestones for the investment of additional private capital into the project. The DOE could do the same and include both technical and financial milestones in a geothermal cost-share program. 

Risk Insurance Program

Longer term, the DOE could implement a risk insurance program for conventional hydrothermal and next-generation geothermal projects. Insuring against underproduction could make it easier and cheaper for projects to be financed, since the potential downside for investors would be capped. The DOE could initially offer insurance just for conventional hydrothermal, since there is already extensive data on past commercial projects that can inform how the insurance is designed. In order to design insurance for next-generation technologies, more commercial-scale projects will first need to be built to collect the data necessary to assess the underproduction risk of different approaches.

France has administered a successful Geothermal Public Risk Insurance Fund for conventional hydrothermal projects since 1982. The insurance originally consisted of two parts: a Short-Term Fund to cover the risk of underproduction and a Long-Term Fund to cover uncertain long-term behavior over the operating lifetime of the geothermal plant. The Short-Term Fund asked project owners to pay a premium of 1.5% of the maximum guaranteed amount. In return, the Short-Term Fund provided a 20% subsidy for the cost of drilling the first well and, in the case of reduced output or a dry hole, a compensation between 20% and 90% of the maximum guaranteed amount (inclusive of the subsidy that has already been paid). The exact compensation is determined based on a formula for the amount necessary to restore the project’s profitability with its reduced output. The Short-Term Fund relied on a high success rate, especially in the Paris Basin where there is known to be good hydrothermal resources, to fund the costs of failures. Geothermal developers that chose to get coverage from the Short-Term Fund were required to also get coverage from the Long-Term Fund, which was designed to hedge against the possibility of unexpected geological or geothermal changes within the wells, such as if their output declined faster than expected or severe corrosion or scaling occurred, over the geothermal plant’s operating lifetime. The Long-Term Fund ended in 2015, but a new iteration of the Short-Term Fund was approved in 2023.

The Netherlands has successfully run a similar program to the Short-Term Fund since the 2000s. Private-sector attempts at setting up geothermal risk insurance packages in Europe and around the world have mostly failed, though. The premiums were often too high, costing up to 25–30% of the cost of drilling, and were established in developing markets where not enough projects were being developed to mutualize the risk. 

To implement such a program at the DOE, projects seeking coverage would first submit an application consisting of the technical plan, timeline, expected costs, and expected output. The DOE would then conduct rigorous due diligence to ensure that the project’s proposal is reasonable. Once accepted, projects would pay a small premium upfront; the DOE should keep in mind the failed attempts at private-sector insurance packages and ensure that the premium is affordable. In the case that either the installed capacity is much lower than expected or the output capacity declines significantly over the course of the first year of operations, the Fund would compensate the project based on the level of underproduction and the amount necessary to restore the project’s profitability with a reduced output. The French Short-Term Fund calculated compensation based on characteristics of the hydrothermal wells; the DOE would need to develop its own formulas reflective of the costs and characteristics of different next-generation geothermal technologies once commercial data actually exists. 

Before setting up a geothermal insurance fund, the DOE should investigate whether there are enough geothermal projects being developed across the country to ensure the mutualization of risk and whether there is enough commercial data to properly evaluate the risk. Another concern for next-generation geothermal is that a high failure rate could cause the fund to run out. To mitigate this, the DOE will need to analyze future commercial data for different next-generation technologies to assess whether each technology is mature enough for a sustainable insurance program. Last, poor state capacity could impede the feasibility of implementing such a program. The DOE will need personnel on staff that are sufficiently knowledgeable about the range of emerging technologies in order to properly evaluate technical plans, understand their risks, and design an appropriate insurance package. 

Production Subsidy

While the green premium for next-generation geothermal has not been an issue in California, it may be slowing down project development in other states with lower electricity prices. The Inflation Reduction Act introduced a new clean energy Production Tax Credit that included geothermal energy for the first time. However, due to the higher development costs of next-generation geothermal projects compared to other renewable energy projects, that subsidy is insufficient to fully bridge the green premium. DOE could use OTA to introduce a production subsidy for next-generation geothermal energy with varied rates depending on the state that the electricity is sold to and its average baseload electricity price (e.g., the production subsidy likely would not apply to California). This would help address variations in the green premium across different states and expand the number of states in which it is financially viable to develop next-generation geothermal projects. 

The United States is well-positioned to lead the next-generation geothermal industry, with its abundance of geothermal resources and opportunities to leverage the knowledge and workforce of the domestic oil and gas industry. The responsibility is on Congress to ensure that DOE has the necessary funding to support the full range of innovative technologies being pursued by this young industry. With more funding, DOE can take advantage of the flexibility offered by OTA to create agreements tailored to the unique challenges that the geothermal industry faces as it begins to scale. Successful commercialization would pave the way to unlocking access to 24/7 clean energy almost anywhere in the country and help future-proof the transition to a fully decarbonized power grid.