Scaling Effective Methods across Federal Agencies: Looking Back at the Expanded Use of Incentive Prizes between 2010-2020

Policy entrepreneurs inside and outside of government, as well as other stakeholders and advocates, are often interested in expanding the use of effective methods across many or all federal agencies, because how the government accomplishes its mission is integral to what the government is able to produce in terms of outcomes for the public it serves. Adoption and use of promising new methods by federal agencies can be slowed by a number of factors that discourage risk-taking and experimentation, and instead encourage compliance and standardization, too often as a false proxy for accountability. As a result, many agency-specific and government-wide authorities for promising methods go under-considered and under-utilized. 

Policy entrepreneurs within center-of-government agencies (e.g., Executive Office of the President) are well-positioned to use a variety of policy levers and actions to encourage and accelerate federal agency adoption of promising and effective methods. Some interventions by center-of-government agencies are better suited to driving initial adoption, others to accelerating or maintaining momentum, and yet others to codifying and making adoption durable once widespread. Therefore, a policy entrepreneur interested in expanding adoption of a given method should first seek to understand the “adoption maturity” of that method and then undertake interventions appropriate for that stage of adoption. The arc of agency adoption of new methods can be long—measured in years and decades, not weeks and months. Policy entrepreneurs should be prepared to support adoption over similar timescales. In considering adoption maturity of a method of interest, policy entrepreneurs can also reference the ideas of Tom Kalil in a July 2024 Federation of American Scientists blog post on “Increasing the ‘Policy Readiness of Ideas,” which offers sample questions to ask about “the policy landscape surrounding a particular idea.”

As a case study for driving federal adoption of a new method, this paper looks back at actions that supported the widespread adoption of incentive prizes by most federal agencies over the course of fiscal years 2010 through 2020. Federal agency use of prizes increased from several incentive prize competitions being offered by a handful of agencies in the early 2000s to more than 2,000 prize competitions offered by over 100 federal agencies by the end of fiscal year 2022. These incentive prize competitions have helped federal agencies identify novel solutions and technologies, establish new industry benchmarks, pay only for results, and engage new talent and organizations. 

A summary framework below includes types of actions that can be taken by policy entrepreneurs within center-of-government agencies to support awareness, piloting, and ongoing use of new methods by federal agencies in the years ahead. (Federal agency program and project managers who seek to scale up innovative methods within their agencies are encouraged to reference related resources such as this article by Jenn Gustetic in the Winter 2018 Issues in Science and Technology: “Scaling Up Policy Innovations in the Federal Government: Lessons from the Trenches.”) 

Efforts to expand federal capacity through new and promising methods are worthwhile to ensure the federal government can use a full and robust toolbox of tactics to meet its varied goals and missions. 

OPPORTUNITIES AND CHALLENGES IN FEDERAL ADOPTION OF NEW METHODS

Opportunities for federal adoption and use of promising and effective methods

To address national priorities, solve tough challenges, or better meet federal missions to serve the public, a policy entrepreneur may aim to pilot, scale, and make lasting federal use of a specific method. 

A policy entrepreneur’s goals might include new ways for federal agencies to, for example:

To support these and other goals, an array of promising methods exist and have been demonstrated, such as in other sectors like philanthropy, industry, and civil society, in state, local, Tribal, or territorial governments and communities, or in one or several federal agencies—with promise for beneficial impact if more federal agencies adopted these practices. Many methods are either specifically supported or generally allowable under existing government-wide or agency-specific authorities. 

Center-of-government agencies include components of the Executive Office of the President (EOP) like the Office of Management and Budget (OMB) and the Office of Science and Technology Policy (OSTP), as well as the Office of Personnel Management (OPM) and the General Services Administration (GSA). These agencies direct, guide, convene, support, and influence the implementation of law, regulation, and the President’s policies across all Federal agencies, especially the executive departments. An August 2016 report by the Partnership for Public Service and the IBM Center for the Business of Government noted that, “The Office of Management and Budget and other “center of government” agencies are often viewed as adding processes that inhibit positive change—however, they can also drive innovation forward across the government.”

A policy entrepreneur interested in expanding adoption of a given method through actions driven or coordinated by one or more center-of-government agencies should first seek to understand the “adoption maturity” of a given method of interest by assessing: (1) the extent that adoption of the method has already occurred across the federal interagency; (2) any real or perceived barriers to adoption and use; and (3) the robustness of existing policy frameworks and agency-specific and government-wide infrastructure and resources that support agency use of the method.

Challenges in federal adoption and use of new methods

Policy entrepreneurs are usually interested in expanding federal adoption of new methods for good reason: a focus on supporting and expanding beneficial outcomes. Effective leaders and managers across sectors understand the importance of matching appropriate and creative tactics with well-defined problems and opportunities. Ideally, leaders are picking which tactic or tool to use based on their expert understanding of the target problem or opportunity, not using a method solely because it is novel or because it is the way work has always been done in the past. Design of effective program strategies is supported by access to a robust and well-stocked toolbox of tactics. 

However, many currently authorized and allowable methods for achieving federal goals are generally underutilized in the implementation strategies and day-to-day tactics of federal agencies. Looking at the wide variety of existing authorities in law and the various flexibilities allowed for in regulation and guidance, one might expect agency tactics for common activities like acquisition or public comment to be varied, diverse, iterative, and even experimental in nature, where appropriate. In practice, however, agency methods are often remarkably homogeneous, repeated, and standardized.   

This underutilization of existing authorities and allowable flexibilities is due to factors such as:

Strategies for addressing challenges in federal adoption and use of new methods

Attention and action by center-of-government agencies often is needed to address the factors cited above that slow the adoption and use of new methods across federal agencies and to build momentum. The following strategies are further explored in the case study on federal use of incentive prizes that follows: 

Additional strategies can be deployed within federal agencies to address agency-level barriers and scale promising methods—see, for example, this article by Jenn Gustetic in the Winter 2018 Issues in Science and Technology: “Scaling Up Policy Innovations in the Federal Government: Lessons from the Trenches.” 

LOOKING BACK: A DECADE OF POLICY ACTIONS SUPPORTING EXPANDED FEDERAL USE OF INCENTIVE PRIZES

The use of incentive prizes is one method for open innovation that has been adopted broadly by most federal agencies, with extensive bipartisan support in Congress and with White House engagement across multiple administrations. In contrast to recognition prizes, such as the Nobel Prize or various presidential medals, which reward past accomplishments, incentive prizes specify a target, establish a judging process (ideally as objective as possible), and use a monetary prize purse and/or non-monetary incentives (such as media and online recognition, access to development and commercialization facilities, resources, or experts, or even qualification for certain regulatory flexibility) to induce new efforts by solvers competing for the prize. 

The use of incentive prizes by governments (and by high net worth individuals) to catalyze novel solutions certainly is not new. In 1795, Napoleon offered 12,000 francs to improve upon the prevailing food preservation methods of the time, with a goal of better feeding his army. Fifteen years later, confectioner Nicolas François Appert claimed the prize for his method involving heating, boiling and sealing food in airtight glass jars — the same basic technology still used to can foods. Dava Sobel’s book Longitude details how the rulers of Spain, the Netherlands, and Britain all offered separate prizes, starting in 1567, for methods of figuring out longitude at sea, and finally John Harrison was awarded Britain’s top longitude prize in 1773. In 1919, Raymond Orteig, a French-American hotelier, aviation enthusiast, and philanthropist, offered a $25,000 prize for the first person who could perform a nonstop flight between New York and Paris. The prize offer initially expired by 1924 without anyone claiming it. Given technological advances and a number of engaged pilots involved in trying to win the prize, Orteig extended the deadline by 5 years. By 1926, nine teams had come forward to formally compete, and the prize went to a little-known aviator named Charles Lindbergh, who attempted the flight in a custom-built plane known as the “Spirit of St. Louis.”

The U.S. Government did not begin to adopt the use of incentive prizes until the early 21st century, following a 1999 National Academy of Engineering workshop about the use of prizes as an innovation tool. In the first decade of the 2000s, the Defense Advanced Research Projects Agency (DARPA), the National Aeronautics and Space Administration (NASA), and the Department of Energy conducted a small number of pilot prize competitions. These early agency-led prizes focused on autonomous vehicles, space exploration, and energy efficiency, demonstrating a range of benefits to federal agency missions. 

Federal use of incentive prizes did not accelerate until, in the America COMPETES Reauthorization Act of 2010, Congress granted all federal agencies the authority to conduct prize competitions (15 USC § 3719). With that new authority in place, and with the support of a variety of other policy actions, federal use of incentive prizes reached scale, with over 2,000 prize competitions offered on Challenge.gov by over 100 federal agencies between the fiscal years 2010 and 2022

There certainly remains extensive opportunity to improve the design, rigor, ambition, and effectiveness of federal prize competitions. That said, there are informative lessons to be drawn from how incentive prizes evolved in the United States from a method used primarily outside of government, with limited pilots among a handful of early-adopter federal agencies, to a method being tried by many civil servants across an active interagency community of practice and lauded by administration leaders, bipartisan members of Congress, and external stakeholders alike. 

A summary follows of the strategies and tactics used by policy entrepreneurs within the EOP—with support and engagement from Congress as well as program managers and legal staff across federal agencies—that led to increased adoption and use of incentive prizes in the federal government.

role of philanthropy

Summary of strategies and policy levers supporting expanded use of incentive prizes

In considering how best to expand awareness, adoption, and use among federal agencies of promising methods, policy entrepreneurs might consider utilizing some or all of the strategies and policy levers described below in the incentive prizes example. Those strategies and levers are summarized generally in the table that follows. Some of the listed levers can advance multiple strategies and goals. This framework is intended to be flexible and to spark brainstorming among policy entrepreneurs, as they build momentum in the use of particular innovation methods. 

Policy entrepreneurs are advised to consider and monitor the maturity level of federal awareness, adoption, and use, and to adjust their strategies and tactics accordingly. They are encouraged to return to earlier strategies and policy levers as needed, should adoption and momentum lag, should agency ambition in design and implementation of initiatives be insufficient, or should concerns regarding risk management be raised by agencies, Congress, or stakeholders. 

Stage of Federal AdoptionStrategyTypes of Center-of-Government Policy Levers
Early – No or few Federal agencies using methodUnderstand federal opportunities to use method, and identify barriers and challenges* Connect with early adopters across federal agencies to understand use of agency-specific authorities, identify pain points and lessons learned, and capture case studies (e.g., 2000-2009)

* Engage stakeholder community of contractors, experts, researchers, and philanthropy

* Look to and learn from use of method in other sectors (such as by philanthropy, industry, or academia) and document (or encourage third-party documentation of) that use and its known benefits and attributes (e.g., April 1999, July 2009)

* Encourage research, analysis, reports, and evidence-building by National Academies, academia, think tanks, and other stakeholders (e.g., April 1999, July 2009, June 2014)

* Discuss method with OMB Office of General Counsel and other relevant agency counsel

* Discuss method with relevant Congressional authorizing committee staff

* Host convenings that connect interested federal agency representatives with experts

* Support and connect nascent federal “community of interest”
Early – No or few Federal agencies using methodBuild interest among federal agencies* Designate primary policy point of contact/dedicated staff member in the EOP (e.g., 2009-2017, 2017-2021)

* Designate a primary implementation point of contact/dedicated staff at GSA and/or OPM

* Identify leads in all or certain federal agencies

* Connect topic to other administration policy agendas and strategies

* Highlight early adopters within agencies in communications from center-of-government agencies to other federal agencies (and to external audiences)

* Offer congressional briefings and foster bipartisan collaboration (e.g., 2015)
Early – No or few Federal agencies using methodEstablish legal authorities and general administration policy * Engage OMB Office of OMB General Counsel and OMB Legislative Review Division, as well as other relevant OMB offices and EOP policy councils

* Identify existing general authorities and regulations that could support federal agency use of method (e.g., March 2010)

* Establish general policy guidelines, including by leveraging Presidential authorities through executive orders or memoranda (e.g., January 2009)

* Issue OMB directives on specific follow-on agency actions or guidance to support agency implementation (“M-Memos” or similar) (e.g., December 2009, March 2010, August 2011, March 2012)

* Provide technical assistance to Congress regarding government-wide or agency-specific authority (or authorities) (e.g., June-July 2010, January 2011)

* Delegate existing authorities within agencies (e.g., October 2011)

* Encourage issuance of agency-specific guidance (e.g., October 2011, February 2014)

* Include direction to agencies as part of broader Administration policy agendas (e.g., September 2009, 2011-2016)
Early – No or few Federal agencies using methodRemove barriers and “make it easier”* Create a central government website with information for federal agency practitioners (such as toolkits, case studies, and trainings) and for the public (e.g., September 2010)

* Create dedicated GSA schedule of vendors (e.g., July 2011)

* Establish an interagency center of excellence (e.g., September 2011)

* Encourage use of interagency agreements on design or implementation of pilot initiatives (e.g., September 2011)

* Request agency budget submissions to OMB to support pilot use in President’s budget (e.g., December 2013)
Adoption well underway – Many federal agencies have begun to use methodConnect practitioners* Launch a federal “community of practice” with support from GSA for meetings, listserv, and collaborative projects (e.g., April 2010, 2016, June 2019)

* Host regular events, workshops, and conferences with federal agency and, where appropriate and allowable, seek philanthropic or nonprofit co-hosts (e.g., April 2010, June 2012, April 2015, March 2018, May 2022)
Adoption well underway – Many federal agencies have begun to use methodStrengthen agency infrastructure* Foster leadership buy-in through briefings from White House/EOP to agency leadership, including members of the career senior executive service

* Encourage agencies to dedicate agency staff and invest in prize design support within agencies

* Encourage agencies to create contract vehicles as needed to support collaboration with vendors/ experts

* Encourage agencies to develop intra-agency networks of practitioners and to provide external communications support and platforms for outreach

* Request agency budget submissions to OMB for investments in agency infrastructure and expansion of use, to include in the President's budget where needed (e.g., 2012-2013), and request agencies otherwise accommodate lower-dollar support (such as allocation of FTEs) where possible within their budget toplines
Adoption well underway – Many federal agencies have begun to use methodClarify existing policies and authorities* Issue updated OMB, OSTP, or agency-specific policy guidance and memoranda as needed based on engagement with agencies and stakeholders (e.g.,: August 2011, March 2012)

* Provide technical assistance to Congress on any needed updates to government-wide or agency-specific authorities (e.g., January 2017)
Adoption prevalent – Most if not all federal agencies have adopted, with a need to maintain use and momentum over timeHighlight progress and capture lessons learned* Require regular reporting from agencies to EOP (OSTP, OMB, or similar) (e.g., April 2012, May 2022)

* Require and take full advantage of regular reports to Congress (e.g., April 2012, December 2013, May 2014, May 2015, August 2016, June 2019, May 2022, April 2024)

* Continue to capture and publish federal-use case studies in multiple formats online (e.g., June 2012)

* Undertake research, evaluation, and evidence-building

* Co-develop practitioner toolkit with federal agency experts (e.g., December 2016)

* Continue to feature promising examples on White House/EOP blogs and communication channels (e.g., October 2015, August 2020)

* Engage media and seek both general interest and targeted press coverage, including through external awards/honorifics (e.g., December 2013)
Adoption prevalent – Most if not all federal agencies have adopted, with a need to maintain use and momentum over timePrepare for presidential transitions and document opportunities for future administrations* Integrate go-forward proposals and lessons learned into presidential transition planning and transition briefings (e.g., June 2016-January 2017)

* Brief external stakeholders and Congressional supporters on progress and future opportunities

* Connect use of method to other, broader policy objectives and national priorities (e.g., August 2020, May 2022, April 2024)

Phases and timeline of policy actions advancing the adoption of incentive prizes by federal agencies

  1. Growing number of incentive prizes offered outside government (early 2000s)

At the close of the 20th century, federal use of incentive prizes to induce activity toward targeted solutions was limited, though the federal government regularly utilized recognition prizes to reward past accomplishment. In October 2004, the $10 million Ansari XPRIZE—which was first announced in May 1996—was awarded by the XPRIZE Foundation for the successful flights of Spaceship One by Scaled Composites. Following the awarding of the Ansari XPRIZE and the extensive resulting news coverage, philanthropists and high net worth individuals began to offer prize purses to incentivize action on a wide variety of technology and social challenges. A variety of new online challenge platforms sprung up, and new vendors began offering consulting services for designing and hosting challenges, trends that lowered the cost of prize competition administration and broadened participation in prize competitions among thousands of diverse solvers around the world. This growth in the use of prizes by philanthropists and the private sector increased the interest of the federal government in trying out incentive prizes to help meet agency missions and solve national challenges. Actions during this period to support federal use of incentive prizes include:

  1. Obama-Biden Administration Seeks to Expand Federal Prizes Through Administrative Action (2009-2010)

From the start of the Obama-Biden Administration, OSTP and OMB took a series of policy steps to expand the use of incentive prizes across federal agencies and build federal capacity to support those open-innovation efforts. Bipartisan support in Congress for these actions soon led to new legislation to further advance agency adoption of incentive prizes. Actions during this period to support federal use of incentive prizes include:

  1. Implementing New Government-Wide Prizes Authority Provided by the America COMPETES Act (2011-2016)

During this period of expansion in the federal use of incentive prizes supported by new government-wide prize authority provided by Congress, the Obama-Biden Administration continued to emphasize its commitment to the model, including as a key method for accomplishing administration priorities, including priorities related to open government and evidence-based decision making. Actions during this period to support federal use of incentive prizes include:

toolkit
  1. Maintaining Momentum in New Presidential Administrations

Support for federal use of incentive prizes continued beyond the Obama-Biden Administration foundational efforts. Leadership by federal agency prize leads was particularly important to support this momentum from administration to administration. Actions during the Trump-Pence and Biden-Harris Administrations to support federal use of incentive prizes include:

Harnessed American ingenuity through increased use of incentive prizes. Since 2010, more than 80 Federal agencies have engaged 250,000 Americans through more than 700 challenges on Challenge.gov to address tough problems ranging from fighting Ebola, to decreasing the cost of solar energy, to blocking illegal robocalls. These competitions have made more than $220 million available to entrepreneurs and innovators and have led to the formation of over 275 startup companies with over $70 million in follow-on funding, creating over 1,000 new jobs.

In addition, in January 2017, the Obama-Biden Administration OSTP mentioned the use of incentive prizes in its public “exit memo” as a key “pay-for-performance” method in agency science and technology strategies that “can deliver better results at lower cost for the American people,” and also noted:

Harnessing the ingenuity of citizen solvers and citizen scientists. The Obama Administration has harnessed American ingenuity, driven local innovation, and engaged citizen solvers in communities across the Nation by increasing the use of open-innovation approaches including crowdsourcing, citizen science, and incentive prizes. Following guidance and legislation in 2010, over 700 incentive prize competitions have been featured on Challenge.gov from over 100 Federal agencies, with steady growth every year.

By the end of fiscal year 2022, federal agencies had hosted over 2,000 prize competitions on Challenge.gov, since its launch in 2010. OSTP, GSA, and NASA CoECI had provided training to well over 2,000 federal practitioners during that same period. 

Number of Federal Prize Competitions by Authority FY14-FY22

Source: Office of Science and Technology Policy. Biennial Report on “IMPLEMENTATION OF FEDERAL PRIZE AND CITIZEN SCIENCE AUTHORITY: FISCAL YEARS 2021-22.” April 2024.

Federal Agency Practices to Support the Use of Prize Competitions

Source: Office of Science and Technology Policy. Biennial Report on “IMPLEMENTATION OF FEDERAL PRIZE AND CITIZEN SCIENCE AUTHORITY: FISCAL YEARS 2019-20.” March 2022. 

CONCLUSION

Over the span of a decade, incentive prizes had moved from a tool used primarily outside of the federal government to one used commonly across federal agencies, due to a concerted, multi-pronged effort led by policy entrepreneurs and incentive prize practitioners in the EOP and across federal agencies, with bipartisan congressional support, crossing several presidential administrations. And yet, the work to support the use of prizes by federal agencies is not complete–there remains extensive opportunity to further improve the design, rigor, ambition, and effectiveness of federal prize competitions; to move beyond “ideas challenges” to increase the use of incentive prizes to demonstrate technologies and solutions in testbeds and real-world deployment scenarios; to train additional federal personnel on the use of incentive prizes; to learn from the results of federal incentive prizes competitions; and to apply this method to address pressing and emerging challenges facing the nation.

In applying these lessons to efforts to expand the use of other promising methods in federal agencies, policy entrepreneurs in center-of-government federal agencies should be strategic in the policy actions they take to encourage and scale method adoption, by first seeking to understand the adoption maturity of that method (as well as the relevant policy readiness) and then by undertaking interventions appropriate for that stage of adoption. With attention and action by policy entrepreneurs to address factors that discourage risk-taking, experimentation, and piloting of new methods by federal agencies, it will be possible for federal agencies to utilize a further-expanded strategic portfolio of methods to catalyze the development, demonstration, and deployment of technology and innovative solutions to meet agency missions, solve long-standing problems, and address grand challenges facing our nation. 

Photo by Nick Fewings

Making the Most of OSHA’s Extreme Heat Rule

KEY TAKEAWAYS

This article is informed by extensive research and stakeholder engagement conducted by the Federation of American Scientists, including a comprehensive literature review and interviews with experts in the field. Much of this work informed our recent publication which can be found here


The Imperative for Infrastructure Investment

As climate change intensifies, the need for robust heat safety measures for outdoor workers has never been more pressing. The Occupational Safety and Health Administration has taken a significant step forward in protecting workers from extreme heat by proposing a new safety standard. The proposed rule aims to protect approximately 36 million workers in indoor and outdoor settings from heat-related illnesses and fatalities. As we move forward, the rule’s success hinges on substantial investments to bridge the gap between policy and practice. It is crucial to examine how the federal government can create the necessary infrastructure to support and maximize the effectiveness of this potentially groundbreaking standard.

The need for these investments is underscored by the significant economic and human costs of heat-related illnesses and fatalities. A study by the Atlantic Council estimates that extreme heat costs the U.S. economy $100 billion annually, with agricultural workers being among the most affected. Proper implementation of safety measures could potentially prevent many of these fatalities and reduce substantial economic losses.

Key Areas for Infrastructure Development to Meet OSHA’s Heat Safety Rules

The outdoor occupational sector, employing tens of millions of workers across diverse landscapes and industries, faces unique challenges in properly implementing heat safety measures. From vast open fields to enclosed processing facilities, the infrastructure needs are as varied as the sector itself. Without targeted investments, the OSHA standard risks becoming an unfunded mandate, unable to fulfill its life-saving potential.

The effective implementation of OSHA’s proposed standard requires a multifaceted approach to infrastructure development. By focusing on these key areas, we can create a robust framework that supports the standard’s goals and protects outdoor workers across diverse settings and conditions. To maximize the impact of the proposed rule, investments must be strategically directed across several key areas. It is important to note that these areas represent a broad overview and are not exhaustive– comprehensive stakeholder engagement is essential to tailor solutions to specific needs across different states, regions, industries, and employers.

Workforce

Developing a resilient and well-prepared workforce is a cornerstone of effective safety measures. Key investments in training, access to facilities, and health monitoring ensure that workers are equipped to handle extreme heat conditions, safeguarding their health and productivity.

  1. Training & Education. Developing multilingual, interactive training modules accessible to all workers is crucial. These programs must include ongoing education to ensure workers are continually updated on best practices for heat safety.
  2. Access to Infrastructure. Installing hydration stations and shaded rest areas is essential to provide necessary relief from extreme heat. These facilities enable workers to stay hydrated and take breaks, significantly reducing the risk of heat exhaustion and heat stroke.
  3. Personal Protective Equipment. Providing cooling vests, lightweight clothing, and sunscreen to protect workers from heat stress is another critical component. PPE must be tailored to the specific needs of workers, offering protection without hindering productivity.
  4. Health Insurance. Ensuring workers have access to adequate health insurance is crucial, particularly for those in rural and underserved areas. This includes addressing the unique challenges faced by workers with complex immigration statuses, who may be hesitant to seek medical care or face barriers in obtaining insurance coverage.
  5. Awareness. Implementing acclimatization programs and regular health screenings can help monitor workers’ health and identify early signs of heat stress. This includes educating workers about recognizing early signs of heat stress in themselves and colleagues, and understanding the importance of gradual adaptation to hot working conditions. 
  6. Migrant Worker Vulnerabilities. Undocumented workers face unique challenges in accessing heat safety protections, such as fear of retaliation for reporting unsafe conditions, which can lead to underreporting of incidents. This vulnerability highlights the need for stronger protections and outreach strategies specifically tailored to this population.

Employer & Industry

Employers and industries play a critical role in implementing heat safety standards. By investing in infrastructure, regulatory compliance, and technological innovations, they can create safer working environments and ensure the sustainability of their operations.

  1. Financial Assistance. Offering grants, subsidies, and tax incentives can support employers in implementing necessary safety measures. Financial support can alleviate the burden on small and medium-sized enterprises, ensuring that all employers can invest in heat safety infrastructure.
  2. Physical Infrastructure. Employers must invest in the necessary infrastructure, including hydration stations, shaded rest areas, and cooling systems. These investments are essential for creating a safe working environment and ensuring compliance with the proposed standards.
  3. Regulatory Compliance Support. Developing clear guidelines and compliance tools can help employers adhere to the new standards. Providing technical assistance and resources for compliance can simplify the process and encourage widespread adoption of safety measures .
  4. Technology & Innovation. Utilizing weather monitoring systems, wearable heat sensors, and mobile health applications can enhance worker safety. These technologies enable real-time tracking of heat exposure and facilitate timely interventions, reducing the risk of heat-related illnesses.
  5. Rural Infrastructure. Many agricultural operations are in rural areas with limited resources and infrastructure. This includes a lack of nearby healthcare facilities, making it difficult to quickly respond to heat-related illnesses in the workplace. Investments in rural infrastructure and targeted support can address these limitations.

Regulatory Agencies

Regulatory agencies are essential in enforcing heat safety standards. Increased resources, staffing, and technical expertise, along with robust data collection and public outreach, are necessary to support compliance and drive continuous improvement in safety measures.

  1. Resources & Staffing. Adequate staffing is essential to enforce the new standards effectively. Increased financial resources would support hiring additional staff, enhance the technological capabilities for monitoring compliance, and ensure that there are adequate resources to investigate and address non-compliance.
  2. Training & Expertise. Ensuring regulatory agencies possess the necessary technical and operational expertise through ongoing training for inspectors and regulatory staff to stay updated on the latest heat safety technologies, practices, and research.
  3. Data Collection & Analysis. Developing incident reporting systems, syndromic surveillance, and integration of data with a centralized health and safety database can inform policy decisions and improve safety measures.
  4. Public Outreach & Education. Implementing awareness campaigns, community engagement initiatives, and distributing educational materials can increase awareness of safety.
  5. Research & Development. Funding for research collaborations with academic institutions and pilot programs to test new heat safety technologies and strategies is vital.
  6. Whistleblower Protections. To ensure the effectiveness of heat safety measures, it’s crucial that all workers, including undocumented workers, can report dangerous conditions without fear of retaliation. Strengthening and enforcing whistleblower protections is essential to create a culture of safety and compliance.

Healthcare

A robust healthcare infrastructure is vital to support the prevention, early detection, and treatment of heat-related illnesses among outdoor workers. Investments in medical facilities, telemedicine, emergency response systems, and healthcare worker training are crucial to providing timely and effective care. 

  1. Access to Healthcare. Strengthening access to healthcare is crucial, especially in rural and underserved areas. This involves expanding medical facilities and ensuring workers have access to qualified healthcare professionals and affordable treatment options tailored to heat-related conditions.
  2. Telemedicine Infrastructure. Developing robust telemedicine platforms enables remote consultations for workers in remote areas. This provides timely healthcare interventions without the need for extensive travel.
  3. Emergency Response Systems. Bolstering emergency response capabilities ensures that medical aid is swiftly available during critical heat-related incidents. This reduces potential health complications and improves outcomes for affected workers.
  4. Healthcare Worker Training. Training healthcare professionals in the specifics of heat-related illnesses prepares them to offer effective treatment and preventative care. This enhances the overall response to heat stress conditions and improves patient outcomes.
  5. Data Sharing & Coordination. Creating data-sharing frameworks between healthcare providers, emergency services, and public health agencies ensures a coordinated response to heat-related health issues. This enhances overall healthcare efficacy and enables better tracking and management of heat-related incidents.

Community & Advocacy Groups

Community and advocacy groups play a pivotal role in bridging the gap between policy and practice. By supporting local networks, grassroots education programs, and worker advocacy efforts, these groups can significantly enhance the effectiveness of heat safety initiatives. Their involvement ensures that programs are culturally appropriate, widely understood, and effectively implemented on the ground.

  1. Worker Education. Implementing wide-reaching education and advocacy programs helps raise awareness about heat risks. These efforts promote community-wide preventive measures and empower workers to protect themselves.
  2. Advocacy. Ensuring direct worker representation in policy discussions and implementation planning is crucial. Their firsthand experiences are invaluable in creating effective, practical safety measures that address real-world challenges.
  3. Local Heat Safety Networks. Supporting the creation of community networks ensures the distribution of heat safety resources. These networks enhance preparedness and response to heat risks at the local level.
  4. Worker Advocacy Support. Providing resources to advocacy groups enables effective representation of workers‘ safety interests. This ensures that policies are worker-centered and address the actual needs of those most affected by heat hazards.
  5. Community Resilience Planning. Collaborating with community groups to develop localized resilience strategies strengthens community preparedness against heat impacts. This approach integrates workplace safety measures with broader community resilience efforts.

The Federal Government’s Role in Facilitating Investments

Successful implementation of OSHA’s heat safety standard requires substantial federal support and coordination. The government must actively facilitate and incentivize necessary investments to create a robust heat safety infrastructure. By leveraging its resources, the federal government can catalyze nationwide improvements. Key actions include:

  1. Program Investment. Must significantly invest in funding agencies like OSHA and HHS to enhance their capacity to implement and enforce the safety program. This includes financial resources for hiring additional staff, improving technological capabilities, and offering comprehensive training and support to employers.
  2. Providing Financial Incentives. Should provide targeted grants, subsidies, and tax incentives. These financial aids will alleviate the burden on small and medium-sized enterprises, fostering widespread adoption of advanced heat safety measures.).
  3. Capacity Building. Must develop and support comprehensive educational programs and training workshops to enhance the capabilities of the workforce. This will ensure that workers are well-informed and equipped to effectively navigate and implement complex safety regulations.
  4. Public-Private Partnerships. Must encourage collaboration between the public sector and private enterprises, leveraging private innovation alongside public resources to ensure that safety solutions are comprehensive and widely accessible.
  5. Interagency Coordination. This involves pooling resources, expertise, and efforts from diverse federal agencies to support and enforce the heat safety regulations efficiently. Agencies should identify and allocate resources within their scope to contribute to a broad-based support network—ranging from funding and manpower to specific programmatic initiatives, as well as data-sharing and surveillance.
  6. Overcoming Bureaucratic Inertia. Delays and resistance within government agencies can impede the timely adoption and enforcement of new regulations. Streamlining processes and clear mandates can help overcome this inertia.

The Benefits of Investing in Heat Safety Infrastructure

Investing in heat safety infrastructure yields numerous benefits, including:

  1. Lives Saved, Improved Worker Health & Safety. Investing in proper heat safety infrastructure significantly reduces the incidence of heat-related illnesses, such as heat exhaustion and heat stroke, which can be fatal. This reduction cascades into numerous health and safety benefits:
    • The most immediate and crucial benefit is the preservation of human life and health
    • Enhances workplace safety culture
    • Reduces long-term health complications from chronic heat exposure
    • Enables better management of pre-existing health conditions exacerbated by heat
    • Improves public health outcomes in heat-vulnerable communities
    • Reduces inequality by protecting vulnerable worker populations
    • Example: a study reported a 91% decrease in heat-related illnesses following the implementation of safety measures.
  2. Economic Benefits. Heat safety investments stimulate economic growth through multiple channels, creating a positive ripple effect across businesses and communities. Key economic advantages include:
    • Increased workforce productivity and efficiency
    • Reduced absenteeism and turnover rates
    • Stimulation of local economies through infrastructure investments
    • Reduced healthcare costs for both employers and the broader healthcare system
    • Improved job satisfaction and worker morale
    • Enhanced employer reputation and ability to attract/retain talent
    • Example: the same study saw heat-related illness claims drop from 30 per 1,000 workers to zero, eliminating workers’ compensation claims entirely.
  3. Climate Resilience.  As global temperatures rise, building infrastructure to withstand extreme heat conditions becomes crucial for overall climate resilience. This proactive approach offers several strategic advantages:
    • Increases adaptability to rising global temperatures
    • Enables integration with broader climate adaptation strategies
    • Reduces energy consumption through efficient cooling methods
    • Enhances business continuity during extreme weather events
    • Reduces risk of legal liabilities and regulatory penalties
    • Enhances organizational preparedness for climate change impacts

Moving Forward

As we face the escalating challenges of climate change, the urgency to protect our workforce cannot be overstated. The proposed OSHA heat safety standard marks a crucial advancement in safeguarding our agricultural workers from the rise of extreme heat conditions. While some may express concerns about the costs and regulatory burden of these investments, it’s crucial to consider the long-term benefits. The initial expenses are outweighed by reduced healthcare costs, increased productivity, and avoided workers’ compensation claims. These measures protect businesses from potential legal liabilities and reputational damage associated with worker heat-related illnesses or fatalities. Moreover, investing in federal infrastructure to support this standard is a strategic imperative that will yield significant returns in public health, economic productivity, and climate resilience.

By thoughtfully allocating resources, the federal government can create a powerful framework for implementing and maximizing the impact of the proposed standard. The health and safety of millions of workers, particularly in high-risk sectors like agriculture, depend on our ability to create a comprehensive, well-resourced system. Every stakeholder from policymakers to industry leaders must now rise to the occasion. It is imperative that we channel collective efforts and resources before another heatwave claims more lives. The consequences of inaction are too severe to ignore.

For specific actions you can take to protect our outdoor workers, please refer to the strategies outlined in Appendix A: Call to Action Guide.


Appendix A. Call to Action Guide

This guide offers strategies for various stakeholders to support and enhance the implementation of OSHA’s heat safety rule.

For Policymakers

For Employers

For Workers and Advocacy Groups

For Healthcare Providers

For Researchers and Academic Institutions

Understanding the U.S. Bioeconomy: Agency Perspectives

The U.S. bioeconomy—defined by the National Institute of Standards and Technology (NIST) as “economic activity derived from the life sciences, particularly in the areas of biotechnology and biomanufacturing, including industries, products, services, and the workforce” and valued by some at ~$1 trillion—has been a major focus of policy development over the past few years. These policy advances include the White House Executive Order on “Advancing Biotechnology and Biomanufacturing Innovation for a Sustainable, Safe, and Secure American Bioeconomy” (Bioeconomy EO), the CHIPS & Science Act, and the Inflation Reduction Act (IRA). In March 2024, the Office of Science and Technology Policy (OSTP), announced the launch of the National Bioeconomy Board (NBB). The board will “partner across the public and private sectors to advance societal well-being, national security, sustainability, economic productivity, and competitiveness through biotechnology and biomanufacturing,” highlighting the Biden Administration’s commitment to future-proofing an economically sustainable U.S. bioeconomy. 

Despite these advances, the vast intersectionality inherent to the bioeconomy (e.g., with health, clean energy, national security, climate change, economic development) poses unique challenges for the U.S. government. This complexity makes it difficult for the various agencies to coordinate and even more difficult for the general public to understand the government’s approach to the bioeconomy. Nonetheless, to maintain the continued growth within the bioeconomy that has resulted from these policy advances, it will be imperative to clarify a strategic vision that coordinates and publicizes governmental efforts that support the burgeoning U.S. bioeconomy.

The NBB can play an important role in promoting this strategic vision. As directed by the Bioeconomy EO, the interagency through the Executive Office of the President set up the NBB to promote interagency coordination and collaboration on the bioeconomy. The NBB is co-chaired by OSTP, the Department of Commerce (DOC), and the Department of Defense (DOD), and nine other agencies make up the entirety of the board. Other agencies not represented on the NBB itself, including the Environmental Protection Agency (EPA), work with the NBB through various working groups and play an integral role. 

To understand the range of governmental priorities for the bioeconomy, the overarching strategy, the work underway, the various programs within the agencies, and the role of environmental sustainability, our team at the Federation of American Scientists (FAS) spoke with key agencies represented on the NBB to collect their perspectives.

The perspectives summarized below demonstrate that the agencies align bioeconomy-related initiatives to their varied mission areas and, through the NBB and other interagency activities, are working together to develop a shared vision. However, the summaries also show the diversity in focus that informs how agencies approach the bioeconomy. The agency views encompass the broader bioeconomy landscape, including biotechnologies from commodity fuels and agriculture to individualized therapeutics, and biomanufacturing solutions from biomass production to final product. This range highlights both the important role that each agency plays in supporting the U.S. bioeconomy as well as the challenge in coordinating their activities and programs across the federal government.

Approach

In order to collect perspectives from the agencies represented on the NBB around the U.S. bioeconomy, FAS conducted semi-structured interviews with key NBB officials from the OSTP, DOC, DOD, Department of Energy (DOE), Department of Health and Human Services (HHS), and the U.S. Department of Agriculture (USDA) from May 2024 through June 2024. With the exception of USDA, all agency interviews were conducted over Zoom and answers were documented by note-taking. All summaries have been reviewed by agency representatives to confirm consent and validity. The USDA perspective was summarized using publicly available reports and have also been confirmed for validity by an agency representative.

Perspectives from these agencies on the Bioeconomy EO deliverables, bioeconomy-related programs, coordination, goals, hurdles, and the role of environmental sustainability are summarized below. The full list of questions used in the semi-structured interviews can be found in Appendix A.

Agency Perspectives

Office of Science & Technology Policy

For OSTP’s perspective, FAS conducted a semi-structured interview with Dr. Sarah Glaven, principal assistant director for biotechnology and biomanufacturing.

The Office of Science & Technology Policy plays an important role in interagency coordination for topics, like the bioeconomy, that cut across many different agencies, and is one of the co-chairs for the NBB. In adherence with the Bioeconomy EO, OSTP has coordinated interagency efforts and published several reports on the bioeconomy: Bold Goals for U.S. Biotechnology and Biomanufacturing, Building the Bioworkforce of the Future, and Visions, Needs & Proposed Actions for Data for the Bioeconomy Initiative. They are currently working with interagency groups on several activities, including one that recently published a report, in conjunction with USDA and other agencies, that recommended revisions to the North American Industry Classification System (NAICS) and the North American Product Classification System (NAPCS) to better capture economic activity related to the bioeconomy. The creation of the NBB itself fulfills directives from both the Bioeconomy EO as well as the CHIPS & Science Act, which called on OSTP to establish a coordination office on these topics. Currently, due to a lack of funding, OSTP is not an official coordinating office but will function to coordinate activities through the NBB. 

According to OSTP, the Bioeconomy EO reflects the whole-of-government approach that will be needed to support the bioeconomy. For the near term, OSTP plans to show the value and utility of the NBB, execute policy from the Bioeconomy EO, prioritize specific actions from the resulting Bioeconomy EO reports, highlight significant investments, and produce a report on the NAICS and NAPCS codes. In the long term, OSTP hopes that the NBB will become a sustainable government entity that drives a clear national strategy to move the bioeconomy forward and enables the United States to work collaboratively with global partners. 

A key challenge is measuring the bioeconomy. It is difficult to prioritize, strategize, or advocate for additional resources in the absence of baseline economic metrics to track impact or estimate the potential return on investment. Ultimately, OSTP believes it is important to clarify the definition of the bioeconomy in order to create measurements and classifications.

A challenge for OSTP is continuity as it experiences staff turnover and administration changes. However, the NBB and coordination of the bioeconomy portfolio will be well positioned to persist, in part by relying on the NBB’s co-chairs. Also, the Bioeconomy EO allowed OSTP to create principal and assistant director positions for the bioeconomy portfolio, which can help ensure that it remains a high priority. At OSTP, this portfolio sits within the Industrial Innovation Group, which also houses coordination efforts for semiconductors and clean energy. OSTP leadership understands the importance of the bioeconomy and is keen to see the intersections of biomanufacturing with other initiatives, like the DOE’s Earth Shot programs and other clean energy initiatives. 

On the issue of environmental sustainability and the bioeconomy, OSTP highlights efforts by DOE to push for sustainable aviation fuels and USDA’s sustainable biomass supply chain framework as initiatives that are setting the pace for sustainability. There is also an opportunity to consider how biomanufacturing and biosynthesis fit into the broader sustainable chemistry landscape.

Department of Commerce

For DOC’s perspective, FAS conducted a semi-structured interview with Dr. Christopher Szakal, acting director, program coordination office at the National Institute of Standards and Technology.

The Department of Commerce is one of the NBB co-chairs. DOC is sector-agnostic and is interested in the bioeconomy as a way to support the broader economy, remain competitive, and solve broader challenges, such as those related to supply chain resilience. In response to the Bioeconomy EO, the DOC has released the bioeconomy lexicon from NIST and the Feasibility Study for measuring the bioeconomy from the Bureau of Economic Analysis. It has also participated in several interagency activities, including development of OSTP’s Bold Goals report and USDA’s Biomass Supply Chain report, as well as ongoing working groups focusing on updating systems for measuring economic activities (e.g., the NAICS and NAPCS codes) and on biological data and cybersecurity. Separate from the executive order, the Inflation Reduction Act provided significant investments for the Economic Development Administration in biotechnology-related regional technology hubs. Other ongoing activities at DOC in support of the bioeconomy include efforts to support biotechnology and biomanufacturing standards development at NIST, supply chain analyses at the International Trade Administration, work at the Bureau of Industry and Security and at the Patent and Trademark Office to ensure a safe and fair market, and the Workforce Development Strategy.

By nature, DOC keeps a broad perspective and tries to understand how the bioeconomy intersects with other parts of the economy and how technological developments may impact progress. There are important intersections of the bioeconomy with artificial intelligence (AI) and with data security, and policy development in these other areas will have implications for the bioeconomy. For example, the October 2023 Executive Order on AI called for significant new requirements for providers of synthetic nucleic acids to conduct biosecurity screening, which will have implications for biotechnology and biomanufacturing. NIST is tasked with developing standards for this new policy. The intersectional nature of the bioeconomy requires coordination both within the DOC and across the U.S. government. A key challenge is the need for sustained funding because coordination requires time and effort. 

On environmental sustainability, the DOC prioritizes the market and what U.S. companies will find profitable in both the near term and the long term. Elevating sustainability has been challenging because there is uncertainty in how sustainability is measured. Additionally, market drivers have been inconsistent relative to the level needed to address the uncertainty. DOC is looking to utilize the NBB to help provide clarity on how to achieve more consistent market forces in support of sustainability to drive growth of the bioeconomy. 

Department of Defense

For DOD’s perspective, FAS conducted a semi-structured interview with Dr. Peter Emanuel, senior research scientist, bioengineering at U.S. Army Combat Capabilities Development Command.

The Department of Defense is one of the NBB co-chairs. In September 2022 (before the Bioeconomy EO was announced), DOD announced a $1.2 billion investment in biomanufacturing. In March 2023, DOD released a Biomanufacturing Strategy, which was informed by both the National Defense Authorization Act for Fiscal Year 2023 and the Bioeconomy EO. In support of this strategy and the investments made by DOD, the Department’s Defense Production Act Investments (DPAI) Office published an open Request for Information that sought input from industry on biomanufactured products and process capabilities that could help address defense needs. Significant additional investments in biomanufacturing are likely to be forthcoming.

The bioeconomy portfolio is a tiny portion of the overall programmatic budget for DOD. Previously, the DOD’s interest in biology and biotechnology was limited to military medicine and chemical and biological defense, but the department is increasingly focused on nonmedical biomanufacturing applications and believes that they will be key for ensuring national security. The department also acknowledges the importance of workforce development and the need for standardization and infrastructure for the bioeconomy and strongly supports these areas. This commitment can be seen with DOD’s large investments in 2020 in BioMADE, a Manufacturing Innovation Institute focused on creating a sustainable, domestic end-to-end bioindustrial manufacturing ecosystem. 

In the future, DOD hopes to take advantage of biomanufacturing’s potential to support defense objectives beyond just medical countermeasures and other human health-related advances, such as production of bio-based materials, chemicals, and foods. However, DOD faces challenges both internally and externally in communicating the full potential of the bioeconomy and biomanufacturing for DOD. 

On environmental sustainability, DOD believes that economic and environmental sustainability for the bioeconomy go hand-in-hand. For example, a company that could make chemicals without waste would have a significant economic advantage and would support environmental sustainability. Historically, DOD has seen significant costs due to polluted sites, and so understands the value of cleaner products and processes. In addition, DOD is investing in different technologies that would valorize waste streams.

Department of Energy

For DOE’s perspective, FAS conducted a semi-structured interview with Dr. Valerie Reed, director, Bioenergy Technologies Office.

The Department of Energy has many goals for advancing the bioeconomy, with the common denominator being to decarbonize America’s transportation and fuel sectors and to build resilient clean energy for generations to come. In response to the Bioeconomy EO, the DOE contributed to the OSTP Bold Goals report and was tasked to work with other agencies to write reports on National Security Recommendations for Federal Procurement (forthcoming) and best practices for cyber security documentation. Furthermore, DOE also played a large role in an upcoming biotechnology and biomanufacturing report mandated by the Bioeconomy EO. Outside of the direct requirements from the EO, the DOE plays a crucial role in supporting industrial biotechnology through additional reports and their involvement in ongoing interagency activities. For example, the Billion Ton Report provides a comprehensive assessment of biomass availability today and how to sustainably produce more than one billion tons of biomass per year to meet the demand for sustainable aviation fuel production. 

DOE’s bioeconomy efforts are concentrated within the Bioenergy Technologies Office (BETO) and the Office of Science. BETO aims to utilize biomass for sustainable and renewable fuel and chemical production, while the Office of Science supports fundamental research that enables the bioeconomy, including synthetic biology and thermochemical conversion. Under the Inflation Reduction Act and CHIPS & Science Act, significant support was given to bioenergy solutions and clean energy demonstrations, including DOE tax incentives aimed at carbon reduction in fuel production.

In the short term, DOE is focused on prioritizing the use of biomass for Sustainable Aviation Fuel (SAF) and marine fuel production, as well as supporting renewable diesel and ethanol for medium- and heavy-duty vehicles. Long-term goals include transitioning to electrification using biomass, achieving substantial SAF production by 2035 through the SAF Grand Challenge, and scaling up the production of specific chemicals by 2035 as part of the industrial decarbonization strategy. Additionally, in coordination with the USDA, there are focused efforts to increase cultivation of purpose-grown energy crops.

One of the major hurdles the DOE currently faces, and may continue to face in the future, is ensuring sustained funding levels that support ongoing development. Currently, biomass is seen as an expensive feedstock. While the IRA provided an initial policy bridge, it is essential to establish a longer-term incentive to meet market demand, like the 40B (SAF production) and 45Z (clean fuel production) tax credits. 

On environmental sustainability, the DOE is very focused on goals for decarbonization of transportation and fuels, including replacing petroleum-based products with sustainable biomass solutions and conducting life cycle assessments (LCAs) to measure sustainability impacts throughout the supply chain. DOE created the GREET Model for LCAs, which was updated recently, to reduce ambiguity and to help standardize the process for measuring carbon emissions. Additionally, DOE’s Clean Fuels and Products Earthshot is an important cross-agency collaboration that supports accelerating bio-based fuels and chemicals production and decarbonizing both the fuel and chemical industry.

Department of Health & Human Services

For HHS’s perspective, FAS conducted a semi-structured interview with Dr. Lyric Jorgenson, associate director for science policy and the director of the Office of Science Policy at the National Institutes of Health (NIH), and Dr. Julia Limage, director, Office of Strategy, Policy, and Requirements in the Administration for Strategic Preparedness and Response (ASPR).

The Department of Health & Human Services has two representatives on the NBB, one from NIH and one from ASPR. HHS has a broad mission in support of human health, and many of its programs could be considered part of the bioeconomy. However, the Bioeconomy EO outlined a set of priorities that called for additional focus at HHS on advances specific to biotechnology and biomanufacturing, many of which were included in the OSTP Bold Goals report. The EO also tasked HHS with leading the establishment of a Biosafety and Biosecurity Innovation Initiative; a strategic plan for this initiative will be available soon. Another area of intersection of HHS and the Bioeconomy EO is on the regulatory side: the Food and Drug Administration worked with USDA and EPA to provide updates on the regulatory system as deliverables for the EO. Many of the activities related to the EO draw on interagency working groups and other ongoing activities—for example, the work toward pandemic preparedness and biodefense, as well as collaborations between NIH and NSF on health-relevant research.

In the near future, HHS will focus on advancing biotechnologies such as multi-omic medicine, gene editing, and other therapeutics tailored to individual patients. Biomanufacturing and scale-up is another key focus to increase speed and availability of key medicines. In regards to public health, the COVID pandemic highlighted the need for fast and secure biomanufacturing for vaccine production. The Biomedical Advanced Research and Development Authority (BARDA) in ASPR has made significant investments in biomanufacturing for this reason. ASPR also has an Office of Industrial Base Management and Supply Chain to support domestic biomanufacturing in case of public health emergencies.

For HHS, activities related to the bioeconomy directly and unambiguously support the department’s mission and will continue to be prioritized. A key challenge for HHS is the need for sustained funding, especially for coordination, which requires time and effort above and beyond programmatic work. To be effective, activities initiated by the Bioeconomy EO will need to be funded. Some HHS activities, including some related to biomanufacturing of medical countermeasures, were funded with COVID supplemental funding that will soon run out.

On environmental sustainability, HHS has not had any significant focus. However, there have been efforts to decrease the use of single-use plastics and equipment in research and public health activities.

United States Department of Agriculture

For USDA’s perspective, FAS gathered information from publicly available reports and documents, with guidance and direction from Herrick Fox, USDA’s bioeconomy coordinator in the Office of the Chief Economist, and Greg Jaffe, senior advisor in the Office of the Secretary.

The Bioeconomy EO tasked USDA with a wide range of deliverables, and USDA has released many related reports and products that reflect its bioeconomy-related priorities. One set of deliverables focuses on biomass and feedstocks, and supports the strategic vision outlined for agriculture in OSTP’s Bold Goals report. This includes the report on Building a Resilient Biomass Supply—A Plan to Enable the Bioeconomy in America, along with an Implementation Framework. USDA also has a long-standing focus on bio-based products, including support of the BioPreferred Program, a program created by the 2002 Farm Bill to increase the purchase of bio-based products and reauthorized in the 2018 Farm Bill. Their recent Economic Impact Analysis of the U.S. Biobased Products Industry report summarizes the status of bio-based products, an important component of the bioeconomy.

USDA also plays a central role in regulating biotechnology products, and the Bioeconomy EO called for updates to the regulatory system. In response, USDA (along with the FDA and the EPA) conducted stakeholder outreach, which is summarized in a report on Ambiguities, Gaps, and Uncertainties in Regulation of Biotechnology Under the Coordinated Framework. USDA also released a Plan for Regulatory Reform under the Coordinated Framework for the Regulation of Biotechnology and produced an updated Coordinated Framework website. Activities to improve coordination across the three major regulatory agencies are ongoing.

Unlike most federal departments and agencies, most programs and activities at USDA have a link with the life sciences, including those that support food and fiber, forests and grasslands, and other natural resources, as well as the manufacturing of numerous bio-based products and biofuels from these resources and the R&D and infrastructure that supports it. From USDA’s perspective, the department has served the bioeconomy since its founding in 1862. This broad focus provides many opportunities for strategic partnerships with other parts of the U.S. government working on the bioeconomy, and there are many different ways that USDA can contribute to the NBB.

On environmental sustainability, USDA has demonstrated its commitment to developing a circular bioeconomy, which is reflected in its Biomass Plan, and in its support for bio-based products and sustainable agriculture initiatives.

Conclusion

The agencies that make up the NBB highlight the complex nature of the U.S. bioeconomy and the various sectors that fall under it. Nevertheless, despite this complexity, the NBB is providing a whole-of-government approach to enable agencies to better support the burgeoning U.S. bioeconomy. The work underway is underpinned by the agencies’ priorities and programmatic expertise, but comes together to build the foundational base needed to support and grow the U.S. bioeconomy. Each agency also has a focus on environmental sustainability, with some, like DOE, DOD, and USDA, having a stronger focus due to their direct connections with the environment. Finally, agencies also agree on the need for more data on the bioeconomy’s impact as the different sectors evolve and the need for sustained funding to promote coordination, which takes time and effort beyond just programmatic work.


Appendix A. Interview Questions

  1. In response to the September 2022 Bioeconomy EO, your agency has produced some reports and other deliverables on the bioeconomy.
    • Are we missing any deliverables? Are there any other reports or activities that are already completed or still to come in response to the EO?
  1. Are there programs or other deliverables relevant to the bioeconomy that your agency has pursued under the Inflation Reduction Act or the CHIPS & Science Act?
  1. Are there other activities within your agency that you believe support the bioeconomy? Is the bioeconomy broader than what was captured by the EO and these other efforts?
  1. What does your agency hope to achieve in the foreseeable future and in the more distant future regarding the U.S. bioeconomy?
    • Are these goals related to the OSTP Bold Goals Report or other deliverables for the Bioeconomy EO?
    • To what extent will this progress be prioritized within your agency? How central to your agency is progress in the bioeconomy – now and into the future?
  1. What are the major hurdles your agency currently faces or may face in the future in reaching these goals?
  1. How does your agency tackle the issue of creating an environmentally sustainable bioeconomy and/or a circular bioeconomy?
    • Are there any initiatives in place currently or coming up in the near future that speak towards this?

Critical Thinking on Critical Minerals

Access to critical minerals supply chains will be crucial to the clean energy transition in the United States. Batteries for electric vehicles, in particular, will require the U.S. to consume an order of magnitude more lithium, nickel, cobalt, and graphite than it currently consumes. Currently, these materials are sourced from around the world. Mining of critical minerals is concentrated in just a few countries for each material, but is becoming increasingly geographically diverse as global demand incentivizes new exploration and development. Processing of critical minerals, however, is heavily concentrated in a single country—China—raising the risk of supply chain disruption. 

To address this, the U.S. government has signaled its desire to onshore and diversify critical minerals supply chains through key legislation, such as the Bipartisan Infrastructure Law and the Inflation Reduction Act, and trade policies. The development of new mining and processing projects entails significant costs, however, and project financiers require developers to demonstrate certainty that projects will generate profit through securing long-term offtake agreements with buyers. This is made difficult by two factors: critical minerals markets are volatile, and, without subsidies or trade protections, domestically-produced critical minerals have trouble competing against low-priced imports, making it difficult for producers and potential buyers to negotiate a mutually agreeable price (or price floor). As a result, progress in expanding the domestic critical minerals supply may not occur fast enough to catch up to the growing consumption of critical minerals.

To accelerate project financing and development, the Department of Energy (DOE) should help generate demand certainty through backstopping the offtake of processed, battery-grade critical minerals at a minimum price floor. Ideally, this would be accomplished by paying producers the difference between the market price and the price floor, allowing them to sign offtake agreements and sell their products at a competitive market price. Offtake agreements, in turn, allow developers to secure project financing and proceed at full speed with development.

While demand-side support can help address the challenges faced by individual developers, market-wide issues with price volatility and transparency require additional solutions. Currently, the pricing mechanisms available for battery-grade critical minerals are limited to either third-party price assessments with opaque sources or the market exchange traded price of imperfect proxies. Concerns have been raised about the reliability of these existing mechanisms, hindering market participation and complicating discussions on pricing. 

As the North American critical minerals industry and market develops, DOE should support the parallel development of more transparent, North American based pricing mechanisms to improve price discovery and reduce uncertainty. In the short- and medium-term, this could be accomplished through government-backed auctions, which could be combined with offtake backstop agreements. Auctions are great mechanisms for price discovery, and data from them can help improve market price assessments. In the long-term, DOE could support the creation of new market exchanges for trading critical minerals in North America. Exchange trading enables greater price transparency and provides opportunities for hedging against price volatility. 

Through this two-pronged approach, DOE would simultaneously accelerate the development of the domestic critical minerals supply chain through addressing short-term market needs, while building a more transparent and reliable marketplace for the future.

Introduction

The global transportation system is currently undergoing a transition to electric vehicles (EVs) that will fundamentally transform not only our transportation system, but also domestic manufacturing and supply chains. Demand for lithium ion batteries, the most important and expensive component of EVs, is expected to grow 600% by 2030 compared to 2023, and the U.S. currently imports a majority of its lithium batteries. To ensure a stable and successful transition to EVs, the U.S. needs to reduce its import-dependence and build out its domestic supply chain for critical minerals and battery manufacturing. 

Crucial to that will be securing access to battery-grade critical minerals. Lithium, nickel, cobalt, and graphite are the primary critical minerals used in EV batteries. All four were included in the 2023 Department of Energy (DOE) Critical Minerals List. Cobalt and graphite are considered at risk of shortage in the short-term (2020-2025), while all four materials are at risk in the medium-term (2025-2030).

As shown in Figure 1, the domestic supply chain for batteries and critical minerals consists primarily of downstream buyers like automakers and battery assemblers, though there are a growing number of battery cell manufacturers thanks to domestic sourcing requirements in the Inflation Reduction Act (IRA) incentives. The U.S. has major gaps in upstream and midstream activities—mining of critical minerals, refining/processing, and the production of active materials and battery components. These industries are concentrated globally in a small number of countries, presenting supply chain risks. By developing new domestic industries within these gaps, the federal government can help build out new, resilient clean energy supply chains. 

This report is organized into three main sections. The first section provides an overview of current global supply chains and the process of converting different raw materials into battery-grade critical minerals. The second section delves into the pricing and offtake challenges that projects face and proposes demand-side support solutions to provide the price and volume certainty necessary to obtain project financing. The final section takes a look at existing pricing mechanisms and proposes two approaches that the government can take to facilitate price discovery and transparency, with an eye towards mitigating market volatility in the long term. Given DOE’s central role in supporting the development of domestic clean energy industries, the policies proposed in this report were designed with DOE in mind as the main implementer.

Figure 1. Lithium-ion battery supply chain

Adapted from Li-BRIDGE

Segments highlighting in light blue indicated gaps in U.S. supply chains. See original graphic from Li-BRIDGE for more information.

Section 1. Understanding Critical Minerals Supply Chains

Global Critical Minerals Sources

Globally, 65% or more of processed lithium, cobalt, and graphite originates from a single country: China (Figure 2). This concentration is particularly acute for graphite, 91% of which was processed by China in 2023. This market concentration has made downstream buyers in the U.S. overly dependent on sourcing from a single country. The concentration of supply chains in any one country makes them vulnerable to disruptions within that country—whether they be natural disasters, pandemics, geopolitical conflict, or macroeconomic changes. Moreover, lithium, nickel, cobalt, and graphite are all expected to experience shortages over the next decade. In the case of future shortages, concentration in other countries puts U.S. access to critical minerals at risk. Rocky foreign relations and competition between the U.S. and China over the past few years have put further strain on this dependence. In October 2023, China announced new export controls on graphite, though it has not yet restricted supply, in response to the U.S.’s export restrictions on semiconductor chips to China and other “foreign entities of concern” (FEOC).

Expanding domestic processing of critical minerals and manufacturing of battery components can help reduce dependence on Chinese sources and ensure access to critical minerals in future shortages. However, these efforts will hurt Chinese businesses, so the U.S. will also need to anticipate additional protectionist measures from China.

On the other hand, mining of critical minerals—with the exception of graphite and rare earth elements—occurs primarily outside of China. These operations are also concentrated in a small handful of countries, shown in Figure 3. Consequently, geopolitical disruptions affecting any of those primary countries can significantly affect the price and supply of the material globally. For example, Russia is the third largest producer of nickel. In the aftermath of Russia’s invasion of Ukraine at the beginning of 2022, expectations of shortages triggered a historic short squeeze of nickel on the London Metal Exchange (LME), the primary global trading platform, significantly disrupting the global market. 
To address global supply chain concentration, new incentives and grant programs were passed in the IRA and the Bipartisan Infrastructure Law. These include the 30D clean vehicle tax credit, the 45X advanced manufacturing production credit, and the Battery Materials Processing Grants Program (see Domestic Price Premium section for further discussion). Thanks to these policies, there are now on the order of a hundred North American projects in mining, processing, and active1 material manufacturing in development. The success of these and future projects will help create new domestic sources of critical minerals and batteries to feed the EV transition in the U.S. However, success is not guaranteed. A number of challenges to investment in the critical minerals supply chain will need to be addressed first.

Battery Materials Supply Chain

Critical minerals are used to make battery electrodes. These electrodes require specific forms of critical minerals for their production processes: typically lithium hydroxide or carbonate, nickel sulfate, cobalt sulfate, and a blend of coated spherical graphite and synthetic graphite.2

Lithium hydroxide/carbonate typically comes from two sources: spodumene, a hard rock ore that is mined primarily in Australia, and lithium brine, which is primarily found in South America (Figure 3). Traditionally, lithium brine must be evaporated in large open-air pools before the lithium can be extracted, but new technologies are emerging for direct lithium extraction that significantly reduces the need for evaporation. Whereas spodumene mining and refining are typically conducted by separate entities, lithium brine operations are typically fully integrated. A third source of lithium that has yet to be put into commercial production is lithium clay. The U.S. is leading the development of projects to extract and refine lithium from clay deposits.
Lithium Hydroxide and Lithium Carbonate

Lithium hydroxide/carbonate typically comes from two sources: spodumene, a hard rock ore that is mined primarily in Australia, and lithium brine, which is primarily found in South America (Figure 3). Traditionally, lithium brine must be evaporated in large open-air pools before the lithium can be extracted, but new technologies are emerging for direct lithium extraction that significantly reduces the need for evaporation. Whereas spodumene mining and refining are typically conducted by separate entities, lithium brine operations are typically fully integrated. A third source of lithium that has yet to be put into commercial production is lithium clay. The U.S. is leading the development of projects to extract and refine lithium from clay deposits.

Nickel sulfate can be made from either nickel metal, which was historically the preferred feedstock, or directly from nickel intermediate products, such as mixed hydroxide precipitate and nickel matte, which are the feedstocks that most Chinese producers have switched to in the past few years (Figure 4). Though demand from batteries is driving much of the nickel project development in the U.S., since nickel metal has a much larger market than nickel sulfate, developers are designing their projects with the flexibility to produce either nickel metal or nickel sulfate.
Nickel Sulfate

Nickel sulfate can be made from either nickel metal, which was historically the preferred feedstock, or directly from nickel intermediate products, such as mixed hydroxide precipitate and nickel matte, which are the feedstocks that most Chinese producers have switched to in the past few years (Figure 4). Though demand from batteries is driving much of the nickel project development in the U.S., since nickel metal has a much larger market than nickel sulfate, developers are designing their projects with the flexibility to produce either nickel metal or nickel sulfate.

Cobalt is primarily produced in the Democratic Republic of the Congo from cobalt-copper ore. Cobalt can also be found in lesser amounts in nickel and other metallic ores. Cobalt concentrate is extracted from cobalt-bearing ore and then processed into cobalt hydroxide. At this point, the cobalt hydroxide can be further processed into either cobalt sulfate for batteries or cobalt metal and other chemicals for other purposes.
Cobalt Sulfate

Cobalt is primarily produced in the Democratic Republic of the Congo from cobalt-copper ore. Cobalt can also be found in lesser amounts in nickel and other metallic ores. Cobalt concentrate is extracted from cobalt-bearing ore and then processed into cobalt hydroxide. At this point, the cobalt hydroxide can be further processed into either cobalt sulfate for batteries or cobalt metal and other chemicals for other purposes.

Battery cathodes come in a variety of chemistries: lithium nickel manganese cobalt (NMC) is the most common in lithium-ion batteries thanks to its higher energy density, while lithium iron phosphate is growing in popularity for its affordability and use of more abundantly available materials, but is not as energy dense. Cathode active material (CAM) manufacturers purchase lithium hydroxide/carbonate, nickel sulfate, and cobalt sulfate and then convert them into CAM powders. These powders are then sold to battery cell manufacturers, who coat them onto copper electrodes to produce cathodes.
Cathode Active Materials

Battery cathodes come in a variety of chemistries: lithium nickel manganese cobalt (NMC) is the most common in lithium-ion batteries thanks to its higher energy density, while lithium iron phosphate is growing in popularity for its affordability and use of more abundantly available materials, but is not as energy dense. Cathode active material (CAM) manufacturers purchase lithium hydroxide/carbonate, nickel sulfate, and cobalt sulfate and then convert them into CAM powders. These powders are then sold to battery cell manufacturers, who coat them onto copper electrodes to produce cathodes.

Graphite can be synthesized from petroleum needle coke, a fossil fuel waste material, or mined from natural deposits. Natural graphite typically comes in the form of flakes and is reshaped into spherical graphite to reduce its particle size and improve its material properties. Spherical graphite is then coated with a protective layer to prevent unwanted chemical reactions when charging and discharging the battery.
Natural and Synthetic Graphite

Graphite can be synthesized from petroleum needle coke, a fossil fuel waste material, or mined from natural deposits. Natural graphite typically comes in the form of flakes and is reshaped into spherical graphite to reduce its particle size and improve its material properties. Spherical graphite is then coated with a protective layer to prevent unwanted chemical reactions when charging and discharging the battery.

The majority of battery anodes on the market are made using just graphite, so there is no intermediate step between processors and battery cell manufacturers. Producers of battery-grade synthetic graphite and coated spherical graphite sell these materials directly to cell manufacturers, who coat them onto electrodes to make anodes. These battery-grade forms of graphite are also referred to as graphite anode powder or, more generally, as anode active materials. Thus, the terms graphite processor and graphite anode manufacturer are interchangeable.
Anode Active Material

The majority of battery anodes on the market are made using just graphite, so there is no intermediate step between processors and battery cell manufacturers. Producers of battery-grade synthetic graphite and coated spherical graphite sell these materials directly to cell manufacturers, who coat them onto electrodes to make anodes. These battery-grade forms of graphite are also referred to as graphite anode powder or, more generally, as anode active materials. Thus, the terms graphite processor and graphite anode manufacturer are interchangeable.

Section 2. Building Out Domestic Production Capacity

Challenges Facing Project Developers

Offtake Agreements

Offtake agreements (a.k.a. supply agreements or contracts) are an agreement between a producer and a buyer to purchase a future product. They are a key requirement for project financing because they provide lenders and investors with the certainty that if a project is built, there will be revenue generated from sales to pay back the loan and justify the valuation of the business. The vast majority of feedstocks and battery-grade materials are sold under offtake agreements, though small amounts are also sold on the spot market in one-off transactions. Offtake agreements are made at every step of the supply chain: between miners and processors (if they’re not vertically integrated), between processors and component manufacturers; and between component manufacturers and cell manufacturers. Due to domestic automakers’ concerns about potential material shortages upstream and the desire to secure IRA incentives, many of them have also been entering into offtake agreements directly with North American miners and processors. Tesla has started constructing their own domestic lithium processing plant.

Historically, these offtake agreements were structured as fixed-price deals. However, when prices on the spot market go too high, sellers often find a way to rip up the contract, and vice versa, when spot prices go too low, buyers often find a way to get out of the contract. As a result, more and more offtake agreements for battery-grade lithium, nickel, and cobalt have become indexed to spot prices, with price floors and/or ceilings set as guardrails and adjustments for premiums and discounts based on other factors (e.g. IRA compliance, risk from a greenfield producer, etc.). 

Graphite is the one exception where buyers and suppliers have mostly stuck to fixed-price agreements. There are two main reasons for this: graphite pricing is opaque and products exhibit much more variation, complicating attempts to index the price. As a result, cell manufacturers don’t consider the available price indexes to accurately reflect the value of the specific products they are buying.

Offtake agreements for battery cells are also typically partially indexed on the price of the critical minerals used to manufacture them. In other words, a certain amount of the price per unit of battery cell is fixed in the agreement, while the rest is variable based on the index price of critical minerals at the time of transaction.

Domestic critical minerals projects face two key challenges to securing investment and offtake agreements: market volatility and a lack of price competitiveness. The price difference between materials produced domestically and those produced internationally stems from two underlying causes: the current oversupply from Chinese-owned companies and the domestic price premium. 

Market Volatility

Lithium, cobalt, and graphite have relatively low-volume markets with a small customer base compared to traditional commodities. Low-volume products experience low liquidity, meaning it can be difficult to buy or sell quickly, so slight changes in supply and demand can result in sharp price swings, creating a volatile market. Because of the higher risk and smaller market, companies and investors tend to prefer mining and processing of base metals, such as copper, which have much larger markets, resulting in underinvestment in production capacity. 

In comparison, nickel is a base metal commodity, primarily used for stainless steel production. However, due to its rapidly growing use in battery production, its price has become increasingly linked to other battery materials, resulting in greater volatility than other base metals. Moreover, the short squeeze in 2022 forced LME to suspend trading and cancel transactions for the first time in three decades. As a result, trust in the price of nickel on LME faltered, many market participants dropped out, and volatility grew due to low trading volumes.

For all four of these materials, prices reached record highs in 2022 and subsequently crashed in 2023 (Figure 4). Nickel, cobalt, and graphite experienced price declines of 30-45%, while lithium prices dropped by an enormous 75%. As discussed above, market volatility discourages investment into critical minerals production capacity. The current low prices have caused some domestic projects to be paused or canceled. For example, Jervois halted operation of its Idaho cobalt mine in March 2023 due to cobalt prices dropping below its operating costs. In January 2024, lithium giant Albemarle announced that it was delaying plans to begin construction on a new South Carolina lithium hydroxide processing plant.

Retrospective analysis suggests that mining companies, battery investors, and automakers had all made overly optimistic demand projections and ramped up their production a bit too fast. These projections assumed that EV demand would keep growing as fast as it did immediately after the pandemic and that China’s lifting of pandemic restrictions would unlock even faster growth in the largest EV market. Instead, China, which makes up over 60% of the EV market, emerged into an economic downturn, and global demand elsewhere didn’t grow quite as fast as projected, as backlogs built up during the pandemic were cleared. (It is important to note that the EV market is still growing at significant rates—global EV sales increased by 35% from 2022 to 2023—just not as fast as companies had wished.) Consequently, supply has temporarily outpaced demand. Midstream and upstream companies stopped receiving new purchase orders while automakers worked through their stock build-up. Prices fell rapidly as a result and are now bottoming out. Some companies are waiting for prices to recover before they restart construction and operation of existing projects or invest in expanding production further. 

While companies are responding to short-term market signals, the U.S. government needs to act in anticipation of long-term demand growth outpacing current planned capacity. Price volatility in critical minerals markets will need to be addressed to ensure that companies and financiers continue investing in expanding production capacity. Otherwise, demand projections suggest that the supply chain will experience new shortages later this decade. 

Oversupply

The current oversupply of critical minerals has been exacerbated by below market-rate financing and subsidies from the Chinese government. Many of these policies began in 2009, incentivizing a wave of investment not just in China, but also in mineral-rich countries. These subsidies played a large role in the 2010s in building out nascent battery critical minerals supply chains. Now, however, they are causing overproduction from Chinese-owned companies, which threatens to push out competitors from other countries.

Overproduction begins with mining. Chinese companies are the primary financial backers for 80% of both the Democratic Republic of the Congo’s cobalt mines and Indonesia’s nickel mines. Chinese companies have also expanded their reach in lithium, buying half of all the lithium mines offered for sale since 2018, in addition to domestically mining 18% of global lithium.  For graphite, 82% of natural graphite was mined directly in China in 2023, and nearly all natural and synthetic graphite is processed in China.

After the price crash in 2023, while other companies pulled back their production volume significantly, Chinese-owned companies pulled back much less and in some cases continued to expand their production, generating an oversupply of lithium, cobalt, nickel, and natural and synthetic graphite. Government policies enabled these decisions by making it financially viable for Chinese companies to sell materials at low prices that would otherwise be unsustainable. 

Domestic Price Premium (and Current Policies Addressing It) 

Domestically-produced critical minerals and battery electrode active materials come with a higher cost of production over imported materials due to higher wages and stricter environmental regulations in the U.S. The IRA’s new 30D and 45X tax credit and upcoming section 301 tariffs help address this problem by creating financial incentives for using domestically produced materials, allowing them to compete on a more even playing field with imported materials. 

The 30D New Clean Vehicle Tax Credit provides up to $7,500 per EV purchased, but it requires eligible EVs to be manufactured from critical minerals and battery components that are FEOC-compliant, meaning they cannot be sourced from companies with relationships to China, North Korea, Russia, and Iran. It also requires that an increasing percentage of critical minerals used to make the EV batteries be extracted or processed in the U.S. or a Free Trade Agreement country. These two requirements apply to lithium, nickel, cobalt, and graphite. For graphite, however, since nearly all processing occurs in China and there is currently no domestic supply, the US Treasury has chosen to exempt it from the 30D tax credit’s FEOC and domestic sourcing requirements until 2027 to give automakers time to develop alternate supply chains.

The 45X Advanced Manufacturing Production Tax Credit subsidizes 10% of the production cost for each unit of critical minerals processed. The Internal Revenue Service’s proposed regulations for this tax credit interprets the legislation for 45X as applying only to the value-added production cost, meaning that the cost of purchasing raw materials and processing chemicals is not included in the covered production costs. This limits the amount of subsidy that will be provided to processors. The strength of 45X, though, is that unlike the 30D tax credit, there is no sunset clause for critical minerals, providing a long term guarantee of support. 

In terms of tariffs, the Biden administration announced in May 2024, a new set of section 301 tariffs on Chinese products, including EVs, batteries, battery components, and critical minerals. The critical minerals tariffs include a 25% tariff on cobalt ores and concentrates that will go into effect in 2024 and a 25% tariff on natural flake graphite that will go into effect in 2026. In addition, there are preexisting 25% tariffs in section 301 for natural and synthetic graphite anode powder. These tariffs were previously waived to give automakers time to diversify their supply chains, but the U.S. Trade Representative (USTR) announced in May 2024 that the exemptions would expire for good on June 14th, 2024, citing the lack of progress from automakers as a reason for not extending them.

Current State of Supply Chain Development

For lithium, despite market volatility, offtake demand for existing domestic projects has remained strong thanks to IRA incentives. Based on industry conversations, many of the projects that are developed enough to make offtake agreements have either signed away their full output capacity or are actively in the process of negotiating agreements. Strong demand combined with tax incentives has enabled producers to negotiate offtake agreements that guarantee a price floor at or above their capital and operating costs. Lithium is the only material for which the current planned mining and processing capacity for North America is expected to meet demand from planned U.S. gigafactories.

Graphite project developers report that the 25% tariff coming into force will be sufficient to close the price gap between domestically produced materials and imported materials, enabling them to secure offtake agreements at a sustainable price. Furthermore, the Internal Revenue Service will require 30D tax credit recipients to submit period reports on progress that they are making on sourcing graphite outside of China. If automakers take these reports and the 2027 exemption deadline seriously, there will be even more motivation to work with domestic graphite producers. However, the current planned production capacity for North America still falls significantly short of demand from planned U.S. battery gigafactories. Processing capacity is the bottleneck for production output, so there is room for additional investment in processing capacity.

Pricing has been a challenge for cobalt though. Jervois briefly opened the only primary cobalt mine in the U.S. before shutting down a few months later due to the price crash. Jervois has said that as soon as prices for standard-grade cobalt rise above $20/pound, they will be able to reopen the mine, but that has yet to happen. Moreover, the real bottleneck is in cobalt processing, which has attracted less attention and investment than other critical minerals in the U.S. There are currently no cobalt sulfate refineries in North America; only one or two are in development in the U.S. and a few more in Canada.3

Nickel sulfate is also facing pricing challenges, and, similar to cobalt, there is an insufficient amount of nickel sulfate processing capacity being developed domestically. There is one processing plant being developed in the U.S. that will be able to produce either nickel metal or nickel sulfate and a few more nickel sulfate refineries being developed in Canada.

Policy Solutions to Support the Development of Processing Capacity

The U.S. government should prioritize the expansion of processing capacity for lithium, graphite, cobalt, and nickel. Demand from domestic battery manufacturing is expected to outpace the current planned capacity for all of these materials, and processing capacity is the key bottleneck in the supply chain. Tariffs and tax incentives have resulted in favorable pricing for lithium and graphite project developers, but cobalt and nickel processing has gotten less support and attention. 

DOE should provide demand-side support for processed, battery-grade critical minerals to accelerate the development of processing capacity and address cobalt and nickel pricing needs. The Office of Manufacturing and Energy Supply Chains (MESC) within DOE would be the ideal entity to administer such a program, given its mandate to address vulnerabilities in U.S. energy supply chains. In the immediate term, funding could come from MESC’s Battery Materials Processing Grants program, which has roughly $1.9B in remaining, uncommitted funds. Below we propose a few demand-support mechanisms that MESC could consider.

Long term, the Bipartisan Policy Center proposes that Congress establish and appropriate funding for a new government corporation that would take on the responsibility of administering demand-support mechanisms as necessary to mitigate volume and price uncertainty and ensure that domestic processing capacity grows to sufficiently meet critical minerals needs.

Offtake Backstops

Offtake backstops would commit MESC to guaranteeing the purchase of a specific amount of materials at a minimum negotiated price if producers are unable to find buyers at that price. This essentially creates a price floor for specific producers while also providing a volume guarantee. Offtake backstops help derisk project development and enable developers to access project financing. Backstop agreements should be made for at least the first five years of a plant’s operations, similar to a regular offtake agreement. Ideally, MESC should prioritize funding for critical minerals with the largest expected shortages based on current planned capacity—i.e., nickel, cobalt, and graphite.

There are two primary ways that DOE could implement offtake backstops:

First. The simplest approach would be for DOE to pay processors the difference between the spot price index (adjusted for premiums and discounts) and the pre-negotiated price floor for each unit of material, similar to how a pay-for-difference or one-sided contract-for-difference would work.4 This would enable processors to sign offtake agreements with no price floor, accelerating negotiations and thus the pace of project development. Processors could also choose to keep some of their output capacity uncommitted so that they can sell their products on the spot market without worrying about prices collapsing in the future.

A more limited form of this could look like DOE subsidizing the price floor for specific offtake agreements between a processor and a buyer. This type of intervention requires a bit more preliminary work from processors, since they would have to identify and bring a buyer to the table before applying for support.

Second. Purchasing the actual materials would be a more complex route for DOE to take, since the agency would have to be ready to receive delivery of the materials. The agency could do this by either setting up a system of warehouses suitable for storing battery-grade critical minerals or using “virtual warehousing,” as proposed by the Bipartisan Policy Center. An actual warehousing system could be set up by contracting with existing U.S. warehouses, such as those in LME and CME’s networks, to expand or upgrade their facilities to store critical minerals. These warehouses could also be made available for companies’ to store their private stockpiles, increasing the utility of the warehousing system and justifying the cost of setting it up. Virtual warehousing would entail DOE paying producers to store materials on-site at their processing plants. 

The physical reserve provides an additional opportunity for DOE to address market volatility by choosing when it sells materials from the reserve. For example, DOE could pause sales of a material when there is an oversupply on the market and prices dip or ramp up sales when there is a shortage and prices spike. However, this can only be used to address short-term fluctuations in supply and demand (e.g. a few months to a few years at most), since these chemicals have limited shelf lives. 

A third way to implement offtake backstops that would also support price discovery and transparency is discussed in Section 3. 


Section 3. Creating Stable and Transparent Markets

Concerns about Pricing Mechanisms

Market volatility in critical minerals markets has raised concerns about just how reliable the current pricing mechanisms for these markets are. There are two main ways that prices in a market are determined: third-party price assessments and market exchanges. A third approach that has attracted renewed attention this year is auctions. Below, we walk through these three approaches and propose potential solutions for addressing challenges in price discovery and transparency. 

Index Pricing

Price reporting agencies like Fastmarkets and Benchmark Mineral Intelligence offer subscription services to help market participants assess the price of commodities in a region. These agencies develop rosters of companies for each commodity, who regularly contribute information on transaction prices. That intel is then used to generate price indexes. Fastmarkets and Benchmark’s indexes are primarily based on prices provided by large, high-volume sellers and buyers. Smaller buyers may pay higher than index prices. 

It can be hard to establish reliable price indexes in immature markets if there is an insufficient volume of transactions or if the majority of transactions are made by a small set of companies. For example, lithium processing is concentrated among a small number of companies in China and spot transactions are a minority share of the market. New entrants and smaller producers have raised concern that these companies have significant control over Asian spot prices reported by Fastmarkets and Benchmark, which are used to set offtake agreement prices, and that the price indexes are not sufficiently transparent.

Exchange Trading

Market exchanges are a key feature of mature markets that helps reduce market volatility. Market exchanges allow for a wider range of participants, improving market liquidity, and enables price discovery and transparency. Companies up and down the supply chain can use physically-delivered futures and options contracts to hedge against price volatility and gain visibility into expectations for the market’s general direction to help inform decision-making. This can help derisk the effect of market volatility on investments in new production capacity.

Of the materials we’ve discussed, nickel and cobalt metal are the only two that are physically traded on a market exchange, specifically LME. Metals make good exchange commodities due to their fungibility. Other forms of nickel and cobalt are typically priced as a percentage of the payable price for nickel and cobalt metal. LME’s nickel price is used as the global benchmark for many nickel products, while the in-warehouse price of cobalt metal in Rotterdam, Europe’s largest seaport, is used as the global benchmark for many cobalt products. These pricing relationships enable companies to use nickel and cobalt metal as proxies for hedging related materials.

After nickel trading volumes plummeted on LME in the wake of the short squeeze, doubts were raised about LME’s ability to accurately benchmark its price, sparking interest in alternative exchanges. In April 2024, UK-based Global Commodities Holdings Ltd (GCHL) launched a new trading platform for nickel metal that is only available to producers, consumers, and merchants directly involved in the physical market, excluding speculative traders. The trading platform will deliver globally “from Baltimore to Yokohama.” GCHL is using the prices on the platform to publish its own price index and is also working with Intercontinental Exchange to create cash-settled derivatives contracts. This new platform could potentially expand to other metals and critical minerals. 

In addition to LME’s troubles though, changes in the battery supply chain have led to a growing divergence between the nickel and cobalt metal traded on exchanges and the actual chemicals used to make batteries. Chinese processors who produce most of the global supply of nickel sulfate have mostly switched from nickel metal to cheaper nickel intermediate products as their primary feedstock. Consequently, market participants say that the LME exchange price for nickel metal, which is mostly driven by stainless steel, no longer reflects market conditions for the battery sector, raising the need for new tradeable contracts and pricing mechanisms. For the cobalt industry, 75% of demand comes from batteries, which use cobalt sulfate. Cobalt metal makes up only 18% of the market, of which only 10-15% is traded on the spot market. As a result, cobalt chemicals producers have transitioned away from using the metal reference price towards fixed-prices or cobalt sulfate payables. 

These trends motivate the development of new exchange contracts for physically trading nickel and cobalt chemicals that can enable price discovery separate from the metals markets. There is also a need to develop exchange contracts for materials like lithium and graphite with immature markets that exhibit significant volatility. 

However, exchange trading of these materials is complicated by their nature as specialty chemicals: they have limited shelf lives and more complex storage requirements, unlike metal commodities. Lithium and graphite products also exhibit significant variations that affect how buyers can use them. For example, depending on the types and level of impurities in lithium hydroxide/carbonate, manufacturers of cathode active materials may need to conduct different chemical processes to remove them. Offtakers may also require that products meet additional specifications based on the characteristics they need for their CAM and battery chemistries.

For these reasons, major exchanges like LME, the Chicago Mercantile Exchange (CME), and the Singapore Exchange (SGX) have instead chosen to launch cash-settled contracts for lithium hydroxide/carbonate and cobalt hydroxide that allow for financial trading, but require buyers and sellers to arrange physical delivery separately from the exchange. Large firms have begun to participate increasingly in these derivatives markets to hedge against market volatility, but the lack of physical settlement limits their utility to producers who still need to physically deliver their products in order to make a profit. Nevertheless, CME’s contracts for lithium and cobalt have seen significant growth in transaction volume. LME, CME, and SGX all use Fastmarkets’ price indexes as the basis for their cash-settled contracts. 

As regional industries mature and products become more standardized, these exchanges may begin to add physically settled contracts for battery-grade critical minerals. For example, the Guangzhou Futures Exchange (GFEX) in China, where the vast majority of lithium refining currently occurs, began offering physically settled contracts for lithium carbonate in August 2023. Though the exchange exhibited significant volatility in its first few months, raising concerns, the first round of physical deliveries in January 2024 occurred successfully, and trading volumes have been substantial this year. Access to GFEX is currently limited to Chinese entities and their affiliates, but another trading platform could come to do the same for North America over the next few decades as lithium production volume grows and a spot market emerges. Abaxx Exchange, a Singapore-based startup, has also launched a physically settled futures contract for nickel sulfate with delivery points in Singapore and Rotterdam. A North American delivery point could be added as the North American supply chain matures. 

No market exchange for graphite currently exists, since products in the industry vary even greater than other materials. Even the currently available price indexes are not seen as sufficiently robust for offtake pricing. 

Auctions

In the absence of a globally accessible market exchange for lithium and concerns about the transparency of index pricing, Albemarle, the top producer of lithium worldwide, has turned to auctions of spodumene concentrate and lithium carbonate as a means to improve market transparency and an “approach to price discovery that can lead to fair product valuation.” Albemarle’s first auction in March of spodumene concentrate in China closed at a price of $1200/ton, which was in line with spot prices reported by Asian Metal, but about 10% greater than prices provided by other price reporting agencies like Fastmarkets. Plans are in place to continue conducting regular auctions at the rate of about one per week in China and other locations like Australia. Lithium hydroxide will be auctioned as well. Auction data will be provided to Fastmarkets and other price reporting agencies to be formulated into publicly available price indexes.

Auctions are not a new concept: in 2021 and 2022, Pilbara Minerals regularly conducted auctions of spodumene on its own platform Battery Metals Exchange, helping to improve market sentiment. Now, though, the company says that most of its material is now committed to offtakers, so auctions have mostly stopped, though it did hold an auction for spodumene concentrate in March. If other lithium producers join Albemarle in conducting auctions, the data could help improve the accuracy and transparency of price indexes. Auctions could also be used to inform the pricing of other battery-grade critical minerals. 

Policy Solutions to Support Price Discovery and Transparency Across the Market

Right now, the only pricing mechanisms available to domestic project developers are spot price indexes for battery-grade critical minerals in Asia or global benchmarks for proxies like nickel and cobalt metal. Long-term, the development of new pricing mechanisms for North America will be crucial to price discovery and transparency in this new market. There are two ways that DOE could help facilitate this: one that could be implemented immediately for some materials and one that will require domestic production volume to scale up first.

First. Government-Backed Auctions: Auctions require project developers to keep a portion of their expected output uncommitted to any offtakers. However, there is a risk that future auctions won’t generate a price sufficient to offset capital and operating expenses, so processors are unlikely to do this on their own, especially for their first domestic project. MESC could address this by providing a backstop guarantee for the portion of a producer’s output that they commit to regularly auctioning for a set timespan. If, in the future, auctions are unable to generate a price above a pre-negotiated price floor, then DOE would pay sellers the difference between the highest auction price and the price floor for each unit sold. Such an agreement could be made using DOE’s Other Transaction Authority. DOE could separately contract with a platform such as MetalsHub to conduct the auction. 

Government-backed auctions would enable the discovery of a true North American price for different battery-grade critical minerals and the raw materials used to make them, generating a useful comparison point with Asian spot prices. Such a scheme would also help address developers’ price and demand needs for project financing. These backstop-auction agreements could be complementary to the other types of backstop agreements proposed earlier and potentially more appealing than physically offtaking materials since the government would not have to receive delivery of the materials and there would be a built-in mechanism to sell the materials to an appropriate buyer. If successful, companies could continue to conduct auctions independently after the agreements expire.

Second. New Benchmark Contracts: Employ America has proposed that the Loan Programs Office (LPO) could use Section 1703 to guarantee lending to a market exchange to develop new, physically settled benchmark contracts for battery-grade critical minerals. The development of new contracts should include producers in the entire North American region. Canada also has a significant number of mines and processing plants in development. Including those projects would increase the number of participants, market volume, and liquidity of new benchmark contracts.

In order for auctions or new benchmark contracts to operate successfully, three prerequisites must be met:

  1. There must be a sufficient volume of materials available for sale (i.e. production output that is not committed to an offtaker).
  2. There must be sufficient product standardization in the industry such that materials produced by different companies can be used interchangeably by a significant number of buyers.
  3. There must be a sufficient volume of demand from buyers, brokers, and traders.

Market exchanges typically conduct research into stakeholders to understand whether or not the market is mature enough to meet these requirements before they launch a new contract. Interest from buyers and sellers must indicate that there would be sufficient trading volume for the exchange to make a profit greater than the cost of setting up the new contract. A loan from LPO under Section 1703 can help offset some of those upfront costs and potentially make it worthwhile for an exchange to launch a new contract in a less mature market than they typically would. 

Government-backed auctions, on the other hand, solve the first prerequisite by offering guarantees to producers for keeping a portion of their production output uncommitted. Product standardization can also be less stringent, since each producer can hold separate auctions, with varying material specifications, unlike market exchanges where there must be a single set of product standards.

Given current market conditions, no battery-grade critical minerals can meet the above prerequisites for new benchmark contracts, primarily due to a lack of available volume, though there are also issues with product standardization for certain materials. However, nickel, cobalt, lithium, and graphite could be good candidates for government-backed auctions. DOE should start engaging with project developers that have yet to fully commit their output to offtakers and gauge their interest in backstop-auction agreements. 

Nickel and Cobalt

As discussed prior, there are only a handful of nickel and cobalt sulfate refineries currently being developed in North America, making it difficult to establish a benchmark contract for North America. None of the project developers have yet signed offtake agreements covering their full production capacity, so backstop-auction agreements could be appealing to project developers and their investors. Given that more than half of the projects in development are located in Canada, MESC and DOE’s Office of International Affairs should collaborate with the Canadian government in designing and implementing government-backed auctions. 

Lithium

Domestic companies have expressed interest in establishing North American-based spot markets and price indexes for lithium hydroxide and carbonate, but say that it will take quite a few years before production volume is large enough to warrant that. Product variation has also been a concern from lithium processors when the idea of a market exchange or public auction has been raised. Lessons could be learned from the GFEX battery-grade lithium carbonate contracts. GFEX set standards on the purity, moisture, loss on ignition, and maximum content of different impurities. Some Chinese companies were able to meet these standards, while others were not, preventing them from participating in the futures market or requiring them to trade their materials as lower-purity industrial-grade lithium carbonate, which sells for a discounted price. Other companies producing lithium of much higher quality than the GFEX standards, opted to continue selling on the spot market because they could charge a premium on the standard price. Despite some companies choosing not to participate, trading volumes on GFEX have been substantial, and the exchange was able to weather through initial concerns of a short squeeze, suggesting that challenges with product variation can be overcome through standardization.

Analysts have proposed that spodumene could be a better candidate for exchange trading, since it is fungible and does not have the limited shelf-life or storage requirements of lithium salts. 60% of global lithium comes from spodumene, and the U.S. has some of the largest spodumene deposits in the world, so spodumene would be a good proxy for lithium salts in North America. However, the two domestic developers of spodumene mines are planning to construct processing plants to convert the spodumene into battery-grade lithium on-site. Similarly, the two Canadian mines that currently produce spodumene are also planning to build their own processing plants. These vertical integration plans mean that there is unlikely to be large amounts of spodumene available for sale on a market exchange in the near future.

DOE could, however, work with miners and processors to sign backstop-auction agreements for smaller amounts of lithium hydroxide/carbonate and spodumene that they have yet to commit to offtakers. This may be especially appealing to companies that have announced delays to project development due to current low market prices and help derisk bringing timelines forward. Interest in these future auctions could also help gauge the potential for developing new benchmark contracts for lithium hydroxide/carbonate further down the line.

Graphite

Natural and synthetic graphite anode material products currently exhibit a great range of variation and insufficient product standardization, so a market exchange would not be viable at the moment. As the domestic graphite industry develops, DOE should work with graphite anode material producers and battery manufacturers to understand the types and degree of variations that exist across products and discuss avenues towards product standardization. Government-backed auctions could be a smaller-scale way to test the viability of product standards developed from that process, perhaps using several tiers or categories to group products. Natural and synthetic graphite would have to be treated separately, of course. 

Conclusion

The current global critical minerals supply chain partially reflects the results of over a decade of focused, industrial policies implemented by the Chinese government. If the U.S. wants to lead the clean energy transition, critical minerals will also need to become a cornerstone of U.S. industrial policy. Developing a robust North American critical minerals industry would bolster U.S. energy security and independence and ensure a smooth energy transition. 

Promising progress has already been made in lithium, with planned processing capacity expected to meet demand from future battery manufacturing. However, market and pricing challenges remain for battery-grade nickel, cobalt, and graphite, which will fall far short of future demand without additional intervention. This report proposes that DOE take a two-pronged approach to supporting the critical minerals industry through offtake backstops, which address project developers’ current pricing dilemmas, and the development of more reliable and transparent pricing mechanisms such as government-backed auctions, which will set up markets for the future.

While the solutions proposed in this report focus on DOE as the primary implementer, Congress also has a role to play in authorizing and appropriating new funding necessary to execute a cohesive industrial strategy on critical minerals . The policies proposed in this report can also be applied to other critical minerals crucial for the energy transition and our national security. Similar analysis of other critical minerals markets and end uses should be conducted to understand how these solutions can be tailored to those industry needs. 

Building a Whole-of-Government Strategy to Address Extreme Heat

Comprehensive recommendations from +85 experts to enable a heat-resilient nation

From August 2023 to March 2024, the Federation of American Scientists (FAS) talked with +85 experts to source 20 high-demand opportunity areas for ready policy innovation and 65 policy ideas. In response, FAS recruited 33 authors to work on +18 policy memos through our Extreme Heat Policy Sprint from January 2024 to April 2024, generating an additional +100 policy recommendations to address extreme heat. Our experts’ full recommendations can be found here. In total, FAS has collected +165 recommendations for 34 offices and/or agencies. Key opportunity areas are described below and link out to a set of featured recommendations. Find the 165 policy ideas developed through expert engagement here.


America is rapidly barreling towards its next hottest summer on record. While we wait for a national strategy, states, counties, and cities around the country have taken up the charge of addressing extreme heat in their communities and are experimenting on the fly. California has announced $200 million to build resilience centers that protect communities from extreme heat and has created an all-of-government action plan to address extreme heat. Arizona, New Jersey, and Maryland are all actively developing extreme heat action plans of their own. Miami-Dade County considered passing some of the strictest workplace heat rules (although the measure ultimately failed). Additionally, New York City and Los Angeles have driven cool roof adoption through funding programs and local ordinances, which can reduce energy demands, improve indoor comfort, and potentially lower local outside air temperatures.  

While state and local governments can make significant advances, national extreme heat resilience requires a “whole of government” federal approach, as it intersects health, energy, housing, homeland and national security, international relations, and many more policy domains. The federal government plays a critical role in scaling up heat resilience interventions through research and development, regulations, standards, guidance, funding sources, and other policy levers. But what are the transformational policy opportunities for action?

Sourcing Opportunities and Ideas for Policy Innovation

During Fall 2023, FAS engaged +85 experts in conversations around federal policies needed to address extreme heat. Our stakeholders included: 22 academic researchers, 33 non-profit organization leaders, 12 city and state government employees, 3 private company leaders, 2 current or former Congressional staffers, 3 National Labs leaders, and 10 current or former federal government employees. Our conversations were guided by the following four questions:

Our conversations with experts sourced 20 high-demand opportunity areas for policy innovation and 65 policy ideas. To go deeper, FAS recruited 33 authors to work on +18 policy memos through our Extreme Heat Policy Sprint, generating an additional +100 policy recommendations to address extreme heat’s impacts and build community resilience. Our policy memos from the Extreme Heat Policy Sprint, published in April 2024, provide a more comprehensive dive into many of the key policy opportunities articulated in this report. Overall, FAS’ work scoping the policy landscape, understanding the needs of key actors, identifying demand signals, and responding to these demands has generated +165 policy recommendations for 34 offices and/or agencies.

Opportunities for Extreme Heat Policy Innovation

The following 20 “opportunity areas” are not exhaustive, yet can serve as inspiration for the building blocks of a future strategic initiative.

Facilitate Government-Wide Coordination

The first opportunity is an overarching call to action: the need for a government-wide extreme heat strategic initiative. This can build upon the National Integrated Health Health Information System’s (NIHHIS) National Heat Strategy, set to release this year. This strategy would define the problems to solve, create targets and galvanizing goals, set and assign priorities for federal agencies, review available resources for financial assistance, assess regulatory and rulemaking authority where applicable, highlight legislative action, and include evaluation metrics and timeline for review, adjustment, and renewal of programs. In creating this strategy, one interviewee recommended there should be a comprehensive review of “heat exposure settings” and federal actors that can safeguard Americans in these settings: homes, workplaces, schools and childcare facilities, transit, senior living facilities, correctional facilities, and outdoor public spaces. Through scoping potential regulations, standards, guidelines, planning processes, research agendas, and financial assistance, the federal government will then be prepared to support its intergovernmental actors and communities.

Accelerate Resilient Cooling Technologies, Building Codes, and Urban Infrastructure

On average, Americans spend 90% of their time indoors, making the built environment a critical site for heat exposure mitigation. To keep cool, especially in places of the U.S. not used to extreme heat, buildings are increasingly reliant on mechanical cooling interventions. While a life-saving necessity, air conditioning (AC) consumes significant amounts of electricity, putting high demands on aging grid infrastructure during the hottest days. Excess heat from air conditioners can lead to higher outdoor temperatures and even more AC demand. Finally, ACs are useless interventions if there’s no power, an increasing risk due to growing energy poverty and grid failure. In these scenarios, our current construction is likely to widely “fail” in its ability to cool residents.

Resilient cooling strategies, like high-energy efficiency cooling systems, demand/response systems, and passive cooling interventions, need policy actions to rapidly scale for a warming world. For example, cool roofs, walls, and surfaces can keep buildings cool and less reliant on mechanical cooling, but are often not considered a part of weatherization audits and upgrades. District cooling, such as through networked geothermal, can keep entire neighborhoods cool while relying on little electricity, but is still in the demonstration project phase in the United States. Heat pumps are also still out of reach for many Americans, making it essential to design technologies that work for different housing types (i.e. affordable housing construction). Initiatives like the Department of Energy’s (DOE) Affordable Home Energy Shot can bring these technologies into reach for millions of Americans, but only if it is given sufficient financial resources. DOE’s Office of Clean Energy Demonstrations and State and Community Energy Programs FY25 budget request to strengthen heat resilience in disadvantaged communities through energy solutions could be a step towards realizing innovative heat technologies. Further, the Environmental Protection Agency’s Energy Star program can further incentivize low-power and resilient cooling technologies — if rebates are designed that take advantage of these technologies.

Thermal resilience of buildings must also be considered, for both day-to-day operations and emergency blackout scenarios. DOE can work with stakeholders to create “cool” building standards and metrics with human health and safety in mind, and integrate them into building codes like ASHREI 189.1 and 90 series. These codes are “win-wins” for building designers, creating buildings that consume far less electricity while keeping inhabitants safe from the heat. DOE can assist in conducting more demonstration projects for building strategies that ensure indoor survivability in everyday and extreme conditions. 

Intervention efficacy and applicability are still evolving for extreme heat resilience interventions at the community scale, such as cool pavements, urban greening, shading, ventilation corridors, and development regulations (i.e. solar orientation). Individual interventions and their interactions need more evidence of their costs and benefits, potential tradeoffs and maladaptations. The National Institutes of Standards and Technology works on building and urban planning standards for other natural hazards, such as their National Windstorm Impact Reduction Program (NWIRP) and their Community Resilience program, and could serve as a “technology test-bed” for heat resilience practices and advance our understanding of their effectiveness as well as how to measure and account for benefits and costs. This could be done in partnership with the National Science Foundation, which has been dedicating funding for use-inspired research and technology development for climate resilience.

Finally, the U.S. government is the largest landlord in the nation. As the General Services Administration is rapidly decarbonizing its buildings, it can also be a test site for new technologies, building designs, planning, and resilience metrics development and analysis.

Adapt Transportation to the Heat

Public transportation is a site of high exposure to extreme heat. While the Department of Transportation’s Promoting Resilient Operations for Transformative, Efficient, and Cost-saving Transportation (PROTECT) grants are for “surface transportation resilience,” multiple of our local and regional government interviewees expressed difficulty successfully applying to these grants for “cooling” infrastructure, like water fountains, shade, and air-conditioned bus shelters. DOT should make extreme heat resilience explicit in its eligibility requirements as well as review the benefit-cost analysis (BCA) formula and how it might disadvantage cool infrastructure. 

Asphalt and concrete roadways contribute to the urban heat island effect and hotter weather makes asphalt in particular more vulnerable to cracking. DOT should leverage its research and development (R&D) capabilities to develop and deploy reflective and cool materials as a part of transportation infrastructure improvements. Finally, DOT should also consider the levers available to incentivize cool surfaces and cool materials as a part of transportation construction.

Create More Heat-Resilient Schools for Sustained Learning

Higher temperatures combined with minimal to no air conditioning in older school buildings have led to an increase in the number of “heat days”, or school closures due to dangerous temperatures. Pulling children out of the classroom not only negatively impacts them, but also puts increasing strain on families that rely on schools as childcare. Even when school is in session, many students are attempting to learn in classrooms exceeding 80°F, a temperature threshold where studies have repeatedly shown that students struggle to learn and fall short of true academic performance. This is because heat reduces cognitive function and ability to concentrate – both essential to learning. Learning loss from rising heat will only compound the learning losses from the COVID-19 pandemic. The Environmental Protection Agency predicts that the total lost future income attributable to heat-related learning losses may reach $6.9 billion at 2°C (a threshold we are well on the way to meeting) and $13.4 billion at 4°C. Schools need guidance on how to deal with the heat crisis currently at hand, while being supported as they plan necessary climate adaptations needed for a hotter world. 

At a minimum, schools can be encouraged to formalize plans for school heat preparedness to protect both the health of students and safeguard their learning. No federal heat safety recommendations yet exist and thus will need to be created by the Department of Education (Ed), EPA, FEMA, the National Oceanic and Atmospheric Administration (NOAA), and others. Title I Grants, in alignment with Justice40, could then assist schools in adapting to climate change that includes researched guidance on ways to cool students indoors, outdoors, and through behavioral management. Further, school system leaders need a better system to track how schools are currently experiencing extreme heat and what strategies could be employed to respond to heat exposure (closing schools, informed behavioral interventions to manage heat exposure, green infrastructure to build resilience, etc). Federal involvement is essential for creating this tool. Finally, to address the root causes of excessive classroom heat, schools will need to transform their infrastructure through HVAC investments and improvements, greening, playground material changes and shading. HVAC costs alone are expected to be $40 billion for all U.S. schools that need infrastructure improvements. While Inflation Reduction Act (IRA) tax credits are available for updating HVAC systems, many low-wealth schools will not be able to finance the gap between the credit coverage and the true cost and will need additional financial assistance.

Make Housing and Eviction Policy More Climate-Aware and Resilient

Most of the U.S. lacks minimum cooling requirements for buildings and existence of a cooling device within the property. Adoption of the latest building energy codes, despite their previously described limitations, can still be a cost-saving and life-saving advancement according to research by the DOE. For new properties, the Federal Housing Finance Agency could require that they adhere to the latest energy codes to receive a mortgage from Government Sponsored Enterprises, which is already under consideration by Housing and Urban Development (HUD) and the U.S. Department of Agriculture (USDA) for their mortgage products. For older construction, there could be requirements for adequate cooling to exist in the property at the point of sale. 

For all property types, weatherization audits, through the Weatherization Assistance Program (WAP) and Low-Income Home Energy Assistance Program (LIHEAP), can be expanded to consider heat resilience and cooling efficiency of the property and then identify upgrades such as more efficient HVAC, building envelope improvements, cool roofs, cool walls, shade, and other infrastructure. If cooling the entire property is unfeasible or costly, homeowners could benefit from creating “Climate Safe Rooms”  which are guaranteed to be safe during a heat wave. DOE and HUD could collaborate to demonstrate climate safe rooms in affordable housing, where many residents lack access to consistent cooling.

Some housing types are more risky than others. People living in manufactured homes in Arizona were 6 to 8 times more likely to die indoors due to extreme heat. This is because of poorly functioning or completely defunct cooling systems and/or inability to pay electric bills. Manufactured home park landlords can also set a variety of rules for homeowners, including banning cooling devices like window ACs and shade systems. While states like Arizona have now passed laws making these bans illegal, there is a need for a nationwide policy for secure access to cooling. HUD does not regulate manufactured homes parks, but does finance the parks through Section 207 mortgages and could stipulate park owners must guarantee resident safety. Finally, HUD could also update the Manufactured Home Construction and Safety Standards to allow for HVAC and other cooling regulations in local building codes to apply to manufactured homes, as they do for other forms of housing, as well as require homes perform to a certain level of cooling under high heat conditions. 

Renter’s are another highly vulnerable population. Most states do not require landlords to provide cooling devices to tenants or keep housing below risky temperatures. HUD for example does not require cooling devices in public housing, although regulations exist for heating. HUD could implement similar guarantees of a “right to cool”. Evictions in the summer months are also on the rise, due to rising rents compounded with rising energy costs, putting people out in the deadly heat. Keeping people in housing should be of the utmost importance, yet implementation remains fractured across the nation. Eviction moratoriums at a national level have been challenged by the Supreme Court, which overturned the CDC’s COVID-19 moratorium.

Address Communities’ Needs for Long-Term Infrastructure Funding Support

Heat vulnerability mapping has advanced significantly in the past few years. Federal programs like the NIHHIS’s Urban Heat Island Mapping Campaigns have mapped +60 communities in the United States that have guided city policy. The Census’ new product, Community Resilience Estimates (CRE) for Heat, assesses vulnerability at the level of individuals and households. Finally, researchers and non-profit organizations have been developing tools that can assess risk and also aid in individual or local decision-making, such as the Climate Health and Risk Tool and Heat Factor

Advancements in our understanding of heat’s impacts and potential interventions have not translated to sustained resources to support transformative infrastructure development. As one interviewee put it “communities that have mapped their urban heat islands are still waiting on funding opportunities to build relevant infrastructure projects”. Federal grants for mitigation and resilience may or may not consider heat resilience projects “cost-effective” and aligned with grant-making objectives, leading to rejection. 

FEMA’s Hazard Mitigation Grants (HMGP), made available only after a federally-declared disaster, can only be used for extreme heat in specific circumstances and recommends that cost-effective heat mitigation projects will also “reduce risks of other hazards”. Another example, FEMA’s BRIC grant has rejected cooling centers, HVAC upgrades, and weatherization activities, all strategies with some benefit to preventing morbidity and mortality. Green infrastructure projects, with co-benefits such as flood mitigation, have been more successful, often because the BCA is based on the property-damaging hazard, flooding. Only one FEMA BRIC project has been funded with heat as the main hazard, an urban greening project in Portland, Oregon. This unknown regarding grant success can lead to communities not applying with a heat-focused project, when time could be better spent securing grants for other community priorities. FEMA’s announcement that it will fund net-zero projects, including passive heating and cooling, through its HMGP and BRIC programs and Public Assistance could shift the paradigm, yet communities will likely need more guidance and technical assistance to execute these projects.
To invest in resilience to the growing risk of heat, policymakers will need to create a dedicated and reliable funding resource. Federal stakeholders can look to the states for models. California’s Integrated Climate Adaptation and Resiliency Program’s Extreme Heat and Community Resilience grants are currently slated to allocate $118 million to 20-40 communities for planning and implementation grants over three rounds. To start, FEMA could replicate this program, similar to its specific programs for wildfires, providing $50,000 to $5 million to a wide range of heat resilience projects, and make it eligible for joint funding through BRIC. DOE’s $105 million FY25 budget request for a program for planning, development, and demonstration of community-scale solutions to mitigate extreme heat in low-income communities is a step in the right direction. If funded, the program would benefit from coordinating with FEMA’s BRIC program on high-impact solutions.

Set Indoor and Outdoor Temperature Standards and Workplace Protections to Protect Human Health

Our understanding of when heat becomes risky to human health and impacts daily governance is still in development. Our interviewees shared that there is not yet consensus or agreement on the lower threshold for 1) when outdoor and indoor temperatures risks begin and 2) at what level of continued exposure should there be cause for action, such as implementing breaks for workers or deploying rapid emergency cooling to residents. For workplaces, guidelines will come soon: the Occupational Health and Safety Administration (OSHA) is set to release their heat standard for indoor and outdoor workers by the end of 2024, which will advance heat safety for workers across the country. For all other settings (such as residential settings and schools), the jury is still out on a valid threshold and a regulatory mechanism to establish it.

Enforcement of standards is necessary for realizing their full potential. In preparation for a workplace heat standard, interviewees recommended the Department of Labor create an advanced Hazard Alert System for Heat (using an evolved data standard discussed in a later section) in order to better pinpoint regulatory enforcement. Small businesses will also need help to be prepared for compliance with the new standard. DOL and the Small Businesses Administration should consider setting up a navigator program for resourcing energy-efficient, worker-centric cooling strategies, leveraging IRA funds where applicable.

Build the Extreme Heat Resilience Workforce

Extreme heat is not just a challenge to worker health, it’s also a challenge to workforce ability and capacity. As heat becomes a threat to the entire nation, many fields are needing to rapidly adapt to entirely new knowledge bases. For example, much of the health workforce, doctors, nurses, public health workers, receive little to no education on climate change and climate’s health impacts. Programs are beginning to crop up, such as Harvard’s C-Change Program, yet will need support to scale. With the federal government being the nation’s largest single source funder of graduate medical education, there are many levers at their disposal to develop, incentivize, and even require climate and health education. The U.S. Public Health Commissioned Corps is another program that could mobilize a climate-aware health workforce, placing professionals with a deep awareness of climate change’s impact on health in local communities.

The weatherization and decarbonization workforce must also be made aware and ready for heat’s growing impacts and emerging strategies to build building and community-scale resilience. While promising strategies exist for heat mitigation, such as cool walls and roofs, these interventions are largely not considered during weatherization audits and energy efficiency audits. Tax credits that have been created by the IRA/BIL could be used for interventions for passive or low-energy cooling, yet a lack of clarity prevents their uptake and implementation. For example, EPA’s EnergyStar program used to certify roofing products before the program sunsetted in 2022. Stakeholders at DOE and EPA should consider their role in workforce readiness for extreme heat, collaborating with third party entities to build awareness about these promising strategies.

Navigating all of the benefits of the IRA and BIL is challenging for resource-strapped communities and households. Program navigators for weatherization assistance and resilience could be an incredible asset to low-resource communities, and leverage IRA resources for technical assistance as well as the newly created American Climate Corps.

Finally, the federal government workforce is being stretched thin by the sheer number of new mandates in IRA and BIL. To meet the moment, agencies have used flexible hiring mechanisms like the Intergovernmental Personnel Act (IPAs) and for some offices its BIL and IRA connected Direct Hire Authority to make those critical talent decisions and staff their agencies. DOE, for example, has exceeded its goals – hiring over 1000 new employees to date. But not all agencies and offices have access to the Direct Hire Authority –  and it’s set to expire anywhere between 2025 (for IRA) and 2027 (for BIL). Congress should be encouraged to expand this authority, extend it beyond 2025 and 2027 respectively, and remove the limit on the number of staff allowed. Further, agencies should be encouraged to use other flexible hiring mechanisms like IPAs and other termed positions. The federal government should have the talent needed to meet its current mandates and be prepared to solve problems like extreme heat.

Build Healthcare System Preparedness

Years of underinvestment in preparedness have impacted U.S. health infrastructure’s surveillance, data collection, and workforce capacity to respond to emerging climate threats like extreme heat. The Administration for Strategic Planning and Response’s Hospital Preparedness Program, which prepares healthcare systems for emergencies, has had its budget reduced by 67% from FY 2002-FY2022, considering inflation. Further, the Center for Disease Control and Prevention (CDC) has seen a 20% budget reduction from FY 2002-2022. The CDC’s Climate Ready States and Cities Initiative can only support nine states, one city, and one county, despite 40 jurisdictions having applied. The Trust for America’s Health (TFAH) found increasing funding from $10 million to $110 million is required to support all states, and improve climate surveillance. The TFAH also found that an additional $75 million is needed to extend the CDC’s National Environmental Public Health Tracking Program, a program that tracks threats and plans interventions, to every state. Finally, the Office of Climate Change and Health Equity, the sole office within Health and Human Services solely dedicated to the intersection of climate and health, has yet to receive direct appropriations to support its work. 

Centers for Medicare and Medicaid (CMS) and the Healthcare Resources and Services Administration (HRSA) provide critical investments to healthcare facilities, operations, care provision, and the medical workforce, yet have no publicly available programs dedicated to building climate resilience in the face of rising temperatures. The Veterans Health Administration (VHA), the largest integrated healthcare system in the U.S., includes responding to heat wave exposure in its agency Climate Action Plan and has made commitments to developing biosurveillance systems that incorporate external data on air quality, temperature, heat index, and weather as well as upgrading medical center infrastructure. This is critical as 62% of VHA medical centers are exposed to extreme heat and the VHA sees a rise in heat-related illness in the Veteran population. Given its sheer size, systems changes like this made by the VHA can drive real change in healthcare practice. 

To build resilience to extreme heat within healthcare systems, our interviews and literature review highlighted that these three actions are most critical: 1) increasing surveillance and tracking of heat-related illness through improvements to medical diagnosis and coding practices and technological systems (i.e. EHRs); 2) leveraging healthcare financing for preventative treatments (i.e. cooling devices), incentives for climate-change preparedness, accurate coding and treatment, and quality care delivery (CQIs), and requirements for accreditation and reimbursements; and 3) fostering capacity-building through grants, technical assistance, planning support and guidance, and emergency preparedness. 

Design Activation Thresholds for Public Health, Medical, and Emergency Responses

Despite the fact that extreme heat events have overwhelmed local capacity and triggered local disaster declarations, heat is not explicitly required in healthcare preparedness efforts authorized under the Pandemics and All Hazards Preparedness Act (PAHPA), insufficiently included or not included at all in local and state hazard mitigation plans required by FEMA, and there has yet to be a federal disaster declaration for heat. This all inhibits the deployment of federal resources to mitigation, planning, and response that states and local jurisdictions rely on for other hazards. Our interviewees recommended that there needs to be better “activation thresholds” for heat i.e. markers that the hazard has reached a level of impact that needs additional capacity and resources. Most thresholds set right now just rely on high-temperatures, not the risk factors that exacerbate the impacts of heat. Data inputs into these locally-relevant thresholds can include wet-bulb globe temperature (which accounts for humidity), heat stress risk, level of acclimatization, nighttime temperatures, building conditions and cooling device uptake, work situations, other compounding health risks like wildfire smoke, and other factors. These activation thresholds should also be designed around the most heat-vulnerable populations, such as children, the elderly, pregnant people, and those with comorbidities. 

Increased transmission of viral pathogens and pathogen spread is also a growing risk of overall hotter average temperatures that needs more attention. Increased pathogen surveillance and correlation with existing climate conditions would greatly enable U.S. pandemic and endemic disease surveillance. Finally, no program to date at the Biomedical Advanced Development and Research Authority has focused on creating climate-aware medical countermeasures and the 2022-2026 strategic plan includes no mention of climate change. 

Reduce Energy Burdens, Utility Insecurity, and Grid Insecurity

As temperatures rise, so do energy bills. Americans are facing an ever-growing burden of energy debt. 16% (20.9 million people) of U.S. households find themselves behind on their energy bills, increasing the risk of utility shut-offs due to non-payment. The Low Income Home Energy Assistance Program (LIHEAP) exists to relieve energy burdens, yet was designed primarily for heating assistance. Thus, the LIHEAP formulas advantage states with historically frigid climates. Further, most states use their LIHEAP budgets for heating first, leaving what remains for cooling assistance (or just don’t offer cooling assistance at all). As a result, nationally from 2001-2019, only 5% of energy assistance went to cooling. Finally, the LIHEAP program is massively oversubscribed, and can only service a portion of needy families. To adapt to a hotter world, LIHEAP’s budgets must increase and allocation formulas will need to be made more “cooling”-aware and equitable for hot-weather states. The FY25 presidential budget keeps LIHEAP’s funding levels at $4.1 billion, while also proposing expanding eligible activities that will draw on available resources. The National Energy Assistance Directors Association recent analysis found that this funding level could cut ~1.5 million families from the program and cut program benefits like cooling.

Another key issue is that 31 states have no policy preventing energy shut-offs during excessive heat events and even the states that have policies vary widely in their cut-off points. These cut-off policies are all set at the state level, and there is still an ongoing need to identify best practices that save lives. While the Public Utility Regulatory Policies Act of 1978 (PURPA) prohibits electric utilities from shutting off home electricity for overdue bills when doing so would be dangerous for someone’s health, it does not have explicit protections for extreme weather (hot/cold). Reforms to PURPA could be considered that require utilities to have moratoriums on energy shut-offs during extreme heat seasons.

Finally, grid resilience will become even more essential in a hotter climate. Power outages and blackouts during extreme heat events are deadly. If a blackout were to occur in Phoenix, Arizona during the summer, nearly 900,000 people would need immediate medical attention. Rising use of AC itself is a risk factor for blackouts due to increases in energy demand. The North American Electric Reliability Corporation (NERC), a regulatory organization that works to reduce risks to power grid infrastructure, issued a dire warning that two-thirds of the U.S.  are facing reliability challenges because of heatwaves. Ensuring grids are ready for the climate to come should be top priority for DOE, the Federal Emergency Management Agency (FEMA), and the Federal Energy Regulatory Commission (FERC). Given the risks to human health, the Centers for Disease Control and Prevention (CDC) should work with public health organizations to prepare for blackouts and grid failure events.

Address Critical Needs of Confined Populations Facing Heat

Confined populations, whether because of their medical status or legal status, are vulnerable to extreme heat indoors. Long-term care facilities are required by law to keep properties within 71-81℉. Yet, long-term care facilities are reporting challenges actually meeting resident’s needs in a disaster, such as a power outage, calling for a need for more coordination with CMS. 

Incarcerated populations on the other hand are not guaranteed any cooling, even as summers become more brutal. This directly leads to an increase in deaths, 45% of U.S. detention facilities saw spikes in deaths on hazardous heat days from 1982 to 2020. Despite this lack of sufficient cooling being “cruel and unusual” punishment, there has been no public activity to date from the Department of Justice to secure cooling infrastructure for federal prisons or work with state prisons to expand cooling infrastructure. The National Institute of Corrections does recommend ASHRAE 55 Thermal Environmental Conditions for Human Occupancy to corrections institutions, though this metric needs to be updated for our evolving understanding of extreme heat’s risks to human health.

Anticipate and Prevent Supply Chain Disruptions 

Hotter temperatures are changing the landscape of American and global food production. 70% of global agriculture is expected to be affected by heat stress by 2045. Recent heat waves have already killed crops and livestock en masse, leading to lower yields and even shortages for certain products – like olive oil, potatoes, coffee, rice, and fruits. Rising heat is also poised to reshape local and state economies that rely on their changing climatic capabilities to produce certain crops. Oranges, a $5 billion dollar industry for Florida, are struggling in the heat which stresses the trees and provides fertile ground for pathogens. As a result, Florida is facing its worst citrus yield since the Great Depression. A decrease in winter chill is another growing risk, as many perennial crops have adapted to certain amounts of accumulated winter chill to develop and bloom. Winter-time heat is shaking up plants’ biological clocks, decreasing quality and yield. Overall, extreme heat is impacting American household bottom lines in the short-term and long-term through heat-exacerbated earning losses and spiking food prices. 

Ensuring ongoing access to critical commodity and specialty agricultural products in a future of higher temperatures is a national security priority. Resilience of products to extreme heat could be included as a future requirement in the Federal Supplier Climate Risks and Resilience Rule that governs Federal Acquisition Regulations. Further, FAS’ work scoping the federal landscape has shown there are few federal research and development programs, financial assistance opportunities, and incentives for heat resilience, and our interviewees concurred with that assessment. The U.S. Department of Agriculture (USDA) can prepare farmers for future climate risks and hotter temperatures, ensuring consistent food production and reducing the losses and needed economic pay-outs from the USDA through crop insurance and disaster assistance. The USDA can accelerate advances in biotechnology and genetic engineering to improve heat resilience of agricultural products while also encouraging practices like shade, effective water management, and soil regeneration that build system-wide resilience. As Congress continues to consider reauthorizations and appropriations for the Farm Bill, they should consider fully funding the Agriculture Advanced Research and Development Authority to advance resilient agriculture R&D while also increasing funding to the USDA Climate Hubs to support roll-out of heat resilient practices.

Connect Drought Resilience and Heat Resilience Strategies

Hotter winters have literal downstream consequences. Warming is shrinking the snowpack that feeds rivers, leading to further groundwater reliance, straining aquifers to the brink of complete collapse. Warmer temperatures also leads to more surface water evaporating, thus leaving less to seep through the ground to replenish overstressed aquifers. Rising temperatures also mean that plants need more water, as they evapotranspirate at greater rates to keep their internal temperatures in-check. All of these factors compound the growing risk of drought facing American communities. Drought, now made worse by high heat conditions, accounts for a significant portion of annual agricultural losses. 80% of 2023 emergency disaster designations declared by the United States Department of Agriculture (USDA) were for drought and/or excessive heat. Secure access to water is an escalating catastrophe, and to address it requires a national strategy that accounts for future hotter temperatures and how they will put strain on water accounts necessary to sustain agricultural production and human habitation.

Heat and dry weather/drought also combine to make prime conditions for megawildfires. The smoke then generated by these fires compounds the health impacts of extreme heat, with research showing that concurrent effects of heat and smoke drive up the number of hospitalizations and deaths. More funding from Congress is needed to improve wildfire forecasting and threat intelligence in the era of compounding hazards.

Reform the Benefit-Costs Analysis

Benefit cost analysis (BCA) is a critical tool for guiding infrastructure investments, and yet is not set up to account for the benefits of heat mitigation investments. When the focus of the BCA is mitigating property damage and loss of life, it will discount impact’s that go beyond those damages such as economic losses, learning losses, wage losses, and healthcare costs. Research will likely be needed to generate the pre-calculated benefits of heat mitigation infrastructure, such as avoiding heat illness, death, and wage losses and preventing widespread power failures (a growing risk). Further, strategies that enhance an equitable response, articulated in the recent update to the Office of Management and Budget’s Circular A-4, need to be quantified. This could include response efforts that protect the most vulnerable populations to extreme heat, such as checking in on heat sensitive households identified by the CRE for Heat. Developing these metrics will take time, and should be done in partnership with agencies like the DOE, EPA, and CDC. Finally, FEMA’s BCA is often based on a single hazard, the one with the highest BCA ratio, making it more challenging to work on multi-hazard resilience. FEMA should develop BCA methods that allow for accounting of an infrastructure investment for community resilience to many hazards (like resilience hubs).

Create the “Plan” for How the Federal Emergency Management Agency and Others Should Respond to an Extreme Heat Disaster

Extreme heat’s extended duration, from a few days to several months, poses a significant challenge to existing disaster policy’s focus on acute events that damage property. An acute focus on infrastructure damages by FEMA has been an insurmountable barrier to all past attempts to declare extreme heat as a disaster and receive federal disaster assistance. Because in theory, FEMA can reimburse state and local governments for any disaster response effort that exceeds local resources, including heat waves. Our interviewees acknowledged that federal recognition that heat waves are disasters will only come with extending the definition of what a disaster is.

New governance models will need to be created for climate and health hazards like extreme heat, focusing on an adaptation forward, people-centered disaster response approach given the outsized impact of heat hazards on human health and economic productivity. Such a shift will challenge the federal government’s existing authorities authorized under national disaster law, the Stafford Act, which at this current moment does not consider “human damages” beyond loss of life. Thus, we do not see how existing infrastructure fails to provide critical function during these heat hazard events, such as secure learning, secure workplaces, secure municipal operations, secure healthcare delivery, and resultantly strains or exceeds local resources to respond. By quantifying more of these damages, there will then be an existing incentive to design responses that address current impacts and plan for and mitigate future impacts. 

Finally, there are highly-risky heat disasters that we need to be executing planning scenarios for, specifically an extended power outage in a city under high-heat conditions. A power outage during the summer in Phoenix would send 800,000 people to the emergency room, which would very likely overwhelm local resources and those of all surrounding jurisdictions. There is a need for a power outage during an extended heat wave to be an included planning scenario for emergency management exercises lead by state and local governments. FEMA should produce a comprehensive list of everything a city needs to be prepared for a catastrophic power outage.

Spur Insurance and Financing Innovation

While insurance is the countries’ largest industry, few insurance products and services exist in the U.S. to cover the losses from extreme heat. The U.S. Department of the Treasury recently acknowledged this lack of comprehensive insurance for extreme heat’s impacts in its comprehensive report on how climate change worsens household finances. Heat insurance for individuals could manifest in a variety of ways: security from utility cost spikes during extreme weather events, real-estate assessment and scoring for future heat-risk, “worker safety” coverage to protect wages during extremely hot days where it might be unsafe to work, protections for household items/resources lost due an extended blackout or power outage, and full coverage for healthcare expenses caused by or exacerbated by heat waves. California is currently leading the country on thinking through the role of the insurance industry in mitigating extreme heat’s impacts, and should be a model to watch by federal stakeholders to see what can be scaled and replicated across the nation.

Further, it is important that investments made today are resilient for the climate conditions of tomorrow. The Office of Management and Budget’s November 2023 memo on climate-smart infrastructure, currently being implemented, provides technical guidance on how federal financial assistance programs can and should be invested in climate resilience. A yet unexplored financial lever for climate resilience identified in our interviews is federally-backed municipal bonds. Climate change is undermining this once stable investment, as cities and local governments struggle to pay back interest due to the rising costs of addressing hazards. The municipal bond market could price climate risk when deciding on interest payments, and give beneficial rates to jurisdictions that have done a full analysis of their risks and made steps towards resilience.

Finally, there is a need to update assessments of heat risk that are used to make insurance and financial decisions. Recent research by the DOE has found that the FEMA NRI property damage data appear to be deficient and underestimate damages when compared to published values for recent U.S. extreme temperature events. To start, FEMA should consider including metrics in its NRI that characterize the building stock (i.e. by adherence to certain building codes) and its thermal comfort levels (even with cooling devices) as well as thermal resilience.

Incorporate Future Climate Projections into Planning at All Levels

Recent research has shown that cities and counties are barreling toward temperature thresholds at which it would be dangerous to operate municipal services, affecting the operations of daily life. Yet little of this future risk is accounted for in the various planning activities (for public health, emergency preparedness, grid security, transportation, urban design, etc) done by local and state governments. Our interviewees expressed that because many plans are based on historical and current risk data, there is little anticipation of the future impacts of hotter temperatures when making current planning choices. 

One example stood out around nature-based solutions (NBS): while NBS has received over a billion dollars in federal funding and is argued as an approach to mitigate extreme heat’s impacts – planners are not always considering whether the trees planted today will survive effectively in 20-30 years of warming. Reporting has shown that Southern Nevada is at risk of losing many of its shade trees due to inadequate species selection, as the trees that once thrived in this climate exceed their zones of heat tolerance. 

Changes are being made to some federally-required planning processes to require assessment of future risk. FEMA’s National Mitigation Planning Program now requires state and local governments to plan for future risks caused by climate change, land use, and population change to receive emergency disaster funds and mitigation funding. While extreme heat is a noteworthy future risk, it is not explicitly required in the new guidelines. As of April 2023, only half of U.S. states had a section dedicated to extreme heat in their Hazard Mitigation Plans.

Climate.gov, operated by NOAA, was a recommended starting place for a library of future climate files that can be brought into planning processes and resilience analysis. Technical assistance and decision-making tools that support planners in making predictive analyses based on future extreme temperature conditions can help inform the effective design of resilient transportation systems, infrastructure investments, public health activities, and grids, and ensure accurate estimations of investment cost effectiveness over the measure lifetime.

Set Standards for Data Collection and Analysis

While official CDC-reported deaths from heat, approximately 1670 in 2022, exceed those from any other natural hazard, experts widely agree this number is an undercount. True mortality is likely at a rate of 10,000 deaths a year from extreme heat under current climate conditions. Many factors compound this systematic undercount: hospitals often do not consider extreme heat in their hazard preparedness plans, there’s a lack of awareness around ICD-10 coding for heat illness, death attribution exacerbated/caused by heat is often attributed to other causes. Retraining the healthcare workforce and modernizing death counting for climate change will take time, our interviewees acknowledged. Thus, decision makers need better data and surveillance systems now to address this growing public health crisis. Excess deaths analysis could provide a proxy data point for the true number of heat deaths, and has already been employed by California to assess the impact of past heat waves. The CDC has utilized excess death methods in tracking the COVID-19 pandemic, and could apply this analysis to “climate killers” like extreme heat to inform healthcare system planning ahead of Summer 2024 (such as forecasting tools like HeatRisk). It will be critical to set a standard methodology in order to compare heat’s impacts in different communities across the United States. True mortality is also essential to enhancing the benefit-cost analysis for heat mitigation and resilience.

Our conversations also highlighted the data gaps that exist around counting worker injuries and deaths due to extreme heat. For work-related heat-health impacts, injuries or deaths are often only counted if there’s a hospital admission that is a required report, heat-exacerbated injuries (i.e. falls) aren’t often counted as heat-related, and harms off the job (i.e. long-term kidney impacts) go unnoticed. Studies estimate that California alone saw 20,000 heat injuries a year, while The U.S. Department of Labor (DOL) reports only 3400 injuries a year nationally. DOL could track how overall workplace injuries correlate with temperature to develop a methodology that would yield much more accurate numbers around true heat impacts.

Finally, anticipating the full risks of heat due to factors like existing infrastructure, social vulnerability, and levels of community resilience, remains a work in progress. For example, FEMA’s National Risk Index (which informs environmental justice tools like the Climate and Economic Justice Screening Tool and the Community Disaster Resilience Zones program) has notable limitations due to its reliance on previous weather data and narrow focus on mortality reduction, leading to underestimates of damages when compared to published values for recent U.S. extreme temperature events. There is a big opportunity to develop a standard data set for extreme heat risks and vulnerabilities in current and future anticipated climate conditions. This data set can then produce high-quality and relevant tools for community decision making (like FEMA’s Flood Maps) and inform federal screening tools and funding decisions. 

Create Regulatory Oversight Infrastructure for Extreme Heat

There are only a few regulatory levers currently in place or in the regulatory pipeline to protect Americans from the growing heat and build more heat resilient communities. These include the temperature standards for senior living facilities set by CMS and OSHA’s upcoming heat standard. There are many more common settings: homes, schools and childcare facilities, transit, correctional facilities, and outdoor public spaces where regulations are needed. There will also need to be expanded enforcement of the regulations, including better monitoring of temperatures outdoors and indoors. HUD, EPA, and NOAA should work to identify expansion opportunities to indoor and outdoor air temperature monitoring, seeking additional funding from Congress where needed

Future regulations for mitigating extreme heat exposure can be conceptualized in the following three ways: technology standards, the required presence of a cooling and/or thermal-regulating technology, behavioral guidelines and expectations, required actions to avert overexposure, and performance standards, requirements that heat exposure cannot cross a certain threshold. These potential regulations will need to be conceptualized, reviewed, and implemented by several federal agencies, as authority for different aspects of heat exposure is fragmented across the federal government. Some examples of regulatory levers identified through our interviews (and introduced in previous sections) include:

Conclusion

Extreme heat, both acute and chronic, is a growing threat to American livelihoods, affecting household incomes, students’ learning, worker safety, food security, and health and wellbeing. While the policy landscape for addressing heat is nascent, this report offers recommendations for near and long term solutions that policymakers can consider. Complimentary to FAS’ Extreme Heat Policy Sprint, we hope this report can be a toolkit for potential realistic actions.

Heat Hazards and Migrant Rights: Protecting Agricultural Workers in a Changing Climate

KEY TAKEAWAYS

KEY FACTS


In 2008, Maria Isabel Vasquez Jimenez, a 17-year-old pregnant farmworker, tragically died from heatstroke while working in the vineyards of California. Despite laboring for more than nine hours in the sweltering heat, Maria was denied access to shade and adequate water breaks. Management never called 911 and instructed her fiancé to lie about the events. To this day, her death underscores the dire need for robust protections for those who endure extreme conditions to feed our nation.

This heartbreaking incident is not isolated. With the United States shattering over a thousand temperature records last year, the crisis of heat-related illnesses in the agricultural sector is intensifying. Rising global temperatures are making heat waves more frequent and severe, posing a significant threat to farmworkers who are essential to our food supply. While progress is being made towards comprehensive heat safety regulations, we must now focus on ensuring these protections are equitably implemented to safeguard all farmworkers from the intensifying threats of climate change, especially vulnerable groups like migrants. As individual stories shed light on the real-life tragedies of neglecting climate resilience, broader climate trends reveal a significant rise in these risks, affecting agricultural workers nationwide.

Climate change & agriculture

Rising Temperatures

Climate change poses significant challenges to global agricultural systems, threatening food security, livelihoods, and the overall sustainability of farming practices. Among the various climate-related hazards, rising temperatures stand out as a primary concern for agricultural productivity and worker health and safety. The Environmental Protection Agency (EPA) reports that the average temperature in the United States has increased by 1.8°F over the past century, with the most significant increases occurring in the last few decades. According to the Intergovernmental Panel on Climate Change, global average temperatures have been steadily increasing due to the accumulation of greenhouse gasses in the atmosphere, primarily from human activities such as burning fossil fuels and deforestation. This warming trend is expected to continue, critically impacting agricultural operations worldwide. The Union of Concerned Scientists predicts that by mid-century, the average number of days with a heat index above 100°F in the United States will more than double, severely impacting agricultural productivity and worker health. As the climate continues to change, the direct threats to those who supply our food become increasingly severe, particularly for farmworkers exposed to the elements.

Threats to Farmworkers

In agriculture, rising temperatures worsen challenges like water scarcity, soil degradation, and pest infestations, and introduce new risks like heat stress for farmworkers. As temperatures rise, heatwaves become more frequent, intense, and prolonged, posing serious threats to the health and well-being of agricultural workers who perform physically demanding tasks outdoors. Heat stress can lead to heat-related illnesses such as heat exhaustion and heatstroke, which can be life-threatening if not properly managed. Prolonged exposure to high temperatures can impair cognitive function, reduce productivity, and increase the risk of accidents and injuries in the workplace. According to the Public Citizen, from 2000 to 2010, as many as 2,000 workers  died each year from heat-related causes in the United States, while farmworkers are 20 times more likely to die from heat-related illnesses than other workers.

Given the critical role of agricultural workers in food production and supply chains, protecting their health and safety in the face of escalating heat risks is critical. Comprehensive heat safety standards and regulations are essential to mitigate the adverse impacts of climate change on farmworkers and ensure the sustainability and resilience of agricultural operations. By implementing comprehensive heat safety measures such as heat acclimatization guidelines, shade access, and regular rest breaks, agricultural employers can minimize the risk of heat-related illnesses and injuries. Effective heat standard implementation requires collaboration among policymakers, industry stakeholders, and worker advocacy groups to address climate change challenges and protect agricultural workers. Beyond the direct effects of heat, farmworkers also face compounded environmental hazards that further jeopardize their health and safety.

Compounded Hazards

While the focus of this discussion is on heat safety regulations, it’s important to recognize that these regulations intersect with broader environmental and health challenges faced by agricultural workers. High temperatures often coincide with wildfire seasons, leading to increased exposure to wildfire smoke. This overlap amplifies health risks like respiratory and cardiovascular diseases, disproportionately affecting workers with vulnerable conditions. Effective protection against these compounded hazards requires coordination among policymakers and industry leaders. Comprehensive standards and holistic safety measures are crucial to mitigate the risks associated with heat and to address the broader spectrum of environmental pollutants. While environmental hazards are a significant concern, the specific vulnerabilities of migrant workers introduce additional layers of risk and complexity.

Challenges faced by migrant workers

Recognizing these challenges is only the first step; next, we must assess how current protections measure up and where they fall short in safeguarding these vulnerable populations.

Understanding the Vulnerabilities

Migrant agricultural workers face socioeconomic, legal, and environmental challenges that increase their vulnerability to heat hazards. Economically, many migrant workers endure low wages and lack access to adequate healthcare, which complicates their ability to cope with and recover from heat-related illnesses. A study by the National Center for Farmworker Health found that 85% of migrant workers earn less than the federal poverty level, making it difficult for them to access necessary medical care. Legally, the fragile status of many migrant workers, including those on temporary visas or without documentation, exacerbates their vulnerability. These workers often hesitate to report violations or seek help due to fear of retaliation, job loss, or deportation.

Harsh Working Conditions

Additionally, migrant workers frequently labor in conditions that provide minimal protection against the elements. Excessive heat exposure is compounded by inadequate access to water, shade, and breaks, making outdoor work particularly dangerous during heatwaves. Furthermore, many migrant workers return after work to substandard housing that lacks essential cooling or ventilation, preventing effective recovery from daily heat exposure and exacerbating dehydration and heat-related health risks. According to the National Center for Farmworker Health, about 40% of migrant farmworkers in the United States live in homes without air conditioning.

Barriers to Protection

The barriers to effective heat protection for migrant workers are extensive and complex, which may prevent them from accessing crucial protections and resources, including:

Language Diversity. The migrant worker community is incredibly diverse, encompassing individuals from various cultural and linguistic backgrounds. In the U.S. agricultural sector, over 50% of workers report limited English proficiency. This diversity may present a significant challenge to understand their rights and the safety measures available to them. Even when regulations and protections are in place, the communication of these policies often fails to reach non-English speaking workers effectively, leading to misunderstandings that can prevent them from advocating for their safety and well-being. The National Agricultural Workers Survey reports that 77% of farmworkers in the United States are foreign-born, with 68% primarily speaking Spanish, highlighting the language barriers that complicate effective communication of safety regulations.

Vulnerable Visas & Immigration Status. Visa statuses and undocumented immigration also play a critical role in the vulnerability of migrant workers. Workers holding temporary visas, such as H-2A visas, often face precarious employment conditions because these visas tie them to specific employers, limiting their ability to assert their rights without fear of retaliation. Undocumented workers are particularly susceptible to exploitation and abuse by employers who may use their immigration status as leverage. Fear of deportation and legal repercussions further discourages reporting workplace incidents, perpetuating a cycle of exploitation and vulnerability.

Undocumented workers are particularly susceptible to exploitation and abuse by employers who may use their immigration status as leverage

via Tim Mossholder

Farmworker Housing. Farmworker housing often lacks proper cooling or ventilation, increasing heat exposure risks during off-work hours. Many agricultural workers live in substandard housing characterized by overcrowding, poor insulation, and inadequate access to air conditioning or ventilation systems. Poor living conditions worsen heat-related illnesses, particularly during extreme weather. Limited access to cooling amenities after long hours of outdoor labor exacerbates heat stress and heightens the health risks associated with heat exposure.

Recognizing these challenges is only the first step; next, we must assess how current protections measure up and where they fall short in safeguarding these vulnerable populations.

Review of existing protections

Federal Efforts

Currently, there is no overarching federal mandate specifically addressing heat exposure, leaving significant gaps in worker protection, especially for vulnerable populations like migrant workers. However, the federal government has taken several critical steps to address heat safety in the interim. OSHA has moved beyond relying solely on the General Duty Clause, launching a National Emphasis Program that prioritizes inspections on high-heat days and increases outreach in vulnerable industries. The Biden administration’s Heat Hazard Alert in July 2023 further emphasized employers’ responsibilities, while the initiation of a federal heat standard through OSHA’s rulemaking process signals a commitment to sweeping, nationwide protections.

These efforts reflect progress but it’s crucial that these federal efforts evolve to address the unique challenges faced by workers, ensuring that no one is left behind in the implementation of heat safety measures. The true test of these regulations will be their ability to safeguard those most at risk, bridging gaps in protection and creating a more resilient workforce in the face of rising temperatures.

State-Level Protections

At the state level, the scenario is mixed, with states like California, Washington, and Oregon having implemented their own heat safety regulations, which provide a model for other states and potentially for federal standards. Oregon’s regulations, for instance, require employers to provide drinking water, access to shade, and adequate rest periods during high heat conditions. These measures are designed not just to respond to the immediate needs of workers but also to educate them on the risks of heat exposure and the importance of self-care in high temperatures. When Oregon implemented stricter heat safety standards, it saw a significant reduction in heat-related illnesses reported among agricultural workers. By requiring more frequent breaks, adequate hydration, and access to shade, Oregon’s regulations demonstrate how well-designed policies can decrease the incidence of heat stress and related medical emergencies. California has also taken a comprehensive approach with its Heat Illness Prevention Program, which extends protections to both outdoor and indoor workers, reflecting the broad scope of heat hazards. This program is noted for its requirements, including training programs that educate workers on preventing heat illness, emergency response strategies, and the necessity of acclimatization.

Legislative Challenges & Need for Unified Approach

Conversely, legislative actions in states like Florida and Texas represent a significant challenge to advancements in occupational heat safety. For example, Florida’s HB 433, recently signed into law, expressly prohibits local governments from enacting regulations that would mandate workplace protections against heat exposure. This legislation stalls progress and endangers workers by blocking local standards tailored to the state’s specific needs.

The contradiction between states pushing for more stringent protections and those opposing regulatory measures illustrates a fragmented approach that could undermine worker safety nationwide. Without a federal standard, the protection a worker receives is largely dependent on state policies, which may not adequately address the specific risks associated with heat exposure in increasingly hot climates. This patchwork of regulations underscores the importance of a unified federal standard that could provide consistent and enforceable protections across all states, ensuring that no worker, regardless of geographical location, is left vulnerable to the dangers of heat exposure.

With an understanding of the gaps in current heat safety regulations, the next crucial step is fostering effective stakeholder engagement to drive meaningful changes.

Engaging Stakeholders: Beyond Public Comment

While progress has been made in recognizing the need for heat safety regulations, we must now focus on ensuring equitable representation in the policy-making process. Traditional engagement methods have often fallen short in capturing the voices of those most impacted by these policies, particularly vulnerable groups like migrant agricultural workers. Regulatory agencies must rethink their strategies to include more direct and inclusive approaches, empowering workers to contribute meaningfully to policies that directly affect their safety and well-being.

Challenges in Traditional Engagement

The traditional approaches to stakeholder engagement, particularly in regulatory settings, often rely heavily on formal mechanisms like public comment periods. While these methods are structured to gather feedback, they frequently fall short of engaging those most impacted by the policies—namely, the workers themselves. Many workers, especially in labor-intensive sectors like agriculture, may not have the time, resources, or knowledge to participate in these processes. Relying on online submissions or weekday meetings during work hours can exclude many workers whose insights are crucial for shaping effective regulations. A survey conducted by the Migrant Clinicians Network found that fewer than 10% of migrant workers had participated in any form of public comment or feedback process related to workplace safety.

The complexities of these workers’ lives—ranging from language barriers to fear of retaliation—mean that conventional engagement strategies may not effectively reach or address their concerns. This gap highlights a critical need for regulatory bodies to rethink and expand their engagement strategies to include more direct and inclusive methods.

As we push for broader and more inclusive engagement, we must also consider systemic improvements that can solidify these efforts into lasting safety standards.

Looking Forward: Systemic Improvements & Community Collaboration

Protecting migrant workers from extreme heat requires systemic improvements and a coordinated approach to address gaps in current regulations and foster collaborative efforts among stakeholders. By combining the strengths of government agencies, employers, and community advocates, we can develop robust solutions of heat safety which protect the well-being of vulnerable workers while supporting the productivity and resilience of the agricultural industry.

Systemic Changes Needed

To effectively protect migrant workers from the dangers of extreme heat, systematic changes are required. On the regulatory side, this includes boosting the human resources and funding available to agencies like OSHA to ensure they can effectively implement and enforce new heat safety standards. Building robust infrastructure for enforcement and consultation is crucial, as is ensuring these bodies can handle the demands of new regulatory programs. From the employer and industry perspective, federal support is essential. Incentives such as tax breaks or reimbursement programs similar to those provided under the Families First Coronavirus Response Act during the COVID-19 pandemic could motivate employers to adhere more strictly to safety standards, knowing they can recoup some costs associated with implementing safety measures like paid sick leave.

Fostering a Safe Reporting Culture

Creating a workplace that encourages safe and open communication is vital. Employers must be encouraged to establish non-retaliatory policies and to offer regular training sessions that educate workers about their rights and the importance of reporting safety violations. Reporting mechanisms should protect employee anonymity to reduce fear of retaliation. These practices can improve safety, while also enhancing worker retention and morale, contributing to a healthier workplace culture.

Role of Community & Grassroots Advocacy

Grassroots organizations and community advocates play a pivotal role in shaping and enforcing heat safety regulations. These groups often have direct insights into the needs and challenges of workers on the ground and can help tailor educational and enforcement strategies to the community context. Collaborations with these organizations can facilitate the delivery of multilingual training and legal assistance, ensuring that workers are well-informed about their rights and the safety measures in place to protect them. Additionally, these partnerships can help to monitor compliance and gather grassroots feedback on the efficacy of the regulatory measures. A notable example is the partnership between California Rural Legal Assistance and local farming communities to develop heat stress prevention training tailored to the languages and cultures of the workers. This program has improved knowledge and awareness of heat stress risks among workers, and has also empowered them to take proactive steps in managing their health during extreme conditions. Evaluations of this initiative show a marked improvement in both the adoption of safety practices and worker satisfaction, highlighting the importance of community-driven approaches in policy implementation.

To support these systemic changes, strategic investments are essential, not only to enhance regulatory capacity but to ensure the long-term health and productivity of the agricultural workforce.

The Power of Investment

Investing in heat safety offers strategic, far-reaching benefits for both workers and employers alike. By funding regulatory frameworks and workplace safety programs, organizations can effectively mitigate the impact of heat-related illnesses and injuries. Such investments can enhance regulatory agencies’ capacity to enforce standards while creating safer, more productive work environments that benefit businesses and employees. An investment approach to heat safety strengthens economic sustainability, worker well-being, and industry compliance.

By funding regulatory frameworks and workplace safety programs, organizations can effectively mitigate the impact of heat-related illnesses and injuries.

via Tim Mossholder

Envisioning Enhanced Regulatory Capacity

In the pursuit of more effective heat safety regulations, one critical aspect overlooked is the role of increased investment in regulatory agencies like OSHA. An addition of resources into these bodies is not merely a bureaucratic expansion but a potential lifesaver. Research consistently demonstrates that increased funding for regulatory enforcement can significantly enhance compliance and improve safety outcomes. This investment empowers agencies to provide greater education and outreach, conduct more inspections, and enforce compliance more effectively, which are essential for protecting workers from heat-related hazards. Enhancing the capacity of organizations like OSHA to enforce heat safety standards saves lives, while supporting economic efficiency and sustainability in labor-intensive industries. These investments ensure that safety regulations evolve from paper to practice, significantly impacting the lives of those they are designed to protect.

Economic Benefit

Economic analyses further support the notion that investing in worker safety is not just a cost but a strategic benefit. Studies show that every dollar spent on improving workplace safety yields substantial returns in reducing the costs of workplace injuries and deaths. For instance, implementing stringent heat safety measures not only reduces the incidence of heat-related illnesses but also cuts down on associated costs such as medical expenses, workers’ compensation, and lost workdays. This is particularly relevant in sectors like agriculture, where the physical nature of the work increases vulnerability to heat stress. The economic benefit for employers extends beyond direct cost savings. Maintaining a safe work environment enhances a company’s reputation, aids in employee retention, and increases productivity. Workers are more likely to stay with an employer they trust to prioritize their health and safety, which is crucial in industries facing labor shortages. A culture that encourages reporting and promptly addresses safety concerns can significantly reduce the risk of severe injuries and fatalities, further lowering potential liabilities and insurance costs.

Employer Benefit

A compelling example of the benefits of proactive safety measures is the Gold Star Grower Program in North Carolina. This program recognizes agricultural employers who provide housing that  meets and exceeds the requirements of the Migrant Housing Act of North Carolina. This recognition serves as a badge of honor, indicating to potential employees that these employers value worker well-being. Reports suggest that workers actively seek out employers with this certification, preferring to work in environments where their health and safety are a priority. A preference like this can drive more growers to participate in safety programs, fostering a broader culture of safety and compliance within the industry.

Call for Collaborative Action

As the climate crisis continues, so does the threat of heat exposure to agricultural workers, posing grave risks to their health and to the core of our food supply systems. The necessity for comprehensive heat safety measures is now both urgent and undeniable. 

Governments at every level, employers across industries, community groups, and the workers themselves must unite to create resilient, practical strategies that prioritize safety and health. The cost of inaction is stark, exceeding $100 billion annually— not only affecting the economy but leading to the irreplaceable loss of life and well-being. 

We are at a critical juncture which demands a unified, strong response to heat hazards. By adopting systemic improvements and fostering a culture of collaboration and proactive communication, we have the opportunity to safeguard those most vulnerable to the impacts of climate threats.  

As we progress towards implementing rigorous heat safety regulations, our focus must now shift to ensuring these protections reach all workers equitably. Let’s mobilize, from grassroots movements to national policy reforms, to create inclusive implementation strategies that protect our most vulnerable workers, particularly migrants, and secure our collective future.

For resources on how you can support these critical efforts, please refer to the guides provided in Appendix A and B, which offer strategies for advocacy, community engagement, and policy development. Together, our collective efforts can protect our most vulnerable and build a resilient path forward in the face of climate change.


APPENDIX A: RESOURCE GUIDE

Further information and support on heat-related safety and worker rights

Resources for Migrant Workers

Resources for Employers

Resources for Policymakers


APPENDIX B: ACTION GUIDE

Support Legislative Changes

Participate in Advocacy Efforts

Engage in Policy Development

A Guide to Public Deliberation

Science is advancing at an unprecedented speed, and scientists are facing major ethical dilemmas daily. Unfortunately, the general public rarely gets opportunities to share their opinions and thoughts on these ethical challenges, moving us, as a society, towards a future that is not inclusive of most people’s ideas and beliefs. Scientists regularly call for public engagement opportunities to discuss cutting-edge research. In fact, “71% of scientists [associated with the American Association for the Advancement of Science (AAAS)] believe the public has either some or a lot of interest in their specialty area.” Sadly, scientists’ calls often go unnoticed and unanswered, as there continue to be inadequate mechanisms for these engagement opportunities to come to fruition.

To Deliberate or Not to Deliberate

Public deliberation, when performed well, can lead to more transparency, accountability to the public, and the emergence of ideas that would otherwise go unnoticed. Due to the direct involvement of participants from the public, decisions made through such initiatives can also be seen as more legitimate. On a societal level, public deliberation has been shown to encourage pluralism among participants.

Despite the importance of deliberation, it’s important to note that it is not always the best way to engage the public. Planning a public deliberation event — a citizens’ panel, for instance — takes a large amount of time and resources. Plus, incentivizing a random sample of citizens to participate (which is considered the gold standard of deliberation) is difficult. It’s therefore paramount to first assess whether the topic of focus is suitable for public deliberation. 

To assess the appropriateness of a deliberation topic, consider the following criteria (inspired by criteria set forth by Stephanie Solomon and Julia Abelson and the Kettering Foundation):

  1. Does the issue involve conflicting public opinions? Issues that involve setting priorities in healthcare, for example, may benefit from public deliberation as there is no singular correct answer; deliberation may offer a more clear and holistic view of what is best for a community, according to the community.
  2. Is the issue controversial? If so, deliberation can be a good tool as it brings many opinions into view and can foster pluralism as mentioned previously.
  3. Does the issue have no clear-cut solution and is “intractable, ongoing, or systemic”?
  4. Do all available solutions have significant drawbacks?
  5. Does the community at large have an interest in the problem?
  6. Would the discussion of the issue benefit from a combination of expert and real-world experience and knowledge (what Solomon and Abelson call “hybrid” topics)? Certain issues may solely require technical knowledge but many issues would benefit from the views of the public as well.1
  7. Are citizens and the government on the same page about the issue? If not, public deliberation can foster trust, but only if the initiative is done with the intention of taking the public’s conclusions into account.

Setting Goals

If it’s deemed that the topic is suitable for public deliberation, the next step is to set goals for the public deliberation initiative. Julia Abelson, Lead of the Public Engagement in Health Policy Project and Professor at McMaster University, has explained that one of the significant differentiating factors between successful and unsuccessful initiatives is thoughtful planning and organization — including setting clear goals and objectives organizers would like to meet by the end of deliberation. Having an end goal not only helps with planning but also allows for a realistic goal to be shared with deliberation participants. Setting unrealistic expectations as to what the deliberation process is meant to achieve — and subsequently not achieving those goals — will lead participants and citizens, in general, to lose trust in the deliberation process (and organizational body).

Is the goal of deliberation to bring new ideas into view and share those with relevant agencies (governmental or otherwise)? Is the goal instead to enact change in current policies? Is the goal to help shape new policies? The aforementioned Citizens’ Reference Panel on Health Technologies in Canada did not directly impact the government’s decisions, but served to make experts aware of a viewpoint they had not previously explored. This is in contrast to the typical “sit and listen” initiatives that don’t have as much of a capacity to encourage new ideas to emerge. In another instance, a citizens’ jury in Buckinghamshire, England was formed to discuss how to tackle back pain in the county. The Buckinghamshire Health Authority promised to implement the citizens’ recommendations (as was mandated by a charity that was supporting this public deliberation effort) — and they did.

Expanding on the idea of making promises and accountability, it’s important for the organizing body — which may or may not include a federal agency — to consider its role in implementing the conclusions of the deliberation. Promising to implement the conclusion of the deliberations can serve to invigorate discussion and make participants more engaged, knowing that their discussions can have a direct impact on future decisions. For instance, the British Columbia Biobank Deliberation involved a “commitment at the outset of the deliberation from the leaders of a proposed BC BioLibrary (now funded by the Michael Smith Foundation for Health Research) that the Bio-Library’s policy discussions would consider suggestions from this deliberation.” Researchers have suggested this may have contributed to participants’ interest in the deliberation event. Despite some examples of implementation following deliberation (such as the Buckinghamshire and Ontario examples), there continues to be a lack of adequate change based on the public’s recommendations. One other instance comes from NASA’s 2014 efforts to involve the public in the discussion around planetary defense (in the context of asteroids) through a participatory technology assessment (PTA). It seems that the PTA helped to spur the creation of NASA’s Planetary Defense Coordination Office. 

Furthermore, providing updates on implementation to participants, and the public at large, would provide another crucial aspect of accountability: “explanations and justifications.” However, these updates on their own would not fulfill an organization or agency’s duty to accountability as that requires an active dialogue with the public (which is precisely why implementing the conclusions of public deliberation initiatives is important).  

When to Deliberate: Agenda Setting for Citizens

As mentioned above, deliberation can happen at various points during the policymaking pipeline. It has become increasingly popular to include the public early on in the process, such as in an agenda-setting role. This allows the public not only to engage in discussions about a topic but to also set the priorities and frame how the discussions will move forward. As Naomi Scheinerman writes, “with proper agenda setting and precedent creation, the resulting […] questions would be more reflective of what the public is interested in discussing rather than of the companies, industries, and other stakeholder groups.”

A trailblazing model in citizen agenda-setting has been the Ostbelgien Model. The model involves both a permanent Citizens’ Council and ad hoc Citizens’ Panels. Though the members of the Citizens’ Council rotate (and are chosen randomly), one of the permanent roles of the Council is to select topics for the ad hoc Citizens’ Panels, with citizens having a direct hand in what issues their fellow citizens and government should tackle. Since its inception in 2019, the Citizens’ Council has asked Citizens’ Panels to tackle issues such as “how to improve the working conditions of healthcare workers” and “inclusive education.” 

Framing

One of the pillars of the success of public deliberation is a well-scoped question that is framed appropriately. Issues that are framed unfairly, meaning they place emphasis on a specific part of the issue while ignoring others, can lead to inaccurate results and a loss of trust between the public and the organizers. Though this depends on the goals of the deliberation, it’s often best for questions to be specific in their scope to allow for concrete results at the end of the deliberation initiative. For example, an online deliberation session in New York City aimed to assess the public’s views on who should be given priority access to COVID-19 vaccines. One of the questions asked participants to rank the order in which they think a pre-specified list of essential workers should get access to the vaccine. This allows for discussion while retaining a clear focus.

Another example comes from climate change. Climate change can be framed in many ways —  through an economic frame, a public health frame, a justice frame, and others. These various framings impact how the public reacts to the issue; in the case of the economic frame, it has led to “political divisiveness.” Focusing instead on the public health frame, for instance, led to greater agreement on policy decisions. Similarly, according to a 2023 policy paper from the Organisation for Economic Co-operation and Development (OECD), an issue like COVID-19 can be less polarizing if the framing used is about solutions to the pandemic rather than solely vaccines. Importantly, the organizers of the public deliberation initiative do not have sole control over the framing of the issue. Citizens often have a pre-existing “frame of thought.” This makes frames tricky yet essential in making it possible to appropriately and productively deliberate a topic. 

Framing is implicit in that participants in deliberation are not aware of it, making it all the more crucial to be wary of the framing. Thus, it becomes clear how seemingly unimportant factors, such as setting, also affect deliberation. According to Mauro Barisione, the framing of the setting includes:

Selecting a Type of Public Deliberation

Another factor that merits attention at this point is the type of public deliberation being undertaken. Though public deliberation has been referred to as one entity thus far, there are many different types, including, but not limited to, citizens’ juries, planning cells, consensus conferences, citizens’ assemblies, and deliberative polls. Below are some further details about various types of public deliberation (where a source is not included below, it was adapted from Smith & Setälä).

Citizens’ juries


Planning cells


Consensus conferences/citizens’ conferences


Citizens’ assemblies


Deliberative polls


A note on online deliberation

The COVID-19 pandemic forced many initiatives to shift to a fully online modality. This highlighted many of the opportunities as well as challenges that online deliberation presents. One consideration is accessibility, a double-edged sword when it comes to deliberation. Virtual deliberation alleviates the need for a venue or hotel accommodations — decreasing costs for organizers — and may allow participants to continue to go to work at the same time. However, difficulties with using technology and a lack of access to a device or an internet connection are drawbacks. Another opportunity presented by virtual deliberation is to provide more balanced viewpoints on the topic of deliberation. For instance, there are no geographical barriers as to the experts organizers can invite to speak at an event. 

A concern somewhat unique to online deliberation is data privacy and security. While this can also be an issue with in-person initiatives, many tools that participants are familiar with and may prefer to use do not have robust security.


A note on cost

While the cost of many deliberation initiatives is not publicly available, the available estimates range from $20,000 (citizens’ jury) to $95,000 (consensus conference) to $2.6 million (Europe-wide deliberative poll of 4300 people) to $5.5 million (citizens’ assembly). Note that these costs come from a range of time points and locations (though they have been adjusted for inflation) and only serve as rough estimates. A major contributor to these costs, particularly for longer deliberative initiatives, is hotel or venue costs as well as the reimbursement of participants. This reimbursement is costly but a part of the founding philosophy of many types of deliberation, including that of planning cells.


Selecting Participants

Many different approaches can be taken to selecting participants for deliberative forums. Unfortunately, there are inherent trade-offs in selecting a sampling method or approach. For instance, random sampling is more in line with the principle of “equal opportunity” and may promote “cognitive diversity”— the diversity of ideas, experiences, and approaches participants bring to the event — but is prone to creating deliberation groups that are not representative of the population at large. This is particularly true when the deliberative forum has few participants. This is why, depending on the type of deliberation event (and therefore number of participants chosen), a different type of sampling may be appropriate. 

Another approach is random-stratified sampling, where participants are randomly chosen and invited to participate in the deliberative event. There is often an unequal distribution among those who accept the invitation — for instance, individuals with higher socio-economic statuses may respond disproportionately more. In this case, a more representative sample may be chosen from those who responded. Quotas may also be set, such as ensuring that a certain number of female-identifying participants are included in a deliberative event. For this method, the organizers must decide on groups of individuals who are primarily affected by the topic being discussed, as well as groups often excluded from such deliberations. A deliberative forum on immigration, for instance, may call for the presence of a participant who is an immigrant to ensure polarization does not take place. In certain instances, purposive sampling — where individuals from groups whose views are specifically being sought are purposefully chosen — may also be appropriate. Furthermore, some researchers suggest including a “critical mass” of individuals from typically underserved groups. This can serve to make participants more comfortable in speaking up, ensure that the diversity of discussions is retained when participants are broken up into smaller groups (in certain forms of public deliberation), and provide a step in avoiding tokenism.

Furthermore, there are newer methods of selecting participants that combine both random and stratified sampling — namely algorithms that try to maximize both representation and equal opportunity of participation. One instance is the LEXIMIN algorithm which “choose[s] representative panels while selecting individuals with probabilities as close to equal as mathematically possible.” This algorithm is open-access and can be used at panelot.org

Aside from considerations for selecting participants, it’s important to consider the selected individuals’ ability and willingness to participate. Several factors can dissuade selected individuals from taking part, including but not limited to, the cost of missing work, the cost of childcare, transportation costs, and lack of trust in the organizing body or agency. Prohibitive costs are addressed by several of the deliberation models discussed in the “Selecting a Type of Public Deliberation” section. These models strongly suggest stipends which, at minimum, cover incidental expenses. A lack of trust is a particularly important issue to address as it can hinder the organizer’s ability to reach individuals typically left out of policymaking discussions. One approach to addressing this once again brings us to making — and critically, keeping — promises regarding the implementation of the conclusions of participants. Framing (as discussed in an earlier section) can also contribute to building trust, though, importantly, this is not a gap that can be bridged overnight. A more extensive discussion on inclusion in public deliberation forums can be found here.

Bringing On Experts & Creating Materials

Prior to selecting the group who will participate in the public deliberation activity, steps need to be taken to organize which experts will be part of the event and create the informational material that will be provided to participants before deliberations begin. 

Here, efforts must be made to ensure sufficient and balanced information is presented without creating a framing event where participants enter discussions with a biased perspective. It has been found that participants readily integrate the facts and opinions presented by experts/witnesses prior to deliberation and critically engage with their points. A deliberative engagement initiative in British Columbia, Canada about biobanking brought on a variety of experts and stakeholders to present to participants. To ensure fairness, presenters were “given specific topics, limited presentation times, and asked to use terms as defined in the information booklet” that was previously provided. A unique component included in this initiative was the ability for participants to ask presenters questions in between the two deliberative session weekends, which were two weeks apart, through a website. 

In addition, participants were provided with booklets and readings. In the case of the British Columbia initiative, to create booklets and background materials, a literature review was performed. Once more, the materials should provide a balance of opinions. They should include the most important facts relevant to the question at hand, some of the most common/salient approaches and points with regards to the question, and the weaknesses of each approach/point (Mauro Barisione). It is also best to keep materials succinct, with some deliberative initiatives keeping their materials to one page long.

Though the traditional approach is to have experts present prior to deliberation, other methods have also been used. For instance, a Colorado deliberation initiative focused on future water supply used an “on tap but not on top” expert approach. Rather than call experts to present information, they instead provided one-page information sheets, followed directly by deliberation. Experts were present during the deliberation session. When prompted by a participant, a facilitator would ask an expert to briefly join the group to answer the participant’s question. The approach was largely successful, though one “rogue expert” frequently interjected in a group’s discussion, providing his own opinions. One limiting factor to this approach is time; the deliberative sessions mentioned above were two hours long. But many other forms of deliberation are significantly longer, making coordinating with experts for long durations of time difficult. Despite these challenges, this approach provides an interesting way of integrating experts into the deliberation process so their expertise is best used and the participants’ questions are best answered as they arise.

Facilitation

A good facilitator or moderator is critical to the deliberation process. As explained by Kara N. Dillard, moderators set the ground rules for the discussion and prevent any one participant from dominating the session; this is called presentation. It has been found that clearly setting expectations for the discussion can lead to greater deliberative functioning — which, for our purposes, includes the exchange of ideas/reasons, equality, and freedom to speak and be heard — according to participants. Moderators also guide the discussion in two main ways: asking questions that challenge what participants have already discussed (elicitation); and connecting ideas that were previously brought up to new topics and “play[ing] devil’s advocate” to bring forth new ideas (interpretation). At the end of the session, moderators also help participants produce conclusions by asking what areas of consensus and contention were present throughout the discussion.

Moderators can take multiple approaches to facilitating, with one framework proposed by Kara N. Dillard separating moderators into three groups: passive, moderate, and involved. Passive moderators take a “backseat” approach to moderating. They often describe their role to participants as only being there to prevent a participant from dominating the conversation, rather than actively leading it. This has led to unfocused discussions and unclear conclusions. Participants often jumped around and went off-topic. Though this passive approach may work in some instances, a moderate or involved approach often leads to better deliberation.

Involved facilitators actively lead the discussion by asking questions that challenge participants to think in new ways, sometimes acting as a “quasi-participant.” In line with this, these moderators often play devil’s advocate to move the discussion in new, albeit related, directions. These moderators ask follow-up questions and “editorialize” to help participants flesh out their ideas together and aim to pinpoint points of contention so participants can further discuss them. If participants begin to veer off-topic, involved moderators will move the group back into a more focused direction while also connecting this new topic to the main question, allowing for new thoughts to emerge. These moderators take the time to sum up the main points brought up by participants after each point so conclusions become clear. Once more, this approach may not work in all instances but often leads to deeper conversations and more focused conclusions.

As implied by the name, moderate facilitators are somewhere in between passive and involved facilitators. These moderators ask questions to guide the discussion, but don’t often challenge the participants and let them take the wheel. These moderators use the elicitation strategy frequently, an important difference between moderate and passive moderators.

Due to the skills needed to facilitate a deliberation event well, organizers or government agencies looking to organize these events may require would-be facilitators to undergo brief training

What Comes Next

After deliberation has taken place, the next step is to write a report summarizing the conclusions of the deliberative forum. As we have seen several times with other topics, there are multiple approaches to this. One approach is to leave the report writing to the facilitators, organizers, or researchers who use their own takeaways from the deliberation (in the case of facilitators) or summarize based on recordings or transcripts (in the case of organizers or researchers). However, this method introduces bias into the process and doesn’t allow participants to be directly involved in creating conclusions or next steps.

An alternative is to allocate time towards coming up with conclusions together with participants both throughout and at the end of the deliberative session. Recall that involved facilitators frequently summarize the conclusions of the group throughout the deliberation, making this final task both more efficient and more participant-led. Participants can directly and immediately add on to or push back against the facilitator’s summary. As a guideline, Public Agenda, an organization conducting public engagement research, divides the summary into the following sections: areas of agreement, areas of disagreement, questions requiring further research, and high-priority action steps.

ALI Task Force Findings to Improve Education R&D

The Alliance for Learning Innovation (ALI) coalition, which includes the Federation of American Scientists, EdCounsel, and InnovateEdu, today celebrate the release of three task force briefs aimed at enhancing education research and development (“ed R&D”). With pressing issues such as declining literacy and math scores, chronic absenteeism, and the rise of technologies like AI, a strong ed R&D infrastructure is vital. In 2023, ALI convened three task forces to recommend ways to bolster ed R&D. The task forces focused on state and local ed R&D infrastructure, inclusive ed R&D, and the critical role of Historically Black Colleges and Universities (HBCUs), Minority-Serving Institutions (MSIs), and Tribal Colleges and Universities (TCUs) in this ecosystem.

State and Local Education R&D Infrastructure

Read here

Supporting R&D at the local level encourages an environment of continuous learning, accelerating improvements to educational methods based on new evidence and pioneering research. Therefore, given that over 90% of K-12 education funding comes from state and local sources, the ALI task force recommends that capacity-building, vision alignment, and investment in state and local education agencies (SEAs and LEAs) is prioritized. Preparing these entities to leverage R&D resources within their specific locales, in rural and urban contexts, will enable the infrastructure to best meet the unique needs of communities and students across the country. Additionally, supporting human capacity and development, modernizing data systems, and strengthening collaborative partnerships and fellowships across research institutions and key stakeholders in the ecosystem, will set the stage for more context-specific and effective ed R&D infrastructure at the state and local levels.

Inclusive Education R&D

Read here

Traditional education R&D is often dominated by privileged institutions and individuals with outsized access to capital and opportunities, sidelining the needs and perspectives of historically marginalized communities. To address this imbalance, intentional efforts are needed to create a more inclusive R&D ecosystem. The task force recommends that government actors implement multidimensional measures of progress and simplify application processes for R&D funding. Continuing dialogue on equity and inclusion will create space for identifying possible biases in approaches and processes. In sum, inclusion is imperative to achieving greater equity in education and supporting all learners of diverse backgrounds and communities.

The Role of HBCUs, MSIs, & TCUs in Education R&D

Read here

Achieving collaborative infrastructure and inclusion in ed R&D requires the strong participation of Historically Black Colleges and Universities (HBCUs), Minority-Serving Institutions (MSIs), and Tribal Colleges and Universities (TCUs). An equitable education R&D ecosystem must focus on the representation of these institutions and diverse student populations in research topics, grants, and funding to support learners from all backgrounds, particularly those of disadvantaged circumstances. Actionable steps include establishing diverse peer review panels, incentivizing grant proposals from minority-serving institutions, and creating specialized scholar programs. Additionally, programs should explicitly outline resource accessibility, leadership dynamics, funder relationships, grant processes, and inclusive language to dismantle structural inequalities and make the invisible visible.

Conclusion

Recommendations from the ALI task forces propose that sufficient funding, inclusivity, and diverse representation of higher education institutions are strong first steps in a path toward a more equitable and effective education system. The education R&D ecosystem must be a learning-oriented network committed to the principles of innovation that the system itself strives to promote across best practices in education and learning.

K-12 STEM Education For the Future Workforce: A Wish List for the Next Five Year Plan

This report was prepared in partnership with the Alliance for Learning Innovation (ALI), to advocate for building a better research and development (R&D) infrastructure in education. The Federation of American Scientists believes that STEM education evolution is necessary to prepare today’s students for tomorrow’s in-demand scientific and technological careers, as well as being a national security pursuit.

American STEM Education in Context

“This country is in the midst of a STEM and data literacy crisis,” opined Elena Gerstmann and Laura Albert in a recent piece for The Hill. Their sentiment represents a widely held concern that America’s global leadership in scientific and technological innovation, anchored in educational excellence, is being relinquished, thereby jeopardizing our economy and national security. Their message recycles a 65-year-old warning to U.S. policy makers, educators, and employers when the USSR seemingly eclipsed our innovation pace with the launch of Sputnik. 

Life magazine devoted their March 1958 edition to a scathing comparison of the playful approach to STEM education in U.S. schools versus the no-nonsense rigor of Russian classrooms. The issue’s theme, “Crisis in Education” was summed up soberly: “The outcomes of the arms race will depend eventually on our schools and those of the Russians.” America answered the bell and came out swinging. Under President Eisenhower, the National Aeronautics and Space Administration (NASA) and the Defense Advanced Research Projects Agency (DARPA) were both established in 1958, as was the National Defense Education Act that channeled billions of dollars into K-12 and collegiate STEM education. By innumerable metrics (the Apollo program, the internet, GPS, and manufacturing dominance, all fueled by an internationally envied higher education system), the United States reclaimed preeminence in STEM innovation.

LIFE March 24, 1958

Over the next four decades tectonic shifts in demographics, economics, and politics rearranged continental competition such that complacent U.S. education systems were once again called on the carpet. In 2001, shortly before terrorists struck the World Trade Center and Pentagon, a U.S. Senate report on homeland vulnerability echoed that of Life magazine decades prior: “The inadequacies of our systems of research and education pose a greater threat to U.S. national security over the next quarter century than any potential conventional war that we might imagine.”  The painfully prescient study, product of the Hart-Rudman Commission on National Security/21st Century, identified the advancement of information technology, bioscience, energy production, and space science, all overlain by economic and geopolitical destabilization, as the nation’s greatest challenge and our new Sputnik. The Commission called on reformed education systems to quadruple the number of scientists and engineers and to dramatically increase the number and skills of science and mathematics teachers. As in 1958, leaders responded boldly, creating the Department of Homeland Security in 2001, and planting the seeds for the 2007 America Creating Opportunities to Meaningfully Promote Excellence in Technology, Education, and Science (COMPETES) Act.

Funding for research and development across federal agencies significantly increased over the decade, including a budget boost for the National Science Foundation’s grant programs supporting emergent scholars (Faculty Early Career Development Program, or CAREER), the research capacities of targeted jurisdictions (Established Program to Stimulate Competitive Research, or EPSCoR), Graduate Research Fellowships (GRF), the Robert Noyce Teacher Scholarships, the Advanced Technological Education (ATE) program, and others designed to bolster diverse talent pipelines to STEM careers. Despite increases in the number of students studying science and engineering in the U.S, there is still a significant gap in diverse representation and equitable access to opportunities in the STEM field; ensuring greater inclusion and diversity in the American science and engineering landscape is essential to engaging the “missing millions,” or persistently underrepresented minority groups and women, in the nation’s STEM workforce and education programs.

Nearly a quarter century later, America is once again in a STEM talent crisis. The solutions of Hart-Rudman and of the Eisenhower era need an update. This latest Sputnik moment, unlike the space race that motivated the National Defense Education Act, and the terrorism that spawned Homeland Security, is more perfuse and profound, permeating every aspect of our lives: artificial intelligence and machine learning, CRISPR (clustered regularly interspaced short palindromic repeat), quantum computing, 6G and 7G communications, semiconductors, hydrogen and other energy sources, lithium and other ionic energy storage, robotics, big data, blockchain, biopharmaceuticals, and other emergent technologies.

To relinquish the lead in these arenas would put the U.S. economy, national security, and social fabric in the hands of other nations. Our new USSR is a roulette wheel of friends and foes vying for STEM supremacy including Singapore, Japan, China, Germany, the UK, Taiwan, Saudi Arabia, India, South Korea and many more. Not unlike the education crises that came to a head in 1958 and in 2001, our educational Achilles heel is a lack of exposure to and under-preparedness for STEM career pursuit for the majority of diverse young Americans. Further, the U.S. Bureau of Labor Statistics projects that STEM career opportunities will grow 10.8% by 2032, more than four times faster than non-STEM occupations.

What the United States has going for it in 2024 (and was comparatively lacking in the 1950s and the early 2000s) are STEM-rich local schools, communities, and states. Powered by investments of federal agencies (e.g., Smithsonian, NSF, NASA, DOL, ED and others), state governments (governors in Massachusetts, Iowa, Alabama, for example), nonprofits (Project Lead The Way and the Teaching Institute for Excellence in STEM for example), and industries (Regeneron, Collins Aerospace, John Deere, Google, etc.), STEM is now seen as an imperative field by most Americans.  

Today’s STEM education landscape presents significant opportunities and challenges. Existing models of excellence demonstrate readiness to scale. To focus on what works and to channel resources in the direction of broader impacts for maximal benefit is to answer the call of our omni-present 2024 Sputnik.

The Current State: Future STEM Workforce Cultivation

At its root, STEM education is about workforce cultivation for high-demand and high-skill occupations of fundamental importance to American economic vitality and national security. In the ideal state, STEM education also prepares all learners to be critical thinkers who make evidence-based decisions by equipping them with analytical, computational, and scientific ways of knowing. STEM students should learn effective collaboration and problem-solving skills with an interdisciplinary approach, and feel prepared to apply STEM skills and knowledge to everyday life as voters, consumers, parents, and citizens.1

Target Audiences and Service Providers 

The early childhood education community (pre-K-grade 3), both in school and out-of-school (at informal learning centers), has emerged over the last decade as a prime target for boosting STEM education as research findings accumulate around the importance of early exposure to and comfort with STEM concepts and processes. Popular providers of kits and activities, curricula, software platforms, and professional development for educators include Hand2Mind (Numberblocks), Robo Wunderkind, StoryTimeSTEM (Dragonland), NewBoCo (Tiny Techies), BirdBrain Tech (Finch robot), FIRST Lego League (Discover), Museum of Science Boston (Wee Engineer), Iowa Regents’ Center for Early Developmental Education (Light & Shadow), and Mind Research (Spatial-Temporal Math).  

The elementary to middle school level of STEM education options both in and out of school enjoys the richest menu of STEM programming on the market, reflecting stronger curricular freedom to integrate content compared to high schools. Popular STEM programs include Blackbird Code, Derivita Math, FUSE Studio, Positive Physics, Micro:bit, Nepris (now Pathful), Project Lead The Way (Launch and Gateway), FIRST Tech Challenge, Code.org (CS Discoveries), Bootstrap Data Science, and many more.    

The secondary education STEM landscape differs from pre-K-8 in a significant way: although discrete STEM activities and programs are plentiful for integration into secondary science, mathematics, and other classes, the adoption of packaged courses or after-school enrichment opportunities is more common. Project Lead The Way and Code.org offer an array of stand-alone elective STEM courses2, as do local community colleges and universities. Nonprofits and industry sources offer STEM enrichment programs such as the Society of Women Engineers’ SWEnext Leadership Academy, Google’s CodeNext, the Society of Hispanic Professional Engineers’ Virtual STEM Labs, and Girls Who Code’s Summer Immersion. Finally, a number of federal, state, nonprofit and business organizations conduct future workforce programs for targeted students including the federal TRIO program, Advancement Via Individual Determination (AVID), Jobs for America’s Graduates (JAG), and Jobs For the Future (JFF). 

 Investment in STEM Education 

A modestly conservative estimate of the total American investment in STEM education annually is $12 billion, nearly the equivalent of the entire budget of the National Science Foundation or the Environmental Protection Agency. 

For fiscal year 2023 the White House budgeted $4.0424 billion for STEM education across 16 agencies that make up the Subcommittee on Federal Coordination in STEM Education (FC-STEM). Total nonprofit and philanthropic investments are more elusive since there are so many, with origins of their dollars often overlapping with state or local government (grants for example), and wildly variable definitions of STEM investments. That said, U.S. charitable giving to the education sector totaled $64 billion in 2019. A reasonable assumption that two percent made its way to STEM education equates to over $1 billion contributed to the overall funding pie. Business and industry in the United States contribute well over $5 billion annually, a conservative estimated proportion of total annual STEM education market share among ten nations, according to a recent study. K-12 schools spend well over $1 billion on STEM, a minimally modest fraction of the $870 billion total spent on K-12 across the U.S. The same figure would likely be true of America’s annual $700 billion higher education expenditure, minimally $1 billion to STEM. Elusive as definitive figures can be in this space, a glaring reality is that funds are streaming into STEM education at a level where measurable results should be expected. Are resources being distributed for maximal impact? Are measures capturing that impact? Is it enough money?

There are approximately 55.4 million K-12 students across the nation. At $12 billion per year on STEM, that comes to about $217 worth of STEM education annually per young American. Is that enough to move the needle? The answer is a qualified “yes” based on Iowa’s experience. The state launched a legislatively funded STEM education program in 2012, investing on average about $4.2 million annually to provide enrichment opportunities for about one-fifth of all K-12 students, or 100,000 per year. To date, about 1.2 million youth have been served through a total investment of about $50 million. That calculates to $42 per student. The result? Among participants: increased standardized test scores in math and science; increased interest in STEM study and careers; a near doubling of post-secondary STEM majors at community colleges and universities. Thus, from Iowa’s experience, the amount of funding toward American STEM education is adequate to expect systemic gains. The qualifier is that Iowa funds flow toward increased equity (most needy are top priority), school-work alignment (career linked curriculum, professional development), and proof of effectiveness (rigorously vetted and carefully monitored programs). Variance in these three factors can separate ambitions from realities.

Ambitions vs. Realities

The federal STEM education strategic plan Charting a Course for Success: America’s Strategy for STEM Education, identified three consensus goals for U.S. STEM education: a strong STEM literacy foundation for all Americans; increased diversity, equity, and inclusion in STEM study and work; and preparation of the STEM workforce of the future. Three challenges lie between those goals and reality.

Elusive equity. The provision of quality STEM education opportunities to Americans most in need is universally embraced yet difficult to achieve at the program level. Unequal funding of school STEM programs across urban, rural, and suburban public and private school districts equate to less experienced educators and diminished material resources (laboratories, computers, transportation to enrichment experiences) in socioeconomically disadvantaged communities. The challenge is then compounded by the lack of role models to inspire and support youth of underserved subpopulations by race, ability, ethnicity, gender, and geography. Bias, whether implicit or explicit, fuels stereotype threat and identity doubt for too many individuals in schools, colleges, and workplaces, countering diversity and equity efforts.    

School-work misalignment. For most learners, the school experience can seem quite different from the higher education which follows, and the work and life experiences beyond. Employer and learner polls unearth misalignment in priorities: employers value in new hires skills such as relationship building, dealing with complexity and ambiguity, balancing opposing views, collaboration, co-creativity, and cultural sensitivity, in addition to expectations of work-related experiences. Schools typically proclaim missions like “Educating each student to be a lifelong learner and a caring, responsible citizen” omitting the importance of employability. Learners feel that school taught them time management, academic knowledge, and analytical skills, while experiential learning remains limited.

Elusive proof. Evidence of effect can be vexingly evasive. The 2022 progress report of the federal STEM plan clarified the difficulty in verifying reach to those most in need: the identification of participants in STEM programs can be restricted for privacy/legality reasons. The gathering of racial, ethnic, and demographic data on STEM participants may often be unreliable given self-reported or observational identifications as well as the fleeting, often anonymous encounters typical of “STEM Night” or informal experiences at science centers, zoos, and museums.  

Participant profiles aside, variability in program assessments – design and objectives – make meaningful meta-analysis challenging, which creates difficulties in scaling promising STEM programs. “We recommend that states and programs prioritize research and evaluation using a common framework, common language, and common tools” advised a group of evaluators recently.     

Exemplars 

Plentiful success stories exist at the local, regional, and national levels. The following six exemplars are each funded in whole or in part by federal and/or state grants. The first examples are local education systems (one in-school, one out-of-school) masterfully aligning learning experiences to career preparation. The second pair of examples profile a regional out-of-school STEM program powerfully documenting effects on participants, and an in-school enrichment course demonstrating success. And the final pair of examples are a nationwide equity program successfully preparing STEM educators to effectively serve students of diversity, and an exciting consortium effort aimed at refocusing the entire educational enterprise on skills that matter most.

1.a. School-work alignment at the local level

The Barrow Community School District (BCSD) in Georgia is strongly committed to work-based learning (WBL). All 15,000 students are required to take a sequence of exploratory STEM career classes beginning in ninth grade. Fifteen career pathways are available ranging from computing to health, manufacturing to engineering. It all culminates in an optional senior year internship serving 400 students annually. Interns earn dual-enrollment credits in partnership with local colleges and are paid by the employer host. Interns spend 7.5 to 15 hours per week at work experiences in a hospital, on a construction site, or in a production plant. The district employs a full time WBL coordinator to oversee, administer, and evaluate, as well as to cultivate community employer partners. Teachers are expected to spend one week in an industry externship every three to five years. The BCSD commitment to a school experience aligned to future careers is something that every student in any district ought to be able to experience.

1.b. Diverse workforce of the future – local-to-global level

The World Smarts STEM Challenge is a community-based, after-school, real-world problem-solving experience for student workforce development. Funded by a 2021 National Science Foundation ITEST (Innovative Technology Experiences for Students and Teachers) grant in partnership with North Carolina State University, students in the Washington D.C. area are assigned bi-national groups (arranged through a partnership with the International Research and Exchanges Board) to collaborate in solving local/global STEM issues via virtual communications. Groups are mentored by industry professionals. In the process, students develop skills in innovation, investigation, problem-solving, and global citizenship for careers in STEM. Participant diversity is a primary objective. Learners of underrepresented backgrounds, including Black, Hispanic, economically disadvantaged, and female students, are actively recruited from local schools. Educator-facilitators are treated to professional development opportunities to build mentorship skills that support students. The end-product is a World Smarts STEM Activation Kit for implementing the model elsewhere.

2.a. Proof of effect at the regional level out-of-school

NE STEM 4U is an after-school program serving Omaha, Nebraska regional elementary school youth. Programs are hands-on problem-based challenges relevant to children. The staff were interested in the effect of their activities on the excitement, curiosity, and STEM concept gains of participants. The instrument they chose to use is the Dimensions of Success (DoS) observational tool of the P.E.A.R. Institute (Program in Education, Afterschool & Resiliency). The DoS is conducted by a certified administrator who observes and rates four groups of criteria: the learning environment itself, level of engagement in the activity, STEM knowledge and skills gained, and relevancy. Through multiple cohorts over two years, the DoS findings validated the learning approach at NE STEM 4U across dimensions, though with natural variations in positive effect. The upshot is not only that this after-school model is readily replicable, but that the DoS observation tool is a thoroughly vetted, powerful, and readily available instrument that could become a “common tool” in the STEM education program evaluation community.  

2.b. Proof of effect at the regional school level

From a modest New York origin in 1997, Project Lead The Way (PLTW) has blossomed into a nationwide tour de force in STEM education, funded by the Kern Foundation, Chevron, and other philanthropies. Adopted at the community school level where trained educators integrate units at the pre-K-5 and middle school levels (Launch, and Gateway, respectively), or offer courses at the secondary level (Algebra, Computer Science, Engineering, Biomedical), all share a common focus on developing in-demand, transportable skills like problem solving, critical and creative thinking, collaboration, and communication. Career connections are a mainstay. To that end, PLTW is notable for expecting schools to form advisory boards of local employers for feedback and connections. Attitudinal surveys attest to increased student interest in STEM careers. 

3.a. Equity at the national level – diversity and inclusion

The National Alliance for Partnerships in Equity (NAPE) offers a wide array of professional development programs related to STEM equity. One module is called Micromessaging to Reach and Teach Every Student. Educators in and out of school convey micro-messages to students at every encounter. Micro-messages are subtle and typically unconscious. Sometimes they are helpful – a smile or eye contact. Sometimes they can be harmful towards individuals or reveal bias towards a group to which a student may belong – a furrowed brow or a stereotypical comment. Exceedingly rare is micro-message expertise in the teacher preparatory pipeline or in standard professional development. Yet micro-messaging is tremendously influential in the self-perceptions of learners as welcome in STEM. 

3.b. Equity at the national level – leveling the playing field

Durable skills – e.g., teamwork, collaboration, negotiation, empathy, critical thinking, initiative, risk-taking, creativity, adaptability, leadership, and problem-solving – define jobs of the future. AI and automation cannot replace durable skills. The nonprofit America Succeeds has championed a list of 100 durable skills grouped into 10 competencies, based on industry input. They studied state standards for college and career readiness against those competencies and prescribe remedies to states whose standards fall short (most U.S. states). Durable Skills, packaged by America Succeeds, is an equity service par excellence – every learner can command these 100 enduring skills, setting them up for success.

Black and white photo of early 20th century science class

The Case for Increased Investment in STEM Education R&D at the Federal and State Level

Billions of dollars pour into American STEM education each year. Millions of learners and employers benefit from the investment. Outstanding programs produce undeniably successful results for individuals and organizations. And yet, “This country is in the midst of a STEM and data literacy crisis.” How can that be? Here are some of the factors in play.

Recent STEM Education/Workforce Investment Trends

The bi-annual Science and Technology Indicators compiled by the National Science Board were released in March 2024. Noteworthy findings (necessarily a couple of years old given the retrospective analysis) include:

The federal government funds 52% of all academic research and development taking place at colleges and universities (2021).

Contrasting the findings of the NSB against current federal budgets, FY2024 appropriation for STEM education research and development is a work in progress. In comparison to FY23, the budget presented to Congress by the executive branch called for increases for STEM spending across many agencies but not all. The U.S. House and Senate generally propose reductions in spending. The Defense Department’s STEM education line, for example, the National Defense Education Program, is slated for significant reduction (-7.3 percent to -20 percent). The Department of Energy’s Office of Science which funds STEM education, is slated for a slight increase (+1.7 percent). The same is true for the NSF’s STEM education programs (+1.6 percent). NASA’s Office of STEM Engagement is on track for a slight decrease (-.3 percent). The Department of Agriculture’s Research and Education budget is down slightly (-1.7 percent). The U.S. Geological Survey’s Science Support budget that includes human capital development, is down slightly (-1.2 percent). The Department of Education’s Institute for Education Sciences was slated for significant increase by the executive branch though slated for reduction in both the House and Senate budgets. The Department of Homeland Security’s Science and Technology budget which includes funding for university-based centers and minority institution programs is set for reduction (-1.3 percent to -19 percent).

Significant STEM education and workforce development support resides within the CHIPS and Science Act of 2022 which has yet to be fully funded by the Congress. An overall trend in shifting R&D, including education, from federal to private sector support means greater reliance on business and industry to invest in STEM program development. The NSB Indicators report highlights this shift in R&D investment: federal government investment in R&D is at 19 percent in 2021 (down from 30 percent in 2011), while the business sector now funds 75 percent of U.S. R&D funding.

A bottom-line interpretation is that federal investment in STEM education/workforce development, though significant, can hardly be described as a generational response to an economic and national security crisis.

Emergent Frontiers

Meanwhile, economic Sputniks are circling the globe. All driven by semiconducting silicon and germanium chips. Yet another testament to American STEM education is the home-grown invention of chips. But they are built mostly elsewhere – Taiwan, South Korea, and Japan. Semiconductors lie at the heart of our communications (e.g. cell phones, satellites), transportation (e.g. planes, trains, automobiles), defense (e.g. guidance systems and risk analytics), health (e.g. pacemakers, insulin pumps), lifestyle (e.g. dishwashers, Siri and Alexa), and virtually every other aspect of life and commerce. The federal government committed $53 billion through the 2022 CHIPS and Science Act to expand semiconductor talent development, research, and manufacture in the U.S., amplified by $231 billion in commitments to semiconductor development by business and industry. Guidance through the National Strategy on Microelectronics Research was recently released by the White House Office of Science and Technology Policy. When fully realized, the CHIPS Act may come to be a generational response to an international adversarial threat far more profound than Sputnik. 

Equally compelling and weighty in terms of life, liberty, and the pursuit of happiness is to lead in research and development as well as governance around artificial intelligence. Extraordinary workplace and homelife evolution are underway resulting from applications of this new technology. For example, AI dramatically increases precision and thus reduces error in health care. Machine learning is far superior to human eyes at image analysis – MRI or x-ray – for detecting cancer early. On a lighter note, machine learning can dramatically increase the likely appeal of new movies by compressing millions of historic data points and a sea of YouTube videos into a sure box office hit. Conversely are the misuses both present and potential, to AI. The displacement of radiologists, movie script writers, and countless others whose routine, analytical, or creative skills can be performed by robots and neural networked sensors is troublesome yes, but a mild effect of AI compared to the proneness of our privacy, our democratic systems, business and finance integrity, and national defense structures for starters.

The White House Blueprint for an AI Bill of Rights plants an important stake in the ground around AI safeguards. But it does not speak to the cultivation of future managers of AI. Similarly, the U.S. Department of Education report Artificial Intelligence and the Future of Teaching and Learning advises on risks of and uses for AI in diagnostics and descriptive statistics. However, guidance for preparing the upcoming generation to manage AI is not included. The National Science Foundation supports several AI-education studies that may prove worthy of scaling.

A potpourri of additional emergent trends fuel the current STEM crisis. Many are technological innovations, unearthing powers of manipulation and control with which society is ill-prepared to manage. Quantum computing is one such innovation – using subatomic particle positioning, qubits, to store information. Computers will become exponentially faster and more powerful, possibly solving climate change while also deciphering everyone’s passwords. Relatedly, revolutions in cybersecurity and data analytics may be out ahead of societal grasp. Many educational programs at the local and national levels have emerged in this space, including eCybermission from the Army Education Outreach Program (AEOP), and Data Science Foundations using sports, finance, and other contexts for sense-making, from EverFi.

Not everyone needs to know how a microwave oven works in order to use it effectively. But U.S. citizens bear the responsibility for weighing ethical, equitable, and legal dimensions of STEM advancements as voters, educators, parents, and consumers. Whether it be CRISPR alterations of individuals’ genetics, socioeconomic dimensions of factory automation, morality aspects of Directed Energy Weaponry (DEW), the cost/benefit balance of climate mitigation technologies such as carbon sequestration, and so on, STEM education and workforce development need to be out front. That requires additional investment.

Supply-Demand Imbalance

Emergent technologies will drive job opportunities in the STEM arena that are expected to grow at four times the rate of jobs in other sectors in the coming decade. While it is encouraging that post-secondary STEM certificates and degrees have increased over the last decade (growing from 982,000 in 2012 to 1,310,000 in 2021), this growth is a ripple when the field needs a wave. Further, significant subpopulations of Americans are underrepresented in STEM majors and jobs. Women make up just about one-third of the science and engineering workforce. While racial and ethnic subgroups including Alaska Native, Black or African American, American Indian, and Hispanic or Latino comprise 30% of the total workforce, just 23% are in STEM jobs. Rural residency exacerbates those disparities for all subpopulations regarding the STEM education pipeline. While 40% of urban adults have at least a bachelor’s degree, only 25% of rural residents do.

The commitment to diversify the STEM talent pipeline is a universal consensus across federal, state, local, corporate, nonprofit, and philanthropic investors in STEM education and workforce development. Numerous programs devoted to equity and inclusion are at work today with promising results, ripe for scaling.

Impact on Individuals and Society

Of all the arguments supporting increased investment in STEM education R&D to solve our current STEM crisis – tepid federal spending, ominously powerful inventions, and the dearth of talent for advancing and managing those inventions – a fourth argument eclipses each of them: STEM education improves the lives of individuals irrespective of their occupation. And in so doing, STEM education improves communities and the country at large.

Learners fortunate to enjoy quality STEM education develop creativity through imaginative design, interpretation, and representation of investigations. The tools they use strengthen technology literacy. The mode of discovery is highly social, honing communication and cooperation skills. With no sage-on-the-stage they develop independence of thought. Failure happens, forging perseverance and resilience in its wake. Asking and answering questions nurtures curiosity. Defending and refuting ideas cultivates critical thinking, Truth and facts are evidence-based yet always tentative. Empathy is cultivated through alternative interpretations or points of view. And confidence to pursue STEM as a career comes from doing STEM.

The prospect of an entire population of Americans thus equipped is the most compelling case for strategically increased R&D investment in STEM education.

Photo of 2008 Ethics in the Science Classroom

Policy Recommendations for Increasing the Efficacy of Education R&D to Support STEM Education

Where do federal, state, local, corporate, nonprofit, and philanthropic STEM investors look for guidance in the alignment and leveraging of their dollars to nationwide priorities? The closest we have to a “master plan” is the federal STEM education strategic plan mandated by the America COMPETES Act. Updated every five years by the White House Office of Science and Technology Policy in close collaboration with federal agencies, the 2018-2023 plan is due for an update, and it is likely the next iteration will be released soon. 

While the STEM community waits, valuable input on the next iteration was recently provided to the OSTP from the STEM Education Coalition. Coalition members, (numbering over 600) represent the spectrum of STEM advocates – business and industry, higher education, formal and informal K-12 education, nonprofits, and national/state policy groups – and collectively hold great sway in matters of STEM education nationally. The expiring federal STEM plan is closely reflective of their input, as its successor likely will be as well. 

Six of the following ten recommendations build upon the STEM Education Coalition’s priorities, while the remaining four recommendations address gaps in the pipeline from STEM education to workforce pathways.

In order to maximize research and development to improve STEM education, we have distilled ten recommendations:

  1. Devote resources (human and financial) to both the scaling of, and continued research and development in, interventions that disrupt the status quo when it comes to rural under-reach and under-service in STEM education.
  2. Devote resources to both the scaling of, and continued research and development in transdisciplinary (a.k.a. Convergent) STEM teaching and learning, formally and informally.
  3. STEM teacher recruitment and training to support learning characterized on page 11 is a high-value target for investment in both the scaling of existent models as well as research and development on this essential frontier.
  4. Expand student authentic career-linked or work-based learning experiences to all, earning credits while acquiring job skills, by improving coordination capacity, and crediting – especially earning core (graduation) credits. 
  5. Devote resources to research and development on coordination across components of the STEM education system – in school and out of school, educator preparation – at the local, state and national levels.
  6. Devote resources to research and development toward improved awareness/communication systems of Federal STEM education agencies.
  7. Devote resources to research and development on supporting the training of STEM teachers and professionals for career coaching on a real-time, as-needed basis for all youth.
  8. Devote resources to research and development on the expansion of local/global challenge-solution learning opportunities and how they  influence student self-efficacy and STEM career trajectories.
  9. Devote resources to research and development of a digital platform readily accessible, easily navigable, and comprehensively thorough, for education-providers to harvest effective, vetted STEM programs from across the entire producer spectrum.
  10. Devote resources to the design and development of a catalog of STEM/workforce education “discoveries” funded by federal grant agencies (e.g., NSF’s I-Test, DR-K12, INCLUDES, CSforAll, etc.) to be used by STEM educators, developers and practitioners.

Recommendation 1. Devote resources (human and financial) to both the scaling of, and continued research and development in, interventions that disrupt the status quo when it comes to rural under-reach and under-service in STEM education.

Aligning to the STEM Ed Coalition’s priority of “Achieving Equity in STEM Education Must Be a National Priority,” this recommendation is central to the success of STEM education. The economic and moral imperative to broaden access to quality STEM education and to high-demand STEM careers is a national consensus. Lack of access and opportunity across rural America, where 20% of all youth attend half of all school districts  and where persistent inequality hits members of racial and ethnic minority groups hardest, creates a high-value target.

STEM Excellence and Leadership Project

Identifying and nurturing STEM talent in rural K-12 settings can be a challenge. The Belin-Blank Center for Gifted Education and Talent Development successfully designed and implemented the “STEM Excellence and Leadership Project” at the middle school level. Funded by the NSF’s Advancing Informal STEM Learning program, flexible professional development, wide-net-casting of students, networking within the community, and career-counseling, resulted in increased creatively, critical thinking, and positive perceptions of mathematics and science.

Recommendation 2. Devote resources to both the scaling of, and continued research and development in transdisciplinary (a.k.a. Convergent) STEM teaching and learning, formally and informally. 

Aligning to the STEM Ed Coalition’s Priority “Science Education Must Be Elevated as a National Priority within a Transdisciplinary Well-Rounded STEM Education,” we need more investment in R&D to understand the transdisciplinary STEM teaching and learning models that improve student outcomes. America’s formal education model remains largely reflective of the 1894 recommendations of the Committee of Ten: annually teach all students History, English, Mathematics, Physics, Chemistry, etc. This prevailing “layer cake” approach serves transdisciplinary education poorly. Even the Next Generation Science Standards upon which state and district science standards are largely based, focuses on developing… “an in-depth understanding of content and develop key skills…” All modern STEM-related challenges facing Generations Z, Alpha, and Beta require an entirely different brand of education – one of transdisciplinary inquiry.

USPTO Motivates Young Innovators and Entrepreneurs

The United States Patent and Trade Office (USPTO)’s National Summer Teacher Institute (NSTI) on Innovation, STEM, and Intellectual Property (IP) trains teachers to incorporate concepts of making, inventing, and intellectual property creation and protection into classroom instruction, with the goal to inspire and motivate young innovators and entrepreneurs. To date the program claims 22,000 hours of IP and invention education training of 444 teachers in 50 states – 110 of whom have inventions – now equipped to spread the power of invention education and IP to hundreds of thousands of learners across the country and the world. We should better understand the program components that enable this kind of transdisciplinary learning.

Recommendation 3. STEM teacher recruitment and training to support learning is a high-value target for investment in both the scaling of existent models as well as research and development on this essential frontier. 

Aligning to the STEM Ed Coalitions’ priority “Increase the Number of STEM Teachers in Our Nation’s Classrooms,” we need to deploy more education R&D to address America’s well-documented STEM teacher shortage. But the shortage is only half of the challenge we face. The other half is equipping teachers to authentically teach STEM, not merely a discipline underneath the STEM umbrella. Efforts such as the NSF’s Robert Noyce Teacher Scholarship program and the UTeach model support the production of excellent teachers of mathematics and science, but not STEM overall. To teach in a convergent (transdisciplinary) fashion through collaborative community partnerships, on local/global complex issues is beyond the scope and capacity of traditional teacher preparatory models.

Example Programs

Two means for equipping educators to teach STEM are (1) in their pre-professional preparation, and (2) as in-service professional development for disciplinary instructors. Promising examples are flourishing.

  1. STEM Teaching Certificate. A few U.S. states and some national organizations have built STEM licenses and endorsements. Georgia State University’s STEM Certificate program trains teachers to bring a convergent STEM approach to whatever course, “[candidates] figure out how to work across their schools, with the arts, with connections to other subjects.”
  2. In-service STEM Externships. Teachers in industry externships discover workplace connections and durable skills important to build in classrooms. Numerous businesses (e.g., 3M), organizations (e.g. Aerospace/NASA), and states (e.g., Iowa’s NSF ITEST funded externships) conduct variations on the concept, with compelling results.

Recommendation 4. Expand student authentic career-linked or work-based learning experiences to all, earning credits while acquiring job skills, by improving coordination capacity, and crediting – especially earning core (graduation) credits.

Aligning to the STEM Ed Coalition’s priority to “Support Partnerships with Community Based STEM Organizations, Out of School Providers and Informal Learning Providers” education R&D needs to better understand career based learning models that work and deploy these evidence-based practices at scale.

Example Programs

With all 50 U.S. states aggressively pursuing work-based learning (WBL) policies and support, there is an opportunity to study and codify what states are learning to improve and iterate faster. According to the Education Commission of the States, 33 states have a definition for WBL, though variable. Nearly all states report WBL as a state strategy for their Workforce Innovation and Opportunity Act (WIOA) profile. Twenty-eight states legislate funding to support WBL. Less than half of all states permit WBL to count for graduation credits. Of all states, Tennessee presents a particularly aggressive WBL profile worthy of scale/replication. 

Recommendation 5. Devote resources to research and development on coordination across components of the STEM education system – in school and out of school, educator preparation – at the local, state and national levels.

Aligning to the STEM Ed Coalition’s priority to “Take a Systemic Approach to Future STEM Education Interventions,” more R&D should be deployed to study ecosystem models to understand the components that lead to student outcomes

The STEM learning that takes place during the K-12 school day may or may not mesh well with the STEM learning that takes place at museum nights or at summer camp. In both instances, it may or may not align well with local, state, or national assessments. The preparation of educators is widely variable. The curricular content classroom-to-classroom and state-to-state varies. To drop novel grant-funded interventions into the mix is a random act of hope.

Example Programs

STEM Learning Ecosystems now number over 100 across the U.S., providing vertebral backbone to a national coordinative skeleton for STEM education. Formally designated by their membership in the STEM Learning Ecosystems Community of Practice supported by the Teaching Institute for Excellence in STEM (TIES), they each unite “…pre-K-16 schools; community-based organizations, such as after-school and summer programs; institutions of higher education; STEM-expert organizations, such as science centers, museums, corporations, intermediary and non-profit organizations and professional associations; businesses; funders; and informal experiences at home and in a variety of environments” to “…spark young people’s engagement, develop their knowledge, strengthen their persistence and nurture their sense of identity and belonging in STEM disciplines.” Every one of America’s 20,000 cities and towns ought to have a STEM Ecosystem. Just 19,900 to go.

Recommendation 6. Devote resources to research and development toward improved awareness/communication systems of Federal STEM education agencies.

Aligning to the STEM Ed Coalition’s priority toClarify and Define the Role of Federal Agencies and OSTP in Supporting STEM Education” we should utilize R&D and inspiration from other fields to ensure we are propagating knowledge and systems in ways that foster increased transparency and evidence-use.

Awareness is the weak link in the chain of federal STEM education outreach to consumers at local levels. Seventeen federal agencies engage in STEM education via 156 programs spanning pre-K-12 formal and informal, higher education, and adult education.

In 2018-19 a strong push was put forth by the OSTP and the Federal Coordination in STEM subcommittee (FC-STEM) to build STEM.gov or STEMeducation.gov in the spirit of AI.gov and Grants.gov. A one-stop clearinghouse through which Americans can explore and discover funding, programs, and expertise in STEM. To date, the closest analog is https://www.ed.gov/stem.

Example Programs

Discrete programs of various federal agencies have employed clever tactics for awareness and communication, as described in the 2022 Progress Report on the Implementation of the Federal STEM Education Strategic Plan. The AmeriCorp program, for example, partnered with Mathematica to build a web-based interactive SCALERtool useable by education professionals, local education agencies, state education agencies, nonprofits, state and local government agencies, universities and colleges, tribal nations, and others to request participants to address local challenges they have identified, including STEM. Similarly, the National Institute of Standards and Technology launched their NIST Educational STEM Resource registry (NEST-R) to provide wide access to NIST educational and workforce development content including STEM resource records. Can the concept be broadened to a grand unifying collective? 

Recommendation 7. Devote resources to research and development on supporting the training of STEM teachers and professionals for career coaching on a real-time, as-needed basis for all youth. 

Gen Z and Gen Alpha may end up in jobs like machine learning tech, molecular medical therapist, cryptocurrency auditor, big data distiller, climate change mitigator, or jetpack mechanic. From whom can they expect good career coaching? It is unrealistic to expect that their school counselors can keep up, with an average caseload of 385 students across all disciplines, their hands are full. STEM teachers, both the disciplinary and the integrated type, are best positioned to take on more responsibility for career coaching, with the help of counselors, administrators, librarians in fact it is an all-hands-on-deck challenge.

Example Programs

Meaningful Career Conversations is a program begun in Colorado now spreading to other states. It is a light training experience of four hours to equip educators and others with whom youth come into contact to conduct conversations that steer students toward reflection, exploration, and consideration of career pathways of interest. Trainings are based upon starters and prompts that get students talking about and reflecting on their strengths and interests, such as “What activities or places make you feel safe and valued? Why?” Not a silver bullet, but a model of distributed responsibility which, by engaging core teachers and other adults in career guidance, can help more students find their way toward a STEM career.

Recommendation 8. Devote resources to research and development on the expansion of local/global challenge-solution learning opportunities and how they  influence student self-efficacy and STEM career trajectories.

The standardization of a vision for STEM in classrooms across America will take time and resources. In the meantime, programs like MIT Solve can fast-track authentic learning experiences in school and after school. It is the ultimate student-centeredness to invite groups of youth to think big – identify challenges for which they are enthused and tap all imaginable resources in dreaming up solutions – to command their own learning.

Example Programs

Common in higher education are capstone projects, applied coursework, even entire college missions (e.g., Olin College) that center the student learning experience around local/global challenges and solutions. 

For citizens of all ages there are opportunities like Changemakers Challenges, and the “Reinvent the Toilet” competition of the Gates Foundation.

At the K-12 level, FIRST Lego League teams learn about robotics through humanitarian themes such as adaptive technologies for the disabled. The World Food Prize offers student group projects focused on global food security challenges. Of similar format is Future Cities, and Invention Convention. These well-evaluated programs are prime for expansion or replication.

Recommendation 9.  Devote resources to research and development of a digital platform readily accessible, easily navigable, and comprehensively thorough, for education-providers to harvest effective, vetted STEM programs from across the entire producer spectrum.

More than 50 different programs are named in this paper, each an exemplar, a mere snapshot of the STEM programs available to the pre-K-12 community in and out of school. Therein lies a challenge/opportunity uniquely defining this moment in American educational history compared to the 1958 and 2001 crises: an embarrassment of riches.

Example Programs

The number of databases and resource catalogs on STEM education programs available to educators is almost as overwhelming as the number of programs themselves. A few standouts help dampen the decibels (though none are perfect):  

  1. What Works Clearinghouse (WWC). Established in 2002 under the Institute for Education Sciences at the U.S. Department of Education, the WWC does the hard work for educators of reviewing the research to make evidence-based recommendations about instruction. A priceless service. The trick is distillation. Their goal to digest and disseminate education research gets the material down to the level of curriculum developers, publishers, teacher-trainers, etc. Overwhelming though, for casual-shopping educators.  
  2. STEMworks Database. Born under Change The Equation in 2012, WestEd acquired STEMworks in 2017, a tool to sift through the noise using a rigorous rubric (Design Principles) to present sure-fire winning STEM programs to educators and organizations. Programs (kits, courses, software, lessons) submit applications for expert review. The result is a “Searchable honor-roll” of high-quality STEM. The hitch? Relatively few providers apply, especially not the emergent or experimental yet to acquire robust impact evidence.

Recommendation 10. Devote resources to the design and development of a catalog of STEM/workforce education “discoveries” funded by federal grant agencies (e.g., NSF’s I-Test, DR-K12, INCLUDES, CSforAll, etc.) to be used by STEM educators, developers and practitioners.

This recommendation relates to recommendation #9 except expressly regarding federal programs, and related to recommendation #6 except not a mere roster of offerings, but a vetted (and user-friendly) What Works Clearinghouse for all prior grants that yields empirical support for preK-12 STEM, across all agencies. What a treasure-trove of proven interventions and innovations across NSF, DE, DOE, DoD and on, mostly unknown to practitioners across the United States.

Each federal agency currently posts STEM opportunities at their websites (e.g., http://www.ed.gov/stem, http://dodstem.us/, http://www.nsf.gov/funding, http://www.nasa.gov/education, https://science.education.nih.gov/). These tools are valuable, but a desperate need remains for a singular STEM.gov style searchable landing page. 

There must be a way to view what worked for the thousands of R&D projects funded by these agencies. An online shopping mall for successful preK-12 STEM curricula, teaching approaches, equity practices, virtual platforms, etc. CoSTEM could create a “STEM Ideas that Work” landing page to ensure that emerging research insights are captured in systematic and accessible ways.

Example Programs

The Ideas That Work resource is an analog. Curated by the Office of Special Education Programs at the U.S. Department of Education, it is a searchable database that includes all grants past and current exclusive to the NSF. Special educators and families can search, e.g., “behavioral challenge” yielding resources and toolkits, training modules, tip sheets, etc. 

Black and white photo of early 20th century science class

Recommended Actions of ALI and Other Stakeholders

While we hope to see many of these recommendations in the forthcoming Five Year STEM Plan, to actualize these recommendations, it will take multiple actors working together to advance the STEM education field.  

The Alliance for Learning Innovation has perhaps the most potent of tools among STEM/workforce stakeholders to affect change: communication.

ALI should host events, publish white papers, develop convenings and deploy mass media and other awareness and advocacy modes for rallying the august collective of member organizations toward amplifying America’s rural STEM equity opportunity, career coaching capacity, educator-employer partnership potential, convergence approach to learning, along with six other recommendations, doing more for preparing the future STEM workforce than any other action, including investment. 

Investment is a close second-most impactful action ALI can take. If all STEM investors – federal, state, corporate, and philanthropic, aligned around a finite array of pressing priorities served by a proven set of interventions (the very function of this report), the collective impact would transform systems. What it would take is an aggregator. ALI or a designee organization, functioning as an agent for businesses, philanthropies and other STEM investors, can make funding recommendations (or more ambitiously, pool investor funds) based on consensus goals of the STEM cooperative, acting to focus investments accordingly.               

Federal Agencies have made significant gains toward cooperative and complementary STEM education support through the sustenance of interagency working groups on Computational Literacy, Convergence, Strategic Partnerships, Transparency & Accountability, Inclusion in STEM, and Veterans and Military Spouses in STEM. As a result, improvements are being made in coordination and increased transparency about federal education R&D investments, especially between the National Science Foundation and the Department of Education. And yet, more needs to be done. 

Business, Industry, and Philanthropic Organizations have the ability to pilot or expand proven programs to national scale, as many examples herein attest. However, the impact of the investments of the private sector may fall short of systemic change due to a smorgasbord of pet programs chosen by each entity, leading to incremental rather than wholesale progress.

Business, industry, and philanthropic investors in STEM education should pool their resources around a finite array of proven programs for maximal, collective impact. A functional intermediary such as the Alliance for Learning Innovation could represent the interests of all non-government STEM funders by winnowing the horde of pre-K-12 STEM education programs to only those most effective at achieving consensus goals and priorities. The outcome might be a Consumer Reports-style top-rated performers menu that concentrates investments, amplifying impact. Like federal agencies, non-government funders should consider driving the advancement of transdisciplinary (convergent) STEM education, work-based or career-linked learning, the synchronization of in-school and out-of-school STEM education, educator career-coaching capacity, and the development of rural, diverse STEM workforce talent.     

States are best positioned to help local education/workforce organizations meet the human resource challenges and the material challenges inhibiting full production of future workers for high demand careers. It is state government that sets the policies that determine practices.

K-12 formal and informal education at the daily practical level bears the greatest responsibility to act on behalf of the future STEM workforce. Insofar as government and non-government funded programs support, and state policies empower, and preparatory trainings equip, educators should seize this moment in history to help American economic vitality and national security one student at a time.

Others at the table include post-secondary institutions, media outlets, faith communities, local trade and professional societies, social service providers, families, and citizens at-large. Each should contribute to the goal of producing a vibrant future workforce by advocating for education research and development policies at the state and federal levels and by partnering with formal and nonformal learning organizations to inspire tomorrow’s innovators in today’s classrooms. 

Students work in cell biology lab in Peckham Hall, 2012

Conclusion

American competitiveness through innovation is driven by leading-edge education systems. Legitimate concern for whether those systems can maintain their lead surfaces during periods of vulnerability whether eclipsed in the space race, or comparatively under-armored in military advancement, or surpassed in the advancement of information technology. To relinquish leadership in innovation is a threat to the U.S. economy and national security. In response to periodic threats to American innovation preeminence, bold investments in STEM education have produced waves of talent for securing the helm. 

This era is different. A myriad of fronts for innovation advancement – automation, machine learning, molecular medicine, energy transformation, cybersecurity – each harboring an existential challenge, heighten the imperative for action to an unprecedented level. And yet, the U.S. has never been more prepared to act. A wealth of pre-K-12 STEM programs and infrastructure stand in testament to legacy investments by the federal government and the private sector. This time, the challenge is to engage a broader swath of the population, especially those underserved and underrepresented in STEM programs of the past. And in tight budgetary times, broadened opportunities must utilize evidence-based solutions proven to work, whether they be in the realm of teacher preparation, equity and inclusion, early learning, informal education, community engagement, mathematics, coding, quantum physics, or all the above and more.

The best time to invest is when the pathway to success is clear. The tools and the know-how for producing tomorrow’s STEM workforce reside within pre-K-12 systems today. For public and private investors alike, there is an opportunity for amplification through collective impact. By collectively identifying high-impact solutions transparent in design and indisputable in effect, aligning resources for surgical precision rather than shotgun spray, and scaling known winners to all young Americans, the current challenge to U.S. innovation leadership will be met. Enough with moving the needle. It is time to pin the needle, shattering the gauge.  

Predicting Progress: A Pilot of Expected Utility Forecasting in Science Funding

Read more about expected utility forecasting and science funding innovation here.

The current process that federal science agencies use for reviewing grant proposals is known to be biased against riskier proposals. As such, the metascience community has proposed many alternate approaches to evaluating grant proposals that could improve science funding outcomes. One such approach was proposed by Chiara Franzoni and Paula Stephan in a paper on how expected utility — a formal quantitative measure of predicted success and impact — could be a better metric for assessing the risk and reward profile of science proposals. Inspired by their paper, the Federation of American Scientists (FAS) collaborated with Metaculus to run a pilot study of this approach. In this working paper, we share the results of that pilot and its implications for future implementation of expected utility forecasting in science funding review. 

Brief Description of the Study

In fall 2023, we recruited a small cohort of subject matter experts to review five life science proposals by forecasting their expected utility. For each proposal, this consisted of defining two research milestones in consultation with the project leads and asking reviewers to make three forecasts for each milestone:

  1. the probability of success;
  2. The scientific impact of the milestone, if it were reached; and
  3. The social impact of the milestone, if it were reached.

These predictions can then be used to calculate the expected utility, or likely impact, of a proposal and design and compare potential portfolios.

Key Takeaways for Grantmakers and Policymakers

The three main strengths of using expected utility forecasting to conduct peer review are

Despite the apparent complexity of this process, we found that first-time users were able to successfully complete their review according to the guidelines without any additional support. Most of the complexity occurs behind-the-scenes, and either aligns with the responsibilities of the program manager (e.g., defining milestones and their dependencies) or can be automated (e.g., calculating the total expected utility). Thus, grantmakers and policymakers can have confidence in the user friendliness of expected utility forecasting. 

How Can NSF or NIH Run an Experiment on Expected Utility Forecasting?

An initial pilot study could be conducted by NSF or NIH by adding a short, non-binding expected utility forecasting component to a selection of review panels. In addition to the evaluation of traditional criteria, reviewers would be asked to predict the success and impact of select milestones for the proposals assigned to them. The rest of the review process and the final funding decisions would be made using the traditional criteria. 

Afterwards, study facilitators could take the expected utility forecasting results and construct an alternate portfolio of proposals that would have been funded if that approach was used, and compare the two portfolios. Such a comparison would yield valuable insights into whether—and how—the types of proposals selected by each approach differ, and whether their use leads to different considerations arising during review. Additionally, a pilot assessment of reviewers’ prediction accuracy could be conducted by asking program officers to assess milestone achievement and study impact upon completion of funded projects.

Findings and Recommendations

Reviewers in our study were new to the expected utility forecasting process and gave generally positive reactions. In their feedback, reviewers said that they appreciated how the framing of the questions prompted them to think about the proposals in a different way and pushed them to ground their assessments with quantitative forecasts. The focus on just three review criteria–probability of success, scientific impact, and social impact–was seen as a strength because it simplified the process, disentangled feasibility from impact, and eliminated biased metrics. Overall, reviewers found this new approach interesting and worth investigating further. 

In designing this pilot and analyzing the results, we identified several important considerations for planning such a review process. While complex, engaging with these considerations tended to provide value by making implicit project details explicit and encouraging clear definition and communication of evaluation criteria to reviewers. Two key examples are defining the proposal milestones and creating impact scoring systems. In both cases, reducing ambiguities in terms of the goals that are to be achieved, developing an understanding of how outcomes depend on one another, and creating interpretable and resolvable criteria for assessment will help ensure that the desired information is solicited from reviewers. 

Questions for Further Study

Our pilot only simulated the individual review phase of grant proposals and did not simulate a full review committee. The typical review process at a funding agency consists of first, individual evaluations by assigned reviewers, then discussion of those evaluations by the whole review committee, and finally, the submission of final scores from all members of the committee. This is similar to the Delphi method, a structured process for eliciting forecasts from a panel of experts, so we believe that it would work well with expected utility forecasting. The primary change would therefore be in the definition and approach for eliciting criterion scores, rather than the structure of the review process. Nevertheless, future implementations may uncover additional considerations that need to be addressed or better ways to incorporate forecasting into a panel environment. 

Further investigation into how best to define proposal milestones is also needed. This includes questions such as, who should be responsible for determining the milestones? If reviewers are involved, at what part(s) of the review process should this occur? What is the right balance between precision and flexibility of milestone definitions, such that the best outcomes are achieved? How much flexibility should there be in the number of milestones per proposal? 

Lastly, more thought should be given to how to define social impact and how to calibrate reviewers’ interpretation of the impact score scale. In our report, we propose a couple of different options for calibrating impact, in addition to describing the one we took in our pilot. 

Interested grantmakers, both public and private, and policymakers are welcome to reach out to our team if interested in learning more or receiving assistance in implementing this approach.


Introduction

The fundamental concern of grantmakers, whether governmental or philanthropic, is how to make the best funding decisions. All funding decisions come with inherent uncertainties that may pose risks to the investment. Thus, a certain level of risk-aversion is natural and even desirable in grantmaking institutions, especially federal science agencies which are responsible for managing taxpayer dollars. However, without risk, there is no reward, so the trade-off must be balanced. In mathematics and economics, expected utility is the common metric assumed to underlie all rational decision making. Expected utility has two components: the probability of an outcome occurring if an action is taken and the value of that outcome, which roughly corresponds with risk and reward. Thus, expected utility would seem to be a logical choice for evaluating science funding proposals. 

In the debates around funding innovation though, expected utility has largely flown under the radar compared to other ideas. Nevertheless, Chiara Franzoni and Paula Stephan have proposed using expected utility in peer review. Building off of their paper, the Federation of American Scientists (FAS) developed a detailed framework for how to implement expected utility into a peer review process. We chose to frame the review criteria as forecasting questions, since determining the expected utility of a proposal inherently requires making some predictions about the future. Forecasting questions also have the added benefit of being resolvable–i.e., the true outcome can be determined after the fact and compared to the prediction–which provides a learning opportunity for reviewers to improve their abilities and identify biases. In addition to forecasting, we incorporated other unique features, like an exponential scale for scoring impact, that we believe help reduce biases against risky proposals. 

With the theory laid out, we conducted a small pilot in fall of 2023. The pilot was run in collaboration with Metaculus, a crowd forecasting platform and aggregator, to leverage their expertise in designing resolvable forecasting questions and to use their platform to collect forecasts from reviewers. The purpose of the pilot was to test the mechanics of this approach in practice, see if there are any additional considerations that need to be thought through, and surface potential issues that need to be solved for. We were also curious if there would be any interesting or unexpected results that arise based on how we chose to calculate impact and total expected utility. It is important to note that this pilot was not an experiment, so we did not have a control group to compare the results of the review with. 

Since FAS is not a grantmaking institution, we did not have a ready supply of traditional grant proposals to use. Instead, we used a set of two-page research proposals for Focused Research Organizations (FROs) that we had sourced through separate advocacy work in that area.1 With the proposal authors’ permission, we recruited a cohort of twenty subject matter experts to each review one of five proposals. For each proposal, we defined two research milestones in consultation with the proposal authors. Reviewers were asked to make three forecasts for each milestone:

  1. The probability of success;
  2. The scientific impact, conditional on success; and
  3. The social impact, conditional on success.

Reviewers submitted their forecasts on Metaculus’ platform; in a separate form they provided explanations for their forecasts and responded to questions about their experience and impression of this new approach to proposal evaluation. (See Appendix A for details on the pilot study design.)

Insights from Reviewer Feedback

Overall, reviewers liked the framing and criteria provided by the expected utility approach, while their main critique was of the structure of the research proposals. Excluding critiques of the research proposal structure, which are unlikely to apply to an actual grant program, two thirds of the reviewers expressed positive opinions of the review process and/or thought it was worth pursuing further given drawbacks with existing review processes. Below, we delve into the details of the feedback we received from reviewers and their implications for future implementation.

Feedback on Review Criteria

Disentangling Impact from Feasibility

Many of the reviewers said that this model prompted them to think differently about how they assess the proposals and that they liked the new questions. Reviewers appreciated that the questions focused their attention on what they think funding agencies really want to know and nothing more: “can it occur?” and “will it matter?” This approach explicitly disentangles impact from feasibility: “Often, these two are taken together, and if one doesn’t think it is likely to succeed, the impact is also seen as lower.” Additionally, the emphasis on big picture scientific and social impact “is often missing in the typical review process.” Reviewers also liked that this approach eliminates what they consider biased metrics, such as the principal investigator’s reputation, track record, and “excellence.” 

Reducing Administrative Burden

The small set of questions was seen as more efficient and less burdensome on reviewers. One reviewer said, “I liked this approach to scoring a proposal. It reduces the effort to thinking about perceived impact and feasibility.” Another reviewer said, “On the whole it seems a worthwhile exercise as the current review processes for proposals are onerous.” 

Quantitative Forecasting

Reviewers saw benefits to being asked to quantify their assessments, but also found it challenging at times. A number of reviewers enjoyed taking a quantitative approach and thought that it helped them be more grounded and explicit in their evaluations of the proposals. However, some reviewers were concerned that it felt like guesswork and expressed low confidence in their quantitative assessments, primarily due to proposals lacking details on their planned research methods, which is an issue discussed in the section “Feedback on Proposals.” Nevertheless, some of these reviewers still saw benefits to taking a quantitative approach: “It is interesting to try to estimate probabilities, rather than making flat statements, but I don’t think I guess very well. It is better than simply classically reviewing the proposal [though].” Since not all academics have experience making quantitative predictions, we expect that there will be a learning curve for those new to the practice. Forecasting is a skill that can be learned though, and we think that with training and feedback, reviewers can become better, more confident forecasters.

Defining Social Impact

Of the three types of questions that reviewers were asked to answer, the question about social impact seemed the harder one for reviewers to interpret. Reviewers noted that they would have liked more guidance on what was meant by social impact and whether that included indirect impacts. Since questions like these are ultimately subjective, the “right” definition of social impact and what types of outcomes are considered most valuable will depend on the grantmaking institution, their domain area, and their theory of change, so we leave this open to future implementers to clarify in their instructions. 

Calibrating Impact

While the impact score scale (see Appendix A) defines the relative difference in impact between scores, it does not define the absolute impact conveyed by a score. For this reason, a calibration mechanism is necessary to provide reviewers with a shared understanding of the use and interpretation of the scoring system. Note that this is a challenge that rubric-based peer review criteria used by science agencies also face. Discussion and aggregation of scores across a review committee helps align reviewers and average out some of this natural variation.2

To address this, we surveyed a small, separate set of academics in the life sciences about how they would score the social and scientific impact of the average NIH R01 grant, which many life science researchers apply to and review proposals for. We then provided the average scores from this survey to reviewers to orient them to the new scale and help them calibrate their scores. 

One reviewer suggested an alternative approach: “The other thing I might change is having a test/baseline question for every reviewer to respond to, so you can get a feel for how we skew in terms of assessing impact on both scientific and social aspects.” One option would be to ask reviewers to score the social and scientific impact of the average grant proposal for a grant program that all reviewers would be familiar with; another would be to ask reviewers to score the impact of the average funded grant for a specific grant program, which could be more accessible for new reviewers who have not previously reviewed grant proposals. A third option would be to provide all reviewers on a committee with one or more sample proposals to score and discuss, in a relevant and shared domain area.

When deciding on an approach for calibration, a key consideration is the specific resolution criteria that are being used — i.e., the downstream measures of impact that reviewers are being asked to predict. One option, which was used in our pilot, is to predict the scores that a comparable, but independent, panel of reviewers would give the project some number of years following its successful completion. For a resolution criterion like this one, collecting and sharing calibration scores can help reviewers get a sense for not just their own approach to scoring, but also those of their peers.

Making Funding Decisions

In scoring the social and scientific impact of each proposal, reviewers were asked to assess the value of the proposal to society or to the scientific field. That alone would be insufficient to determine whether a proposal should be funded though, since it would need to be compared with other proposals in conjunction with its feasibility. To do so, we calculated the total expected utility of each proposal (see Appendix C). In a real funding scenario, this final metric could then be used to compare proposals and determine which ones get funded. Additionally, unlike a traditional scoring system, the expected utility approach allows for the detailed comparison of portfolios — including considerations like the expected proportion of milestones reached and the range of likely impacts.

In our pilot, reviewers were not informed that we would be doing this additional calculation based on their submissions. As a result, one reviewer thought that the questions they were asked failed to include other important questions, like “should it occur?” and “is it worth the opportunity cost?” Though these questions were not asked of reviewers explicitly, we believe that they would be answered once the expected utility of all proposals is calculated and considered, since the opportunity cost of one proposal would be the expected utility of the other proposals. Since each reviewer only provided input on one proposal, they may have felt like the scores they gave would be used to make a binary yes/no decision on whether to fund that one proposal, rather than being considered as a part of a larger pool of proposals, as it would be in a real review process.

Feedback on Proposals

Missing Information Impedes Forecasting

The primary critique that reviewers expressed was that the research proposals lacked details about their research plans, what methods and experimental protocols would be used, and what preliminary research the author(s) had done so far. This hindered their ability to properly assess the technical feasibility of the proposals and their probability of success. A few reviewers expressed that they also would have liked to have had a better sense of who would be conducting the research and each team member’s responsibilities. These issues arose because the FRO proposals used in our pilot had not originally been submitted for funding purposes, and thus lacked the requirements of traditional grant proposals, as we noted above. We assume this would not be an issue with proposals submitted to actual grantmakers.3  

Improving Milestone Design

A few reviewers pointed out that some of the proposal milestones were too ambiguous or were not worded specifically enough, such that there were ways that researchers could technically say that they had achieved the milestone without accomplishing the spirit of its intent. This made it more challenging for reviewers to assess milestones, since they weren’t sure whether to focus on the ideal (i.e., more impactful) interpretation of the milestone or to account for these “loopholes.” Moreover, loopholes skew the forecasts, since they increase the probability of achieving a milestone, while lowering the impact of doing so if it is achieved through a loophole.

One reviewer suggested, “I feel like the design of milestones should be far more carefully worded – or broken up into sub-sentences/sub-aims, to evaluate the feasibility of each. As the questions are currently broken down, I feel they create a perverse incentive to create a vaguer milestone, or one that can be more easily considered ‘achieved’ for some ‘good enough’ value of achieved.” For example, they proposed that one of the proposal milestones, “screen a library of tens of thousands of phage genes for enterobacteria for interactions and publish promising new interactions for the field to study,” could be expanded to

  1. “Generate a library of tens of thousands of genes from enterobacteria, expressed in E. coli
  2. “Validate their expression under screenable conditions
  3. “Screen the library for their ability to impede phage infection with a panel of 20 type phages
  4. “Publish … 
  5. “Store and distribute the library, making it as accessible to the broader community”

We agree with the need for careful consideration and design of milestones, given that “loopholes” in milestones can detract from their intended impact and make it harder for reviewers to accurately assess their likelihood. In our theoretical framework for this approach, we identified three potential parties that could be responsible for defining milestones: (1) the proposal author(s), (2) the program manager, with or without input from proposal authors, or (3) the reviewers, with or without input from proposal authors. This critique suggests that the first approach of allowing proposal authors to be the sole party responsible for defining proposal milestones is vulnerable to being gamed, and the second or third approach would be preferable. Program managers who take on the task of defining milestones should have enough expertise to think through the different potential ways of fulfilling a milestone and make sure that they are sufficiently precise for reviewers to assess.

Benefits of Flexibility in Milestones

Some flexibility in milestones may still be desirable, especially with respect to the actual methodology, since experimentation may be necessary to determine the best technique to use. For example, speaking about the feasibility of a different proposal milestone – “demonstrate that Pro-AG technology can be adapted to a single pathogenic bacterial strain in a 300 gallon aquarium of fish and successfully reduce antibiotic resistance by 90%” – a reviewer noted that 

“The main complexity and uncertainty around successful completion of this milestone arises from the native fish microbiome and whether a CRISPR delivery tool can reach the target strain in question. Due to the framing of this milestone, should a single strain be very difficult to reach, the authors could simply switch to a different target strain if necessary. Additionally, the mode of CRISPR delivery is not prescribed in reaching this milestone, so the authors have a host of different techniques open to them, including conjugative delivery by a probiotic donor or delivery by engineered bacteriophage.”

Peer Review Results

Sequential Milestones vs. Independent Outcomes

In our expected utility forecasting framework, we defined two different ways that a proposal could structure its outcomes: as sequential milestones where each additional milestone builds off of the success of the previous one, or as independent outcomes where the success of one is not dependent on the success of the other(s). For proposals with sequential milestones in our pilot, we would expect the probability of success of milestone 2 to be less than the probability of success of milestone 1 and for the opposite to be true of their impact scores. For proposals with independent outcomes, we do not expect there to be a relationship between the probability of success and the impact scores of milestones 1 and 2. There are different equations for calculating the total expected utility, depending on the relationship between outcomes (see Appendix C).

For each of the proposals in our study, we categorized them based on whether they had sequential milestones or independent outcomes. This information was not shared with reviewers. Table 1 presents the average reviewer forecasts for each proposal. In general, milestones received higher scientific impact scores than social impact scores, which makes sense given the primarily academic focus of research proposals. For proposals 1 to 3, the probability of success of milestone 2 was roughly half of the probability of success of milestone 1; reviewers also gave milestone 2 higher scientific and social impact scores than milestone 1. This is consistent with our categorization of proposals 1 to 3 as sequential milestones.

Table 1. Mean forecasts for each proposal.
See next section for discussion about the categorization of proposal 4’s milestones.
Milestone 1Milestone 2
ProposalMilestone CategoryProbability of SuccessScientific Impact ScoreSocial Impact ScoreProbability of SuccessScientific Impact ScoreSocial Impact Score
1sequential0.807.837.350.418.228.25
2sequential0.886.413.720.368.217.62
3sequential0.687.076.450.348.207.50
4?0.726.583.920.477.064.19
5independent0.557.142.370.406.662.25

Further Discussion on Designing and Categorizing Milestones

We originally categorized proposal 4’s milestones as sequential, but one reviewer gave milestone 2 a lower scientific impact score than milestone 1 and two reviewers gave it a lower social impact score. One reviewer also gave milestone 2 roughly the same probability of success as milestone 1. This suggests that proposal 4’s milestones can’t be considered strictly sequential. 

The two milestones for proposal 4 were

The reviewer who gave milestone 2 a lower scientific impact score explained: “Given the wording of the milestone, I do not believe that if the scientific milestone was achieved, it would greatly improve our understanding of the brain.” Unlike proposals 1-3, in which milestone 2 was a scaled-up or improved-upon version of milestone 1, these milestones represent fundamentally different categories of output (general-purpose tool vs specific model). Thus, despite the necessity of milestone 1’s tool for achieving milestone 2, the reviewer’s response suggests that the impact of milestone 2 was being considered separately rather than cumulatively.

Milestone Design Recommendations
Explicitly define sequential milestones
Recommendation 1

To properly address this case of sequential milestones with different types of outputs, we recommend that for all sequential milestones, latter milestones should be explicitly defined as inclusive of prior milestones. In the above example, this would imply redefining milestone 2 as “Complete milestone 1 and develop a model of the C. elegans nervous system…” This way, reviewers know to include the impact of milestone 1 in their assessment of the impact of milestone 2.

Clarify milestone category with reviewers
Recommendation 2

To help ensure that reviewers are aligned with program managers in how they interpret the proposal milestones (if they aren’t directly involved in defining milestones), we suggest that either reviewers be informed of how program managers are categorizing the proposal outputs so they can conduct their review accordingly or allow reviewers to decide the category (and thus how the total expected utility is calculated), whether individually or collectively or both.

Allow for a flexible number of milestones
Recommendation 3

We chose to use only two of the goals that proposal authors provided because we wanted to standardize the number of milestones across proposals. However, this may have provided an incomplete picture of the proposals’ goals, and thus an incomplete assessment of the proposals. We recommend that future implementations be flexible and allow the number of milestones to be determined based on each proposal’s needs. This would also help accommodate one of the reviewers’ suggestion that some milestones should be broken down into intermediary steps.

Importance of Reviewer Explanations

As one can tell from the above discussion, reviewers’ explanation of their forecasts were crucial to understanding how they interpreted the milestones. Reviewers’ explanations varied in length and detail, but the most insightful responses broke down their reasoning into detailed steps and addressed (1) ambiguities in the milestone and how they chose to interpret ambiguities if they existed, (2) the state of the scientific field and the maturity of different techniques that the authors propose to use, and (3) factors that improve the likelihood of success versus potential barriers or challenges that would need to be overcome.

Exponential Impact Scales Better Reflect the Real Distribution of Impact 

The distribution of NIH and NSF proposal peer review scores tends to be skewed such that most proposals are rated above the center of the scale and there are few proposals rated poorly. However, other markers of scientific impact, such as citations (even with all of its imperfections), tend to suggest a long tail of studies with high impact. This discrepancy suggests that traditional peer review scoring systems are not well-structured to capture the nonlinearity of scientific impact, resulting in score inflation. The aggregation of scores at the top end of the scale also means that very negative scores have a greater impact than very positive scores when averaged together, since there’s more room between the average score and the bottom end of the scale. This can generate systemic bias against more controversial or risky proposals.

In our pilot, we chose to use an exponential scale with a base of 2 for impact to better reflect the real distribution of scientific impact. Using this exponential impact scale, we conducted a survey of a small pool of academics in the life sciences about how they would rate the impact of the average funded NIH R01 grant. They responded with an average scientific impact score 5 and an average social impact score of 3, which are much lower on our scale compared to traditional peer review scores4, suggesting that the exponential scale may be beneficial for avoiding score inflation and bunching at the top. In our pilot, the distribution of scientific impact scores was centered higher than 5, but still less skewed than NIH peer review scores for significance and innovation typically are. This partially reflects the fact that proposals were expected to be funded at one to two orders of magnitude more than NIH R01 proposals are, so impact should also be greater. The distribution of social impact scores exhibits a much wider spread and lower center.

Figure 1. Distribution of Impact scores for milestone 1 (top) and 2 (bottom)

Conclusion

In summary, expected utility forecasting presents a promising approach to improving the rigor of peer review and quantitatively defining the risk-reward profile of science proposals. Our pilot study suggests that this approach can be quite user-friendly for reviewers, despite its apparent complexity. Further study into how best to integrate forecasting into panel environments, define proposal milestones, and calibrate impact scales will help refine future implementations of this approach. 

More broadly, we hope that this pilot will encourage more grantmaking institutions to experiment with innovative funding mechanisms. Reviewers in our pilot were more open-minded and quick-to-learn than one might expect and saw significant value in this unconventional approach. Perhaps this should not be so much of a surprise given that experimentation is at the heart of scientific research. 

Interested grantmakers, both public and private, and policymakers are welcome to reach out to our team if interested in learning more or receiving assistance in implementing this approach. 

Acknowledgements

Many thanks to Jordan Dworkin for being an incredible thought partner in designing the pilot and providing meticulous feedback on this report. Your efforts made this project possible!


Appendix A: Pilot Study Design

Our pilot study consisted of five proposals for life science-related Focused Research Organizations (FROs). These proposals were solicited from academic researchers by FAS as part of our advocacy for the concept of FROs. As such, these proposals were not originally intended as proposals for direct funding, and did not have as strict content requirements as traditional grant proposals typically do. Researchers were asked to submit one to two page proposals discussing (1) their research concept, (2) the motivation and its expected social and scientific impact, and (3) the rationale for why this research can not be accomplished through traditional funding channels and thus requires a FRO to be funded.

Permission was obtained from proposal authors to use their proposals in this study. We worked with proposal authors to define two milestones for each proposal that reviewers would assess: one that they felt confident that they could achieve and one that was more ambitious but that they still thought was feasible. In addition, due to the brevity of the proposals, we included an additional 1-2 pages of supplementary information and scientific context. Final drafts of the milestones and supplementary information were provided to authors to edit and approve. Because this pilot study could not provide any actual funding to proposal authors, it was not possible to solicit full length research proposals from proposal authors.

We recruited four to six reviewers for each proposal based on their subject matter expertise. Potential participants were recruited over email with a request to help review a FRO proposal related to their area of research. They were informed that the review process would be unconventional but were not informed of the study’s purpose. Participants were offered a small monetary compensation for their time.

Confirmed participants were sent instructions and materials for the review process on the same day and were asked to complete their review by the same deadline a month and a half later. Reviewers were told to assume that, if funded, each proposal would receive $50 million in funding over five years to conduct the research, consistent with the proposed model for FROs. Each proposal had two technical milestones, and reviewers were asked to answer the following questions for each milestone: 

  1. Assuming that the proposal is funded by 2025, will the milestone be achieved before 2031?
  2. What will be the average scientific impact score, as judged in 2032, of accomplishing the milestone?
  3. What will be the average social impact score, as judged in 2032, of accomplishing the milestone?

The impact scoring system was explained to reviewers as follows:

Please consider the following in determining the impact score: the current and expected long-term social or scientific impact of a funded FRO’s outputs if a funded FRO accomplishes this milestone before 2030.

The impact score we are using ranges from 1 (low) to 10 (high). It is base 2 exponential, meaning that a proposal that receives a score of 5 has double the impact of a proposal that receives a score of 4, and quadruple the impact of a proposal that receives a score of 3. In a small survey we conducted of SMEs in the life sciences, they rated the scientific and social impact of the average NIH R01 grant — a federally funded research grant that provides $1-2 million for a 3-5 year endeavor — on this scale to be 5.2 ± 1.5 and 3.1 ± 1.3, respectively. The median scores were 4.75 and 3.00, respectively.

Below is an example of how a predicted impact score distribution (left) would translate into an actual impact distribution (right). You can try it out yourself with this interactive version (in the menu bar, click Runtime > Run all) to get some further intuition on how the impact score works. Please note that this is meant solely for instructive purposes, and the interface is not designed to match Metaculus’ interface.

The choice of an exponential impact scale reflects the tendency in science for a small number of research projects to have an outsized impact. For example, studies have shown that the relationship between the number of citations for a journal article and its percentile rank scales exponentially.

Scientific impact aims to capture the extent to which a project advances the frontiers of knowledge, enables new discoveries or innovations, or enhances scientific capabilities or methods. Though each is imperfect, one could consider citations of papers, patents on tools or methods, or users of software or datasets as proxies of scientific impact. 

Social impact aims to capture the extent to which a project contributes to solving important societal problems, improving well-being, or advancing social goals. Some proxy metrics that one might use to assess a project’s social impact are the value of lives saved, the cost of illness prevented, the number of job-years of employment generated, economic output in terms of GDP, or the social return on investment. 

You may consider any or none of these proxy metrics as a part of your assessment of the impact of a FRO accomplishing this milestone.

Reviewers were asked to submit their forecasts on Metaculus’ website and to provide their reasoning in a separate Google form. For question 1, reviewers were asked to respond with a single probability. For questions 2 and 3, reviewers were asked to provide their median, 25th percentile, and 75th percentile predictions, in order to generate a probability distribution. Metaculus’ website also included information on the resolution criteria of each question, which provided guidance to reviewers on how to answer the question. Individual reviewers were blind to other reviewers’ responses until after the submission deadline, at which point the aggregated results of all of the responses were made public on Metaculus’ website. 

Additionally, in the Google form, reviewers were asked to answer a survey question about their experience: “What did you think about this review process? Did it prompt you to think about the proposal in a different way than when you normally review proposals? If so, how? What did you like about it? What did you not like? What would you change about it if you could?” 

Some participants did not complete their review. We received 19 complete reviews in the end, with each proposal receiving three to six reviews. 

Study Limitations

Our pilot study had certain limitations that should be noted. Since FAS is not a grantmaking institution, we could not completely reproduce the same types of research proposals that a grantmaking institution would receive nor the entire review process. We will highlight these differences in comparison to federal science agencies, which are our primary focus.

  1. Review Process: There are typically two phases to peer review at NIH and NSF. First, at least three individual reviewers with relevant subject matter expertise are assigned to read and evaluate a proposal independently. Then, a larger committee of experts is convened. There, the assigned reviewers present the proposal and their evaluation, and then the committee discusses and determines the final score for the proposal. Our pilot study only attempted to replicate the first phase of individual review.
  1. Sample Size: In our pilot, the sample size was quite small, since only five proposals were reviewed, and they were all in different subfields, so different reviewers were assigned to each proposal. NIH and NSF peer review committees typically focus on one subfield and review on the order of twenty or so proposals. The number of reviewers per proposal–three to six–in our pilot was consistent with the number of reviewers typically assigned to a proposal by NIH and NSF. Peer review committees are typically larger, ranging from six to twenty people, depending on the agency and the field.
  1. Proposals: The FRO proposals plus supplementary information were only two to four pages long, which is significantly shorter than the 12 to 15 page proposals that researchers submit for NIH and NSF grants. Proposal authors were asked to generally describe their research concept, but were not explicitly required to describe the details of the research methodology they would use or any preliminary research. Some proposal authors volunteered more information on this for the supplementary information, but not all authors did. 
  1. Grant Size: For the FRO proposals, reviewers were asked to assume that funded proposals would receive $50 million over five years, which is one to two orders of magnitude more funding than typical NIH and NSF proposals.

Appendix B: Feedback on Study-Specific Implementation

In addition to feedback about the review framework, we received feedback on how we implemented our pilot study, specifically the instructions and materials for the review process and the submission platforms. This feedback isn’t central to this paper’s investigation of expected value forecasting, but we wanted to include it in the appendix for transparency.

Reviewers were sent instructions over email that outlined the review process and linked to Metaculus’ webpage for this pilot. On Metaculus’ website, reviewers could find links to the proposals on FAS’ website and the supplementary information in Google docs. Reviewers were expected to read those first and then read through the resolution criteria for each forecasting question before submitting their answers on Metaculus’ platform. Reviewers were asked to submit the explanations behind their forecasts in a separate Google form.

Some reviewers had no problem navigating the review process and found Metaculus’ website easy to use. However, feedback from other reviewers suggested that the different components necessary for the review were spread out over too many different websites, making it difficult for reviewers to keep track of where to find everything they needed.

Some had trouble locating the different materials and pieces of information needed to conduct the review on Metaculus’ website. Others found it confusing to have to submit their forecasts and explanations in two separate places. One reviewer suggested that the explanation of the impact scoring system should have been included within the instructions sent over email rather than in the resolution criteria on Metaculus’ website so that they could have read it before reading the proposal. Another reviewer suggested that it would have been simpler to submit their forecasts through the same Google form that they used to submit their explanations rather than through Metaculus’ website. 

Based on this feedback, we would recommend that future implementations streamline their submission process to a single platform and provide a more extensive set of instructions rather than seeding information across different steps of the review process. Training sessions, which science funding agencies typically conduct, would be a good supplement to written instructions.

Appendix C: Total Expected Utility Calculations

To calculate the total expected utility, we first converted all of the impact scores into utility by taking two to the exponential of the impact score, since the impact scoring system is base 2 exponential:

Utility=2Impact Score.

We then were able to average the utilities for each milestone and conduct additional calculations. 

To calculate the total utility of each milestone, ui, we averaged the social utility and the scientific utility of the milestone:

ui = (Social Utility + Scientific Utility)/2.

The total expected utility (TEU) of a proposal with two milestones can be calculated according to the general equation:

TEU = u1P(m1 ∩ not m2) + u2P(m2 ∩ not m1) + (u1+u2)P(m1m2),

where P(mi) represents the probability of success of milestone i and

P(m1 ∩ not m2) = P(m1) – P(m1 ∩ m2)
P(m2 ∩ not m1) = P(m2) – P(m1 ∩ m2).

For sequential milestones, milestone 2 is defined as inclusive of milestone 1 and wholly dependent on the success of milestone 1, so this means that

u2, seq = u1+u2
P(m2) = Pseq(m1 ∩ m2)
P(m2 ∩ not m1) = 0.

Thus, the total expected utility of sequential milestones can be simplified as

TEU = u1P(m1)-u1P(m2) + (u2, seq)P(m2)
TEU = u1P(m1) + (u2, seq-u1)P(m2)

This can be generalized to

TEUseq = Σi(ui, seq-ui-1, seq)P(mi).

Otherwise, the total expected utility can be simplified to 

TEU = u1P(m1) + u2P(m2) – (u1+u2)P(m1 ∩ m2).

For independent outcomes, we assume 

Pind(m1 ∩ m2) = P(m1)P(m2), 

so

TEUind = u1P(m1) + u2P(m2) – (u1+u2)P(m1)P(m2).

To present the results in Tables 1 and 2, we converted all of the utility values back into the impact score scale by taking the log base 2 of the results.

Scaling AI Safely: Can Preparedness Frameworks Pull Their Weight?

A new class of risk mitigation policies has recently come into vogue for frontier AI developers. Known alternately as Responsible Scaling Policies or Preparedness Frameworks, these policies outline commitments to risk mitigations that developers of the most advanced AI models will implement as their models display increasingly risky capabilities. While the idea for these policies is less than a year old, already two of the most advanced AI developers, Anthropic and OpenAI, have published initial versions of these policies. The U.K. AI Safety Institute asked frontier AI developers about their “Responsible Capability Scaling” policies ahead of the November 2023 UK AI Safety Summit. It seems that these policies are here to stay.

The National Institute of Standards & Technology (NIST) recently sought public input on its assignments regarding generative AI risk management, AI evaluation, and red-teaming. The Federation of American Scientists was happy to provide input; this is the full text of our response. NIST’s request for information (RFI) highlighted several potential risks and impacts of potentially dual-use foundation models, including: “Negative effects of system interaction and tool use…chemical, biological, radiological, and nuclear (CBRN) risks…[e]nhancing or otherwise affecting malign cyber actors’ capabilities…[and i]mpacts to individuals and society.” This RFI presented a good opportunity for us to discuss the benefits and drawbacks of these new risk mitigation policies.

This report will provide some background on this class of risk mitigation policies (we use the term Preparedness Framework, for reasons to be described below). We outline suggested criteria for robust Preparedness Frameworks (PFs) and evaluate two key documents, Anthropic’s Responsible Scaling Policy and OpenAI’s Preparedness Framework, against these criteria. We claim that these policies are net-positive and should be encouraged. At the same time, we identify shortcomings of current PFs, chiefly that they are underspecified, insufficiently conservative, and address structural risks poorly. Improvement in the state of the art of risk evaluation for frontier AI models is a prerequisite for a meaningfully binding PF. Most importantly, PFs, as unilateral commitments by private actors, cannot replace public policy.

Motivation for Preparedness Frameworks

As AI labs develop potentially dual-use foundation models (as defined by Executive Order No. 14110, the “AI EO”) with capability, compute, and efficiency improvements, novel risks may emerge, some of them potentially catastrophic. Today’s foundation models can already cause harm and pose some risks, especially as they are more broadly used. Advanced large language models at times display unpredictable behaviors

To this point, these harms have not risen to the level of posing catastrophic risks, defined here broadly as “devastating consequences for vast numbers of people.” The capabilities of models at the current state of the art simply do not imply levels of catastrophic risk above current non-AI related margins.1 However, as these models continue to scale in training compute, some speculate they may develop novel capabilities that could potentially be misused. The specific capabilities that will emerge from further scaling remain difficult to predict with confidence or certainty. Some analysis indicates that as training compute for AI models has doubled approximately every six months since 2015, performance on capability benchmarks has also steadily improved. While it’s possible that bigger models could lead to better performance, it wouldn’t be surprising if smaller models emerge with better capabilities, as despite years of research by machine learning theorists, our knowledge of just how the number of model parameters relates to model capabilities remains uncertain. 

Nonetheless, as capabilities increase, risks may also increase, and new risks may appear. Executive Order 14110 (the Executive Order on Artificial Intelligence, or the “AI EO”) detailed some novel risks of potentially dual-use foundation models, including potential risks associated with chemical, biological, radiological, or nuclear (CBRN) risks and advanced cybersecurity risks. Other risks are more speculative, such as risks of model autonomy, loss of control of AI systems, or negative impacts on users including risks of persuasion.2 Without robust risk mitigations, it is plausible that increasingly powerful AI systems will eventually pose greater societal risks.

Other technologies that pose catastrophic risks, such as nuclear technologies, are heavily regulated in order to prevent those risks from resulting in serious harms. There is a growing movement to regulate development of potentially dual-use biotechnologies, particularly gain-of-function research on the most pathogenic microbes. Given the rapid pace of progress at the AI frontier, comprehensive government regulation has yet to catch up; private companies that develop these models are starting to take it upon themselves to prevent or mitigate the risks of advanced AI development.

Prevention of such novel and consequential risks requires developers to implement policies that address potential risks iteratively. That is where preparedness frameworks come in. A preparedness framework is used to assess risk levels across key categories and outline associated risk mitigations. As the introduction to OpenAI’s PF states, “The processes laid out in each version of the Preparedness Framework will help us rapidly improve our understanding of the science and empirical texture of catastrophic risk, and establish the processes needed to protect against unsafe development.” Without such processes and commitments, the tendency to prioritize speed over safety concerns might prevail. While the exact consequences of failing to mitigate these risks are uncertain, they could potentially be significant.

Preparedness frameworks are limited in scope to catastrophic risks. These policies aim to prevent the worst conceivable outcomes of the development of future advanced AI systems; they are not intended to cover risks from existing systems. We acknowledge that this is an important limitation of preparedness frameworks. Developers can and should address both today’s risks and future risks at the same time; preparedness frameworks attempt to address the latter, while other “trustworthy AI” policies attempt to address a broader swathe of risks. For instance, OpenAI’s “Preparedness” team sits alongside its “Safety Systems” team, which “focuses on mitigating misuse of current models and products like ChatGPT.”

A note about terminology: The term “Responsible Scaling Policy” (RSP) is the term that took hold first, but it presupposes scaling of compute and capabilities by default. “Preparedness Framework” (PF) is a term coined by OpenAI, and it communicates the idea that the company needs to be prepared as its models approach the level of artificial general intelligence. Of the two options, “Preparedness Framework” communicates the essential idea more clearly: developers of potentially dual-use foundation models must be prepared for and mitigate potential catastrophic risks from development of these models.

The Industry Landscape

In September of 2023, ARC Evals (now METR, “Model Evaluation & Threat Research”) published a blog post titled “Responsible Scaling Policies (RSPs).” This post outlined the motivation and basic structure of an RSP, and revealed that ARC Evals had helped Anthropic write its RSP (version 1.0) which had been released publicly a few days prior. (ARC Evals had also run pre-deployment evaluations on Anthropic’s Claude model and OpenAI’s GPT-4.) And in December 2023, OpenAI published its Preparedness Framework in beta; while using new terminology, this document is structurally similar to ARC Evals’ outline of the structure of an RSP. Both OpenAI and Anthropic have indicated that they plan to update their PFs with new information as the frontier of AI development advances.

Not every AI company should develop or maintain a preparedness framework. Since these policies relate to catastrophic risk from models with advanced capabilities, only those developers whose models could plausibly attain those capabilities should use PFs. Because these advanced capabilities are associated with high levels of training compute, a good interim threshold for who should develop a PF could be the same as the AI EO threshold for potentially dual-use foundation models; that is, developers of models trained on over 10^26 FLOPS (or October 2023-equivalent level of compute adjusted for compute efficiency gains).3 Currently, only a handful of developers have models that even approach this threshold. This threshold should be subject to change, like that of the AI EO, as developers continue to push the frontier (e.g. by developing more efficient algorithms or realizing other compute efficiency gains).

While several other companies published “Responsible Capability Scaling” documents ahead of the UK AI Safety Summit, including DeepMind, Meta, Microsoft, Amazon, and Inflection AI, the rest of this report focuses primarily on OpenAI’s PF and Anthropic’s RSP. 

Weaknesses of Preparedness Frameworks

Preparedness frameworks are not panaceas for AI-associated risks. Even with improvements in specificity, transparency, and strengthened risk mitigations, there are important weaknesses to the use of PFs. Here we outline a couple weaknesses of PFs and possible answers to them.

1. Spirit vs. text: PFs are voluntary commitments whose success depends on developers’ faithfulness to their principles.

Current risk thresholds and mitigations are defined loosely. In Anthropic’s RSP, for instance, the jump from the current risk level posed by Claude 2 (its state of the art model) to the next risk level is defined in part by the following: “Access to the model would substantially increase the risk of catastrophic misuse, either by proliferating capabilities, lowering costs, or enabling new methods of attack….” A “substantial increase” is not well-defined. This ambiguity leaves room for interpretation; since implementing risk mitigations can be costly, developers could have an incentive to take advantage of such ambiguity if they do not follow the spirit of the policy.

This concern about the gap between following the spirit of the PF and following the text might be somewhat eased with more specificity about risk thresholds and associated mitigations, and especially with more transparency and public accountability to these commitments.

To their credit, OpenAI’s PF and Anthropic’s RSP show a serious approach to the risks of developing increasingly advanced AI systems. OpenAI’s PF includes a commitment to fine-tune its models to better elicit capabilities along particular risk categories, then evaluate “against these enhanced models to ensure we are testing against the ‘worst case’ scenario we know of.” They also commit to triggering risk mitigations “when any of the tracked risk categories increase in severity, rather than only when they all increase together.” And Anthropic “commit[s] to pause the scaling and/or delay the deployment of new models whenever our scaling ability outstrips our ability to comply with the safety procedures for the corresponding ASL [AI Safety Level].” These commitments are costly signals that these developers are serious about their PFs.

2. Private commitment vs. public policy: PFs are unilateral commitments that individual developers take on; we might prefer more universal policy (or regulatory) approaches.

Private companies developing AI systems may not fully account for broader societal risks. Consider an analogy to climate change—no single company’s emissions are solely responsible for risks like sea level rise or extreme weather. The risk comes from the aggregate emissions of all companies. Similarly, AI developers may not consider how their systems interact with others across society, potentially creating structural risks. Like climate change, the societal risks from AI will likely come from the cumulative impact of many different systems. Unilateral commitments are poor tools to address such risks.

Furthermore, PFs might reduce the urgency for government intervention. By appearing safety-conscious, developers could diminish the perceived need for regulatory measures. Policymakers might over-rely on self-regulation by AI developers, potentially compromising public interest for private gains.

Policy can and should step into the gap left by PFs. Policy is more aligned to the public good, and as such is less subject to competing incentives. And policy can be enforced, unlike voluntary commitments. In general, preparedness frameworks and similar policies help hold private actors accountable to their public commitments; this effect is stronger with more specificity in defining risk thresholds, better evaluation methods, and more transparency in reporting. However, these policies cannot and should not replace government action to reduce catastrophic risks (especially structural risks) of frontier AI systems.

Suggested Criteria for Robust Preparedness Frameworks

These criteria are adapted from the ARC Evals post, Anthropic’s RSP, and OpenAI’s PF. Broadly, they are aspirational; no existing preparedness framework meets all or most of these criteria.

For each criterion, we explain the key considerations for developers adopting PFs. We analyze OpenAI’s PF and Anthropic’s RSP to illustrate the strengths and shortcomings of their approaches. Again, these policies are net-positive and should be encouraged. They demonstrate costly unilateral commitments to measuring and addressing catastrophic risk from their models; they meaningfully improve on the status quo. However, these initial PFs are underspecified and insufficiently conservative. Improvement in the state of the art of risk evaluation and mitigation, and subsequent updates, would make them more robust.

Suggested Criteria for Robust Preparedness Frameworks
Table 1: Summary of suggested criteria for robust preparedness frameworks.
BreadthPreparedness frameworks should cover the breadth of potential catastrophic risks of developing frontier AI models.“What risks are covered?”
Risk appetitePreparedness frameworks should define the developer’s acceptable risk level (“risk appetite”) in terms of likelihood and severity of risk.“What is an acceptable level of risk?”
ClarityPreparedness frameworks should clearly define capability levels and risk thresholds.“How will developers know they have hit capability levels associated with particular risks?”
EvaluationPreparedness frameworks should include detailed evaluation procedures for AI models, ensuring comprehensive risk assessment.“What tests will developers run on their models?”
MitigationFor different risk thresholds, preparedness frameworks should identify and commit to pre-specified risk mitigations.“What will developers do when their models reach particular levels of risk?”
RobustnessPreparedness frameworks’ pre-specified risk mitigations must effectively address potentially catastrophic risks.“How do developers know their risk mitigations will work?”
AccountabilityPreparedness frameworks should combine credible risk mitigation commitments with governance structures that ensure these commitments are fulfilled.“How can developers hold themselves accountable to their commitment to safety?”
AmendmentsPreparedness frameworks should include a mechanism for regular updates to the framework itself, in light of ongoing research and advances in AI.“How will developers change their PFs over time?”
TransparencyFor models with risk above the lowest level, both pre- and post-mitigation evaluation results and methods should be public, including any performed mitigations.“How will developers communicate about their models’ capabilities and risks?”

1. Preparedness frameworks should cover the breadth of potential catastrophic risks of developing frontier AI models. 

These risks may include:

Preparedness frameworks should apply to catastrophic risks in particular because they govern the scaling of capabilities of the most advanced AI models, and because catastrophic risks are of the highest consequence to such development. PFs are one tool among many that developers of the most advanced AI models should use to prevent harm. Developers of advanced AI models tend to also have other “trustworthy AI” policies, which seek to prevent and address already-existing risks such as harmful outputs, disinformation, and synthetic sexual content. Despite PFs’ focus on potentially catastrophic risks, faithfully applying PFs may help developers catch many other kinds of risks as well, since they involve extensive evaluation for misuse potential and adverse human impacts.

2. Preparedness frameworks should define the developer’s acceptable risk level (“risk appetite”) in terms of likelihood and severity of risk, in accordance with the NIST AI Risk Management Framework, section Map 1.5.

Neither OpenAI nor Anthropic has publicly declared their risk appetite. This is a nascent field of research, as these risks are novel and perhaps less predictable than eg. nuclear accident risk.5 NIST and other standard-setting bodies will be crucial in developing AI risk metrology. For now, PFs should state developers’ risk appetites as clearly as possible, and update them regularly with research advances.6

AI developers’ risk appetites might be different than a regulatory risk appetite. Developers should elucidate their risk appetite in quantitative terms so their PFs can be evaluated accordingly. As in the case of nuclear technology, regulators may eventually impose risk thresholds on frontier AI developers. At this point, however, there is no standard, scientifically-grounded approach to measuring the potential for catastrophic AI risk; this has to start with the developers of the most capable AI models.

3. Preparedness frameworks should clearly define capability levels and risk thresholds. Risk thresholds should be quantified robustly enough to hold developers accountable to their commitments.

OpenAI and Anthropic both outline qualitative risk thresholds corresponding with different categories of risk. For instance, in OpenAI’s PF, the High risk threshold in the CBRN category reads: “​​Model enables an expert to develop a novel threat vector OR model provides meaningfully improved assistance that enables anyone with basic training in a relevant field (e.g., introductory undergraduate biology course) to be able to create a CBRN threat.” And Anthropic’s RSP defines the ASL-3 [AI Safety Level] threshold as: “Low-level autonomous capabilities, or access to the model would substantially increase the risk of catastrophic misuse, either by proliferating capabilities, lowering costs, or enabling new methods of attack, as compared to a non-LLM baseline of risk.”

These qualitative thresholds are under-specified; reasonable people are likely to differ on what “meaningfully improved assistance” looks like, or a “substantial increase [in] the risk of catastrophic misuse.” In PFs, these thresholds should be quantified to the extent possible.

To be sure, the AI development research community currently lacks a good empirical understanding of the likelihood or quantification of frontier AI-related risks. Again, this is a novel science that needs to be developed with input from both the private and public sectors. Since this science is still developing, it is natural to want to avoid too much quantification. A conceivable failure mode is that developers “check the boxes,” which may become obsolete quickly, in lieu of using their judgment to determine when capabilities are dangerous enough to warrant higher risk mitigations. Again, as research improves, we should expect to see improvements in PFs’ specification of risk thresholds.

4. Preparedness frameworks should include detailed evaluation procedures for AI models, ensuring comprehensive risk assessment within a developer’s tolerance. 

Anthropic and OpenAI both have room for improvement on detailing their evaluation procedures. Anthropic’s RSP includes evaluation procedures for model autonomy and misuse risks. Its evaluation procedures for model autonomy are impressively detailed, including clearly defined tasks on which it will evaluate its models. Its evaluation procedures for misuse risk are much less well-defined, though it does include the following note: “We stress that this will be hard and require iteration. There are fundamental uncertainties and disagreements about every layer…It will take time, consultation with experts, and continual updating.” And OpenAI’s PF includes a “Model Scorecard,” a mock evaluation of an advanced AI model. This model scorecard includes the hypothetical results of various evaluations in all four of their tracked risk categories; it does not appear to be a comprehensive list of evaluation procedures.

Again, the science of AI model evaluation is young. The AI EO directs NIST to develop red-teaming guidance for developers of potentially dual-use foundation models. NIST, along with private actors such as METR and other AI evaluators, will play a crucial role in creating and testing red-teaming practices and model evaluations that elicit all relevant capabilities.

5. For different risk thresholds, preparedness frameworks should identify and commit to pre-specified risk mitigations.

Classes of risk mitigations may include:

Both OpenAI’s PF and Anthropic’s RSP commit to a number of pre-specified risk mitigations for different thresholds. For example, for what Anthropic calls “ASL-2” models (including its most advanced model, Claude 2), they commit to measures including publishing model cards, providing a vulnerability reporting mechanism, enforcing an acceptable use policy, and more. Models at higher risk thresholds (what Anthropic calls “ASL-3” and above) have different, more stringent risk mitigations, including “limit[ing] access to training techniques and model hyperparameters…” and “implement[ing] measures designed to harden our security…”

Risk mitigations can and should differ in approaches to development versus deployment. There are different levels of risk associated with possessing models internally and allowing external actors to interact with them. Both OpenAI’s PF and Anthropic’s RSP include different risk mitigation approaches for development and deployment. For example, OpenAI’s PF restricts deployment of models such that “Only models with a post-mitigation score of “medium” or below can be deployed,” whereas it restricts development of models such that “Only models with a post-mitigation score of “high” or below can be developed further.”

Mitigations should be defined as specifically as possible, with the understanding that as the state of the art changes, this too is an area that will require periodic updates. Developers should include some room for judgment here.

6. Preparedness frameworks’ pre-specified risk mitigations must effectively address potentially catastrophic risks.

Having confidence that the risk mitigations do in fact address potential catastrophic risks is perhaps the most important and difficult aspect of a PF to evaluate. Catastrophic risk from AI is a novel and speculative field; evaluating AI capabilities is a science in its infancy; and there are no empirical studies of the effectiveness of risk mitigations preventing such risks. Given this uncertainty, frontier AI developers should err on the side of caution.

Both OpenAI and Anthropic should be more conservative in their risk mitigations. Consider OpenAI’s commitment to restricting development: “[I]f we reach (or are forecasted to reach) ‘critical’ pre-mitigation risk along any risk category, we commit to ensuring there are sufficient mitigations in place…for the overall post-mitigation risk to be back at most to ‘high’ level.” To understand this commitment, we have to look at their threshold definitions. Under the Model Autonomy category, the “critical” threshold in part includes: “model can self-exfiltrate under current prevailing security.” Setting aside that this threshold is still quite vague and difficult to evaluate (and setting aside the novelty of this capability), a model that approaches or exceeds this threshold by definition can self-exfiltrate, rendering all other risk mitigations ineffective. A more robust approach to restricting development would not permit training or possessing a model that comes close to exceeding this threshold.

As for Anthropic, consider their threshold for “ASL-3,” which reads in part: “Access to the model would substantially increase the risk of catastrophic misuse…” The risk mitigations for ASL-3 models include the following: “Harden security such that non-state attackers are unlikely to be able to steal model weights and advanced threat actors (e.g. states) cannot steal them without significant expense.” While an admirable approach to development of potentially dual-use foundation models, assuming state actors seek out tools whose misuse involves catastrophic risk, a more conservative mitigation would entail hardening security such that it is unlikely that any actor, state or non-state, could steal the model weights of such a model.9

7. Preparedness frameworks should combine credible risk mitigation commitments with governance structures that ensure these commitments are fulfilled.

Preparedness Frameworks should detail governance structures that incentivize actually undertaking pre-committed risk mitigations when thresholds are met. Other incentives, including profit and shareholder value, sometimes conflict with risk management.

Anthropic’s RSP includes a number of procedural commitments meant to enhance the credibility of its risk mitigation commitments. For example, Anthropic commits to proactively planning to pause scaling of its models,10 publicly sharing evaluation results, and appointing a “Responsible Scaling Officer.” However, Anthropic’s RSP also includes the following clause: “[I]n a situation of extreme emergency, such as when a clearly bad actor (such as a rogue state) is scaling in so reckless a manner that it is likely to lead to lead to imminent global catastrophe if not stopped…we could envisage a substantial loosening of these restrictions as an emergency response…” This clause potentially undermines the credibility of Anthropic’s other commitments in the RSP, if at any time it can point to another actor who in its view is scaling recklessly.

OpenAI’s PF also outlines commendable governance measures, including procedural commitments, meant to enhance its risk mitigation credibility. It summarizes its operation structure: “(1) [T]here is a dedicated team “on the ground” focused on preparedness research and monitoring (Preparedness team), (2) there is an advisory group (Safety Advisory Group) that has a sufficient diversity of perspectives and technical expertise to provide nuanced input and recommendations, and (3) there is a final decision-maker (OpenAI Leadership, with the option for the OpenAI Board of Directors to overrule).” 

8. Preparedness frameworks should include a mechanism for regular updates to the framework itself, in light of ongoing research and advances in AI.

Both OpenAI’s PF and Anthropic’s RSP acknowledge the importance of regular updates. This is reflected in both of these documents’ names: Anthropic labels its RSP as “Version 1.0,” while OpenAI’s PF is labeled as “(Beta).”

Anthropic’s RSP includes an “Update Process” that reads in part: “We expect most updates to this process to be incremental…as we learn more about model safety features or unexpected capabilities…” This language directly commits Anthropic to changing its RSP as the state of the art changes. OpenAI references updates throughout its PF, notably committing to updating its evaluation methods and rubrics (“The Scorecard will be regularly updated by the Preparedness team to help ensure it reflects the latest research and findings”).

9. For models with risk above the lowest level, most evaluation results and methods should be public, including any performed mitigations

Publishing model evaluations and mitigations is an important tool for holding developers accountable to their PF commitments. Sensitivity about the level of transparency is key. For example, full information about evaluation methodology and risk mitigations could be exploited by malicious actors. Anthropic’s RSP takes a balanced approach in committing to “[p]ublicly share evaluation results after model deployment where possible, in some cases in the initial model card, in other cases with a delay if it serves a broad safety interest.” OpenAI’s PF does not commit to publishing its Model Scorecards, but OpenAI has since published related research on whether its models aid the creation of biological threats.

Conclusion

Preparedness frameworks represent a promising approach for AI developers to voluntarily commit to robust risk management practices. However, current versions have weaknesses—particularly their lack of specificity in risk thresholds, insufficiently conservative risk mitigation approaches, and inadequacy in addressing structural risks. Frontier AI developers without PFs should consider adopting them, and OpenAI and Anthropic should update their policies to strengthen risk mitigations and include more specificity.

Strengthening preparedness frameworks will require advancing AI safety science to enable precise risk quantification and develop new mitigations. NIST, academics, and companies plan to collaborate to measure and model frontier AI risks. Policymakers have a crucial opportunity to adapt regulatory approaches from other high-risk technologies like nuclear power to balance AI innovation and catastrophic risk prevention. Furthermore, standards bodies could develop more robust AI evaluations best practices, including guidance for third-party auditors.

Overall the AI community must view safety as an intrinsic priority, not just private actors creating preparedness frameworks. All stakeholders, including private companies, academics, policymakers and civil society organizations have roles to play in steering AI development toward societally beneficial outcomes. Preparedness frameworks are one tool, but not sufficient absent more comprehensive, multi-stakeholder efforts to scale AI safely and for the public good.

Many thanks to Madeleine Chang, Di Cooke, Thomas Woodside, and Felipe Calero Forero for providing helpful feedback.

Working with academics: A primer for U.S. government agencies

Collaboration between federal agencies and academic researchers is an important tool for public policy. By facilitating the exchange of knowledge, ideas, and talent, these partnerships can help address pressing societal challenges. But because it is rarely in either party’s job description to conduct outreach and build relationships with the other, many important dynamics are often hidden from view. This primer provides an initial set of questions and topics for agencies to consider when exploring academic partnership.

Why should agencies consider working with academics?

What considerations may arise when working with academics?

Table (Of Contents)
Characteristics of discussed collaborative structures
StructurePrimary needPotential mechanismsStructural complexityLevel of effort
Informal advisingKnowledge >> CapacityAd-hoc engagement; formal consulting agreementLowOccasional work, over the short- to long-term
Study groupsKnowledge > CapacityInformal working group; formal extramural awardModerateOccasional to part-time work, over the short- to medium-term
Collaborative researchCapacity ~= KnowledgeInformal research partnership, formal grant, or cooperative agreement / contractVariablePart-time work, over the medium- to long-term
Short-term placementsCapacity > KnowledgeIPA, OPM Schedule A(r), or expert contract; either ad-hoc or through a formal programModeratePart- to full-time work, over a short- to medium-term
Long-term rotationsCapacity >> KnowledgeIPA, OPM Schedule A(r), or SGE designation; typically through a formal programHighFull-time work, over a medium- to long-term
BOX 1. Key academic considerations
Academic career stages.

Academic faculty progress through different stages of professorship — typically assistant, associate, and full — that affect their research and teaching expectations and opportunities. Assistant professors are tenure-track faculty who need to secure funding, publish papers, and meet the standards for tenure. Associate professors have job security and academic freedom, but also more mentoring and leadership responsibilities; associate professors are typically tenured, though this is not always the case. Full professors are senior faculty who have a high reputation and recognition in their field, but also more demands for service and supervision. The nature of agency-academic collaboration may depend on the seniority of the academic. For example, junior faculty may be more available to work with agencies, but primarily in contexts that will lead to traditional academic outputs; while senior faculty may be more selective, but their academic freedom will allow for less formal and more impact-oriented work.

Soft vs. hard money positions.

Soft money positions are those that depend largely or entirely on external funding sources, typically research grants, to support the salary and expenses of the faculty. Hard money positions are those that are supported by the academic institution’s central funds, typically tied to more explicit (and more expansive) expectations for teaching and service than soft-money positions. Faculty in soft money positions may face more pressure to secure funding for research, while faculty in hard money positions may have more autonomy in their research agenda but more competing academic activities. Federal agencies should be aware of the funding situation of the academic faculty they collaborate with, as it may affect their incentives and expectations for agency engagement.

Sabbatical credits.

A sabbatical is a period of leave from regular academic duties, usually for one or two semesters, that allows faculty to pursue an intensive and unstructured scope of work — this can include research in their own field or others, as well as external engagements or tours of service with non-academic institutions . Faculty accrue sabbatical credits based on their length and type of service at the university, and may apply for a sabbatical once they have enough credits. The amount of salary received during a sabbatical depends on the number of credits and the duration of the leave. Federal agencies may benefit from collaborating with academic faculty who are on sabbatical, as they may have more time and interest to devote to impact-focused work.

Consulting/outside activity limits.

Consulting limits & outside activity limits are policies that regulate the amount of time that academic faculty can spend on professional activities outside their university employment. These policies are intended to prevent conflicts of commitment or interest that may interfere with the faculty’s primary obligations to the university, such as teaching, research, and service, and the specific limits vary by university. Federal agencies may need to consider these limits when engaging academic faculty in ongoing or high-commitment collaborations.

9 vs. 12 month salaries.

Some academic faculty are paid on a 9-month basis, meaning that they receive their annual salary over nine months and have the option to supplement their income with external funding or other activities during the summer months. Other faculty are paid on a 12-month basis, meaning that they receive their annual salary over twelve months and have less flexibility to pursue outside opportunities. Federal agencies may need to consider the salary structure of the academic faculty they work with, as it may affect their availability to engage on projects and the optimal timing with which they can do so.

Advisory relationships consist of an academic providing occasional or periodic guidance to a federal agency on a specific topic or issue, without being formally contracted or compensated. This type of collaboration can be useful for agencies that need access to cutting-edge expertise or perspectives, but do not have a formal deliverable in mind.

Academic considerations

Regulatory & structural considerations

Box 2. Key structural considerations
Regulatory guidance.

Federal agencies and academic institutions are subject to various laws and regulations that affect their research collaboration, and the ownership and use of the research outputs. Key legislation includes the Federal Advisory Committee Act (FACA), which governs advisory committees and ensures transparency and accountability; the Federal Acquisition Regulation (FAR), which controls the acquisition of supplies and services with appropriated funds; and the Federal Grant and Cooperative Agreement Act (FGCAA), which provides criteria for distinguishing between grants, cooperative agreements, and contracts. Agencies should ensure that collaborations are structured in accordance with these and other laws.

Contracting mechanisms.

Federal agencies may use various contracting mechanisms to engage researchers from non-federal entities in collaborative roles. These mechanisms include the IPA Mobility Program, which allows the temporary assignment of personnel between federal and non-federal organizations; the Experts & Consultants authority, which allows the appointment of qualified experts and consultants to positions that require only intermittent and/or temporary employment; and Cooperative Research and Development Agreements (CRADAs), which allow agencies to enter into collaborative agreements with non-federal partners to conduct research and development projects of mutual interest.

University Office of Sponsored Programs.

Offices of Sponsored Programs are units within universities that provide administrative support and oversight for externally funded research projects. OSPs are responsible for reviewing and approving proposals, negotiating and accepting awards, ensuring compliance with sponsor and university policies and regulations, and managing post-award activities such as reporting, invoicing, and auditing. Federal agencies typically interact with OSPs as the authorized representative of the university in matters related to sponsored research.

Non-disclosure agreements.

When engaging with academics, federal agencies may use NDAs to safeguard sensitive information. Agencies each have their own rules and procedures for using and enforcing NDAs involving their grantees and contractors. These rules and procedures vary, but generally require researchers to sign an NDA outlining rights and obligations relating to classified information, data, and research findings shared during collaborations.

A study group is a type of collaboration where an academic participates in a group of experts convened by a federal agency to conduct analysis or education on a specific topic or issue. The study group may produce a report or hold meetings to present their findings to the agency or other stakeholders. This type of collaboration can be useful for agencies that need to gather evidence or insights from multiple sources and disciplines with expertise relevant to their work.

Academic considerations

Regulatory & structural considerations

Case study

In 2022, the National Science Foundation (NSF) awarded the National Bureau of Economic Research (NBER) a grant to create the EAGER: Place-Based Innovation Policy Study Group. This group, led by two economists with expertise in entrepreneurship, innovation, and regional development — Jorge Guzman from Columbia University and Scott Stern from MIT — aimed to provide “timely insight for the NSF Regional Innovation Engines program.” During Fall 2022, the group met regularly with NSF staff to i) provide an assessment of the “state of knowledge” of place-based innovation ecosystems, ii) identify the insights of this research to inform NSF staff on design of their policies, and iii) surface potential means by which to measure and evaluate place-based innovation ecosystems on a rigorous and ongoing basis. Several of the academic leads then completed a paper synthesizing the opportunities and design considerations of the regional innovation engine model, based on the collaborative exploration and insights developed throughout the year. In this case, the study group was structured as a grant, with funding provided to the organizing institution (NBER) for personnel and convening costs. Yet other approaches are possible; for example, NSF recently launched a broader study group with the Institute for Progress, which is structured as a no-cost Other Transaction Authority contract.

Active collaboration covers scenarios in which an academic engages in joint research with a federal agency, either as a co-investigator, a subrecipient, a contractor, or a consultant. This type of collaboration can be useful for agencies that need to leverage the expertise, facilities, data, or networks of academics to conduct research that advances their mission, goals, or priorities.

Academic considerations

Regulatory & structural considerations

Case studies

External collaboration between academic researchers and government agencies has repeatedly proven fruitful for both parties. For example, in May 2020, the Rhode Island Department of Health partnered with researchers at Brown University’s Policy Lab to conduct a randomized controlled trial evaluating the effectiveness of different letter designs in encouraging COVID-19 testing. This study identified design principles that improved uptake of testing by 25–60% without increasing cost, and led to follow-on collaborations between the institutions. The North Carolina Office of Strategic Partnerships provides a prime example of how government agencies can take steps to facilitate these collaborations. The office recently launched the North Carolina Project Portal, which serves as a platform for the agency to share their research needs, and for external partners — including academics — to express interest in collaborating. Researchers are encouraged to contact the relevant project leads, who then assess interested parties on their expertise and capacity, extend an offer for a formal research partnership, and initiate the project.

Short-term placements allow for an academic researcher to work at a federal agency for a limited period of time (typically one year or less), either as a fellow, a scholar, a detailee, or a special government employee. This type of collaboration can be useful for agencies that need to fill temporary gaps in expertise, capacity, or leadership, or to foster cross-sector exchange and learning.

Academic considerations

Regulatory & structural considerations

Case studies

Various programs exist throughout government to facilitate short-term rotations of outside experts into federal agencies and offices. One of the most well-known examples is the American Association for the Advancement of Science (AAAS) Science & Technology Policy Fellowship (STPF) program, which places scientists and engineers from various disciplines and career stages in federal agencies for one year to apply their scientific knowledge and skills to inform policy making and implementation. The Schedule A(r) hiring authority tends to be well-suited for these kinds of fellowships; it is used, for example, by the Bureau of Economic Analysis to bring on early career fellows through the American Economic Association’s Summer Economics Fellows Program. In some circumstances, outside experts are brought into government “on loan” from their home institution to do a tour of service in a federal office or agency; in these cases, the IPA program can be a useful mechanism. IPAs are used by the National Science Foundation (NSF) in its Rotator Program, which brings outside scientists into the agency to serve as temporary Program Directors and bring cutting-edge knowledge to the agency’s grantmaking and priority-setting. IPA is also used for more ad-hoc talent needs; for example, the Office of Evaluation Sciences (OES) at GSA often uses it to bring in fellows and academic affiliates.

Long-term rotations allow an academic to work at a federal agency for an extended period of time (more than one year), either as a fellow, a scholar, a detailee, or a special government employee. This type of collaboration can be useful for agencies that need to recruit and retain expertise, capacity, or leadership in areas that are critical to their mission, goals, or priorities.

Academic considerations

Regulatory & structural considerations

Case study

One example of a long-term rotation that draws experts from academia into federal agency work is the Advanced Research Projects Agency (ARPA) Program Manager (PM) role. ARPA PMs — across DARPA, IARPA, ARPA-E, and now ARPA-H — are responsible for leading high-risk, high-reward research programs, and have considerable autonomy and authority in defining their research vision, selecting research performers, managing their research budget, and overseeing their research outcomes. PMs are typically recruited from academia, industry, or government for a term of three to five years, and are expected to return to their academic institutions or pursue other career opportunities after their term at the agency. PMs coming from academia or nonprofit organizations are often brought on through the IPA mobility program, and some entities also have unique term-limited, hiring authorities for this purpose. PMs can also be hired as full government employees; this mechanism is primarily used for candidates coming from the private sector.