Unlocking AI’s Grid Modernization Potential
Surging energy demand and increasingly frequent extreme weather events are bringing new challenges to the forefront of electric grid planning, permitting, operations, and resilience. These hurdles are pushing our already fragile grid to the limit, highlighting decades of underinvestment, stagnant growth, and the pressing need to modernize our system.
While these challenges aren’t new, they are newly urgent. The society-wide emergence of artificial intelligence (AI) is bringing many of these challenges into sharper focus, pushing the already increasing electricity demand to new heights and cementing the need for deployable, scalable, and impactful solutions. Fortunately, many transformational and mature AI tools provide near-term pathways for significant grid modernization.
This policy memo builds on foundational research from the US Department of Energy’s (DOE) AI for Energy (2024) report to present a new matrix that maps these unique AI applications onto an “impact-readiness” scale. Nearly half of the applications identified by DOE are high impact and ready to deploy today. An additional ~40% have high impact potential but require further investment and research to move up the readiness scale. Only 2 of 14 use cases analyzed here fall into the “low-impact / low-readiness” quadrant.
Unlike other emerging technologies, AI’s potential in grid modernization is not simply an R&D story, but a deployment one. However, with limited resources, the federal government should invest in use cases that show high-impact potential and demonstrate feasible levels of deployment readiness. The recommendations in this memo target regulatory actions across the Federal Energy Regulatory Commission (FERC) and the Department of Energy (DOE), data modernization programs at the Federal Permitting Improvement Steering Council (FPISC), and funding opportunities and pilot projects at and the DOE and the Federal Emergency Management Agency (FEMA).
Thoughtful policy coordination, targeted investments, and continued federal support will be needed to realize the potential of these applications and pave the way for further innovation.
Challenge and Opportunity
Surging Load Growth, Extreme Events, and a Fragmented Federal Response
Surging energy demand and more frequent extreme weather events are bringing new challenges to the forefront of grid planning and operations. Not only is electric load growing at rates not seen in decades, but extreme weather events and cybersecurity threats are becoming more common and costly. All the while, our grid is becoming more complex to operate as new sources of generation and grid management tools evolve. Underlying these complexities is the fragmented nature of our energy system: a patchwork of regional grids, localized standards, and often conflicting regulations.
The emergence of artificial intelligence (AI) has brought many of these challenges into sharper focus. However, the potential of AI to mitigate, sidestep, or solve these challenges is also vast. From more efficient permitting processes to more reliable grid operations, many unique AI use cases for grid modernization are ready to deploy today and have high-impact potential.
The federal government has a unique role to play in both meeting these challenges and catalyzing these opportunities by implementing AI solutions. However, the current federal landscape is fragmented, unaligned, and missing critical opportunities for impact. Nearly a dozen federal agencies and offices are engaged across the AI grid modernization ecosystem (see FAQ #2), with few coordinating in the absence of a defined federal strategy.
To prioritize effective and efficient deployment of resources, recommendations for increased investments (both in time and capital) should be based on a solid understanding of where the gaps and opportunities lie. Historically, program offices across DOE and other agencies have focused efforts on early-stage R&D and foundational science activities for emerging technology. For AI, however, the federal government is well-positioned to support further deployment of the technology into grid modernization efforts, rather than just traditional R&D activities.
AI Applications for Grid Modernization
AI’s potential in grid modernization is significant, expansive, and deployable. Across four distinct categories—grid planning, siting and permitting, operations and reliability, and resilience—AI can improve existing processes or enable entirely new ones. Indeed, the use of AI in the power sector is not a new phenomenon. Industry and government alike have long utilized machine learning (ML) models across a range of power sector applications, and the recent introduction of “foundation” models (such as large language models, or LLMs) has opened up a new suite of transformational use cases. While LLMs and other foundation models can be used in various use cases, AI’s potential to accelerate grid modernization will span both traditional and novel approaches, with many applications requiring custom-built models tailored to specific operational, regulatory, and data environments.
The following 14 use cases are drawn from DOE’s AI for Energy (2024) report and form the foundation of this memo’s analytical framework.
Grid Planning
- Capital Allocations and Planned Upgrades. Use AI to optimize utility investment decisions by forecasting asset risk, load growth, and grid needs to guide substation upgrades, reconductoring, or distributed energy resource (DER)-related capacity expansions.
- Improved Information on Grid Capacity. Use AI to generate more granular and dynamic hosting capacity, load forecast, and congestion data to guide DER siting, interconnection acceleration, and non-wires alternatives.
- Improved Transportation and Energy Planning Alignment. Use AI-enabled joint forecasting tools to align EV infrastructure rollout with utility grid planning by integrating traffic, land use, and load growth data.
- Interconnection Issues and Power Systems Models. Use AI-accelerated power flow models and queue screening tools to reduce delays and improve transparency in interconnection studies.
Siting and Permitting
- Zoning and Local Permitting Analysis. Use AI to analyze zoning ordinances, land use restrictions, and local permitting codes to identify siting barriers or opportunities earlier in the project development process.
- Federal Environmental Review Accelerations. Use AI tools to extract, organize, and summarize unstructured and disparate datasets to support more efficient and consistent reviews.
- AI Models to Assist Subject Matter Experts in Reviews. Use AI and document analysis tools to support expert reviewers by checking for completeness, inconsistencies, or precedent in technical applications and environmental documents.
Grid Operations and Reliability
- Load and Supply Matching. Use AI to improve short-term load forecasting and optimize generation dispatch, reducing imbalance costs and improving integration of variable resources.
- Predictive and Risk-Informed Maintenance. Use AI to predict asset degradation or failure and inform maintenance schedules based on equipment health, environmental stressors, and historical failure data.
- Operational Safety and Issues Reporting and Analysis. Apply AI to analyze safety incident logs, compliance records, and operator reports to identify patterns of human error, procedural risks, or training needs.
Grid Resilience
- Self-healing Infrastructure for Reliability and Resilience. Use AI to autonomously isolate faults, reconfigure power flows, and restore service in real time through intelligent switching and local control systems.
- Detection and Diagnosis of Anomalous Events. Use AI to identify and localize grid disturbances such as faults, voltage anomalies, or cyber intrusions using high-frequency telemetry and system behavior data.
- AI-enabled Situational Awareness and Actions for Resilience. Leverage AI to synthesize grid, weather, and asset data to support operator awareness and guide event response during extreme weather or grid stress events.
- Resilience with Distributed Energy Resources. Coordinate DERs during grid disruptions using AI for forecasting, dispatch, and microgrid formation, enabling system flexibility and backup power during emergencies.
However, not all applications are created equal. With limited resources, the federal government should prioritize use cases that show high-impact potential and demonstrate feasible levels of deployment readiness. Additional investments should also be allocated to high-impact / low-readiness use cases to help unlock and scale these applications.
Unlocking the potential of these use cases requires a better understanding of which ones hit specific benchmarks. The matrix below provides a framework for thinking through these questions.
Using the use cases identified above, we’ve mapped AI’s applications in grid modernization onto a “readiness-impact” chart based on six unique scoring scales (see appendix for full methodological and scoring breakdown).
Readiness Scale Questions
- Technical Readiness. Is the AI solution mature, validated, and performant?
- Financial Readiness. Is it cost-effective and fundable (via CapEx, OpEx, or rate recovery)?
- Regulatory Readiness. Can it be deployed under existing rules, with institutional buy-in?
Impact Scale Questions
- Value. Does this AI solution reduce costs, outages, emissions, or delays in a measurable way?
- Leverage. Does it enable or unlock broader grid modernization (e.g., DERs, grid enhancing technologies (GETs), and/or virtual power plant (VPP) integration)?
- Fit. Is AI the right or necessary tool to solve this compared to conventional tools (i.e., traditional transmission planning, interconnection study, and/or compliance software)?
Each AI application receives a score of 0-5 in each category, which are then averaged to determine its overall readiness and impact scores. To score each application, a detailed rubric was designed with scoring scales for each of the above-mentioned six categories. Industry examples and experience, existing literature, and outside expert consultation was utilized to then assign scores to each application.
When plotted on a coordinate plane, each application falls into one of four quadrants, helping us easily identify key insights about each use case.
- High-Impact / High-Readiness use cases → Deploy now
- High-Impact / Low-Readiness → Invest, unlock, and scale
- Low-Impact / High-Readiness → Optional pilots, but deprioritize federal effort
- Low-Impact / Low-Readiness → Monitor private sector action
Once plotted, we can then identify additional insights, such as where the clustering happens, what barriers are holding back the highest impact applications, and if there are recurring challenges (or opportunities) across the four categories of grid modernization efforts.

Plan of Action
Grid Planning
Average Readiness Score: 2.3 | Average Impact Score: 3.8
- AI use cases in grid planning face the highest financial and regulatory hurdles of any category. Reducing these barriers can unlock high-impact potential.
- These tools are high-leverage use cases. Getting these deployed unlocks deeper grid modernization activities system-wide, such as grid-enhancing technology (GETs) integration.
- While many of these AI tools are technically mature, adoption is not yet mainstream.
Recommendation 1. The Federal Energy Regulatory Commission (FERC) should clarify the regulatory pathway for AI use cases in grid planning.
Regional Transmission Organizations (RTOs), utilities, and Public Utility Commissions (PUCs) require confidence that AI tools are approved and supported before they deploy them at scale. They also need financial clarity on viable pathways to rate-basing significant up-front costs. Building on Commissioner Rosner’s Letters Regarding Interconnection Automation, FERC should establish a FERC-DOE-RTO technical working group on “Next-Gen Planning Tools” that informs FERC-compliant AI-enabled planning, modeling, and reporting standards. Current regulations (and traditional planning approaches) leave uncertainty around the explainability, validation, and auditability of AI-driven tools.
Thus, the working group should identify where AI tools can be incorporated into planning processes without undermining existing reliability, transparency, or stakeholder-participation standards. The group should develop voluntary technical guidance on model validation standards, transparency requirements, and procedural integration to provide a clear pathway for compliant adoption across FERC-regulated jurisdictions.
Siting and Permitting
Average Readiness Score: 2.7 | Average Impact Score: 3.8
- Zoning and local permitting tools are promising, but adoption is fragmented across state, local, and regional jurisdictions.
- Federal permitting acceleration tools score high on technical readiness but face institutional distrust and a complicated regulatory environment.
- In general, tools in this category have high value but limited transferability beyond highly specific scenarios (low leverage). Even if unlocked at scale, they have narrower application potential than other tools analyzed in this memo.
Recommendation 2. The Federal Permitting Improvement Steering Council (FPISC) should establish a federal siting and permitting data modernization initiative.
AI tools can increase speed and consistency in siting and permitting processes by automating the review of complex datasets, but without structured data, standardized workflows, and agency buy-in, their adoption will remain fragmented and niche. Furthermore, most grid infrastructure data (including siting and permitting documentation) is confidential and protected, leading to industry skepticism about the ability of AI to maintain important security measures alongside transparent workflows. To address these concerns, FPISC should launch a coordinated initiative that creates structured templates for federal permitting documents, pilots AI integration at select agencies, and develops a public validation database that allows AI developers to test their models (with anonymous data) against real agency workflows. Having launched a $30 million effort in 2024 to improve IT systems across multiple agencies, FPSIC is well-positioned to take those lessons learned and align deeper AI integration across the federal government’s permitting processes. Coordination with the Council on Environmental Quality (CEQ), which was recently called on to develop a Permitting Technology Action Plan, is also encouraged. Additional Congressional appropriations to FPISC can unlock further innovation.
Operations and Reliability
Average Readiness Score: 3.6 | Average Impact Score: 3.6
- Overall, this category has the highest average readiness across technical, financial, and regulatory scales. These use cases are clear “ready-now” wins.
- They also have the highest fit component of impact, representing unique opportunities for AI tools to improve on existing systems and processes in ways that traditional tools cannot.
Recommendation 3. Launch an AI Deployment Challenge at DOE to scale high-readiness tools across the sector.
From the SunShot Initiative (2011) through the Energy Storage Grand Challenge (2020) to the Energy Earthshots (2021), DOE has a long history of catalyzing the deployment of new technology in the power sector. A dedicated grand challenge – funded with new Congressional appropriations at the Grid Deployment Office – could deploy matching grants or performance-based incentives to utilities, co-ops, and municipal providers to accelerate adoption of proven AI tools.
Grid Resilience
Average Readiness Score: 3.4 | Average Impact Score: 4.2
- As a category, resilience applications have the highest overall impact score, including a perfect value score across all four use cases. There is significant potential in deploying AI tools to solve these challenges.
- Alongside operations and reliability use cases, these tools also exhibit the highest technical readiness, demonstrating technical maturity alongside high value potential.
- Anomalous events detection is the highest-scoring use case across all 14 applications, on both readiness and impact scales. It’s already been deployed and is ready to scale.
Recommendation 4. DOE, the Federal Emergency Management Agency (FEMA), and FERC should create an AI for Resilience Program that funds and validates AI tools that support cross-jurisdictional grid resilience.
AI for resilience applications often require coordination across traditional system boundaries, from utilities to DERs, microgrids to emergency managers, as well as high levels of institutional trust. Federal coordination can catalyze system integration by funding demo projects, developing integration playbooks, and clarifying regulatory pathways for AI-automated resilience actions.
Congress should direct DOE and FEMA, in consultation with FERC, to establish a new program (or carve out existing grid resilience funds) to: (1) support demonstration projects where AI tools are already being deployed during real-world resilience events; (2) develop standardized playbooks for integrating AI into utility and emergency management operations; and (3) clarify regulatory pathways for actions like DER islanding, fault rerouting, and AI-assisted load restoration.
Conclusion
Managing surging electric load growth while improving the grid’s ability to weather more frequent and extreme events is a once-in-a-generation challenge. Fortunately, new technological innovations combined with a thoughtful approach from the federal government can actualize the potential of AI and unlock a new set of solutions, ready for this era.
Rather than technological limitations, many of the outstanding roadblocks identified here are institutional and operational, highlighting the need for better federal coordination and regulatory clarity. The readiness-impact framework detailed in this memo provides a new way to understand these challenges while laying the groundwork for a timely and topical plan of action.
By identifying which AI use cases are ready to scale today and which require targeted policy support, this framework can help federal agencies, regulators, and legislators prioritize high-impact actions. Strategic investments, regulatory clarity, and collaborative initiatives can accelerate the deployment of proven solutions while innovating and building trust in new ones. By pulling on the right policy levers, AI can improve grid planning, streamline permitting, enhance reliability, and make the grid more resilient, meeting this moment with both urgency and precision.
This memo is part of our AI & Energy Policy Sprint, a policy project to shape U.S. policy at the critical intersection of AI and energy. Read more about the Policy Sprint and check out the other memos here.
Scoring categories (readiness & impact) were selected based on the literature of challenges to AI deployment in the power sector. An LLM (OpenAI’s GPT-4o model) was utilized to refine the 0-5 scoring scale after careful consideration of the multi-dimensional challenges across each category, based on the author’s personal industry experience and additional consultation with outside technical experts. Where applicable, existing frameworks underpin the scales used in this memo: technology readiness levels for the ‘technical readiness category’ and adoption readiness levels for the ‘financial’ and ‘regulatory’ readiness categories. A rubric was then designed to guide scoring.
Each of the 14 AI applications were then scored against that rubric based on the author’s analysis of existing literature, industry examples, and professional experience. Outside experts were consulted and provided additional feedback and insights throughout the process.
Below is a comprehensive, though not exhaustive, list of the key Executive Branch actors involved in AI-driven grid modernization efforts. A detailed overview of the various roles, authorities, and ongoing efforts can be found here.
Executive Office of the President (Office of Science and Technology Policy (OSTP), Council on Environmental Quality (CEQ)); Department of Commerce (National Institute of Standards and Technology (NIST)); Department of Defense (Energy, Installations, and Environment (EI&E), Defense Advanced Research projects Agency (DARPA)); Department of Energy (Advanced Research Projects Agency-Energy (ARPA-E), Energy Efficiency and Renewable Energy (EERE), Grid Deployment Office (GDO), Office of Critical and Emerging Technologies (CET), Office of Cybersecurity, Energy Security, and Emergency Response (CESER), Office of Electricity (OE), National Laboratories); Department of Homeland Security (Cybersecurity and Infrastructure Agency (CISA)); Federal Energy Regulatory Commission (FERC); Federal Permitting Improvement Steering Council (FPISC); Federal Emergency Management Agency (FEMA); National Science Foundation (NSF)
A full database of how the federal government is using AI across agencies can be found at the 2024 Federal Agency AI Use Case Inventory. A few additional examples of private sector applications, or public-private partnerships are provided below.
Grid Planning
- EPRI’s Open Power AI Consortium
- Google’s Tapestry
- Octopus Energy’s Kraken
Siting and Permitting
Operations and Reliability
- Schneider Electric’s One Digital Grid Platform
- Cammus
- Amperon
Grid Resilience
Enhancing US Power Grid by using AI to Accelerate Permitting
The increased demand for power in the United States is driven by new technologies such as artificial intelligence, data analytics, and other computationally intensive activities that utilize ever faster and power-hungry processors. The federal government’s desire to reshore critical manufacturing industries and shift the economy from service to goods production will, if successful, drive energy demands even higher.
Many of the projects that would deliver the energy to meet rising demand are in the interconnection queue, waiting to be built. There is more power in the queue than on the grid today. The average wait time in the interconnection queue is five years and growing, primarily due to permitting timelines. In addition, many projects are cancelled due to the prohibitive cost of interconnection.
We have identified six opportunities where Artificial Intelligence (AI) has the potential to speed the permitting process.
- AI can be used to speed decision-making by regulators through rapidly analyzing environmental regulations and past decisions.
- AI can be used to identify generation sites that are more likely to receive permits.
- AI can be used to create a database of state and federal regulations to bring all requirements in one place.
- AI can be used in conjunction with the database of state regulations to automate the application process and create visibility of permit status for stakeholders.
- AI can be used to automate and accelerate interconnection studies.
- AI can be used to develop a set of model regulations for local jurisdictions to adapt and adopt.
Challenge and Opportunity
There are currently over 11,000 power generation and consumption projects in the interconnection queue, waiting to connect to the United States power grid. As a result, on average, projects must wait five years for approval, up from three years in 2010.
Historically, a large percentage of projects in the queue, averaging approximately 70%, have been withdrawn due to a variety of factors, including economic viability and permitting challenges. About one-third of wind and solar applications submitted from 2019 to 2024 were cancelled, and about half of these applications faced delays of 6 months or more. For example, the Calico Solar Project in the California Mojave Desert, with a capacity of 850 megawatts, was cancelled due to lengthy multi-year permitting and re-approvals for design changes. Increasing queue wait time is likely to increase the number of projects cancelled and delay those that are viable.
The U.S. grid added 20.2 gigawatts of utility-scale generating capacity in the first half of 2024, a 21% increase over the first half of 2023. However, this is still less power than is likely to be needed to meet increasing power demands in the U.S. Nor does it account for the retirement of generation capacity, which was 5.1 gigawatts in the first half of 2024. In addition to replacing aging energy infrastructure as it is taken offline, this new power is critically needed to address rising energy demands in the U.S. Data centers alone are increasing power usage dramatically, from 1.9% of U.S. energy consumption in 2018 to 4.4% in 2023, and with an expected consumption of at least 6.7% in 2028.
If we want to achieve the Administration’s vision of restoring U.S. domestic manufacturing capacity, a great deal of generation capacity not currently forecast will also need to be added to the grid very rapidly, far faster than indicated by the current pace of interconnections. The primary challenge that slows most power from getting onto the grid is permitting. A secondary challenge that frequently causes projects to be delayed or cancelled is interconnection costs.
Projects frequently face significant permitting challenges. Projects not only need to obtain permits to operate the generation site but must also obtain permits to move power to the point where it connects to the existing grid. Geographically remote projects may require new transmission lines that cover many miles and cross multiple jurisdictions. Even projects relatively close to the existing grid may require multiple permits to connect to the grid.
In addition, poor site selection has resulted in the cancellation of several high-profile renewable installation projects. The Battle Born Solar Project, valued at $1 billion with a 850 megawatt capacity, was cancelled after community concern that the solar farm would impact tourism and archaeological sites in the Mormon Mesa in Nevada. Another project, a 150 megawatt solar facility proposed for Culpeper County, Virginia, was denied permits for interfering with the historic site of a Civil War battle. Similarly, a geothermal plant in Nevada had to be scaled back to less than a third of its original plan after it was found to be in the only known habitat of the endangered Dixie Valley toad. While community outrage over renewable energy installations is not always avoidable, mostly due to complaints about construction impacts and misinformation, better site selection could save developers time and money by avoiding locations that encroach on historical sites, local attractions, or endangered species‘ habitats.
Projects have also historically faced cost challenges as utilities and grid operators could charge the full cost of new operating capacity to each project, even when several pending projects could utilize the same new operating assets. On July 28, 2023, FERC issued a final rule with a compliance date of March 21, 2024, that requires transmission providers to consider all projects in the queue and determine how operating assets would be shared when calculating the cost of connecting a project to the grid. However, the process for calculating costs can be cumbersome when many projects are involved.
On April 15th, 2025, the Trump Administration issued a Presidential Memorandum titled “Updating Permitting Technology for the 21st Century.” This memo directs executive departments and agencies to take full advantage of technology for environmental review and permitting processes and creates a permitting innovation center. While it is unclear how much authority the PIC will have, it demonstrates the Administration’s focus in this area and may serve as a change agent in the future. There is an opportunity to use AI to improve both the speed and the cost of connecting new projects to the grid. Below are recommendations to capitalize on this opportunity.
Plan of Action
Recommendation 1. Funding for PNNL to expand the PolicyAI NEPA model to streamline environmental permitting processes beyond the federal level.
In 2023, Pacific Northwest National Laboratory (PNNL) was tasked by DOE with developing a PermitAI prototype to help regulators understand the National Environmental Policy Act (NEPA) regulations and speed up project environmental reviews. PNNL data scientists created an AI-searchable database of federal impact environmental statements, composed primarily of information that was not readily available to regulators before. The database contains textual data extracted from documents across 2,917 different projects stored as 3.6 million tokens from the GPT-2 tokenizer. Tokens are the units in which text is broken down for natural language processing AI models. The entire dataset is currently publicly available via HuggingFace. The database is then used for generative-AI searching that can quickly find documents and summarize relevant results as a Large Language Model (LLM). While the development of this database is still preliminary and efficiency metrics have not yet been published, based on complaints from those involved in permitting about the complexity of the process and the lack of guidelines, this approach should be a model for tools that could be developed and provided to state and local regulators to assist with permitting reviews.
In 2021, PNNL created a similar process, without using AI, for NEPA permitting for small-to medium-sized nuclear reactors, which simplified the process and reduced the environmental review time from three to six years to between six and twenty-four months. Using AI has the potential to reduce the process exponentially for renewables permitting. The National Renewable Energy Laboratory (NREL) has also studied using LLMs to expedite the processing of policy data from legal documents and found the results to support the expansion of LLMs for policy database analysis, primarily when compared to the current use of manual effort.
State and local jurisdictions can use the “Updating Permitting Technology” Presidential Memorandum as guidance to support the intersection between state and local permitting efforts. The PNNL database of federal NEPA materials, trained on past NEPA cases, would be provided by PNNL to state jurisdictions as a service, through a process similar to that used by EPA to ensure that state jurisdictions do not need to independently develop data collection solutions. Ideally, the initial data analysis model would be trained to be specific to each participating state and continually updated with new material to create a seamless regulatory experience.
Since PNNL has already built a NEPA model and this work is being expanded to a multi-lab effort that includes NREL, Argonne and others The House Energy and Water development committee could appropriate additional funding to the Office of Policy (OP) or EERE (Energy Efficiency and Renewable Energy) to enable the labs to expand the model and make it available to state and local regulatory agencies to integrate it into their permitting processes. States could develop models specific to their ordinances with the backbone of PNNL’s PermitAI. This effort could be expedited through engagement with the Environmental Council of the States (ECOS).
A shared database of NEPA information would reduce time spent reviewing backlogs of data from environmental review documents. State and local jurisdictions would more efficiently identify relevant information and precedent, and speed decision-making while reducing costs. An LLM tool also has the benefit of answering specific questions asked by the user. An example would be answering a question about issues that have arisen for similar projects in the same area.
Recommendation 2. Appropriate funding to expand AI site selection tools and support state and local pilots to improve permitting outcomes and reduce project cancellations.
AI could be used to identify sites that are suitable for energy generation, with different models eventually trained for utility-scale solar siting, onshore and offshore wind siting, and geothermal power plant siting. Key concerns affecting the permitting process include the loss of arable land, impacts on wildlife, and community responses, like opposition based on land use disagreements. Better site selection identifies these issues before they appear during the permitting process.
AI can access data from a range of sources, including satellite imagery from Google Earth, commercially available lidar studies, and local media screening to identify locations with the least number of potential barriers or identify and mitigate barriers for sites that have been selected. Unlike action one, which involves answering questions by pulling from large databases using LLMs, this would primarily utilize machine learning algorithms that process past and current data to identify patterns and predict outcomes, like energy generation potential. Examples of datasets these tools can use are the free, publicly available products created by the Innovative Data Energy Applications (IDEA) group in NREL’s Strategic Energy Analysis Center (SEAC), including the national solar radiation database and the wind resource database. The national solar radiation database visualizes the amount of solar energy potential at a given time and predicts future availability of solar energy for a given location in the dataset, which covers the entirety of the United States.
The wind resource database is a collection of modeled wind resource estimates for locations within the United States. In addition, Argonne National Lab has developed the GEM tool to support the NEPA reviews for transmission projects. A few start-ups have synthesized a variety of datasets like these and created their databases for information like terrain and slope to create site-selection decision-making tools. AI analysis of local news and landmarks important to local communities to identify locations that are likely to oppose renewable installations is particularly important since community opposition is often what kills renewable generation projects that have made it into the permitting process.
The House Committee for Energy and Water Development could appropriate funds to DOE’s Grid Deployment Office which could collaborate with EERE, FECM (Fossil Energy and Carbon Management), NE (Nuclear Energy) and OE (Office of Electricity) to further expand the technology specific models as well as to expand Argonne’s GEM tool. GDO could also provide grant funding to state and local government permitting authorities to pilot AI-powered site selection tools created by start-ups or other organizations. Local jurisdictions, in turn, could encourage use by developers.
Better site selection would speed permitting processes and reduce the number of cancelled projects, as well as wasted time and money by developers.
Recommendation 3. Funding for DOE labs to develop an AI-based permitting database, starting with a state-level pilot, to streamline permit site identification and application for large-scale energy projects.
Use AI to identify all of the non-environmental federal, state, and local permits required for generation projects. A pilot project, focused on one generation type, such as solar, should be launched in a state that is positioned for central coordination. New York may be the best candidate, as the Office of Renewable Energy Siting and Electric Transmission has exclusive jurisdiction over on-shore renewable energy projects of at least 25 megawatts.
A second option could be Illinois, which has statewide standards for utility-scale solar and wind facilities where local governments cannot adopt more restrictive ordinances. This would require the development of a database of regulations and the ability to query that database to provide a detailed list of required permits for each project by jurisdiction, the relevant application process, and forms. The House Energy and Water Development Committee could direct funds to EERE to support PNNL, NREL, Argonne, and other DOE labs to develop this database. Ideally, this tool would be integrated with tools developed by local jurisdictions to automate their individual permitting process.
State-level regulatory coordination would speed the approval of projects contained within a single state, as well as improve coordination between states.
Recommendation 4. Appropriate funds for DOE to develop a state-level AI permitting application to streamline renewable energy permit approvals and improve transparency.
Use AI as a tool to complete the permitting process. While it would be nearly impossible to create a national permitting tool, it would be realistic to create a tool that could be used to manage developers’ permitting processes at the state level.
NREL developed a permitting tool with funding from the DOE Solar Energy Technologies Office (SETO) for residential rooftop solar permitting. The tool, SolarAPP+, automates plan review, permit approval, and project tracking. As of the end of 2023, it had saved more than 33,000 hours of permitting staff time for more than 32,800 projects. However, permitting for rooftop solar is less complex than permitting for utility-scale solar sites or wind farms because of less need for environmental reviews, wildlife endangerment reviews, or community feedback. Using the AI frameworks developed by PNNL mentioned in recommendation one and leveraging the development work completed by NREL could create tools similar to SolarAPP+ for large-scale renewable installations and have similar results in projects approved and time saved. An application that may meet this need is currently under development at NREL.
The House Energy and Water Development Committee should appropriate funds for DOE to create an application through PNNL and NREL that would utilize the NREL SolarAPP+ framework that could be implemented by states to streamline the permitting application process. This would be especially helpful for complex projects that cross multiple jurisdictions. In addition, Congress, through appropriation by the House Energy and Water Development Committee to DOE’s Grid Deployment Office, could establish a grant program to support state and local level implementation of this permitting tool. This tool could include a dashboard to improve permitting transparency, one of the items required by the Presidential Memorandum on Updating Permitting Technology.
Developers are frequently unclear about what permits are required, especially for complex multi-jurisdiction projects. The AI tool would reduce the time a developer spends identifying permits and would support smaller developers who don’t have permitting consultants or prior experience. An integrated electronic permitting solution would reduce the complexity of applying for and approving permits. With a state-wide system, state and local regulators would only need to add their requirements and location-specific requirements and forms into a state-maintained system. Finally, an integrated system with a dashboard could increase status visibility and help resolve issues more quickly. These tools together would allow developers to make realistic budgets and time frames for projects to allocate resources and prioritize projects that have the greatest chance of being approved.
Recommendation 5. Direct FERC to require RTOs to evaluate and possibly implement AI tools to automate interconnection analysis processes.
Use AI tools to reduce the complexity of publishing and analyzing the mandated maps and assigning costs to projects. While FERC has mandated that grid operators consider all projects coming onto the grid when setting interconnection pricing, as well as considering project readiness rather than time in queue for project completion, the requirements are complex to implement.
A number of private sector companies have begun developing tools to model interconnections. Pearl Street has used its model to reproduce a complex and lengthy interconnection cluster study in ten days, and PJM recently announced a collaboration with Google to develop an analysis capability. Given the private sector efforts in this space, the public interest would be best served by FERC requiring RTOs to evaluate and implement, if suitable, an automated tool to speed their analysis process.
Automating parts of interconnection studies would allow developers to quickly understand the real cost of a new generation project, allowing them to quickly evaluate feasibility. It would create more cost certainty for projects and would also help identify locations where planned projects have the potential to reduce interconnection costs, attracting still more projects to share new interconnections. Conversely, the capability would also quickly identify when new projects in an area would exceed expected grid capacity and increase the costs for all projects. Ultimately, the automation would lead to more capacity on the grid faster and at a lower cost as developers optimize their investments.
Recommendation 6. Provide funding to DOE to extend the use of NREL’s AI-compiled permitting data to develop and model local regulations. The results could be used to promote standardization through national stakeholder groups.
As noted earlier, one of the biggest challenges in permitting is the complexity of varying and sometimes conflicting local regulations that a project must comply with. Several years ago, NREL, in support of the DOE Office of Policy, spent 1500 staff hours to manually compile what was believed to be a complete list of local energy permitting ordinances across the country. In 2024, NREL used an LLM to compile the same information with a 90% success rate in a fraction of the time.
The House Energy and Water Development Committee should direct DOE to fund the continued development of the NREL permitting database and evaluate that information with an LLM to develop a set of model regulations that could be promoted to encourage standardization. Adoption of those regulations could be encouraged by policymakers and external organizations through engagement with the National Governors Association, the National Association of Counties, the United States Conference of Mayors, and other relevant stakeholders.
Local jurisdictions often adopt regulations based on a limited understanding of best practices and appropriate standards. A set of model regulations would guide local jurisdictions and reduce complexity for developers.
Conclusion
As demand on the electrical grid grows, the need to speed up the availability of new generation capacity on the grid becomes increasingly urgent. The deployment of new generation capacity is slowed by challenges related to site selection, environmental reviews, permitting, and interconnection costs and wait times. While much of the increasing demand for energy in the United States can be attributed to AI, it can also be a powerful tool to help the nation meet that demand.
The six recommendations for AI to speed up the process of bringing new power to the grid that have been identified in this memo address all of those concerns. AI can be used to assist with site selection, analyze environmental regulations, help both regulators and the regulated community understand requirements, develop better regulations, streamline permitting processes, and reduce the time required for interconnection studies.
This memo is part of our AI & Energy Policy Sprint, a policy project to shape U.S. policy at the critical intersection of AI and energy. Read more about the Policy Sprint and check out the other memos here.
The combined generating capacity of the projects awaiting approval is about 1,900 gigawatts, excluding ERCOT and NYISO which do not report this data. In comparison, the generating capacity of the U.S. grid as of Q4 2023 was 1,189 gigawatts. Even if the current high cancellation rate of 70% is maintained, the queue will yield an approximately 50% increase in the amount of power available on the grid through a $600B investment in US energy infrastructure.
FERC’s five-year growth forecast through 2029 predicts an increased demand for 128 gigawatts of power. In that context, the net addition of 15.1 gigawatts of power in the first half of 2024 suggests an increase of 150 gigawatts of power and little excess capacity over the five-year horizon. This forecast is predicated on the assumption that the power added to the grid does not decline, retirements do not increase, and the load forecast does not increase. All these estimates are being applied to a system where supply and demand are already so closely matched that FERC predicted supply shortages in several regions in the summer of 2024.
Construction delays and cost overruns can be an issue, but this is more frequently a factor in large projects such as nuclear and large oil and gas facilities, and is rarely a factor for wind and solar which are factory built and modular.
While the current administration has declared a National Energy Emergency to expedite approvals for energy projects, the order excludes wind, solar, and batteries, which make up 90% of the power presently in the interconnection queue as well as mirroring the mix of capacity recently added to the grid. Therefore, the expedited permitting processes required by the administration only applies to 10% of the queue, composed of 7% natural gas and 3% that includes nuclear, oil, coal, hydrogen, and pumped hydro. Since solar, wind, and batteries are unlikely to be granted similar permitting relief, and relying on as-yet unplanned fossil fuel projects to bring more energy to the grid is not realistic, other methods must be undertaken to speed new power to the grid.
Transform Communities By Adaptive Reuse of Legacy Coal Infrastructure to Support AI Data Centers
The rise of artificial intelligence (AI) and the corresponding hyperscale data centers that support it present a challenge for the United States. Data centers intensify energy demand, strain power grids, and raise environmental concerns. These factors have led developers to search for new siting opportunities outside traditional corridors (i.e., regions with longstanding infrastructure and large clusters of data centers), such as Silicon Valley and Northern Virginia. American communities that have historically relied on coal to power their local economies have an enormous opportunity to repurpose abandoned coal mines and infrastructure to site data centers alongside clean power generation. The decline of the coal industry in the late 20th century led to the abandonment of coal mines, loss of tax revenues, destruction of good-paying jobs, and the dismantling of the economic engine of American coal communities, primarily in the Appalachian, interior, and Western coal regions. The AI boom of the 21st century can reinvigorate these areas if harnessed appropriately.
The opportunity to repurpose existing coal infrastructure includes Tribal Nations, such as the Navajo, Hopi, and Crow, in the Western Coal regions. These regions hold post-mining land with potential for economic development, but operate under distinct governance structures and regulatory frameworks administered by Tribal governments. A collaborative approach involving Federal, State, and Tribal governments can ensure that both non-tribal and Tribal coal regions share in the economic benefits of data center investments, while also promoting the transition to clean energy generation by collocating data centers with renewable, clean energy-powered microgrids.
This memo recommends four actions for coal communities to fully capitalize on the opportunities presented by the rise of artificial intelligence (AI).
- Establish a Federal-State-Tribal Partnership for Site Selection, Utilizing the Department of the Interior’s (DOI) Abandoned Mine Land (AML) Program.
- Develop a National Pilot Program to Facilitate a GIS-based Site Selection Tool
- Promote collaboration between states and utility companies to enhance grid resilience from data centers by adopting plug-in and flexible load standards.
- Lay the groundwork for a knowledge economy centered around data centers.
By pursuing these policy actions, states like West Virginia, Pennsylvania, and Kentucky, as well as Tribal Nations, can lead America’s energy production and become tech innovation hubs, while ensuring that the U.S. continues to lead the AI race.
Challenge and Opportunity
Energy demands for AI data centers are expected to rise by between 325 and 580 TWh by 2028, roughly the amount of electricity consumed by 30 to 54 million American households annually. This demand is projected to increase data centers’ share of total U.S. electricity consumption to between 6.7% and 12.0% by 2028, according to the 2024 United States Data Center Energy Usage Report by the Lawrence Berkeley National Lab. According to the same report, AI data centers also consumed around 66 billion liters of water for cooling in 2023. By 2028, that number is expected to be between 60 and 124 billion litres for hyperscale data centers alone. (Hyperscale data centers are massive warehouses of computer servers, powered by at least 40 MW of electricity, and run by major cloud companies like Amazon, Google, or Microsoft. They serve a wide variety of purposes, including Artificial intelligence, automation, data analytics, etc.)
Future emissions are also expected to grow with increasing energy usage. Location has also become important; tech companies with AI investments have increasingly recognized the need for more data centers in different places. Although most digital activities are traditionally centered around tech corridors like Silicon Valley and Northern Virginia, the need for land and considerations of carbon emissions footprints in these places make the case for expansion to other sites.
Coal communities have experienced a severe economic decline over the past decade, as coal severance and tax revenues have plummeted. West Virginia, for example, reported an 83% decline in severance tax collections in fiscal year 2024. Competition from natural gas and renewable energy sources, slow growth in energy demand, and environmental concerns have led to coal often being viewed as a backup option. This has led to low demand for coal locally, and thus a decrease in severance, property, sales, and income taxes.
The percentage of the coal severance tax collected that is returned to the coal-producing counties varies by state. In West Virginia, the State Tax Commissioner collects coal severance taxes from all producing counties and deposits them in the State Treasurer’s office. Seventy-five percent of the net proceeds from the taxes are returned to the coal-producing counties, while the remaining 25% is distributed to the rest of the state. Historically, these tax revenues have usually funded a significant portion of county budgets. For counties like Boone in West Virginia and Campbell County in Wyoming, once two of America’s highest coal-producing counties, these revenues helped maintain essential services and school districts. Property taxes and severance taxes on coal funded about 24% of Boone’s school budget, while 59% of overall property valuations in Campbell county in 2017 were coal mining related. With those tax bases eroding, these counties have struggled to maintain schools and public services.
Likewise, the closure of the Kayenta Mine and the Navajo Generating Station resulted in the elimination of hundreds of jobs and significant public revenue losses for the Navajo and Hopi Nations. The Crow Nation, like many other Native American tribes with coal, is reliant on coal leases with miners for revenue. They face urgent infrastructure gaps and declining fiscal capacity since their coal mines were shut down. These tribal communities, with a rich legacy of land and infrastructure, are well-positioned to lead equitable redevelopment efforts if they are supported appropriately by state and federal action.
These communities now have a unique opportunity to attract investments in AI data centers to generate new sources of revenue. Investments in hyperscale data centers will revive these towns through revenue from property taxes, land reclamation, and investments in energy, among other sources. For example, data centers in Northern Virginia, commonly referred to as the “Data Center Alley,” have contributed an estimated 46,000 jobs and up to $10 billion in economic impact to the state’s economy, according to an economic impact report on data centers commissioned by the Northern Virginia Technology Council.
Coal powered local economies and served as the thread holding together the social fabric of communities in parts of Appalachia for decades. Coal-reliant communities also took pride in how coal powered most of the U.S.’s industrialization in the nineteenth century. However, many coal communities have been hollowed out, with thousands of abandoned coal mines and tens of thousands of lost jobs. By inviting investments in data centers and new clean energy generation, these communities can be economically revived. This time, their economies will be centered on a knowledge base, representing a shift from an extraction-based economy to an information-based one. Data centers attract new AI- and big-data-focused businesses, which reinvigorates the local workforce, inspires research programs at nearby academic institutions, and reverses the brain drain that has long impacted these communities.
The federal government has made targeted efforts to repurpose abandoned coal mines. The Abandoned Mine Land (AML) Reclamation Program, created under the Surface Mining Control and Reclamation Act (SMCRA) of 1977, reclaims lands affected by coal mining and stabilizes them for safe reuse. Building on that, Congress established the Abandoned Mine Land Economic Revitalization (AMLER) Program in 2016 to support the economic redevelopment of reclaimed sites in partnership with state and tribal governments. AMLER sites are eligible for flexible reuse for siting hyperscale AI data centers. Those with flat terrains and legacy infrastructure are particularly desirable for reuse. The AMLER program is supported by a fee collected from active coal mining operations – a fee that has decreased as coal mining operations have ceased – and has also received appropriated Congressional funding since 2016. Siting data centers on AMLER sites can circumvent any eminent domain concerns that arise with project proposals on private lands.
In addition to the legal and logistical advantages of siting data centers on AMLER sites, many of these locations offer more than just reclaimed land; they retain legacy infrastructure that can be strategically repurposed for other uses. These sites often lie near existing transmission corridors, rail lines, and industrial-grade access roads, which were initially built to support coal operations. This makes them especially attractive for rapid redevelopment, reducing the time and cost associated with building entirely new facilities. By capitalizing on this existing infrastructure, communities and investors can accelerate project timelines and reduce permitting delays, making AMLER sites not only legally feasible but economically and operationally advantageous.
Moreover, since some coal mines are built near power infrastructure, there exist opportunities for federal and state governments to allow companies to collocate data centers with renewable, clean energy-powered microgrids, thereby preventing strain on the power grid. These sites present an opportunity for data centers to:
- Host local microgrids for energy load balancing and provide an opportunity for net metering;
- Develop a model that identifies places across the United States and standardizes data center site selection;
- Revitalize local economies and communities;
- Invest in clean energy production; and,
- Create a knowledge economy outside of tech corridors in the United States.
Precedents for collocating new data centers at existing power plants already exist. In February 2025, the Federal Energy Regulatory Commission (FERC) reviewed potential sites within the PJM Interconnection region to host these pairings. Furthermore, plans to repurpose decommissioned coal power stations as data centers exist in the United States and Europe. However, there remains an opportunity to utilize the reclaimed coal mines themselves. They provide a readily available location with proximity to existing transmission lines, substations, roadways, and water resources. Historically, they also have a power plant ecosystem and supporting infrastructure, meaning minimal additional infrastructure investment is needed to bring them up to par.
Plan of Action
The following recommendations will fast-track America’s investment in data centers and usher it into the next era of innovation. Collaboration among federal agencies, state governments, and tribal governments will enable the rapid construction of data centers in historically coal-reliant communities. Together, they will bring prosperity back to American communities left behind after the decline in the coal industry by investing in their energy capacities, economies, and workforce.
Recommendation 1. Establish a Federal-State-Tribal Partnership for Site Selection, Utilizing the Department of the Interior’s (DOI) Abandoned Mine Land (AML) Program.
The first step in investing in data centers in coal communities should be a collaborative effort among federal, state, and tribal governments to identify and develop data center pilot sites on reclaimed mine lands, brownfields, and tribal lands. The Environmental Protection Agency (EPA) and the Department of the Interior (DOI) should jointly identify eligible sites with intact or near-intact infrastructure, nearby energy generation facilities, and broadband corridors, utilizing the Abandoned Mine Land (AML) Reclamation Program and the EPA Brownfields Program. Brownfields with legacy infrastructure should also be prioritized to reduce the need for greenfield development. Where tribal governments have jurisdiction, they should be engaged as co-developers and beneficiaries of data centers, with the right to lead or co-manage the process, including receiving tax benefits from the project. Pre-law AMLs (coal mines that were abandoned before August 3, 1977, when the SMCRA became law) offer the most flexibility in regulations and should be prioritized. Communities will be nominated for site development based on economic need, workforce readiness, and redevelopment plans.
State governments and lawmakers will nominate communities from the federally identified shortlist based on economic need, workforce readiness and mobility, and redevelopment plans.
Recommendation 2. Develop a National Pilot Program to Facilitate a GIS-based Site Selection Tool
In partnership with private sector stakeholders, the DOE National Labs should develop a pilot program for these sites to inform the development of a standardized GIS-based site selection tool. This pilot would identify and evaluate a small set of pre-law AMLs, brownfields, and tribal lands across the Appalachian, Interior, and Western coal regions for data center development.
The pilot program will assess infrastructure readiness, permitting pathways, environmental conditions, and community engagement needs across all reclaimed lands and brownfields and choose those that meet the above standards for the pilot. Insights from these pilots will inform the development of a scalable tool that integrates data on grid access, broadband, water, land use, tax incentives, and workforce capacity.
The GIS tool will equip governments, utilities, and developers with a reliable, replicable framework to identify high-potential data center locations nationwide. For example, the Geospatial Energy Mapper (GEM), developed by Argonne National Laboratory with support from the U.S. Department of Energy, offers a public-facing tool that integrates data on energy resources, infrastructure, land use, and environmental constraints to guide energy infrastructure siting.
The DOE, working in coordination with agencies such as the Department of the Treasury, the Department of the Interior, the Bureau of Indian Affairs, and state economic development offices, should establish targeted incentives to encourage data center companies to join the coalition. These include streamlined permitting, data confidentiality protections, and early access to pre-qualified sites. Data center developers, AI companies, and operators typically own the majority of the proprietary operational and siting data for data centers. Without incentives, this data will be restricted to private industry, hindering public-sector planning and increasing geographic inequities in digital infrastructure investments.
By leveraging the insights gained from this pilot and expanding access to critical siting data, the federal government can ensure that the benefits of AI infrastructure investments are distributed equitably, reaching communities that have historically powered the nation’s industrial growth but have been left behind in the digital economy. A national site selection tool grounded in real-world conditions, cross-agency coordination, and private-public collaboration will empower coal-impacted communities, including those on Tribal lands and in remote Appalachian and Western regions, to attract transformative investment. In doing so, it will lay the foundation for a more inclusive, resilient, and spatially diverse knowledge economy built on reclaimed land.
Recommendation 3. Promote collaboration between states and utility companies to enhance grid resilience from data centers by adopting plug-in and flexible load standards.
Given the urgency and scale of hyperscale data center investments, state governments, in coordination with Public Utility Commissions (PUCs), should adopt policies that allow temporary, curtailable, and plug-in access to the grid, pending the completion of colocated, preferably renewable, energy microgrids in proposed data centers. This plug-in could involve approving provisional interconnection services for large projects, such as data centers. This short-term access is critical for communities to realize immediate financial benefits from data center construction while long-term infrastructure is still being developed. Renewable-powered on-site microgrids for hyperscale data centers typically exceed 100–400 MW per site and require deployment times of up to three years.
To protect consumers, utilities and data center developers must guarantee that any interim grid usage does not raise electricity rates for households or small businesses. The data center and/or utility should bear responsibility for short-term demand impacts through negotiated agreements.
In exchange for interim grid access, data centers must submit detailed grid resilience plans that include:
- A time-bound schedule (typically 18–36 months) for deploying an on-site microgrid, preferably powered by renewable energy.
- On-site battery storage systems and demand response capabilities to smooth load profiles and enhance reliability.
- Participation in net metering to enable excess microgrid energy to be sold back to the grid, benefiting local communities.
Additionally, these facilities should be treated as large, flexible loads capable of supporting grid stability by curtailing non-critical workloads or shifting demand during peak periods. Studies suggest that up to 126 GW of new data center load could be integrated into the U.S. power system with minimal strain if such facilities allow as little as 1% curtailment time (when data centers reduce or pause their electricity usage by 1% of their annual electricity usage).
States can align near-term economic gains with long-term energy equity and infrastructure sustainability by requiring early commitment to microgrid deployment and positioning data centers as flexible grid assets (see FAQs for ideas on water cooling for the data centers).
Recommendation 4. Lay the groundwork for a knowledge economy centered around data centers.
The DOE Office of Critical and Emerging Technologies (CET), in coordination with the Economic Development Administration (EDA), should conduct an economic impact assessment of data center investments in coal-reliant communities. To ensure timely reporting and oversight, the Senate Committee on Energy and Natural Resources and the House Committee on Energy and Commerce should guide and shape the reports’ outcomes, building on President Donald Trump’s executive order to pass legislation on AI education. Investments in data centers offer knowledge economies as an alternative to extractive economies, which have relied on selling fossil fuels, such as coal, that have failed these communities for generations.
A workforce trained in high-skilled employment areas such as AI data engineering, data processing, cloud computing, advanced digital infrastructure, and cybersecurity can participate in the knowledge economy. The data center itself, along with new business ecosystems built around it, will provide these jobs.
Counties will also generate sustainable revenue through increased property taxes, utility taxes, and income taxes from the new businesses. This new revenue will replace the lost revenue from the decline in coal over the past decade. This strategic transformation positions formerly coal-dependent regions to compete in a national economy increasingly shaped by artificial intelligence, big data, and digital services.
This knowledge economy will also benefit nearby universities, colleges, and research institutes by creating research partnership opportunities, developing workforce pipelines through new degree and certificate programs, and fostering stronger innovation ecosystems built around digital infrastructure.
Conclusion
AI is growing rapidly, and data centers are following suit, straining our grid and requiring new infrastructure. Coal-reliant communities possess land and energy assets, and they have a pressing need for economic renewal. With innovative federal-state coordination, we can repurpose abandoned mine lands, boost local tax bases, and build a knowledge economy where coal once dominated. These two pressing challenges—grid strain and post-coal economic decline—can be addressed through a unified strategy: investing in data centers on reclaimed coal lands.
This memo outlines a four-part action plan. First, federal and state governments must collaborate to prepare abandoned mine lands for data center development. Second, while working with private industry, DOE National Labs should develop a standardized, GIS-based site selection tool to guide smart, sustainable investments. Third, states should partner with utilities to allow temporary grid access to data centers, while requiring detailed microgrid-based resilience plans to reduce long-term strain. Fourth, policymakers must lay the foundation for a knowledge economy by assessing the economic impact of these investments, fostering partnerships with local universities, and training a workforce equipped for high-skilled roles in digital infrastructure.
This is not just an energy strategy but also a sustainable economic revitalization strategy. It will transform coal assets that once fueled America’s innovation in the 19th century into assets that will fuel America’s innovation in the 21st century. The energy demands of data centers will not wait; the economic revitalization of Appalachian communities, heartland coal communities, and the Mountain West coal regions cannot wait. The time to act is now.
This memo is part of our AI & Energy Policy Sprint, a policy project to shape U.S. policy at the critical intersection of AI and energy. Read more about the Policy Sprint and check out the other memos here.
There is no direct example yet of data center companies reclaiming former coal mines. However, some examples show the potential. For instance, plans are underway to transform an abandoned coal mine in Wise County, Virginia, into a solar power station that will supply a nearby data center.
Numerous examples from the U.S. and abroad exist of tech companies collocating data centers with energy-generating facilities to manage their energy supply and reduce their carbon footprint. Meta signed a long-term power-purchase agreement with Sage Geosystems for 150 MW of next-generation geothermal power in 2024, enough to run multiple hyperscale data centers. The project’s first phase is slated for 2027 and will be located east of the Rocky Mountains, near Meta’s U.S. data center fleet.
Internationally, Facebook built its Danish data center into a district heating system, utilizing the heat generated to supply more than 7,000 homes during the winter. Two wind energy projects power this data center with 294 MW of clean energy.
Yes! Virginia, especially Northern Virginia, is a leading hub for data centers, attracting significant investment and fostering a robust tech ecosystem. In 2023, new and expanding data centers accounted for 92% of all new investment announced by the Virginia Economic Development Partnership. This growth supports over 78,000 jobs and has generated $31.4 billion in economic output, a clear sign of the job creation potential of the tech industry. Data centers have attracted supporting industries, including manufacturing facilities for data center equipment and energy monitoring products, further bolstering the state’s knowledge economy.
AMLER funds are federally restricted to use on or adjacent to coal mines abandoned before August 3, 1977. However, some of these pre-1977 sites—especially in Appalachia and the West—are not ideal for economic redevelopment due to small size, steep slopes, or flood risk. In contrast, post-1977 mine sites that have completed reclamation (SMCRA Phase III release) are more suitable for data centers due to their flat terrain, proximity to transmission lines, and existing utilities. Yet, these sites are not currently eligible for AMLER funding. To fully unlock the economic potential of coal communities, federal policymakers should consider expanding AMLER eligibility or creating a complementary program that supports the reuse of reclaimed post-1977 mine lands, particularly those that are already prepared for industrial use.
Brownfields are previously used industrial or commercial properties, such as old factories, decommissioned coal-fired power plants, rail yards, and mines, whose reuse is complicated by real or suspected environmental contamination. By contrast, Greenfields are undeveloped acreage that typically requires the development of new infrastructure and land permitting from scratch. Brownfields offer land developers and investors faster access to existing zoning, permitting, transportation infrastructure, and more.
Since 1995, the EPA Brownfields Program has offered competitive grants and revolving loan funds for assessing, cleaning up, and training for jobs at Brownfield sites, transforming liabilities into readily available assets. A study estimated that every federal dollar spent by the EPA in 2018 leveraged approximately $16.86 in follow-on capital and created 8.6 jobs for every $100,000 of grant money. In 2024, the Agency added another $300 million to accelerate projects in disadvantaged communities.
In early 2025, the U.S. Department of Energy (DOE) issued a Request for Information (RFI) seeking input on siting artificial intelligence and data infrastructure on DOE-managed federal lands, including National Labs and decommissioned sites. This effort reflects growing federal interest in repurposing publicly-owned sites to support AI infrastructure and grid modernization. Like the approach recommended in this memo, the RFI process recognizes the need for multi-level coordination involving federal, state, tribal, and local governments to assess land readiness, streamline permitting, and align infrastructure development with community needs. Lessons from that process can help guide broader efforts to repurpose pre-law AMLs, brownfields, and tribal lands for data center investment.
Yes, by turning a flooded mine into a giant underground cooler. Abandoned seams in West Virginia hold water that remains at a steady temperature of ~50–55°F (10–13°C). A Marshall University study logged 54°F mine-pool temperatures and calculated that closed-loop heat exchangers can reduce cooling power enough to achieve paybacks in under five years. The design lifts the cool mine water to the servers in the data centers, absorbs heat from the servers, and then returns the warmed water underground, so the computer hardware side never comes into contact with raw mine water. The approach is already being commercialized: Virginia’s “Data Center Ridge” project secured $3 million in AMLER funds, plus $1.5 million from DOE, to cool 36 MW blocks with up to 10 billion gallons of mine water held at a temperature of below 55°F.
Moving Beyond Pilot Programs to Codify and Expand Continuous AI Benchmarking in Testing and Evaluation
Rapid and advanced AI integration and diffusion within the Department of Defense (DoD) and other government agencies has emerged as a critical national security priority. This convergence of rapid AI advancement and DoD prioritization creates an urgent need to ensure that AI models integrated into defense operations are reliable, safe, and mission-enhancing. For this purpose, the DoD must deploy and expand one of its most critical tools available within its Testing and Evaluation (T&E) process: benchmarking—the structured practice of applying shared tasks and metrics to compare models, track progress, and expose performance gaps.
A standardized AI benchmarking framework is critical for delivering uniform, mission-aligned evaluations across the DoD. Despite their importance, the DoD currently lacks standardized, enforceable AI safety benchmarks, especially for open-ended or adaptive use cases. A shift from ad hoc to structured assessments will support more informed, trusted, and effective procurement decisions.
Particularly at the acquisition stage for AI models, rapid DoD acquisition platforms such as Tradewinds can serve as the policy vehicle for enabling more robust benchmarking efforts. This can be done with the establishment of a federally coordinated benchmarking hub, spearheaded by a coordinated effort between the Chief Data and Artificial Intelligence Officer (CDAO) and Defense Innovation Unit (DIU) in consultation with the newly established Chief AI Officer’s Council (CAIOC) of the White House Office of Management and Budget (OMB).
Challenge and Opportunity
Experts at the intersection of both AI and defense, such as the retired Lieutenant General John (Jack) N.T. Shanahan, have emphasized the profound impact of AI on the way the United States will fight future wars – with the character of war continuously reshaped by AI’s diffusion across all domains. The DoD is committed to remaining at the forefront of these changes: between 2022-2023, the value of federal AI contracts increased by over 1200%, with the surge driven by increases in DoD spending. Secretary of Defense Pete Hegseth has pledged increased investment in AI specifically for military modernization efforts, and has tasked the Army to implement AI in command and control across the theater, corps, and division headquarters by 2027–further underscoring AI’s transformative impact on modern warfare.
Strategic competitors—especially the People’s Republic of China—are rapidly integrating AI into their military and technological systems. The Chinese Communist Party views AI-enabled science and technology as central to accelerating military modernization and achieving global leadership. At this pivotal moment, the DoD is pushing to adopt advanced AI across operations to preserve the U.S. edge in military and national security applications. Yet, accelerating too quickly without proper safeguards risks exposing vulnerabilities adversaries could exploit.
With the DoD at a unique inflection point, it must balance the rapid adoption and integration of AI into its operations with the need for oversight and safety. DoD needs AI systems that consistently meet clearly defined performance standards set by acquisition authorities, operate strictly within the scope of their intended use, and do not exhibit unanticipated or erratic behaviors under operational conditions. These systems can deliver measurable value to mission outcomes while fostering trust and confidence among human operators through predictability, transparency, and alignment with mission-specific requirements.
AI benchmarks are standardized tasks and metrics that systematically measure a model’s performance, reliability, and safety, and have increasingly been adopted as a key measurement tool by the AI industry. Currently, DoD lacks standardized, comprehensive AI safety benchmarks, especially for open-ended or adaptive use cases. Without these benchmarks, the DoD risks acquiring models that underperform, deviate from mission requirements, or introduce avoidable vulnerabilities, leading to increased operational risk, reduced mission effectiveness, and costly contract revisions.
A recent report from the Center for a New American Security (CNAS) on best practices for AI T&E outlined that the rapid and unpredictable pace of AI advancement presents distinctive challenges for both policymakers and end-users. The accelerating pace of adoption and innovation heightens both the urgency and complexity of establishing effective AI benchmarks to ensure acquired models meet the mission-specific performance standards required by the DoD and the services.
The DoD faces particularly outsized risk, as its unique operational demands can expose AI models to extreme conditions where performance may degrade. For example, under adversarial conditions, or when encountering data that is different from its training, an AI model may behave unpredictably, posing heightened risk to the mission. Robust evaluations, such as those offered through benchmarking, help to identify points of failure or harmful model capabilities before they become apparent during critical use cases. By measuring model performance in real-world applicable scenarios and environments, we increase understanding of attack surface vulnerabilities to adversarial inputs. We can identify inaccurate or over-confident measurements of outputs, and recognize potential failures in edge cases and extreme scenarios (including those beyond training parameters, Moreover, we improve human-AI performance and trust factors, and avoid unintended capabilities. Benchmarking helps to surface these issues early.
Robust AI benchmarking frameworks can enhance U.S. leadership by shaping international norms for military AI safety, improving acquisition efficiency by screening out underperforming systems, and surfacing unintended or high-risk model behaviors before deployment. Furthermore, benchmarking enables AI performance to be quantified in alignment with mission needs, using guidance from the CDAO RAI Toolkit and clear acquisition parameters to support decision-making for both procurement officers and warfighters. Given the DoD’s high-risk use cases and unique mission requirements, robust benchmarking is even more essential than in the commercial sector.
The DoD now has an opportunity to formalize AI safety benchmark frameworks within its Testing and Evaluation (T&E) processes, tailored to both dual-use and defense-specific applications. T&E is already embedded in DoD culture, offering a strong foundation for expanding benchmarking. Public-private AI testing initiatives, such as the DoD collaboration with Scale AI to create effective T&E (including through benchmarking) for AI models show promise and existing motivation for such initiatives. Yet, critical policy gaps still exist. With pilot programs underway, the DoD can move beyond vendor-led or ad hoc evaluations to introduce DoD-led testing, assess mission-specific capabilities, launch post-acquisition benchmarking, and develop human-AI team metrics. The widely used Tradewinds platform offers an existing vehicle to integrate these enhanced benchmarks without reinventing the wheel.
To implement robust benchmarking at DoD, this memo proposes the following policy recommendations, to be coordinated by DoD Chief Digital and Artificial Intelligence Office (CDAO):
- Expanding on existing benchmarking efforts
- Standardizing AI safety thresholds during the procurement cycle
- Implementing benchmarking during the lifecycle of the model
- Establishing a benchmarking repository
- Enabling adversarial stress testing, or “red-teaming”, prior to deployment to enhance current benchmarking gaps for DoD AI use cases
Plan of Action
The CDAO should launch a formalized AI Benchmarking Initiative, moving beyond current vendor-led pilot programs, while continuing to refine its private industry initiatives. This effort should be comprehensive and collaborative in nature, leveraging internal technical expertise. This includes the newly established coordinating bodies on AI such as the Chief AI Officer’s Council, which can help to ensure that DoD benchmarking practices are aligned with federal priorities, and the Defense Innovation Unit, which can be an excellent private industry-national defense sector bridge and coordinator in these efforts. Specifically, the CDAO should integrate benchmarking into the acquisition pipeline. This will establish ongoing benchmarking practices that facilitate continuous model performance evaluation through the entirety of the model lifecycle.
Policy Recommendations
Recommendation 1. Establish a Standardized Defense AI Benchmarking Initiative and create a Centralized Repository of Benchmarks
The DoD should build on lessons learned from its partnership with Scale AI (and others) developing benchmarks specifically for defense use cases. This should expand into a standardized, agency-wide framework.
This recommendation is in line with findings outlined by RAND, which calls for developing a comprehensive framework for robust evaluation and emphasizes the need for collaborative practices, and measurable performance metrics for model performance.
The DoD should incorporate the following recommendations and government entities to achieve this goal:
Develop a Whole-of-Government Approach to AI Benchmarking
- Develop and expand on existing pilot benchmarking frameworks, similar to Massive Multitask Language Understanding (MMLU) but tailored to military-relevant tasks and DoD-specific use cases.
- Expand the $10 million T&E and research budget by $10 million, with allocations specifically for bolstering internal benchmarking capabilities. One crucial piece is identifying and recruiting technically capable talent to aid in developing internal benchmarking guidelines. As AI models advance, new “reasoning” models with advanced capabilities become far costlier to benchmark, and the DoD must plan for these future demands now. Part of this allocation can come from the $500 million allocated for the combatant command AI budgets. This monetary allocation is critical to successfully implementing this policy because model benchmarking for more advanced models – such as OpenAI’s GPT-3 – can cost millions. This modest budgetary increase is a starting point for moving beyond piecemeal and ad hoc benchmarking, to a comprehensive and standardized process. This funding increases would facilitate:
- Development of and expansion of internal and customized benchmarking capabilities
- Recruitment and retention of technical talent
- Development of simulation environment for more mission-relevant benchmarks
If internal reallocations from the $500 million allocation proves insufficient or unviable, Congressional approval for additional funds can be another funding source. Given the strategic importance of AI in defense, such requests can readily find bipartisan support, particularly when tied to operational success and risk mitigation.
- Create a centralized AI benchmarking repository under the CDAO. This will standardize categories, performance metrics, mission alignment, and lessons learned across defense-specific use cases. This repository will enable consistent tracking of model performance over time, support analysis across model iterations, and allow for benchmarking transferability across similar operational scenarios. By compiling performance data at scale, the repository will also help identify interoperability risks and system-level vulnerabilities—particularly how different AI models may behave when integrated—thereby enhancing the DoD’s ability to assess, document, and mitigate potential performance and safety failures.
- Convene a partnership, organized by OMB, between the CDAO, the DIU and the CAIOC, to jointly establish and maintain a centralized benchmarking repository. While many CAIOC members represent civilian agencies, their involvement is crucial: numerous departments (such as the Department of Homeland Security, the Department of Energy, and the National Institute of Standards and Technology) are already employing AI in high-stakes contexts and bring relevant technical expertise, safety frameworks, and risk management policies. Incorporating these perspectives ensures that DoD benchmarking practices are not developed in isolation but reflect best practices across the federal government. This partnership will leverage the DIU’s insights on emerging private-sector technologies, the CDAO’s acquisition and policy authorities, and CAIOC’s alignment with broader executive branch priorities, thereby ensuring that benchmarking practices are technically sound, risk-informed, and consistent with government-wide standards and priorities for trustworthy, safe, and reliable AI.
Recommendation 2. Formalize Pre-Deployment Benchmarking for AI Models at the Acquisition Stage
The key to meaningful benchmarking lies in integrating it at the pre-award stage of procurement. The DoD should establish a formal process that:
- Integrates benchmarking into existing AI acquisition platforms, such as Tradewinds, and embeds it within the T&E process.
- Requires participation from third-party vendors in benchmarking the products they propose for DoD acquisition and use.
- Embeds internal adversarial stress testing, or “red-teaming”, into AI benchmarking ensures more realistic, mission-aligned evaluations that account for adversarial threats and the unique, high-risk operating environments the military faces. By leveraging its internal expertise in mission context, classified threat models, and domain-specific edge cases that external vendors are unlikely to fully replicate, the DoD can produce a more comprehensive and defense-relevant assessment of AI system safety, efficacy, and suitability for deployment. Specifically, this policy memo recommends that the AI Rapid Capabilities Cell (AI RCC) be tasked with carrying out the red-teaming, as a technically qualified element of the CDAO.
- Assures procurement officers understand the value of incorporating benchmarking performance metrics into their contract award decision-making. This can be done by hosting benchmarking workshops for procurement officers, which outline the benchmarking results for model performance for various models in the acquisition pipeline and to guide them on how to apply these metrics to their own performance requirements and guidelines.
Recommendation 3. Contextualize Benchmarking into Operational Environments
Current efforts to scale and integrate AI reflect the distinct operational realities of the DoD and military services. Scale AI, in partnership with the DoD, Anduril, Microsoft, and the CDAO, is developing AI-powered solutions which are focused on the United States Indo-Pacific Command (INDOPACOM) and United States European Command (EUCOM). With these regional command focused AI solutions, it makes sense to create equally focused benchmarking standards to test AI model performance in specific environments and under unique and focused conditions. In fact, researchers have been identifying the limits of traditional AI benchmarking and making the case for bespoke, holistic, and use-case relevant benchmark development. This is vital because as AI models advance, they introduce entirely new capabilities which require more robust testing and evaluation. For example, large language models, which have introduced new functionalities including natural language querying or multimodal search interfaces, require entirely new benchmarks that measure: natural language understanding, modal integration accuracy, context retention, and result usefulness. In the same vein, DoD relevant benchmarks must be developed in an operationally-relevant context. This can be achieved by:
- Developing simulation environments for benchmarking that are mission-specific across a broader set of domains, including technical and regional commands, to test AI models under specific conditions which are likely to be encountered by users in unique, contested, and/or adversarial environments. The Bipartisan House Task Force on Artificial Intelligence report provides useful guidance on AI model functionality, reliability, and safety in operating in contested, denied, and degraded environments.
- Prioritizing use-case-specific benchmarks over broad commercial metrics by incorporating user feedback and identifying tailored risk scenarios that more accurately measure model performance.
- Introducing context relevant benchmarks to measure performance in specific, DoD-relevant scenarios, such as:
- Task-specific accuracy (i.e. correct ID in satellite imagery cases)
- Alignment with context-specific rules of engagement
- Instances of degraded performance under high-stress conditions
- Susceptibility to adversarial manipulation (i.e. data poisoning)
- Latency in high-risk, fast-paced decision-making scenarios
- Creating post-deployment benchmarking to ensure ongoing performance and risk compliance, and to detect and address issues like model drift, a phenomenon where model performance degrades over time. As there is no established consensus on how often continuous model benchmarking should be performed, the DoD should study the appropriate practical, risk-informed timelines for re-evaluating deployed systems.
Frameworks such as Holistic Evaluation of Language Models (HELM) and Focused LLM Ability Skills and Knowledge (FLASK) can offer valuable guidance for developing LLM-focused benchmarks within the DoD, by enabling more comprehensive evaluations based on specific model skill sets, use-case scenarios, and tailored performance metrics.
Recommendation 4. Integration of Human-in-the-Loop Benchmarking
An additional layer of AI benchmarking for safe and effective AI diffusion into the DoD ecosystem is evaluating AI-human team performance, and measuring user trust, perceptions and confidence in various AI models. “Human‑in‑the‑loop” systems require a person to approve or adjust the AI’s decision before action, while “human‑on‑the‑loop” systems allow autonomous operation but keep a person supervising and ready to intervene. Both “Human in the loop” and “Human on the loop” are critical components of the DoD and military approach to AI. Both require continued human oversight of ethical and safety considerations over AI-enabled capabilities with national security implications. A recent study by MIT study found that there are surprising performance gaps between AI only, human only, and AI-human teams. For the DoD particularly, it is important to effectively measure these performance gaps across the various AI models it plans to integrate into its operations due to heavy reliance on user-AI teams.
A CNAS report on effective T&E for AI spotlighted the DARPA Air Combat Evolution (ACE) program, which sought autonomous air‑combat agents needing minimal human intervention. Expert test pilots could override the system, yet often did so prematurely, distrusting its unfamiliar tactics. This case underscores the need for early, extensive benchmarks that test user capacity, surface trust gaps that can cripple human‑AI teams, and assure operators that models meet legal and ethical standards. Accordingly, this memo urges expanding benchmarking beyond pure model performance to AI‑human team evaluations in high‑risk national‑security, lethal, or error‑sensitive environments.
Conclusion
The Department of Defense is racing to integrate AI across every domain of warfare, yet speed without safety will jeopardize mission success and national security. Standardized, acquisition‑integrated, continuous, and mission‑specific benchmarking is therefore not a luxury—it is the backbone of responsible AI deployment. Current pilot programs with private partners are encouraging starts, but they remain too ad hoc and narrow to match the scale and tempo of modern AI development.
Benchmarking must begin at the pre‑award acquisition stage and follow systems through their entire lifecycle, detecting risks, performance drift, and adversarial vulnerabilities before they threaten operations. As the DARPA ACE program showed, early testing of human‑AI teams and rigorous red‑teaming surface trust gaps and hidden failure modes that vendor‑led evaluations often miss. Because AI models—and enemy capabilities—evolve constantly, our evaluation methods must evolve just as quickly.
By institutionalizing robust benchmarks under CDAO leadership, in concert with the Defense Innovation Unit and the Chief AI Officers Council, the DoD can set world‑class standards for military AI safety while accelerating reliable procurement. Ultimately, AI benchmarking is not a hurdle to innovation and acquisition, but rather it is the infrastructure that can make rapid acquisition more reliable and innovation more viable. The DoD cannot afford the risk of deploying AI systems which are risky, unreliable, ineffective or misaligned with mission needs and standards in high-risk operational environments. At this inflection point, the choice is not between speed and safety but between ungoverned acceleration and a calculated momentum that allows our strategic AI advantage to be both sustained and secured.
This memo was written by an AI Safety Policy Entrepreneurship Fellow over the course of a six-month, part-time program that supports individuals in advancing their policy ideas into practice. You can read more policy memos and learn about Policy Entrepreneurship Fellows here.
he Scale AI benchmarking initiative, launched in February 2024 in partnership with the DoD, is a pilot framework designed to evaluate the performance of AI models intended for defense and national security applications. It is part of the broader efforts to create a framework for T&E of AI models for the CDAO.
This memo builds on that foundation by:
- Formalizing benchmarking as a standard requirement at the procurement stage across DoD acquisition processes.
- Inserting benchmarking protocols into rapid acquisition platforms like Tradewinds.
- Establishing a defense-specific benchmarking repository and enabling red-teaming led by the AI Rapid Capabilities Cell (AI RCC) within the CDAO.
- Shifting the lead on benchmarking from vendor-enabled to internally developed, led, and implemented, creating bespoke evaluation criteria tailored to specific mission needs.
The proposed benchmarking framework will apply to a diverse range of AI systems, including:
- Decision-making and command and control support tools (sensors, target recognition, process automation, and tools involved in natural language processing).
- Generative models for planning, logistics, intelligence, or data generation.
- Autonomous agents, such as drones and robotic systems.
Benchmarks will be theater and context-specific, reflecting real-world environments (e.g. contested INDOPACOM scenarios), end-user roles (human-AI teaming in combat), and mission-specific risk factors such as adversarial interference and model drift.
Open-source models present distinct challenges due to model ownership and origin, additional possible exposure to data poisoning, and downstream user manipulation. However, due to the nature of open-source models, it should be noted that the general increase in transparency and potential access to training data could make open-source models less challenging to put through rigorous T&E.
This memo recommends:
- Applying standardized evaluation criteria across both open-source and proprietary models which can be developed by utilizing the AI benchmarking repository and applying model evaluations based on possible use cases of the model.
- Incorporating benchmarking to test possible areas of vulnerability for downstream user manipulation.
- Measuring the transparency of training data.
- Performing adversarial testing to assess resilience against manipulated inputs via red-teaming.
- Logging the open-source model performance in the proposed centralized repository, enabling ongoing monitoring for drift and other issues
Red-teaming implements adversarial stress-testing (which can be more robust and operationally relevant if led by an internal team as this memo proposes), and can identify vulnerabilities and unintended capabilities before deployment. Internally led red-teaming, in particular, is critical for evaluating models intended for use in unpredictable or hostile environments.
To effectively employ the red-teaming efforts, this policy recommends that:
- The AI Rapid Capabilities Cell within the CDAO should lead red-teaming operations, leveraging the team’s technical capabilities with its experience and mission set to integrate and rapidly scale AI at the speed of relevance — delivering usable capability fast enough to affect current operations and decision cycles.
- Internal, technically skilled teams should be created who are capable of incorporating classified threat models and edge-case scenarios.
- Red-teaming should focus on simulating realistic mission conditions, and searching for specific model capabilities, going beyond generic or vendor-supplied test cases.
Integrating benchmarking at the acquisition stage enables procurement officers to:
- Compare models on mission-relevant, standardized performance metrics and ensure that there is evidence of measurable performance metrics which align with their own “vision of success” procurement requirements for the models.
- Identify and avoid models with unsafe, misaligned, unverified, or ineffective capabilities.
- Prevent cost-overruns or contract revisions.
Benchmarking workshops for acquisition officers can further equip them with the skills to interpret benchmark results and apply them to their operational requirements.
Develop a Risk Assessment Framework for AI Integration into Nuclear Weapons Command, Control, and Communications Systems
As the United States overhauls nearly every element of its strategic nuclear forces, artificial intelligence is set to play a larger role—initially in early‑warning sensors and decision‑support tools, and likely in other mission areas. Improved detection could strengthen deterrence, but only if accompanying hazards—automation bias, model hallucinations, exploitable software vulnerabilities, and the risk of eroding assured second‑strike capability—are well managed.
To ensure responsible AI integration, the Office of the Assistant Secretary of Defense for Nuclear Deterrence, Chemical, and Biological Defense Policy and Programs (OASD (ND-CBD)), the U.S. Strategic Command (STRATCOM), the Defense Advanced Research Projects Agency (DARPA), the Office of the Undersecretary of Defense for Policy (OUSD(P)), and the National Nuclear Security Administration (NNSA), should jointly develop a standardized AI risk-assessment framework guidance document, with implementation led by the Department of Defense’s Chief Digital and Artificial Intelligence Office (CDAO) and STRATCOM. Furthermore, DARPA and CDAO should join the Nuclear Weapons Council to ensure AI-related risks are systematically evaluated alongside traditional nuclear modernization decisions.
Challenge and Opportunity
The United States is replacing or modernizing nearly every component of its strategic nuclear forces, estimated to cost at least $1.7 trillion over the next 30 years. This includes its:
- Intercontinental ballistic missiles (ICBMs)
- Ballistic missile submarines and their submarine-launched ballistic missiles (SLBMs)
- Strategic bombers, cruise missiles, and gravity bombs
- Nuclear warhead production and plutonium pit fabrication facilities
Simultaneously, artificial intelligence (AI) capabilities are rapidly advancing and being applied across the national security enterprise, including nuclear weapons stockpile stewardship and some components of command, control, and communications (NC3) systems, which encompass early warning, decision-making, and force deployment components.
The NNSA, responsible for stockpile stewardship, is increasingly integrating AI into its work. This includes using AI for advanced modeling and simulation of nuclear warheads. For example, by creating a digital twin of existing weapons systems to analyze aging and performance issues, as well as using AI to accelerate the lifecycle of nuclear weapons development. Furthermore, NNSA is leading some aspects of the safety testing and systematic evaluations of frontier AI models on behalf of the U.S. government, with a specific focus on assessing nuclear and radiological risk.
Within the NC3 architecture, a complex “system of systems” with over 200 components, simpler forms of AI are already being used in areas including early‑warning sensors, and may be applied to decision‑support tools and other subsystems as confidence and capability grow. General Anthony J. Cotton—who leads STRATCOM, the combatant command that directs America’s global nuclear forces and their command‑and‑control network—told a 2024 conference that STRATCOM is “exploring all possible technologies, techniques, and methods” to modernize NC3. Advanced AI and data‑analytics tools, he said, can sharpen decision‑making, fuse nuclear and conventional operations, speed data‑sharing with allies, and thus strengthen deterrence. General Cotton added that research must also map the cascading risks, emergent behaviors, and unintended pathways that AI could introduce into nuclear decision processes.
Thus, from stockpile stewardship to NC3 systems, AI is likely to be integrated across multiple nuclear capabilities, some potentially stabilizing, others potentially highly destabilizing. For example, on the stabilizing effects, AI could enhance early warning systems by processing large volumes of satellite, radar, and other signals intelligence, thus providing more time to decision-makers. On the destabilizing side, the ability for AI to detect or track other countries’ nuclear forces could be destabilizing, triggering an expansionary arms race if countries doubt the credibility of their second-strike capability. Furthermore, countries may misinterpret each other’s nuclear deterrence doctrines or have no means of verification of human control of their nuclear weapons.
While several public research reports have been conducted on how AI integration into NC3 could upset the balance of strategic stability, less research has focused on the fundamental challenges with AI systems themselves that must be accounted for in any risk framework. Per the National Institute of Standards and Technology’s (NIST) AI Risk Management Framework, several fundamental AI challenges at a technical level must be accounted for in the integration of AI into stockpile stewardship and NC3.
Not all AI applications within the nuclear enterprise carry the same level of risk. For example, using AI to model warhead aging in stockpile stewardship is largely internal to the Department of Energy (DOE) and involves less operational risk. Despite lower risk, there is still potential for an insufficiently secure model to lead to leaked technical data about nuclear weapons.
However, integrating AI into decision support systems or early warning functions within NC3 introduces significantly higher stakes. These systems require time-sensitive, high-consequence judgments, and AI integration in this context raises serious concerns about issues including confabulations, human-AI interactions, and information security:
- Confabulations: A phenomenon in which generative AI systems (GAI) systems generate and confidently present erroneous or false content in response to user inputs, or
prompts. These phenomena are colloquially also referred to as “hallucinations” or “fabrications”, and could have particularly dangerous consequences in high-stakes settings.
- Human-AI Interactions: Due to the complexity and human-like nature of GAI technology, humans may over-rely on GAI systems or may unjustifiably perceive GAI content to be of higher quality than that produced by other sources. This phenomenon is an example of automation bias or excessive deference to automated systems. This deference can lead to a shift from a human making the final decision (“human in the loop”), to a human merely observing AI generated decisions (“human on the loop”). Automation bias therefore risks exacerbating other risks of GAI systems as it can lead to humans maintaining insufficient oversight.
- Information Security: AI expands the cyberattack surface of NC3. Poisoned AI training data and tampered code can embed backdoors, and, once deployed, prompt‑injection or adversarial examples can hijack AI decision tools, distort early‑warning analytics, or leak secret data. The opacity of large AI models can let these exploits spread unnoticed, and as models become more complex, they will be harder to debug.
This is not an exhaustive list of issues with AI systems, however it highlights several key areas that must be managed. A risk framework must account for these distinctions and apply stricter oversight where system failure could have direct consequences for escalation or deterrence credibility. Without such a framework, it will be challenging to harness the benefits AI has to offer.
Plan of Action
Recommendation 1. OASD (ND-CBD), STRATCOM, DARPA, OUSD(P), and NNSA, should develop a standardized risk assessment framework guidance document to evaluate the integration of artificial intelligence into nuclear stockpile stewardship and NC3 systems.
This framework would enable systematic evaluation of risks, including confabulations, human-AI configuration, and information security, across modernization efforts. The framework could assess the extent to which an AI model is prone to confabulations, involving performance evaluations (or “benchmarking”) under a wide range of realistic conditions. While there are public measurements for confabulations, it is essential to evaluate AI systems on data relevant to the deployment circumstances, which could involve highly sensitive military information.
Additionally, the framework could assess human-AI configuration with specific focus on risks from automation bias and the degree of human oversight. For these tests, it is important to put the AI systems in contact with human operators in situations that are as close to real deployment as possible, for example when operators are tired, distracted, or under pressure.
Finally, the framework could include assessments of information security under extreme conditions. This should include simulating comprehensive adversarial attacks (or “red-teaming”) to understand how the AI system and its human operators behave when subject to a range of known attacks on AI systems.
NNSA should be included in this development due to their mission ownership of stockpile stewardship and nuclear safety, and leadership in advanced modeling and simulation capabilities. DARPA should be included due to its role as the cutting edge research and development agency, extensive experience in AI red-teaming, and understanding of the AI vulnerabilities landscape. STRATCOM must be included as the operational commander of NC3 systems, to ensure the framework accounts for real-word needs and escalation risks. OASD (ND-CBD) should be involved given the office’s responsibilities to oversee nuclear modernization and coordinate across the interagency. The OUSD (P) should be included to provide strategic oversight and ensure the risk assessment aligns with broader defense policy objectives and international commitments.
Recommendation 2. CDAO should implement the Risk Assessment Framework with STRATCOM
While NNSA, DARPA, OASD (ND-CBD) and STRATCOM can jointly create the risk assessment framework, CDAO and STRATCOM should serve as the implementation leads for utilizing the framework. Given that the CDAO is already responsible for AI assurance, testing and evaluation, and algorithmic oversight, they would be well-positioned to work with relevant stakeholders to support implementation of the technical assessment. STRATCOM would have the strongest understanding of operational contexts with which to apply the framework. NNSA and DARPA therefore could advise on technical underpinnings with regards to AI of the framework, while the CDAO would prioritize operational governance and compliance, ensuring that there are clear risk assessments completed and understood when considering integration of AI into nuclear-related defense systems.
Recommendation 3. DARPA and CDAO should join the Nuclear Weapons Council
Given their roles in the creation and implementation of the AI risk assessment framework, stakeholders from both DARPA and the CDAO should be incorporated into the Nuclear Weapons Council (NWC), either as full members or attendees to a subcommittee. As the NWC is the interagency body the DOE and the DoD responsible for sustaining and modernizing the U.S. nuclear deterrent, the NWC is responsible for endorsing military requirements, approving trade-offs, and ensuring alignment between DoD delivery systems and NNSA weapons.
As AI capabilities become increasingly embedded in nuclear weapons stewardship, NC3 systems, and broader force modernization, the NWC must be equipped to evaluate associated risks and technological implications. Currently, the NWC is composed of senior officials from the Department of Defense, the Joint Chiefs of Staff, and the Department of Energy, including the NNSA. While these entities bring deep domain expertise in nuclear policy, military operations, and weapons production, the Council lacks additional representation focused on AI.
DARPA’s inclusion would ensure that early-stage technology developments and red-teaming insights are considered upstream in decision-making. Likewise, CDAO’s presence would provide continuity in AI assurance, testing, and digital system governance across operational defense components. Their participation would enhance the Council’s ability to address new categories of risk, such as model confabulation, automation bias, and adversarial manipulation of AI systems, that are not traditionally covered by existing nuclear stakeholders. By incorporating DARPA and CDAO, the NWC would be better positioned to make informed decisions that reflect both traditional nuclear considerations and the rapidly evolving technological landscape that increasingly shapes them.
Conclusion
While AI is likely to be integrated into components of the U.S. nuclear enterprise, without a standardized initial approach to assessing and managing AI-specific risk, including confabulations, automation bias, and novel cybersecurity threats, this integration could undermine an effective deterrent. A risk assessment framework coordinated by OASD (ND-CBD), with STRATCOM, NNSA and DARPA, and implemented with support of the CDAO, could provide a starting point for NWC decisions and assessments of the alignment between DoD delivery system needs, the NNSA stockpile, and NC3 systems.
This memo was written by an AI Safety Policy Entrepreneurship Fellow over the course of a six-month, part-time program that supports individuals in advancing their policy ideas into practice. You can read more policy memos and learn about Policy Entrepreneurship Fellows here.
Yes, NWC subordinate organizations or subcommittees are not codified in Title 10 USC §179, so the NWC has the flexibility to create, merge, or abolish organizations and subcommittees as needed.
Section 1638 of the FY2025 National Defense Authorization Act established a Statement of Policy emphasizing that any use of AI in support of strategic deterrence should not compromise, “the principle of requiring positive human actions in execution of decisions by the President with respect to the employment of nuclear weapons.” However, as this memo describes, AI presents further challenges outside of solely keeping a human in the loop in terms of decision-making.
A National Center for Advanced AI Reliability and Security
While AI’s transformative advances have enormous positive potential, leading scientists and industry executives are also sounding the alarm about catastrophic risks on a global scale. If left unmanaged, these risks could undermine our ability to reap the benefits of AI progress. While the U.S. government has made some progress, including by establishing the Center for AI Standards and Innovation (CAISI)—formerly the US AI Safety Institute—current government capacity is insufficient to respond to these extreme frontier AI threats. To address this problem, this memo proposes scaling up a significantly enhanced “CAISI+” within the Department of Commerce. CAISI+ would require dedicated high-security compute facilities, specialized talent, and an estimated annual operating budget of $67-155 million, with a setup cost of $155-275 million. CAISI+ would have expanded capacity for conducting advanced model evaluations for catastrophic risks, provide direct emergency assessments to the President and National Security Council (NSC), and drive critical AI reliability and security research, ensuring America is prepared to lead on AI and safeguard its national interests.
Challenge and Opportunity
Frontier AI is advancing rapidly toward powerful general-purpose capabilities. While this progress has produced widely useful products, it is also generating significant security risks. Recent evaluations on Anthropic’s Claude Opus 4 model were unable to rule out the risk that the model could be used to advise novice actors to produce bioweapons, triggering additional safeguards. Meanwhile, the FBI warns that AI “increases cyber-attack speed, scale, and automation”, with a 442% increase in AI-enhanced voice phishing attacks in 2024, and recent evaluations showing AI models rapidly gaining offensive cyber capabilities.
AI company CEOs and leading researchers have predicted that this progress will continue, with potentially transformative AI capabilities arriving in the next few years–and fast progress in AI capabilities will continue to generate novel threats greater than those from existing models. As AI systems are predicted to become increasingly capable of performing complex tasks and taking extended autonomous actions, researchers warn of these additional risks, such as loss of human control, AI-enabled WMD proliferation, and strategic surprise with severe national security implications. While timelines to AI systems surpassing dangerous capability thresholds are uncertain, this proposal attempts to lay out a US government response that is robust to a range of possible timelines, while taking the above trends seriously.
Current U.S. Government capabilities, including the existing Center for AI Standards and Innovation (CAISI), are not adequately resourced or empowered to independently evaluate, monitor, or respond to the most advanced AI threats. For example, current CAISI funding is precarious, its home institution (NIST)’s offices are reportedly “crumbling”, and its budget is roughly one-tenth of its counterpart in the UK. Despite previous underinvestment, CAISI has consistently produced rigorous model evaluations, and in doing so, has earned strong credibility with industry and government stakeholders. This also includes support from legislators: bipartisan legislation has been introduced in both chambers of Congress to authorize CAISI in statute, while just last month, the House China Committee released a letter noting that CAISI has a role to play in “understanding, predicting, and preparing for” national security risks from AI development in the PRC.
A dedicated and properly resourced national entity is essential for supporting the development of safe, secure, and trustworthy AI to drive widespread adoption, by providing sustained, independent technical assessments and emergency coordination—roles that ad-hoc industry consultations or self-reporting cannot fulfill for paramount matters of national security and public safety.
Establishing CAISI+ now is a critical opportunity to proactively manage these profound risks, ensure American leadership in AI, and prevent strategic disadvantage as global AI capabilities advance. While full operational capacity may not be needed immediately, certain infrastructure, such as highly secure computing, has significant lead times, demanding foresight and preparatory action. This blueprint offers a scalable framework to build these essential national capabilities, safeguarding our future against AI-related catastrophic events and enabling the U.S. to shape the trajectory of this transformative technology.
Plan of Action
To effectively address extreme AI risks, develop more trustworthy AI systems, and secure U.S. interests, the Administration and Congress should collaborate to establish and resource a world-class national entity to inform the federal response to the above trendlines.
Recommendation 1. Establish CAISI+ to Lead National AI Safety and Coordinate Crisis Response.
CAISI+, evolving from the current CAISI within the National Institute of Standards and Technology, under the Department of Commerce, must have a clear mandate focused on large-scale AI risks. Core functions include:
- Advanced Model Evaluation: Developing and operating state-of-the-art platforms to test frontier AI models for dangerous capabilities, adversarial behavior or goals (such as deception or power-seeking), and potential weaponization. While the level of risk presented by current models is very uncertain, even those who are skeptical of particular risk models are often supportive of developing better evaluations.
- Emergency Assessment & Response: Providing rapid, expert risk assessments and warnings directly to the President and the National Security Council (NSC) in the event of severe AI-driven national security threats. The CAISI+ Director should be statutorily designated as the Principal Advisor on AI Risks to the President and NSC, with authority to:
- Submit AI threat assessments to the President’s Daily Brief (PDB) when intelligence indicates imminent or critical risks
- Convene emergency sessions of the NSC Deputies Committee or Principals Committee for time-sensitive AI security threats
- Maintain direct communication channels to the National Security Advisor for immediate threat notification
- Issue “Critical AI Threat Warnings” through established NSC emergency communication protocols, similar to those used for terrorism or WMD threats
- Foundational AI Reliability and Security Research: Driving and funding research into core AI alignment, control, and security challenges to maintain U.S. technological leadership while developing trustworthy AI systems. This research will yield dual benefits to both the public and industry, by enabling broader adoption of reliable AI tools and preventing catastrophic incidents that could devastate the AI sector, similar to how the Three Mile Island disaster impacted nuclear energy development. Following the model of NIST’s successful encryption standards, establishing rigorous AI safety benchmarks and protocols will create industry-wide confidence while ensuring American competitiveness.
Governance will feature clear interagency coordination (e.g., with the Department of Defense, Department of Energy, Department of Homeland Security, and other relevant bodies in the intelligence community) and an internal structure with distinct directorates for evaluations, emergency response, and research, coordinated by CAISI+ leadership.
Recommendation 2. Equip CAISI+ with Elite American Talent and Sustained Funding
CAISI+’s efficacy hinges on world-class personnel and reliable funding to execute its mission. This necessitates:
- Exceptional American Talent: Special hiring authorities (e.g., direct hire, excepted service) and competitive compensation are paramount to attract and retain leading U.S. AI researchers, evaluators, and security experts, ensuring our AI standards reflect American values.
- Significant, Sustained Funding: Initial mainline estimates (see “Funding estimates for CAISI+” below) suggest $155-$275 million for setup and an annual operating budget of $67-$155 million for the recommended implementation level, sourced via new appropriations, to ensure America develops strong domestic capacity for defending against AI-powered threats. If funding is not appropriated, or if appropriations fall short, additional support may be able to be sourced via a NIST Foundation.
Funding estimates for CAISI+
Implementation Considerations
- Phased approach: The facility could be developed in stages, prioritizing core evaluation capabilities before expanding to full emergency response capacity.
- Leverage existing assets: Initial operations could utilize existing DOE relationships rather than immediately building dedicated infrastructure.
- Partnership model: Some costs could be offset through public-private partnerships with technology companies and research institutions.
- Talent acquisition strategy: Use of special hiring authorities (direct hire, excepted service) and competitive compensation (SL/ST pay scales, retention bonuses) may help compete with private sector AI companies.
- Sustainable funding: For stability, a multi-year Congressional appropriation with dedicated line-item funding would be crucial.
Staffing Breakdown by Function
- Technical Research (40-60% of staff): AI evaluations, safety research, alignment, interpretability research
- Security Operations (25-35% of staff): Red-teaming, misuse assessment, weaponization evaluation, security management
- Policy & Strategy (10-15% of staff): Leadership, risk assessment, interagency coordination, international liaisons
- Support Functions (15-20% of staff): Legal, procurement, compute infrastructure management, administration
For context, current funding levels include:
- Current CAISI funding (mid-2025): $10 million annually
- UK AISI (CAISI counterpart) initial funding: £100 million (~$125 million)
- Oak Ridge Leadership Computing Facility operations: ~$200-300 million annually
- Standard DOE supercomputing facility construction: $400-600 million
Even the minimal implementation would require substantially greater resources than the current CAISI, but remains well within the scale of other national-priority technology initiatives. The recommended implementation level would position CAISI+ to effectively fulfill its expanded mission of frontier AI evaluation, monitoring, and emergency response.
Funding Longevity
- Initial authorization: 5-year authorization with specific milestones and metrics
- Review mechanism: Independent assessment by the Government Accountability Office at 3-year mark to evaluate effectiveness and adjust scope/resources, supplemented by a National Academies study specifically tasked with evaluating the scientific and technical rigor of the CAISI+.
- Long-term vision: Transition to permanent authorization for core functions with periodic reauthorization of specific initiatives
- Accountability: Annual reporting to Congress on key performance metrics and risk assessments
Recommendation 3. Equip CAISI+ with Essential Secure Compute Infrastructure.
CAISI+ must be able to access secure compute in order to run certain evaluations involving proprietary models and national security data. This cluster can remain relatively modest in scale. Other researchers have hypothesized that a “Trusted AI Verification and Evaluation Cluster” for verifying and evaluating frontier AI development would need only 128 to 512 state-of-the-art graphical processing units (GPU)s–orders of magnitude smaller than the scale of training compute, such as the recent Llama 3.1 405 B model’s training run use of a 16,000 H100 GPU cluster, or xAI’s 200,000 GPU Colossus cluster.
However, the cluster will need to be highly secure–in other words, able to defend against attacks from nation-state adversaries. Certain evaluations will require full access to the internal “weights” of AI models, which requires hosting the model. Model hosting introduces the risk of model theft and proliferation of dangerous capabilities. Some evaluations will also involve the use of very sensitive data, such as nuclear weapons design evals–introducing additional incentive for cyberattacks. Researchers at Gladstone AI, a national security-focused AI policy consulting firm, write that in several years, powerful AI systems may confer significant strategic advantages to nation-states, and will therefore be top-priority targets for theft or sabotage by adversary nation-states. They also note that neither existing datacenters nor AI labs are secure enough to prevent this theft–thereby necessitating novel research and buildout to reach the necessary security level, outlined as “Security Level-5” (SL-5) in RAND’s Playbook for Securing AI Model Weights.
Therefore, we suggest a hybrid strategy for specialized secure compute, featuring a highly secure SL-5 air-gapped core facility for sensitive model analysis (a long-lead item requiring immediate planning), with access to a secondary pool of compute for additional capacity to run less sensitive evaluations via a formal partnership with DOE to access national lab resources. CAISI+ may also want to coordinate with the NITRD National Strategic Computing Reserve Pilot Program to explore needs for AI-crisis-related surge computing capability.
If a sufficiently secure compute cluster is infeasible or not developed in time, CAISI+ will ultimately be unable to host model internals without introducing unacceptable risks of model theft, severely limiting its ability to evaluate frontier AI systems.
Recommendation 4. Explore Granting Critical Authorities
While current legal authorities may suffice for CAISI+’s core missions, evolving AI threats could require additional tools. The White House (specifically the Office of Science and Technology Policy [OSTP], in collaboration with the Office of Management and Budget [OMB]) should analyze existing federal powers (such as the Defense Production Act or the International Emergency Economic Powers Act) to identify gaps in AI threat response capabilities–including potential needs for an incident reporting system and related subpoena authorities (similar to the function of the National Transportation Safety Board), or for model access for safety evaluations, or compute oversight authorities. Based on this analysis, the executive branch should report to Congress where new statutory authorities may be necessary, with defined risk criteria and appropriate safeguards.
Recommendation 5. Implement CAISI+ Enhancements Through Urgent, Phased Approach
Building on CAISI’s existing foundation within NIST/DoC, the Administration should enhance its capabilities to address AI risks that extend beyond current voluntary evaluation frameworks. Given expert warnings that transformative AI could emerge within the current Administration’s term, immediate action is essential to augment CAISI’s capacity to handle extreme scenarios. To achieve full operational capacity by early 2027, initial-phase activities must begin now due to long infrastructure lead times:
Immediate Enhancements (0-6 months):
- Leverage NIST’s existing relationships with DOE labs to secure interim access to classified computing facilities for sensitive evaluations
- Initiate the security research and procurement process for the SL-5 compute facility outlined in Recommendation 3
- Work with OMB and Department of Commerce leadership to secure initial funding through reprogramming or supplemental appropriations
- Build on CAISI’s current voluntary agreements to develop protocols for emergency model access and crisis response
- Begin the OSTP-led analysis of existing federal authorities (per Recommendation 4) to identify potential gaps in AI threat response capabilities
Subsequent phases will extend CAISI’s current work through:
- Foundation-building activities (6-12 months): Implementing the special hiring authorities described in Recommendation 2, formalizing enhanced interagency MOUs to support coordination described in Recommendation 1, and establishing the direct NSC reporting channels for the CAISI+ Director as Principal Advisor on AI Risks.
- Capability expansion (12-18 months): Beginning construction of the SL-5 facility, operationalizing the three core functions (Advanced Model Evaluation, Emergency Assessment & Response, and Foundational AI Reliability Research), and recruiting the 80-150 technical staff outlined in the funding breakdown.
- Full enhanced capacity (18+ months): Achieving the operational capabilities described in Recommendation 1, including mature evaluation platforms, direct Presidential/NSC threat warning protocols, and comprehensive research programs.
Conclusion
Enhancing and empowering CAISI+ is a strategic investment in U.S. national security, far outweighed by the potential costs of inaction on this front. With an estimated annual operating budget of $67-155 million, CAISI+ will provide essential technical capabilities to evaluate and respond to the most serious AI risks, ensuring the U.S. leads in developing and governing AI safely and securely, irrespective of where advanced capabilities emerge. While timelines to AI systems surpassing dangerous capability thresholds are uncertain, by acting now to establish the necessary infrastructure, expertise, and authorities, the Administration can safeguard American interests and our technological future through a broad range of possible scenarios.
This memo was written by an AI Safety Policy Entrepreneurship Fellow over the course of a six-month, part-time program that supports individuals in advancing their policy ideas into practice. You can read more policy memos and learn about Policy Entrepreneurship Fellows here.
A Grant Program to Enhance State and Local Government AI Capacity and Address Emerging Threats
States and localities are eager to leverage artificial intelligence (AI) to optimize service delivery and infrastructure management, but they face significant resource gaps. Without sufficient personnel and capital, these jurisdictions cannot properly identify and mitigate the risks associated with AI adoption, including cyber threats, surging power demands, and data privacy issues. Congress should establish a new grant program, coordinated by the Cybersecurity and Infrastructure Security Agency (CISA), to assist state and local governments in addressing these challenges. Such funding will allow the federal government to instill best security and operating practices nationwide, while identifying effective strategies from the grassroots that can inform federal rulemaking. Ultimately, federal, state, and local capacity are interrelated; federal investments in state and local government will help the entire country harness AI’s potential and reduce the risk of catastrophic events such as a large, AI-powered cyberattack.
Challenge and Opportunity
In 2025, 45 state legislatures have introduced more than 550 bills focused on the regulation of artificial intelligence, covering everything from procurement guidelines to acceptable AI uses in K-12 education to liability standards for AI misuse and error. Major cities have followed suit with sweeping guidance of their own, identifying specific AI risks related to bias and hallucination and directives to reduce their impact on government functions. The influx of regulatory action reflects burgeoning enthusiasm about AI’s ability to streamline public services and increase government efficiency.
Yet two key roadblocks stand in the way: inconsistent rules and uneven capacity. AI regulations vary widely across jurisdictions — sometimes offering contradictory guidance — and public agencies often lack the staff and skills needed to implement them. In a 2024 survey, six in ten public sector professionals cited the AI skills gap as their biggest obstacle in implementing AI tools. This reflects a broader IT staffing crisis, with over 450,000 unfilled cybersecurity roles nationwide, which is particularly acute in the public sector given lower salaries and smaller budgets.
These roadblocks at the state and local level pose a major risk to the entire country. In the cyber space, ransomware attacks on state and local targets have demonstrated that hackers can exploit small vulnerabilities in legacy systems to gain broad access and cause major disruption, extending far beyond their initial targets. The same threat trajectory is conceivable with AI. States and cities, lacking the necessary workforce and adhering to a patchwork of different regulations, will find themselves unable to safely adopt AI tools and mount a uniform response in an AI-related crisis.
In 2021, Congress established the State and Local Cybersecurity Grant Program (SLCGP) at CISA, which focused on resourcing states, localities, and tribal territories to better respond to cyber threats. States have received almost $1 billion in funding to implement CISA’s security best practices like multifactor authentication and establish cybersecurity planning committees, which effectively coordinate strategic planning and cyber governance among state, municipal, and private sector information technology leaders.
Federal investment in state and local AI capacity-building can help standardize the existing, disparate guidance and bridge resource gaps, just as it has in the cybersecurity space. AI coordination is less mature today than the cybersecurity space was when the SLCGP was established in 2021. The updated Federal Information Security Modernization Act, which enabled the Department of Homeland Security to set information security standards across government, had been in effect for seven years by 2021, and some of its best practices had already trickled down to states and localities.
Thus, the need for clear AI state capacity, guardrails, and information-sharing across all levels of government is even greater. A small federal investment now can unlock large returns by enabling safe, effective AI adoption and avoiding costly failures. Local governments are eager to deploy AI but lack the resources to do so securely. Modest funding can align fragmented rules, train high-impact personnel, and surface replicable models—lowering the cost of responsible AI use nationwide. Each successful pilot creates a multiplier effect, accelerating progress while reducing risk.
Plan of Action
Recommendation 1. Congress should authorize a three-year pilot grant program focused on state and local AI capacity-building.
SLCGP’s authorization expires on August 31, 2025, which provides two unique pathways for a pilot grant program. The Homeland Security Committees in the House and Senate could amend and renew the existing SLCGP provision to make room for an AI-focused pilot. Alternatively, Congress could pass a new authorization, which would likely set the stage for a sustained grant program, upon successful completion of the pilot. A separate authorization would also allow Congress to consider other federal agencies as program facilitators or co-facilitators, in case they want to cover AI integrations that do not directly touch critical infrastructure, which is CISA’s primary focus.
Alternatively, the House Energy and Commerce and Senate Commerce, Science, and Transportation Committees could authorize a program coordinated by the National Institute of Standards and Technology, which produced the AI Risk Management Framework and has strong expertise in a range of vulnerabilities embedded within AI models. Congress might also consider mandating an interagency advisory committee to oversee the program, including, for example, experts from the Department of Energy to provide technical assistance and guidance on projects related to energy infrastructure.
In either case, the authorization should be coupled with a starting appropriation of $55 million over three years, which would fund ten statewide pilot projects totaling up to $5 million plus administrative costs. The structure of the program will broadly parallel SLCGP’s goals. First, it would align state and local AI approaches with existing federal guidance, such as the NIST AI Risk Management Framework and the Trump Administration’s OMB guidance on the regulation and procurement of artificial intelligence applications. Second, the program would establish better coordination between local and state authorities on AI rules. A new authorization for AI, however, allows Congress and the agency tasked with managing the program the opportunity to improve upon SLCGP’s existing provisions. This new program should permit states to coordinate their AI activities through existing leadership structures rather than setting up a new planning committee. The legislative language should also prioritize skills training and allocate a portion of grant funding to be spent on recruiting and retaining AI professionals within state and local government who can oversee projects.
Recommendation 2. Pilot projects should be implementation-focused and rooted in one of three significant risks: cybersecurity, energy usage, or data privacy.
Similar to SLCGP, this pilot grant program should be focused on implementation. The target product for a grant is a functional local or state AI application that has undergone risk mitigation, rather than a report that identifies issues in the abstract. For example, under this program, a state would receive federal funding to integrate AI into the maintenance of its cities’ wastewater treatment plants without compromising cybersecurity. Funding would support AI skills training for the relevant municipal employees and scaling of certain cybersecurity best practices like data encryption that minimize the project’s risk. States will submit reports to the federal government at each phase of their project: first documenting the risks they identified, then explaining their prioritization of risks to mitigate, then walking through their specific mitigation actions, and later, retrospectively reporting on the outcomes of those mitigations after the project has gone into operational use.
This approach would maximize the pilot’s return on investment. States will be able to complete high-impact AI projects without taking on the associated security costs. The frameworks generated from the project can be reused many times over for later projects, as can the staff who are hired or trained with federal support.
Given the inconsistency of priorities surfaced in state and local AI directives, the federal government should set the agenda of risks to focus on. The clearest set of risks for the pilot are cybersecurity, energy usage, and data privacy, all of which are highlighted in NIST’s Risk Management Framework.
- Cybersecurity. Cybersecurity projects should focus on detecting AI-assisted social engineering tactics, used to gain access into secure systems, and adversarial attacks like “poisoning” or “jailbreaking”, which manipulate AI models to produce undesirable outputs. Consider emergency response systems: the transition to IP-based, interconnected 911 systems increases the cyberattack surface, making it easier for an attack targeting one response center to spread across other jurisdictions. A municipality could seek funding to trial an AI dispatcher with necessary guardrails. As part of their project, they could ensure they have the appropriate cyber hygiene protocols in place to prevent cyberattacks from rendering the dispatcher useless or exploiting vulnerabilities in the dispatcher to gain access to underlying 911 systems that multiple localities rely on.
- Energy Usage. Energy usage projects should calculate power needs associated with AI development and implementation and the additional energy resources available to prevent outages. Much of the country faces a heightened risk of power outages due to antiquated grids, under-resourced providers, and a dearth of new electricity generation. AI integrations and supportive infrastructure that require significant power will place a heavy burden on states and potentially impact the operation of other critical infrastructure. A sample project might examine the energy demands of a new data center, powering an AI integration into traffic monitoring, and determine where that data center can best be constructed to accommodate available grid capacity.
- Data Privacy. Finally, data privacy projects should focus on bringing AI systems into compliance with existing data laws like the Health Insurance Portability and Accountability Act (HIPAA) and the Children’s Online Privacy Protection Act (COPPA) for AI interventions in healthcare and education, respectively. Because the U.S. lacks a comprehensive data privacy law, states might also experiment with additional best practices, such as training models to detect and reject prompts that contain personally identifiable information. A sample project in this domain might integrate a chatbot into the state Medicaid system to more efficiently triage patients and identify the steps the state can take to prevent the chatbot from handling PII in a manner that does not comply with HIPAA.
If successful, the pilot could expand to address additional risks or support broader, multi-risk, multi-state interventions.
Recommendation 3. The pilot program must include opportunities for grantees to share their ideas with other states and localities.
Arguably the most important facet of this new AI program will be forums where grantees share their learnings. Administrative costs for this program should go toward funding a twice-yearly (bi-annual) in-person forum, where grantees can publicly share updates on their projects. An in-person forum would also provide states with the space to coordinate further projects on the margins. CISA is particularly well positioned to host a forum like this given its track record of convening critical infrastructure operators. Grantees should be required to publish guidance, tools, and templates in a public, digital repository. Ideally, states that did not secure grants can adopt successful strategies from their peers and save taxpayers the cost of duplicate planning work.
Conclusion
Congress should establish a new grant program to assist state and local governments in addressing AI risks, including cybersecurity, energy usage, and data privacy. Such federal investments will give structure to the dynamic yet disparate national AI regulatory conversation. The grant program, which will cost $55 million to pilot over three years, will yield a high return on investment for both the ten grantee states and the peers that learn from its findings. By making these investments now, Congress can keep states moving fast toward AI without opening the door to critical, costly vulnerabilities.
This memo was written by an AI Safety Policy Entrepreneurship Fellow over the course of a six-month, part-time program that supports individuals in advancing their policy ideas into practice. You can read more policy memos and learn about Policy Entrepreneurship Fellows here.
No, Congress could leverage SLCGP’s existing authorization to focus on projects that look at the intersection of AI and cybersecurity. They could offer an amendment to the next Homeland Security Appropriations package that directs modest SLCGP funding (e.g. $10-20 million) to AI projects. Alternatively, Congress could insert language on AI into SLCGP’s reauthorization, which is due on August 31, 2025.
Although leveraging the existing authorization would be easier, Congress would be better served by authorizing a new program, which can focus on multiple priorities including energy usage and data privacy. To stay agile, the language in the statute could allow CISA to direct funds toward new emerging risks, as they are identified by NIST and other agencies. Finally, a specific authorization would pave the way for an expansion of this program assuming the initial 10 state pilot goes well.
This pilot is right-sized for efficiency, impact, and cost savings. A program to bring all 50 states into compliance with certain AI risk mitigation guidelines would cost hundreds of millions, which is not feasible in the current budgetary environment. States are starting from very different baselines, especially with their energy infrastructure, which makes it difficult to bring them all to a single end-point. Moreover, because AI is evolving so rapidly, guidance is likely to age poorly. The energy needs of AI might change before states finish their plan to build data centers. Similarly, federal data privacy laws might go in place that undercut or contradict the best practices established by this program.
This pilot will allow 10 states and/or localities to quickly deploy AI implementations that produce real value: for example, quicker emergency response times and savings on infrastructure maintenance. CISA can learn from the grantees’ experiences to iterate on federal guidance. They might identify a stumbling block on one project and refine their guidance to prevent 49 other states from encountering the same obstacle. If grantees effectively share their learnings, they can cut massive amounts of time off other states’ planning processes and help the federal government build guidance that is more rooted in the realities of AI deployment.
No. If done correctly, this pilot will cut red tape and allow the entire country to harness AI’s positive potential. States and localities are developing AI regulations in a vacuum. Some of the laws proposed are contradictory or duplicative precisely because many state legislatures are not coordinating effectively with state and local government technical experts. When bills do pass, guidance is often poorly implemented because there is no overarching figure, beyond a state chief information officer, to bring departments and cities into compliance. In essence, 50 states are producing 50 sets of regulations because there is scant federal guidance and few mechanisms for them to learn from other states and coordinate within their state on best practices.
This program aims to cut down on bureaucratic redundancy by leveraging states’ existing cyber planning bodies to take a comprehensive approach to AI. By convening the appropriate stakeholders from the public sector, private sector, and academia to work on a funded AI project, states will develop more efficient coordination processes and identify regulations that stand in the way of effective technological implementation. States and localities across the country will build their guidelines based on successful grantee projects, absorbing best practices and casting aside inefficient rules. It is impossible to mount a coordinated response to significant challenges like AI-enabled cyberattacks without some centralized government planning, but this pilot is designed to foster efficient and effective coordination across federal, state, and local governments.
Accelerating AI Interpretability To Promote U.S. Technological Leadership
The most advanced AI systems remain ‘black boxes’ whose inner workings even their developers cannot fully understand, leading to issues with reliability and trustworthiness. However, as AI systems become more capable, there is a growing desire to deploy them in high-stakes scenarios. The bipartisan National Security Commission on AI cautioned that AI systems perceived as unreliable or unpredictable will ‘stall out’: leaders will not adopt them, operators will mistrust them, Congress will not fund them, and the public will not support them (NSCAI, Final Report, 2021). AI interpretability research—the science of opening these black boxes and attempting to comprehend why they do what they do—could turn opacity into understanding and enable wider AI adoption.
With AI capabilities racing ahead, the United States should accelerate interpretability research now to keep its technological edge and field high-stakes AI deployment with justified confidence. This memorandum describes three policy recommendations that could help the United States seize the moment and maintain a lead on AI interpretability: (1) creatively investing in interpretability research, (2) entering into research and development agreements between interpretability experts and government agencies and laboratories, and (3) prioritizing interpretable AI in federal procurement.
Challenge and Opportunity
AI capabilities are progressing rapidly. According to many frontier AI companies’ CEOs and independent researchers, AI systems could reach general-purpose capabilities that equal or even surpass humans within the next decade. As capabilities progress, there is a growing desire to incorporate these systems into high-stakes use cases, from military and intelligence uses (DARPA, 2025; Ewbank, 2024) to key sectors of the economy (AI for American Industry, 2025).
However, the most advanced AI systems are still ‘black boxes’ (Sharkey et al., 2024) that we observe from the outside and that we ‘grow,’ more than we ‘build’ (Olah, 2024). Our limited comprehension of the inner workings of neural networks means that we still really do not understand what happens within these black boxes, leaving uncertainty regarding their safety and reliability. This could have resounding consequences. As the 2021 final report of the National Security Commission on AI (NSCAI) highlighted, “[i]f AI systems routinely do not work as designed or are unpredictable in ways that can have significant negative consequences, then leaders will not adopt them, operators will not use them, Congress will not fund them, and the American people will not support them” (NSCAI, Final Report, 2021). In other words, if AI systems are not always reliable and secure, this could inhibit or limit their adoption, especially in high-stakes scenarios, potentially compromising the AI leadership and national security goals outlined in the Trump administration’s agenda (Executive Order, 2025).
AI interpretability is a subfield of AI safety that is specifically concerned with opening and peeking inside the black box to comprehend “why AI systems do what they do, and … put this into human-understandable terms” (Nanda, 2024; Sharkey et al., 2025). In other words, interpretability is the AI equivalent of an MRI (Amodei, 2025) because it attempts to provide observers with an understandable image of the hidden internal processes of AI systems.
The Challenge of Understanding AI Systems Before They Reach or Even Surpass Human-Level Capabilities
Recent years have brought breakthroughs across several research areas focused on making AI more trustworthy and reliable, including in AI interpretability. Among other efforts, the same companies developing the most advanced AI systems have designed systems that are easier to understand and have reached new research milestones (Marks et al., 2025; Lindsey et al., 2025; Lieberum et al. 2024; Kramar et al., 2024; Gao et al., 2024; Tillman & Mossing, 2025).
AI interpretability, however, is still trailing behind raw AI capabilities. AI companies project that it could take 5–10 years to reliably understand model internals (Amodei, 2025), while experts expect systems exhibiting human‑level general-purpose capabilities by as early as 2027 (Kokotajlo et al., 2025). That gap will force policymakers into a difficult corner once AI systems reach similar capabilities: deploy unprecedentedly powerful yet opaque systems, or slow deployment and fall behind. Unless interpretability accelerates, the United States could risk both competitive and security advantages.
The Challenge of Trusting Today’s Systems for High-Stakes Applications
We must understand the inner workings of highly advanced AI systems before they reach human or above-human general-purpose capabilities, especially if we want to trust them in high-stakes scenarios. There are several reasons why current AI systems might not always be reliable and secure. For instance, AI systems could exhibit the following vulnerabilities. First, AI systems inherit the blind spots of their training data. When the world changes—alliances shift, governments fall, regulations update—systems still reason from outdated facts, undermining reliability in high-stakes diplomatic or military settings (Jensen et al., 2025).
Second, AI systems are unusually easy to strip‑mine for memorized secrets, especially if these secrets come as uncommon word combinations (e.g., proprietary blueprints). Data‑extraction attacks are now “practical and highly realistic” and will grow even more effective as system size increases (Carlini et al., 2021; Nasr et al., 2023; Li et al., 2025). The result could be wholesale leakage of classified or proprietary information (DON, 2023).
Third, cleverly crafted prompts can still jailbreak cutting‑edge systems, bypassing safety rails and exposing embedded hazardous knowledge (Hughes et al., 2024; Ramesh et al., 2024). With attack success rates remaining uncomfortably high across even the leading systems, adversaries could manipulate AI systems with these vulnerabilities in real‑time national security scenarios (Caballero & Jenkins, 2024).
This is not a comprehensive list. Systems could exhibit vulnerabilities in high-stakes applications for many other reasons. For instance, AI systems could be misaligned and engage in scheming behavior (Meinke et al., 2024; Phuong et al., 2025) or have baked-in backdoors that an attacker could exploit (Hubinger et al., 2024; Davidson et al., 2025).
The Opportunity to Promote AI Leadership Through Interpretability
Interpretability offers an opportunity to address these described challenges and reduce barriers to the safe adoption of the most advanced AI systems, thereby further promoting innovation and increasing the existing advantages those systems present over adversaries’ systems. In this sense, accelerating interpretability could help promote and secure U.S. AI leadership (Bau et al., 2025; IFP, 2025). For example, by helping ensure that highly advanced AI systems are deployed safely in high-stakes scenarios, interpretability could improve national security and help mitigate the risk of state and non-state adversaries using AI capabilities against the United States (NSCAI, Final Report, 2021). Interpretability could therefore serve as a front‑line defense against vulnerabilities in today’s most advanced AI systems.
Making future AI systems safe and trustworthy could become easier the more we understand how they work (Shah et al., 2025). Anthropic’s CEO recently endorsed the importance and urgency of interpretability, noting that “every advance in interpretability quantitatively increases our ability to look inside models and diagnose their problems” (Amodei, 2025). This means that interpretability not only enhances reliability in the deployment of today’s AI systems, but understanding AI systems could also lead to breakthroughs in designing more targeted systems or attaining more robust monitoring of deployed systems. This could then enable the United States to deploy tomorrow’s human-level or above-human general-purpose AI systems with increased confidence, thus securing strategic advantages when engaging geopolitically. The following uses the vulnerabilities discussed above to demonstrate three ways in which interpretability could improve the reliability of today’s AI systems when deployed in high-stakes scenarios.
First, interpretability could help systems selectively update outdated information through model editing, without risking a reduction in performance. Model editing allows us to selectively inject new facts or fix mistakes (Cohen et al., 2023; Hase et al., 2024) by editing activations without updating the entire model. However, this ‘surgical tool’ has shown ‘side effects’ causing performance degradation (Gu et al., 2024; Gupta et al., 2024). Interpretability could help us understand how stored knowledge alters parameters as well as develop stronger memorization measures (Yao et al., 2023; Carlini et al., 2019), enabling us to ‘incise and excise’ AI models with fewer side effects.
Second, interpretability could help systems selectively forget training data through machine unlearning, once again without losing performance. Machine unlearning allows systems to forget specific data classes (such as memorized secrets or hazardous knowledge) while remembering the rest (Tarun et al., 2023). Like model editing, this ‘surgical tool’ suffers from performance degradation. Interpretability could help develop new unlearning techniques that preserve performance (Guo et al., 2024; Belrose et al., 2023; Zou et al., 2024).
Third, interpretability could help effectively block jailbreak attempts, which can only currently be discovered empirically (Amodei, 2025). Interpretability could lead to a breakthrough in understanding models’ persistent vulnerability to jailbreaking by allowing us to characterize dangerous knowledge. Existing interpretability research has already analyzed how AI models process harmful prompts (He et al., 2024; Ball et al., 2024; Lin et al., 2024; Zhou et al., 2024), and additional research could build on these initial findings
The conditions are ripe to promote technological leadership and national security through interpretability. Many of the same problems that were highlighted in the 2019 National AI R&D Strategic Plan remained the same in its 2023 update, echoing those included in NSCAI’s 2021 final report. We have made relatively little progress addressing these challenges. AI systems are still vulnerable to attacks (NSCAI, Final Report, 2021) and can still “be made do the wrong thing, reveal the wrong thing” and “be easily fooled, evaded, and misled in ways that can have profound security implications” (National AI R&D Strategic Plan, 2019). The field of interpretability is gaining some momentum among AI companies (Amodei, 2025; Shah et al., 2025; Goodfire, 2025) and AI researchers (IFP, 2025; Bau et al., 2025; FAS, 2025).
To be sure, despite recent progress, interpretability remains challenging and has attracted some skepticism (Hendrycks & Hiscott, 2025). Accordingly, a strong AI safety strategy must include many components beyond interpretability, including robust AI evaluations (Apollo Research, 2025) and control measures (Redwood Research, 2025).
Plan of Action
The United States has an opportunity to seize the moment and lead an acceleration of AI interpretability. The following three recommendations establish a strategy for how the United States could promptly incentivize AI interpretability research.
Recommendation 1. The federal government should prioritize and invest in foundational AI interpretability research, which would include identifying interpretability as a ‘strategic priority’ in the 2025 update of the National AI R&D Strategic Plan.
The National Science and Technology Council (NSTC) should identify AI interpretability as a ‘strategic priority’ in the upcoming National AI R&D Strategic Plan. Congress should then appropriate federal R&D funding for federal agencies (including DARPA and the NSF) to catalyze and support AI interpretability acceleration through various mechanisms, including grants and prizes, R&D credits, tax credits, advanced market commitments, and buyer-of-first-resort mechanisms.
This first recommendation echoes not only the 2019 update of the National AI R&D Strategic Plan and NSCAI’s 2021 final report––which recommended allocating more federal R&D investments to advance the interpretability of Al systems (NSCAI, Final Report, 2021; National AI R&D Strategic Plan, 2019),, but also the more recent remarks by the Director of the Office of Science and Technology Policy (OSTP), according to whom we need creative R&D funding approaches to enable scientists and engineers to create new theories and put them into practice (OSTP Director’s Remarks, 2025). This recommendation is also in line with calls from AI companies, asserting that “we still need significant investment in ‘basic science’” (Shah et al., 2025).
The United States could incentivize and support AI interpretability work through various approaches. In addition to prize competitions, advanced market commitments, fast and flexible grants (OSTP Director’s Remarks, 2025; Institute for Progress, 2025), and challenge-based acquisition programs (Institute for Progress, 2025), funding mechanisms could include R&D tax credits for AI companies undertaking or investing in interpretability research, and tax credits to adopters of interpretable AI, such as downstream deployers. If the federal government acts as “an early adopter and avid promoter of American technology” (OSTP Director’s Remarks, 2025), federal agencies could also rely on buyer-of-first-resort mechanisms for interpretability platforms.
These strategies may require developing a clearer understanding of which frontier AI companies undertake sufficient interpretability efforts when developing their most advanced systems, and which companies currently do not. Requiring AI companies to disclose how they use interpretability to test models before release (Amodei, 2025) could be helpful, but might not be enough to devise a ‘ranking’ of interpretability efforts. While potentially premature given the state of the art in interpretability, an option could be to start developing standardized metrics and benchmarks to evaluate interpretability (Mueller et al., 2025; Stephenson et al., 2025). This task could be carried out by the National Institute of Standards and Technology (NIST), within which some AI researchers have recommended creating an AI Interpretability and Control Standards Working Group (Bau et al., 2025).
A great way to operationalize this first recommendation would be for the National Science and Technology Council (NSTC) to include interpretability as a “strategic priority” in the 2025 update of the National AI R&D Strategic Plan (RFI, 2025). These “strategic priorities” seek to target and focus AI innovation for the next 3–5 years, paying particular attention to areas of “high-risk, high-reward AI research” that the industry is unlikely to address because it may not provide immediate commercial returns (RFI, 2025). If interpretability were included as a “strategic priority,” then the Office of Management and Budget (OMB) could instruct agencies to align their budgets with the 2025 National AI R&D Strategic Plan priorities in its memorandum addressed to executive department heads. Relevant agencies, including DARPA and the National Science Foundation (NSF), would then develop their budget requests for Congress, aligning them with the 2025 National AI R&D Strategic Plan and the OMB memorandum. After Congress reviews these proposals and appropriates funding, agencies could launch initiatives that incentivize interpretability work, including grants and prizes, R&D credits, tax credits, advanced market commitments, and buyer-of-first-resort mechanisms.
Recommendation 2. The federal government should enter into research and development agreements with AI companies and interpretability research organizations to red team AI systems applied in high-stakes scenarios and conduct targeted interpretability research.
AI companies, interpretability organizations, and federal agencies and laboratories (such as DARPA, the NSF, and the U.S. Center for AI Standards and Innovation) should enter into research and development agreements to pursue targeted AI interpretability research to solve national security vulnerabilities identified through security-focused red teaming.
This second recommendation takes into account the fact that the federal government possesses unique expertise and knowledge in national security issues to support national security testing and evaluation (FMF, 2025). Federal agencies and laboratories (such as DARPA, the NSF, and the U.S. Center for AI Standards and Innovation), frontier AI companies, and interpretability organizations could enter into research and development agreements to undertake red teaming of national security vulnerabilities (as, for instance, SABER which aims to assess AI-enabled battlefield systems for the DoD; SABER, 2025) and provide state-of-the-art interpretability platforms to patch the revealed vulnerabilities. In the future, AI companies could also apply the most advanced AI systems to support interpretability research.
Recommendation 3. The federal government should prioritize interpretable AI in federal procurement, especially for high-stakes applications.
If federal agencies are procuring highly advanced AI for high-stakes scenarios and national security missions, they should preferentially procure interpretable AI systems. This preference could be accounted for by weighing the lack of understanding of an AI system’s inner workings when calculating cost.
This third and final recommendation provides for the interim and assumes interpretable AI systems will coexist in a ‘gradient of interpretability’ with other AI systems that are less interpretable. In that scenario, agencies procuring AI systems should give preference to AI systems that are more interpretable. One way to account for this preference would be by weighing the potential vulnerabilities of uninterpretable AI systems within calculating costs during federal acquisition analyses. This recommendation also requires establishing a defined ‘ranking’ of interpretability efforts. While defining this ranking is currently challenging, the research outlined in recommendations 1 and 2 could better position the government to measure and rank the interpretability of different AI systems.
Conclusion
Now is the time for the United States to take action and lead the charge on AI interpretability research. While research is never guaranteed to lead to desired outcomes or to solve persistent problems, the potential high reward—understanding and trusting future AI systems and making today’s systems more robust to adversarial attacks—justifies this investment. Not only could AI interpretability make AI safer and more secure, but it could also establish justified confidence in the prompt adoption of future systems that are as capable as or even more capable than humans, and enable the deployment of today’s most advanced AI systems to high-stakes scenarios, thus promoting AI leadership and national security. With this goal in mind, this policy memorandum recommends that the United States, through the relevant federal agencies and laboratories (including DARPA, the NSF, and the U.S. Center for AI Standards and Innovation), invest in interpretability research, form research and development agreements to red team high-stakes AI systems and undertake targeted interpretability research, and prioritize interpretable AI systems in federal acquisitions.
Acknowledgments
I wish to thank Oliver Stephenson, Dan Braun, Lee Sharkey, and Lucius Bushnaq for their ideas, comments, and feedback on this memorandum.
This memo was written by an AI Safety Policy Entrepreneurship Fellow over the course of a six-month, part-time program that supports individuals in advancing their policy ideas into practice. You can read more policy memos and learn about Policy Entrepreneurship Fellows here.
Accelerating R&D for Critical AI Assurance and Security Technologies
The opportunities presented by advanced artificial intelligence are immense, from accelerating cutting-edge scientific research to improving key government services. However, for these benefits to be realized, both the private and public sectors need confidence that AI tools are reliable and secure. This will require R&D effort to solve urgent technical challenges related to understanding and evaluating emergent AI behaviors and capabilities, securing AI hardware and infrastructure, and preparing for a world with many advanced AI agents.
To secure global adoption of U.S. AI technology and ensure America’s workforce can fully leverage advanced AI, the federal government should take a strategic and coordinated approach to support AI assurance and security R&D by: clearly defining AI assurance and security R&D priorities; establishing an AI R&D consortium and deploying agile funding mechanisms for critical R&D areas; and establishing an AI Frontier Science Fellowship to ensure a pipeline of technical AI talent.
Challenge and Opportunity
AI systems have progressed rapidly in the past few years, demonstrating human-level and even superhuman performance across diverse tasks. Yet, they remain plagued by flaws that produce unpredictable and potentially dangerous failures. Frontier systems are vulnerable to attacks that can manipulate them into executing unintended actions, hallucinate convincing but incorrect information, and exhibit other behaviors that researchers struggle to predict or control.
As AI capabilities rapidly advance toward more consequential applications—from medical diagnosis to financial decision-making to military systems—these reliability issues could pose increasingly severe risks to public safety and national security, while reducing beneficial uses. Recent polling shows that just 32% of Americans trust AI, and this limited trust will slow the uptake of impactful AI use-cases that could drive economic growth and enhance national competitiveness.
The federal government has an opportunity to secure America’s technological lead and promote global adoption of U.S. AI by catalyzing research to address urgent AI reliability and security challenges—challenges that align with broader policy consensus reflected in the National Security Commission on AI’s recommendations and bipartisan legislative efforts like the VET AI Act. Recent research has surfaced substantial expert consensus around priority research areas that address the following three challenges.
The first challenge involves understanding emergent AI capabilities and behaviors. As AI systems get larger, also referred to as “scaling”, they develop unexpected capabilities and reasoning patterns that researchers cannot predict, making it difficult to anticipate risks or ensure reliable performance. Addressing this means advancing the science of AI scaling and evaluations.
This research aims to build a scientific understanding of how AI systems learn, reason, and exhibit diverse capabilities. This involves not only studying specific phenomena like emergence and scaling but, more broadly, employing and refining evaluations as the core empirical methodology to characterize all facets of AI behavior. This includes evaluations in areas such as CBRN weapons, cybersecurity, and deception, and broader research on AI evaluations to ensure that AI systems can be accurately assessed and understood. Example work includes Wijk et al. (2024) and McKenzie et al. (2023)
The second challenge is securing AI hardware and infrastructure. AI systems require robust protection of model weights, secure deployment environments, and resilient supply chains to prevent theft, manipulation, or compromise by malicious actors seeking to exploit these powerful technologies. Addressing this means advancing hardware and infrastructure security for AI.
Ensuring the security of AI systems at the hardware and infrastructure level involves protecting model weights, securing deployment environments, maintaining supply chain integrity, and implementing robust monitoring and threat detection mechanisms. Methods include the use of confidential computing, rigorous access controls, specialized hardware protections, and continuous security oversight. Example work includes Nevo et al. (2024) and Hepworth et al. (2024)
The third challenge involves preparing for a world with many AI agents—AI models that can act autonomously. Alongside their potentially immense benefits, the increasing deployment of AI agents creates critical blind spots, as agents could coordinate covertly beyond human oversight, amplify failures into system-wide cascades, and combine capabilities in ways that circumvent existing safeguards. Addressing this means advancing agent metrology, infrastructure, and security.
Developing a deeper understanding of agentic behavior in LLM-based systems, including clarifying how LLM agents learn over time, respond to underspecified goals, and engage with their environments. This also includes research that ensures safe multi-agent interactions, such as detecting and preventing malicious collective behaviors, studying how transparency can affect agent interactions, and developing evaluations for agent behavior and interaction. Example work includes Lee and Tiwari (2024) and Chan et al. (2024)
While academic and industry researchers have made progress on these problems, this progress is not keeping pace with AI development and deployment. The market is likely to underinvest in research that is more experimental or with no immediate commercial applications. The U.S. government, as the R&D lab of the world, has an opportunity to unlock AI’s transformative potential through accelerating assurance and security research.
Plan of Action
The rapid pace of AI advancement demands a new strategic, coordinated approach to federal R&D for AI assurance and security. Given financial constraints, it is more important than ever to make sure that the impact of every dollar invested in R&D is maximized.
Much of the critical technical expertise now resides in universities, startups, and leading AI companies rather than traditional government labs. To harness this distributed talent, we need R&D mechanisms that move at the pace of innovation, leverage academic research excellence, engage early-career scientists who drive breakthroughs, and partner with industry leaders who can share access to essential compute resources and frontier models. Traditional bureaucratic processes risk leaving federal efforts perpetually behind the curve.
The U.S. government should implement a three-pronged plan to advance the above R&D priorities.
Recommendation 1. Clearly define AI assurance and security R&D priorities
The Office of Science and Technology Policy (OSTP) and the National Science Foundation (NSF) should highlight critical areas of AI assurance and security as R&D priorities by including these in the 2025 update of the National AI R&D Strategic Plan and the forthcoming AI Action Plan. All federal agencies conducting AI R&D should engage with the construction of these plans to explain how their expertise could best contribute to these goals. For example, the Defense Advanced Research Projects Agency (DARPA)’s Information Innovation Office could leverage its expertise in AI security to investigate ways to design secure interaction protocols and environments for AI agents that eliminate risks from rogue agents.
The priorities would help coordinate government R&D activities by providing funding agencies with a common set of priorities, public research institutes such as the National Labs to conduct fundamental R&D activities, Congress with information to support relevant legislative decisions, and industry to serve as a guide to R&D.
Additionally, given the dynamic nature of frontier AI research, OSTP and NSF should publish an annual survey of progress in critical AI assurance and security areas and identify which challenges are the highest priority.
Recommendation 2. Establish an AI R&D consortium and deploy agile funding mechanisms for critical R&D
As noted by OSTP Director Michael Kratsios, “prizes, challenges, public-private partnerships, and other novel funding mechanisms, can multiply the impact of targeted federal dollars. We must tie grants to clear strategic targets, while still allowing for the openness of scientific exploration.” Federal funding agencies should develop and implement agile funding mechanisms for AI assurance and security R&D in line with established priorities. Congress should include reporting language in its Commerce, Justice, Science (CJS) appropriations bill that supports accelerated R&D disbursements for investment into prioritized areas.
A central mechanism should be the creation of an AI Assurance and Security R&D Consortium, jointly led by DARPA and NSF, bringing together government, AI companies, and universities. In this model:
- Government provides funding for personnel, administrative support, and manages the consortium’s strategic direction
- AI companies contribute model access, compute credits, and engineering expertise
- Universities provide researchers and facilities for conducting fundamental research
This consortium structure would enable rapid resource sharing, collaborative research projects, and accelerated translation of research into practice. It would operate under flexible contracting mechanisms using Other Transaction Authority (OTA) to reduce administrative barriers.
Beyond the consortium, funding agencies should leverage Other Transaction Authority (OTA) and Prize Competition Authority to flexibly contract and fund research projects related to priority areas. New public-private grant vehicles focused on funding fundamental research in priority areas should be set up via existing foundations linked to funding agencies such as the NSF Foundation, DOE’s Foundation for Energy Security and Innovation, or the proposed NIST Foundation.
Specific funding mechanisms should be chosen based on the target technology’s maturity level. For example, the NSF can support more fundamental research through fast grants via its EAGER and RAPID programs. Previous fast-grant programs, such as SGER, were found to be wildly effective, with “transformative research results tied to more than 10% of projects.”
For research areas where clear, well-defined technical milestones are achievable, such as developing secure cluster-scale environments for large AI training workloads, the government can support the creation of focused research organizations (FROs) and implement advanced market commitments (AMCs) to take technologies across the ‘valley of death’. DARPA and IARPA can administer higher-risk, more ambitious R&D programs with national security applications.
Recommendation 3. Establish an AI Frontier Science Fellowship to ensure a pipeline of technical AI talent that can contribute directly to R&D and support fast-grant program management
It is critical to ensure that America has a growing pool of talented researchers entering the field of AI assurance and security, given its strategic importance to American competitiveness and national security.
The NSF should launch an AI Frontier Science Fellowship targeting early-career researchers in critical AI assurance and security R&D. Drawing from proven models like CyberCorp Scholarship for Service, COVID-19 Fast Grants, and proposals such as for “micro-ARPAs”, this program operates on two tracks:
- Frontier Scholars: This track would provide comprehensive research support for PhD students and post-docs conducting relevant research on priority AI security and reliability topics. This includes computational resources, research rotations at government labs and agencies, and financial support.
- Rapid Grant Program Managers (PM): This track recruits researchers to serve fixed terms as Rapid Grant PMs, responsible for administering EAGER/RAPID grants focused on AI assurance and security.
This fellowship solves multiple problems at once. It builds the researcher pipeline while creating a nimble, decentralized approach to science funding that is more in line with the dynamic nature of the field. This should improve administrative efficiency and increase the surface area for innovation by allowing for more early-stage high-risk projects to be funded. Also, PMs who perform well in administering these small, fast grants can then become full-fledged program officers and PMs at agencies like the NSF and DARPA. This program (including grant budget) would cost around $40 million per year.
Conclusion
To unlock AI’s immense potential, from research to defense, we must ensure these tools are reliable and secure. This demands R&D breakthroughs to better understand emergent AI capabilities and behaviors, secure AI hardware and infrastructure, and prepare for a multi-agent world. The federal government must lead by setting clear R&D priorities, building foundational research talent, and injecting targeted funding to fast-track innovation. This unified push is key to securing America’s AI leadership and ensuring that American AI is the global gold standard.
This memo was written by an AI Safety Policy Entrepreneurship Fellow over the course of a six-month, part-time program that supports individuals in advancing their policy ideas into practice. You can read more policy memos and learn about Policy Entrepreneurship Fellows here.
Yes, the recommendations are achievable by reallocating the existing budget and using existing authorities, but this would likely mean accepting a smaller initial scale.
In terms of authorities, OSTP and NSF can already update the National AI R&D Strategic Plan and establish AI assurance and security priorities through normal processes. To implement agile funding mechanisms, agencies can use OTA and Prize Competition Authority. Fast grants require no special statute and can be done under existing grant authorities.
In terms of budget, agencies can reallocate 5-10% of existing AI research funds towards security and assurance R&D. The Frontier Science Fellowship could start as a $5-10 million pilot under NSF’s existing education authorities, e.g. drawing from NSF’s Graduate Research Fellowship Program.
While agencies have flexibility to begin this work, achieving the memo’s core objective – ensuring AI systems are trustworthy and reliable for workforce and military adoption – requires dedicated funding. Congress could provide authorization and appropriation for a named fellowship, which would make the program more stable and allow it to survive personnel turnover.
Market incentives drive companies to fix AI failures that directly impact their bottom line, e.g., chatbots giving bad customer service or autonomous vehicles crashing. More visible, immediate problems are likely to be prioritized because customers demand it or because of liability concerns. This memo focuses on R&D areas that the private sector is less likely to tackle adequately.
The private will address some security and reliability issues, but there are likely to be significant gaps. Understanding emergent model capabilities demands costly fundamental research that generates little immediate commercial return. Likewise, securing AI infrastructure against nation-state attacks will likely require multi-year R&D processes, and companies can fail to coordinate to develop these technologies without a clear demand signal. Finally, systemic dangers arising from multi-agent interactions might be left unmanaged because these failures emerge from complex dynamics with unclear liability attribution.
The government can step in to fund the foundational research that the market is likely to undersupply by default and help coordinate the key stakeholders in the process.
Companies need security solutions to access regulated industries and enterprise customers. Collaboration on government-funded research provides these solutions while sharing costs and risks.
The proposed AI Assurance and Security R&D Consortium in Recommendation 2 create a structured framework for cooperation. Companies contribute model access and compute credits while receiving:
- Government-funded researchers working on their deployment challenges
- Shared IP rights under consortium agreements
- Early access to security and reliability innovations
- Risk mitigation through collaborative cost-sharing
Under the consortia’s IP framework, companies retain full commercial exploitation rights while the government gets unlimited rights for government purposes. In the absence of a consortium agreement, an alternative arrangement could be a patent pool, where companies can access patented technologies in the pool through a single agreement. These structures, combined with the fellowship program providing government-funded researchers, creates strong incentives for private sector participation while advancing critical public research objectives.
Agenda for an American Renewal
Imperative for a Renewed Economic Paradigm
So far, President Trump’s tariff policies have generated significant turbulence and appear to lack a coherent strategy. His original tariff schedule included punitive tariffs on friends and foes alike on the mistaken basis that trade deficits are necessarily the result of an unhealthy relationship. Although they have been gradually paused or reduced since April 2, the uneven rollout (and subsequent rollback) of tariffs continues to generate tremendous uncertainty for policymakers, consumers, and businesses alike. This process has weakened America’s geopolitical standing by encouraging other countries to seek alternative trade, financial, and defense arrangements.
However, notwithstanding the uncoordinated approach to date, President Trump’s mistaken instinct for protectionism belies an underlying truth: that American manufacturing communities have not fared well in the last 25 years and that China’s dominance in manufacturing poses an ever-growing threat to national security. After China’s admission to the WTO in 2001, its share of global manufacturing grew from less than 10% to over 35% today. At the same time, America’s share of manufacturing shrank from almost 25% to less than 15%, with employment shrinking from more than 17 million at the turn of the century to under 13 million today. These trends also create a deep geopolitical vulnerability for America, as in the event of a conflict with China, we would be severely outmatched in our ability to build critical physical goods: for example, China produces over 80% of the world’s batteries, over 90% of consumer drones, and has a 200:1 shipbuilding capacity advantage over the U.S. While not all manufacturing is geopolitically valuable, the erosion in strategic industries, which went hand-in-hand with the loss of key manufacturing skills in recent decades, poses potential long-term challenges for America.
In addition to its growing manufacturing dominance, China is now competing with America’s preeminence in technology leadership, having leveraged many of the skills gained in science, engineering, and manufacturing for lower-value add industries to compete in higher-end sectors. DeepSeek demonstrated that China can natively generate high-quality artificial intelligence models, an area in which the U.S. took its lead for granted. Meanwhile, BYD rocketed past Tesla in EV sales and accounted for 22% of global sales in 2024 as compared to Tesla’s 10%. China has also been operating an extensive satellite-enabled secure quantum communications channel since 2016, preventing others from eavesdropping.
China’s growing leadership in advanced research may give it a sustained edge beyond its initial gains: according to one recent analysis of frontier research publications across 64 critical technologies, global leadership has shifted dramatically to China, which now leads in 57 research domains. These are not recent developments: they have been part of a series of five year plans, the most well known of which is Made in China 2025, giving China an edge in many critical technologies that will continue to grow if not addressed by an equally determined American response.
An Integrated Innovation, Economic Foreign Policy, and Community Development Approach
Despite China’s growing challenge and recent self-inflicted damage to America’s economic and geopolitical relationships, America still retains many ingrained advantages. The U.S. still has the largest economy, the deepest public and private capital pools for promising companies and technologies, and the world’s leading universities; it has the most advanced military, continues to count most of the world’s other leading armed forces as formal treaty allies, and remains the global reserve currency. Ordinary Americans have benefited greatly from these advantages in the form of access to cutting edge products and cheaper goods that increase their effective purchasing power and quality of life – notwithstanding Secretary Bessent’s statements to the contrary.
The U.S. would be wise to leverage its privileged position in high-end innovation and in global financial markets to build “industries of the future.” However, the next economic and geopolitical paradigm must be genuinely equitable, especially to domestic communities that have been previously neglected or harmed by globalization. For these communities, policies such as the now-defunct Trade Adjustment Assistance program were too slow and too reactive to help workers displaced by the “China Shock,” which is estimated to have caused up to 2.4 million direct and indirect job losses.
Although jobs in trade-affected communities were eventually “replaced,” the jobs that came after were disproportionately lower-earning roles, accrued largely to individuals who had college degrees, and were taken by new labor force entrants rather than providing new opportunities for those who had originally been displaced. Moreover, as a result of ineffective policy responses, this replacement took over a decade and has contributed to heinous effects: look no further than the rate at which “deaths of despair” for white individuals without a college degree skyrocketed after 2000.
Nonetheless, surrendering America’s hard-won advantages in technology and international commerce, especially in the face of a growing challenge from China, would be an existential error. Rather, our goal is to address the shortcomings of previous policy approaches to the negative externalities caused by globalization. Previous approaches have focused on maximizing growth and redistributing the gains, but in practice, America failed to do either by underinvesting in the foundational policies that enable both. Thus, we are proposing a two-pronged approach that focuses on spurring cutting-edge technologies, growing novel industries, and enhancing production capabilities while investing in communities in a way that provides family-supporting, upwardly mobile jobs as well as critical childcare, education, housing, and healthcare services. By investing in broad-based prosperity and productivity, we can build a more equitable and dynamic economy.
Our agenda is intentionally broad (and correspondingly ambitious) rather than narrow in focus on manufacturing communities, even though current discourse is focused on trade. This is not simply a “political bargain” that provides greater welfare or lip-service concessions to hollowed-out communities in exchange for a return to the prior geoeconomic paradigm. Rather, we genuinely believe that economic dynamism which is led by an empowered middle-class worker, whether they work in manufacturing or in a service industry, is essential to America’s future prosperity and national security – one in which economic outcomes are not determined by parental income and one where black-white disparities are closed in far less than the current pace of 150+ years.
Thus, the ideas and agenda presented here are neither traditionally “liberal” nor “conservative,” “Democrat” nor “Republican.” Instead, we draw upon the intellectual traditions of both segments of the political spectrum. We agree with Ezra Klein’s and Derek Thompson’s vision in Abundance for a technology-enabled future in which America remembers how to build; at the same time, we take seriously Oren Cass’s view in The Once and Future Worker that the dignity of work is paramount and that public policy should empower the middle-class worker. What we offer in the sections below is our vision for a renewed America that crosses traditional policy boundaries to create an economic and political paradigm that works for all.
Policy Recommendations
Investing in American Innovation
Given recent trends, it is clear that there is no better time to re-invigorate America’s innovation edge by investing in R&D to create and capture “industries of the future,” re-shoring capital and expertise, and working closely with allies to expand our capabilities while safeguarding those technologies that are critical to our security. These investments will enable America to grow its economic potential, providing fertile ground for future shared prosperity. We emphasize five key components to renewing America’s technological edge and manufacturing base:
Invest in R&D. Increase federally funded R&D, which has declined from 1.8% of GDP in the 1960s to 0.6% of GDP today. Of the $200 billion federal R&D budget, just $16 billion is allocated to non-healthcare basic science, an area in which the government is better suited to fund than the private sector due to positive spillover effects from public funding. A good start is fully funding the CHIPS and Science Act, which authorized over $200 billion over 10 years for competitiveness-enhancing R&D investments that Congress has yet to appropriate. Funding these efforts will be critical to developing and winning the race for future-defining technologies, such as next-gen battery chemistries, quantum computing, and robotics, among others.
Capability-Building. Develop a coordinated mechanism for supporting translation and early commercialization of cutting-edge technologies. Otherwise, the U.S. will cede scale-up in “industries of the future” to competitors: for example, Exxon developed the lithium-ion battery, but lost commercialization to China due to the erosion of manufacturing skills in America that are belatedly being rebuilt. However, these investments are not intended to be a top-down approach that selects winners and losers: rather, America should set a coordinated list of priorities (leveraging roadmaps such as the DoD’s Critical Technology Areas), foster competition amongst many players, and then provide targeted, lightweight financial support to industry clusters and companies that bubble to the top.
Financial support could take the form of a federally-funded strategic investment fund (SIF) that partners with private sector actors by providing catalytic funding (e.g., first-loss loans). This fund would focus on bridging the financing gap in the “valley of death” as companies transition from prototype to first-of-a-kind / “nth-of-a-kind” commercial product. In contrast to previous attempts at industrial policy, such as the Inflation Reduction Act (IRA) or CHIPS Act, they should have minimal compliance burdens and focus on rapidly deploying capital to communities and organizations that have proven to possess a durable competitive advantage.
Encourage Foreign Direct Investment (FDI). Provide tax incentives and matching funds (potentially from the SIF) for companies who build manufacturing plants in America. This will bring critical expertise that domestic manufacturers can adopt, especially in industries that require deep technical expertise that America would need to redevelop (e.g., shipbuilding). By striking investment deals with foreign partners, America can “learn from the best” and subsequently improve upon them domestically. In some cases, it may be more efficient to “share” production, with certain components being manufactured or assembled abroad, while America ramps up its own capabilities.
For example, in shipbuilding, the U.S. could focus on developing propulsion, sensor, and weapon systems, while allies such as South Korea and Japan, who together build almost as much tonnage as China, convert some shipyards to defense production and send technical experts to accelerate development of American shipyards. In exchange, they would receive select additional access to cutting-edge systems and financially benefit from investing in American shipbuilding facilities and supply chains.
Immigration. America has long been described as a “nation of immigrants.” Their role in innovation is impossible to deny: 46% of companies in the Fortune 500 were founded by immigrants and accounted for 24% of all founders; they are 19% of the overall STEM workforce but account for nearly 60% of doctorates in computer science, mathematics, and engineering. Rather than spurning them, the U.S. should attract more highly educated immigrants by removing barriers to working in STEM roles and offering accelerated paths to citizenship. At the same time, American policymakers should acknowledge the challenges caused by illegal immigration. One such solution is to pass legislation such as the Border Control Act of 2024, which had bipartisan support and increased border security, supplemented by a “points-based” immigration system such as Canada’s which emphasizes educational credentials and in-country work experience.
Create Targeted Fences. Employ tariffs and export controls to defend nascent, strategically important industries such as advanced chips, fusion energy, or quantum communications. However, rather than employing these indiscriminately, tariffs and export controls should be focused on ensuring that only America and its allies have access to cutting-edge technologies that shape the global economic and security landscape. They are not intended to keep foreign competition out wholesale; rather, they should ensure that burgeoning technology developers gain sufficient scale and traction by accelerating through the “learn curve.”
Building Strong Communities
Strong communities are the foundation of a strong workforce, without which new industries will not thrive beyond a small number of established tech hubs. However, strengthening American communities will require the country to address the core needs of a family-sustaining life. Childcare, education, housing, and healthcare are among the largest budget items for families and have been proven time and again to be critical to economic mobility. Nevertheless, they are precisely the areas in which costs have skyrocketed the most, as has been frequently chronicled by the American Enterprise Institute’s “Chart of the Century.” These essential services have been underinvested in for far too long, creating painful shortages for communities that need them most. As such, addressing these issues form the core pillars of our domestic reinvestment plan. Addressing them means grappling with the underlying drivers of their cost and scarcity. These include issues of state capacity, regulatory and licensing barriers, and low productivity growth in service-heavy care sectors. A new policy agenda that addresses the fundamental supply-side issues is needed to reshape the contours of this debate.
Expand Childcare. Inadequate childcare costs the U.S. economy $122 billion in lost wages and productivity as otherwise capable workers, especially women, are forced to reduce hours or leave the labor force. Access is further exacerbated by supply shortages: more than half the population lives in a “childcare desert,” where there are more than three times as many children as licensed slots. Addressing these shortages will alleviate the affordability issue, enabling workers to stay in the workforce and allow families to move up the income ladder.
Fund Early Education. Investments in early childhood education have been demonstrated to generate compelling ROI, with high-quality studies such as the Perry preschool study demonstrating up to $7 – $12 of social return for every $1 invested. While these gains are broadly applicable across the country, they would make an even greater difference in helping to rebuild manufacturing communities by making it easier to grow and sustain families. Given the return on investment and impact on social mobility, American policymakers should consider investing in universal pre-K.
Invest in Workforce Training and Community Colleges. The cost of a four-year college education now exceeds $38K per year, indicating a clear need for cheaper BA degrees but also credible alternatives. At the same time, community colleges can be reimagined and better funded to enable them to focus on high-paying jobs in sectors with critical labor shortages, many of which are in or adjacent to “industries of the future.” Some of these roles, such as IT specialists and skilled tradespeople, are essential to manufacturing. Others, such as nursing and allied healthcare roles, will help build and sustain strong communities.
Build Housing Stock. America has a shortage of 3.2 million homes. Simply put, the country needs to build more houses to address the cost of living and enable Americans to work and raise families. While housing policy is generally decided at lower levels of government, the federal government should provide grants and other incentives to states and municipalities to defray the cost of developing affordable housing; in exchange, state and local jurisdictions should relax zoning regulations to enable more multi-family and high-density single-family housing.
Expand Healthcare Access. American healthcare is plagued with many problems, including uneven access and shortages in primary care. For example, the U.S. has 3.1 primary care physicians (PCPs) per 10,000 people, whereas Germany has 7.1 and France has 9.0. As such, the federal government should focus on expanding the number of healthcare practitioners (especially primary care physicians and nurses), building a physical presence for essential healthcare services in underserved regions, and incentivizing the development of digital care solutions that deliver affordable care.
Allocating Funds to Invest in Tomorrow’s Growth
Investment Requirements
While we view these policies as essential to America’s reinvigoration, they also represent enormous investments that must be paid for at a time when fiscal constraints are likely to tighten. To create a sense of the size of the financial requirements and trade-offs required, we lay out each of the key policy prescriptions above and use bipartisan proposals wherever possible, many of which have been scored by the Congressional Budget Office (CBO) or another reputable institution or agency. Where this is not possible, we created estimates based on key policy goals to be accomplished. Although trade deals and targeted tariffs are likely to have some budget impact, we did not evaluate them given multiple countervailing forces and political uncertainties (e.g., currency impacts).
Potential Pay-Fors
Given the budgetary requirements of these proposals, we looked for opportunities to prune the federal budget. The CBO laid out a set of budgetary options that collectively could save several trillion over the next decade. In laying out the potential pay-fors, we used two approaches that focused on streamlining mandatory spending and optimizing tax revenues in an economically efficient manner. Our first approach is to include budgetary options that eliminate unnecessary spending that are distortionary in nature or are unlikely to have a meaningful direct impact on the population that they are trying to serve (e.g., kickback payments to state health plans). Our second approach is to include budgetary options in which the burden would fall upon higher-earning populations (e.g., raising the cap on payroll and Social Security taxes).
As the table below shows, there is a menu of options available to policymakers that raise funding well in excess of the required investment amounts above, allowing them to pick and choose which are most economically efficient and politically viable. In addition, they can modify many of these options to reduce the size or magnitude of the effect of the policy (e.g., adjust the point at which Social Security benefits for “high earners” is tapered or raise capital gains by 1% instead of 2%). While some of these proposals are potentially controversial, there is a clear and pressing need to reexamine America’s foundational policy assumptions without expanding the deficit, which is already more than 6% of GDP.
Conclusion
America is in need of a new economic paradigm that renews and refreshes rather than dismantles its hard-won geopolitical and technological advantages. Trump’s tariffs, should they be fully enacted, would be a self-defeating act that would damage America’s economy while leaving it more vulnerable, not less, to rivals and adversaries. However, we also recognize that the previous free trade paradigm was not truly equitable and did not do enough to support manufacturing communities and their core strengths. We believe that our two-pronged approach of investing in American innovation alongside our allies along with critical community investments in childcare, higher education, housing, and healthcare bridges the gap and provides a framework for re-orienting the economy towards a more prosperous, fair, and secure future.
De-Risking the U.S. Bioeconomy by Establishing Financial Mechanisms to Drive Growth and Innovation
The bioeconomy is a pivotal economic sector driving national growth, technological innovation, and global competitiveness. However, the biotechnology innovation and biomanufacturing sector faces significant challenges, particularly in scaling technologies and overcoming long development timelines that don’t align with short-term return expectations from investors. These extended timelines and the inherent risks involved lead to funding gaps that hinder the successful commercialization of technologies and bio-based products. If obstacles like the ‘Valleys of Death, a lack of capital at crucial development junctures, that companies and technology struggle to overcome are not addressed, this could result in economic stagnation and the U.S. losing its competitive edge in the global bioeconomy.
Government programs like SBIR and STTR lessen the financial gap inherent in the U.S. bioeconomy, but existing financial mechanisms have proven insufficient to fully de-risk the sector and attract the necessary private investment. In FY24, the National Defense Authorization Act established the Office of Strategic Capital within the Department of Defense to provide financial and technical support for its 31 ‘Covered Technology Categories’, which includes biotechnology and biomanufacturing. To address the challenges associated with de-risking biotechnology and biomanufacturing within the U.S. bioeconomy, the Office of Strategic Capital within the Department of Defense should house a Bioeconomy Finance Program. This program would offer tailored financial incentives such as loans, tax credits, and volume guarantees, targeting both short-term and long-term scale-up needs in biomanufacturing and biotechnology.
By providing these essential funding mechanisms, the Bioeconomy Finance Program will reduce the risks inherent in biotechnology innovation, encouraging more private sector investment. In parallel, states and regions across the country should develop regional specific strategies, like investing in necessary infrastructure, and fostering public-private partnerships, to complement the federal government’s initiatives to de-risk the sector. Together, these coordinated efforts will create a sustainable, competitive bioeconomy that supports economic growth, and strengthens U.S. national security.
Challenge & Opportunity
The U.S. bioeconomy encompasses economic activity derived from the life sciences, particularly in biotechnology and biomanufacturing. The sector plays an important role in driving national growth and innovation. Given its broad reach across industries, impact on job creation, potential for technological advancements, and requirement for global competitiveness, the U.S. bioeconomy is a critical sector for U.S. policymakers to support. With continued development and growth, the U.S. bioeconomy promises not only economic benefits, but also strengthens national security, health outcomes, and environmental sustainability for the country.
Ongoing advancements in biotechnology, including artificial intelligence and automation, have accelerated the growth of the bioeconomy, making the sector both globally competitive and an important domestic economic sector. In 2023, the U.S. bioeconomy supported nearly 644,000 domestic jobs, contributed $210 billion to the GDP, and generated $49 billion in wages. Biomanufactured products within the bioeconomy span multiple categories (Figure 1). Growth here will drive future economic development and address societal challenges, making the bioeconomy a key priority for government investment and strategic focus.

Biomanufactured products span a wide range of categories, from pharmaceuticals and chemicals, which require small volumes of biomass but yield high-value products, to energy and heat, which require larger volumes of biomass but result in lower-value products. Additionally, there are common infrastructure synergies, bioprocesses, and complementary input-output relationships that facilitate a circular bioeconomy within bioproduct manufacturing. Source: https://edepot.wur.nl/407896
An important driving force for the U.S. bioeconomy is biotechnology and biomanufacturing innovation. However, bringing biotechnologies to market requires substantial investment, capital, and most importantly, time. Unlike other technology sectors which see returns on investment within a short period of time, often, there is a misalignment between scientific and capitalistic expectations. Many biotechnology based companies rely on venture capital, a form of private equity investments, to finance their operations. However, venture capitalists (VCs) typically operate on short return on investment timelines, which may not align with the longer development cycles characteristic of the biotechnology sector (Figure 2). Additionally, the need for large-scale and the high capital expenditures (CAPEX) required for commercially profitable production, along with the low-profit margins in high-volume commodity production, create further barriers to obtaining investment. While this misalignment is not universal, it remains a challenge for many biotech startups.
The U.S. government has implemented several programs to address the financing void that often arises during the biotechnology innovation process. These include the Small Business Innovation Research (SBIR) and Small Business Technology Transfer (STTR) programs, which provide phased funding across all Technology Readiness Levels (TRLs); the DOE Loan Program Office, which offers debt financing for energy-related innovations; the DOE Office of Clean Energy Demonstrations which provides funding for demonstration-scale projects that provide proof of concept; and the newly established Office of Strategic Capital (OSC) within the DOD (as outlined in the FY24 National Defense Authorization Act), which is tasked with issuing loans and loan guarantees to stimulate private investment in critical technologies. An example is the office’s new Equipment Loan Financing through OSC’s Credit Program.

Biotechnology development timelines typically take around ~10+ years to complete and reach the market due to longer R&D and Demonstration & Scale-Up phases, while non-biotechnology development timelines are generally much shorter, averaging around ~5+ years.
While these efforts are important, they are insufficient on their own to de-risk the sector to the degree which is needed to realize the full potential of the U.S. bioeconomy. To effectively support the biotechnology innovation pipeline at critical stages, the government must explore and implement additional financial mechanisms that attract more private investment and mitigate the inherent risks associated with biotechnology innovation. Building on existing resources like the Regional Technology and Innovation Hubs, NSF Regional Innovation Engines, and Manufacturing USA Institutes, help stimulate private sector investment and are crucial for strengthening the nation’s economic competitiveness.
The newly established Office of Strategic Capital (OSC) within the DOD is well-positioned to enhance resilience in critical sectors for national security, including biotechnology and biomanufacturing, through large-scale investments. Biotechnology and biomanufacturing inherently require significant CAPEX, expenses related to the purchase, upgrade, or maintenance of physical assets. This requires substantial amounts of strategic and concessional capital to de-risk and accelerate the biomanufacturing process. By creating, implementing, and leveraging various financial incentives and resources, the Office of Strategic Capital can help build the robust infrastructure necessary for private sector engagement.
To achieve this, the U.S. government should create the Bioeconomy Finance Program (BFP) within the OSC, specifically tasked with enabling and de-risking the biotechnology and biomanufacturing sectors through financial incentives and programs. The BFP should focus on different levels of funding based on the time required to scale, addressing potential ‘Valleys of Death’ that occur during the biomanufacturing and biotechnology innovation process. These funding levels would target short-term (1-2 years) scale-up hurdles to accelerate the biotechnology and biomanufacturing process, as well as long-term (3-5 years) scale-up challenges, providing transformative funding mechanisms that could either make or break entire sectors.
In addition to the federal programs within the BFP to de-risk the sector, states and regions must also make substantial investments and collaborate with federal efforts to accelerate biomanufacturing and biotechnology ecosystems within their own areas. While the federal government can provide a top-down strategy, regional efforts are critical for supporting the sector with bottom-up strategies that complement and align with federal investments and programs, ultimately enabling a sustainable and competitive biotechnology and biomanufacturing industry regionally. To facilitate this, regions should develop and implement state-wide investment initiatives like resource analysis, infrastructure programs, and a cohesive, long-term strategy focused on public-private partnerships. The federal government can encourage these regional efforts by ensuring continued funding for biotechnology hubs and creating additional opportunities for federal investment in the future.
Plan of Action
To strengthen and increase the competitiveness of the U.S. bioeconomy, a coordinated approach is needed that combines federal leadership with state-level action. This includes establishing a dedicated Bioeconomy Finance Program within the Office of Strategic Capital to create targeted financial mechanisms, such as loan programs, tax incentives, and volume guarantees. Additionally, states must be empowered to support commercial-scale biomanufacturing and infrastructure development, leveraging tech hubs, cross-regional partnerships, and building public-private partnerships to build capacity and foster innovation nationwide.
Recommendation 1. Establish and Fund a Bioeconomy Finance Program
Congress, in the next National Defense Authorization Act, should codify the Office of Strategic Capital (OSC) within DOD and authorize the creation of a Bioeconomy Finance Program (BFP) within the OSC to provide centralized federal structure for addressing financial gaps in the bioeconomy, thereby increasing productivity and competitiveness globally. In 2024, Congress expanded the OSCs mission to offer financial and technical support to entities within its 31 ‘Covered Technology Categories,’ including biotechnology and biomanufacturing. Additionally, in order to build resilience in the sector and maintain a competitive advantage globally while also strengthening national security, these substantial expenditures should be housed within the OSC. Establishing the BFP within the OSC at the DOD would allow for a targeted focus on these critical sectors, ensuring long-term stability and resilience against political shifts.
The DOD and OSC should leverage its own funding as well as its existing partnership with the Small Business Administration to direct $1 billion to set up the BFP to create and implement initiatives aimed at de-risking the U.S. bioeconomy. The Bioeconomy Finance Program should work closely with relevant federal agencies, such as the DOE, Department of Agriculture (USDA), and the Department of Commerce (DOC), to ensure a long-term cohesive strategy for financing bioeconomy innovation and biomanufacturing capacity.
Recommendation 2. Task the Bioeconomy Finance Program with Key Initiatives
A key element of the OSC’s mission and investment strategy is to provide financial incentives and support to entities within its 31 ‘Core Technology Categories’. By having BFP design and manage these financial initiatives for the biotechnology and biomanufacturing sectors, the OSC can leverage lessons from similar programs, such as the DOE’s loan program, to address the unique needs of these critical industries, which are essential for national security and economic growth.
Currently, the OSC has launched a credit program for equipment financing. While this is a necessary first step in fulfilling the office’s mission, the program is open to all 31 ‘Core Technology Categories’, resulting in broad, dilutive funding. To accelerate the bioeconomy and reduce risks in biotechnology and biomanufacturing, it is crucial to allocate resources specifically to these sectors. Therefore, BFP should take the lead in several key financial initiatives to support the growth of the bioeconomy, including:
Loan Programs
The BFP should develop specific biotechnology enabling loan programs, in addition to the new equipment loan financing program run by the OSC. These loan programs should be modeled after those in the DOE LPO, focusing on biomanufacturing scale-up, technology transfer, and overcoming financing gaps that hinder commercialization.
Example loan programs:
- DOE Title 17 Clean Energy Financing Program
- USDA Business & Industry Loan Guarantee
- Solar Foods EU Grant/Loan
Tax Incentives
The BFP office should create tax incentives tailored to the bioeconomy, such as, transferable investment and production tax credits. For example, the 45V tax credit for production of clean hydrogen could serve as a model for similar incentives aimed at other bioproducts.
Example tax incentives:
- The Inflation Reduction Act’s transferable tax credits are the gold standard for this category.
Volume Guarantees & Procurement Support
To mitigate risks in biomanufacturing, the office should establish volume guarantees for various bioproducts, offering financial assurance to manufacturers and encouraging private sector investment. An initial assessment should be conducted to identify which bioproducts are best suited for such guarantees. Additionally, the office should explore the possibility of procurement programs to increase government demand for bio-based products, further incentivizing industry growth and innovation. This effort should be undertaken in coordination with the USDA’s BioPreferred Program to minimize redundancy and to create a cohesive procurement strategy. In addition, the BFP should look to the procurement innovations promoted by the Office of Federal Procurement Policy to find solutions for forward funding to create a functioning market.
Example Volume Guarantees & Procurement Support:
- Heavy Forging Press Infrastructure Lease Agreement
- NASA and USAF buying Fairchild semiconductors in advance of needing them, and overbought performance
- Advance Market Commitments
- Joint Venture Partnerships
- Other Transaction Authorities
Recommendation 3. Develop Pipeline Programs to Address Financial and Time Horizon Needs
Utilizing the key initiatives highlighted above, the BFP should create a two-tiered financial mechanisms pipeline and program to address both the short-term and long-term financial needs. The different financial levels could potentially include:
- Level 1 – Short Term Scale-Up (1-2 years) Programs
- Subsidized cost of electricity and other utilities (waste, wastewater treatment, natural gas, energy, etc.)
- Funding for demonstration-scale projects and early-stage engineering development. Similar to the DOEs Office of Clean Energy Demonstrations or the DODs’ Defense Industrial Base Consortium round one $1-2M engineering grants)
- Tax holidays for corporate taxes and property taxes
- Allowing accelerated depreciation to reduce tax liabilities
- Land grants or subsidies for manufacturing assets
- Fast-track permitting and site preparation to avoid long waits
- Labor and workforce subsidies
- Removal of export duties on products created in the U.S. and shipped overseas
- Level 2 – Long Term Scale-Up (3-5 years) Programs
- Large-scale transferable tax credits (either production or investment tax credits) for manufacturing. Similar to the tax credits seen in the Inflation Reduction Act for clean energy.
- Large-scale manufacturing grants
- Large-scale, low-interest manufacturing loans and loan guarantees
- Government procurement contracts or commitment for offtake, such as partial/full volume guarantees
- Government direct or indirect equity investments in biomanufacturing and biotechnology innovations
Recommendation 4. State-Level Initiatives, Infrastructure Development, and Public-Private Partnerships
While federal efforts are crucial, a bottom-up approach is needed to support biomanufacturing and the bioeconomy at the state level. The federal government can support these regional activities by providing targeted funding, policy guidance, and financial incentives that align with regional priorities, ensuring a coordinated effort toward industry growth. States should be encouraged to complement federal initiatives by developing programs that support commercial-scale biomanufacturing. Key actions include:
- State-Level Bioeconomy Resource Analysis: Each state and region should conduct their own analysis to understand the bioeconomy resources at their disposal and determine what relevant resources they would need to establish or strengthen state or regional bioeconomies. Identifying these resources will help the nation understand its true bioeconomic potential by understanding where certain biomass is contained, what facilities are available and needed to develop an economically sustainable bioeconomy, and create data to better understand the economic return on investment.
- Once the analysis is completed, States should collaborate with federal agencies like the DOE, DOC, and Economic Development Administration (EDA) to create and apply for specialized grants for commercial-scale biomanufacturing facilities based off of these analyses. Grants should prioritize non-pharmaceutical biomanufacturing to expand the scope of bioeconomy growth beyond traditional sectors.
- Utility Infrastructure Grants: Another critical area is the creation of utility infrastructure needed to support biomanufacturing, such as wastewater treatment and electricity infrastructure. States should receive targeted funding for these infrastructure projects, which are essential for scaling up production. States should take these targeted funds and establish their own granting mechanism to build necessary, regional infrastructure that is needed long-term to support the U.S. bioeconomy.
- Tech Hub Partnerships: States should leverage existing tech hubs to serve as centers for innovation in bioeconomy technologies. These hubs, which are already positioned in regions with high technological readiness, can be incentivized to partner with other regions that may not yet have robust tech ecosystems. The goal is to create a collaborative, cross-regional network that fosters knowledge-sharing and builds capacity across the country.
- Foster Public-Private Partnerships (PPP): To ensure the success and sustainability of these initiatives, states should actively foster PPPs that bring together government, industry leaders, and academic institutions. These partnerships can help align private sector investment with public goals, enhance resource sharing, and accelerate the commercialization of bioeconomy technologies. By engaging in collaborative R&D, sharing infrastructure costs, and co-developing new biotechnologies, PPPs will play a crucial role in driving innovation and economic growth in the bioeconomy sector. In addition to fostering PPPs, regions should proactively work on creating models that enable these partnerships to become self-sustaining, helping to mitigate potential financial pitfalls if partners drop out of the partnership. By not only creating PPPs, but also ensuring they become fully independent over time, the associated risks with PPPs decrease significantly.
By addressing these steps at both the federal and state levels, the U.S. can create a robust, scalable framework for financing biomanufacturing and the broader bioeconomy, supporting the transition from early-stage innovation to commercial success and ensuring long-term economic competitiveness. A good example of how this approach works is the DOE Loan Program Office, which collaborates with state energy financing institutions. This partnership has successfully supported various projects by leveraging both federal and state resources to accelerate innovation and drive economic growth. This model makes sense for biomanufacturing and biotechnology within the BFP in the OSC, as it ensures coordination between federal and state efforts, de-risks the sector, and facilitates the scaling of transformative technologies.
Conclusion
Biotechnology innovation and biomanufacturing are critical components of the U.S. bioeconomy which drives innovation, economic growth, and global competitiveness, but these sectors face significant challenges due to the misalignment of development timelines and investment cycles. The sector’s inherent risks and long development processes create funding gaps, hindering the commercialization of vital biotechnologies and products. These challenges, including the ‘Valleys of Death,’ could stifle innovation, slow down progress, and result in the U.S. losing its global leadership in biotechnology if left unaddressed.
To overcome these obstacles, a coordinated and comprehensive approach to de-risk the sector is necessary. The establishment of the Bioeconomy Finance Program (BFP) within the DOD’s Office of Strategic Capital (OSC) offers a robust solution by providing targeted financial incentives, such as loans, tax credits, and volume guarantees, designed to de-risk the sector and attract private investment. These financial mechanisms would address both short-term and long-term scale-up needs, helping to bridge funding gaps and accelerate the transition from innovation to commercialization. Furthermore, building on existing government resources, alongside fostering state-level initiatives such as infrastructure development, and public-private partnerships, will create a holistic ecosystem that supports biotechnology and biomanufacturing at every stage and will substantially de-risk the sector. By empowering regions to develop their own bioeconomy strategies and leverage local federal government programs, like the EDA Tech Hubs, the U.S. can create a sustainable, scalable framework for growth. By taking these steps, the U.S. can strengthen both its economic position but also lead the world in development of transformative biotechnologies.
BioMADE, a Manufacturing Innovation Institute sponsored by the U.S. Department of Defense, plays an important role in advancing and developing the U.S. bioeconomy. Yet, BioMADE currently funds pilot to intermediate-scale projects, rather than commercial-scale projects. This leaves a significant funding gap, creating a distinct and significant challenge for the bioeconomy.. By contrast, the BFP within OSC would complement existing efforts by specifically targeting and mitigating risks in the biotechnology and biomanufacturing pipeline that current programs do not address. Furthermore, given that BioMADE is also funded by the DOD, enhanced coordination between these programs willenable a more robust and cohesive strategy to accelerate the growth of the U.S. bioeconomy.
While Private-Public Partnerships (PPPs) are already embedded in some federal regional programs, such as the EDA Tech Hubs, not all states or regions have access to these initiatives or funding. To ensure equitable growth and fully harness the economic potential of the bioeconomy across the nation, it will be important for regions and states to actively seek additional partnerships beyond federally-driven programs. This will empower them to build their own regional bioeconomies, or microbioeconomies, by tapping into regional strengths, resources, and expertise to drive localized innovation. Moreover, federal programs like EDA Tech Hubs are often focused on advancing existing technologies, rather than fostering the development of new ones. By expanding PPPs across the biotech sector, states and regions can spur broader economic growth and innovation by holistically developing all areas of biotechnology and biomanufacturing, enhancing the overall bioeconomy.
Creating a US Innovation Accelerator Modeled On In-Q-Tel
The U.S. should create a new non-governmental Innovation Accelerator modeled after the successful In-Q-Tel program to invest in small and mid-cap companies creating technologies that address critical needs of the United States. Doing so would directly address the bottleneck in our innovation pipeline that limits innovative companies from bringing their products to market.
Challenge and Opportunity
While the federal government funds basic, early-stage R&D, it leaves product development and commercialization to the private sector. This paradigm has created a so-called innovation Valley of Death: a lack of capital support for the transition to early commercialization, and one that stalls economic growth for many innovation-driven sectors. The U.S. currently leads the world in the formation of companies, but the limitations on capital sources artificially restrict growth. For example, the U.S. currently leads the world in biotechnology and biomedical innovation. The U.S. market alone is worth $600B, and is projected to exceed $1.5 trillion. However, international rivals are catching up: China is projected to close the biotechnology innovation gap in 2028. The U.S. must act quickly to protect its lead.
Typically, early and mid-stage innovations are too immature for private capital investors because they present an outsized risk. In addition, private capital tends to be more conservative in rough economic times, which further dries up the innovation pipeline. Investment “fads” tend to starve other fields of capital investment for potentially years at a time So, though the U.S. government provides significant early-stage discovery funding for innovation through its various agencies, the grant lifecycle is such that after the creation and initial development of new technologies, there are few mechanisms for continued support to drive products to market.
It is this period – after R&D but before commercial demonstration – that creates a substantial bottleneck for entrepreneurs where their work is too advanced for the usual government research and development grant funding but not developed enough to draw private investment. Existing SBIR and STTR grant programs that the government provides for this purpose are typically too small to significantly advance such innovations, while the application process is too cumbersome and slow to draw the interests of many companies. As a result, small businesses created around these technologies often fail because of funding challenges, rather than any faults of the innovations they are developing.
The federal government, therefore, has an opportunity to make the path from lab to market smoother by establishing a mechanism for supporting smaller companies developing innovative products that will substantially improve the lives of Americans. A new U.S. Innovation Accelerator will provide R&D funding to promising companies to accelerate innovations critical to the U.S. by de-risking them as they move toward private sector funding and commercialization.
Creating the U.S. Innovation Accelerator
We propose creating a new federally guided entity modeled on In-Q-Tel, the government funded not-for-profit venture capital firm that invests in companies that are developing technologies that can be used by intelligence agencies. Similar to In-Q-Tel, the U.S. Innovation Accelerator would operate independently of the government, but leverage federal investments in research and development to ensure that promising new technologies make it to market.
By having the organization live outside of the government, it will be able to pay staff a wage that is commensurate with their experience and draw top talent interested in driving innovation across the R&D spectrum. The organization would invest in the development of technology companies, and would partner with private capital sources to help develop critical technologies that are too risky for private capital entities to fund on their own. Such capital would allow innovation to flourish. In exchange, the organization could establish requirements for keeping such companies, and their manufacturing operations, in the U.S. for some period after receiving public funding (10 years, for example) to prevent the offshoring of technologies that are developed with public dollars. The agency would use a variety of funding vehicles to support companies to best match their needs and increase their chances of success.
Scope
The new U.S. Innovation Accelerator could be established as a sector-specific entity, (for example as biotechnology and healthcare-focused fund), or it could include a series of portfolios that invest in companies across the innovation spectrum. Both approaches have merits worth exploring: a narrower biomedical fund would have the benefit of quickly deploying capital to accelerate key areas of strategic U.S. interest while proving the concept and setting the stage to expand to other sectors of the economy; alternatively, if a larger pool of funding is available initially, a broader investment portfolio would allow for targeted investments across sectors ranging from biotechnology and agriculture to advanced materials and energy.
Sources of Capital
The U.S. Innovation Accelerator can be funded in several ways to create a robust investment vehicle for advancing biotechnology and healthcare innovation. Two potential models include a publicly-funded revolving fund, similar to In-Q-Tel, while the other would draw capital from retirement and pension funds providing a return on investment to voluntary investors.
Appropriations driven revolving fund. Like In-Q-Tel, Congress could kick start the Innovation Accelerator though direct appropriations. This annual investment could be curtailed and repaid to the treasury once the fund starts to realize returns on the investments it makes.
Thrift Savings Plan allocations. The federal employee retirement savings plan, the Thrift Savings Plan, holds approximately $700 billion in assets across various investment funds. By allowing for voluntary investment allocation in the Innovation Fund by federal employees, even a small percentage of TSP assets could provide billions in initial capital. This allocation would be structured as part of the TSP’s broader investment strategy, through the creation of a new specialized SBF fund option for participants.
U.S. State & Public Pension Plans. State and local government pension plans hold assets totaling roughly $6.25 trillion. The Innovation Fund could work with state pension administrators to create investment vehicles that align with their risk-return profiles and support both financial and social impact goals. These would be made available to plan participants in a similar manner to the TSP or through more traditional allocation.
Reforming the SBIR/STTR Programs. The SBIR and STTR programs represent 3.2% of the total Federal R&D budget for 11 agencies, but struggle to attract suitable applicants. This is not because there is a lack of need in early-stage innovation. Typically, these grants are judged and awarded by program managers that have little or no private sector experience, take too long from application to award, and provide insufficient funds for many companies to consider them. Those dollars could instead be allocated to the Innovation Accelerator program, and invested in more promising small businesses through a streamlined program that creates a revolving fund through returns on initial investment that can be then reinvested in additional promising companies. The program now uses ceilings for different phases of SBIR grants. These phases are artificial, and do not reflect the reality of the needs of different types of companies and thus should be eliminated and replaced with needs-based funding. USG agencies can issue technology priority guidance to the U.S. Innovation Accelerator and completely off-load the burden of having to run multiple SBIR programs.
Part of the proposed US Sovereign Wealth Fund. In February of this year, President Trump issued an Executive Order directing the Secretaries of Commerce and Treasury to develop plans for the creation of a sovereign wealth fund. The plan will include recommendations for funding mechanisms, investment strategies, fund structure, and a governance model. Such funds exist across many countries as a mechanism for amplifying the financial return on the nation’s assets and to leverage those returns for strategic benefit and economic growth. We propose that the U.S. Innovation Accelerator falls squarely in the remit of a sovereign fund and that the fund could serve as a sustainable source of capital to fund the development of innovative companies and products that address critical national challenges and directly benefit Americans in tangible ways.
Structure and operations
The Innovation Accelerator program will be structured similar to a lean private venture capital entity, with oversight from the U.S. government to inform strategic deployment of capital towards innovative companies that address unmet national needs. As an independent, non-profit organization or public benefit corporation (PBC), overhead can be kept low, and it can be guided by a small entrepreneurial Board of Directors representing innovative industries and investment professionals to ensure that the organization stays on mission. Further, the organization should collaborate with federal agencies to identify areas of national need and ensure that promising companies that originate from other federal research and development programs will have the capital necessary to bring their innovations to market, thus ensuring a stable innovation pipeline and addressing a longstanding bottleneck that has driven American companies to seek foreign capital or to offshore their operations.
The professional investment team would include expertise in a broad set of domains and with a proven track record of commercial success. The organization would have a high degree of autonomy but maintain alignment with national technology priorities and competitive strategy. Transparency and accountability will be paramount and include constant full public accounting of all investments and strategy.
The primary objective of the Innovation Accelerator will be to deliver game changing innovations that generate exceptional returns on investment while supporting the development of strategically important technologies. The U.S. Innovation Accelerator will also insulate domestic innovation from the delays and inefficiency caused by the private sector funding cycle.
Conclusion
The U.S. Innovation Accelerator would address a critical gap in the current U.S. innovation pipeline that was created as an artifact of the way we fund research. Most public dollars are dedicated to early-stage research but development and commercialization are normally left to the private sector, which is vulnerable to macroeconomic trends that can stall innovation for years. The U.S. Innovation Accelerator would open up that bottleneck by driving innovation and economic growth while addressing critical national needs. Because the U.S. Innovation Accelerator would exist outside of the federal government, it can be created without an act of Congress. The President could direct his administration through an executive action to develop plans and create the U.S. Innovation Accelerator as either part of the sovereign fund he has proposed or independent of that action. However, to get it initially funded and backed by the U.S. government, (see funding mechanisms above), Congress would have to appropriate dollars through an existing federal agency. Part of the charter for establishing the U.S. Innovation Accelerator could be repayment of the initial investments back to the U.S. Treasury from fund returns.
In-Q-Tel’s mission is to support a specific need of the U.S. government, to invest in companies that build information technologies that are of use to the intelligence community. Without such a model, intelligence agencies would have to rely on in-house expertise to develop such technologies. In the case of the U.S. Innovation Accelerator, the organization would invest in companies that are addressing critical technology gaps facing the entire nation. This would both de-risk such investments for private capital and drive forward innovations that might be out of favor with private capital investors that lack long-term strategic vision. It would also create a continuum from advanced research projects agencies through to the marketplace. This has been a particularly vexing issue for these agencies who generally invest in the research and development of new innovations, but not their advanced development and commercialization.
While there is indeed a significant amount of private capital available, private investors often exhibit risk aversion, particularly when it comes to groundbreaking innovations. Even in times of economic prosperity, private capital tends to gravitate toward trending sectors, driven by groupthink and the desire for near term exits. This lack of strategic patience completely neglects certain technology areas that are critical to solving national challenges. For instance, while private funding is readily available for AI/ML healthcare startups, companies developing new antibiotics often struggle to secure investment. This is a prime example of misalignment between private capital incentives and national health priorities. The proposed U.S. Innovation Accelerator would play a vital role in bridging this gap. It would act as a catalyst for pioneering innovations that tackle critical challenges, are truly novel, and have strong potential for success—areas where private capital might hesitate to invest due to a lack of strategic vision.
While In-Q-Tel still receives annual funding from the U.S. government, we propose a model where the accelerator draws dollars from a variety of sources and repays those sources over time as the businesses they fund succeed. The objective would be for the accelerator to repay those funds within the first 10 years and then remain completely independent financially.
In-Q-Tel decides on its investment theses based on its government agency partners’ perceived strategic needs. These are sometimes highly focused needs with small market potential. This can limit the potential for large exits because those companies would be unable to raise additional investment to make products or modifications to products with such a small market opportunity. The U.S. Innovation Accelerator would prioritize investments in innovative companies making products that have a clearly defined public market and dual-use benefit.