What’s Progress and What’s Not in the Trump Administration’s AI Action Plan

Artificial intelligence is already shaping how Americans work, learn, and receive vital services—and its influence is only accelerating. To steer this technology toward the public good, the United States needs a coherent, government-wide AI agenda that encourages innovation and trustworthiness, grounded in the best scientific evidence.

In February 2025, the Trump Administration sought public comment on its development of an AI Action Plan. The Federation of American Scientists saw this as an opportunity to contribute expert, nonpartisan guidance, combining insights from our policy team with ideas from the broader science and technology community, developed as part of our Day One Project. In our comments to the White House Office of Science and Technology Policy we recommended incorporating responsible policies to unleash AI innovation, accelerate AI adoption, ensure secure and trustworthy AI, and strengthen our existing world-class government institutions.

Last week, the Trump Administration released their AI Action Plan. The document contains many promising aspects related to AI research and development, interpretability and control, managing national security risks, and new models for accelerating scientific research. However, there are also concerning provisions, such as those inhibiting state regulations and removing mentions of diversity, equity, and inclusion and climate change from the NIST AI Risk Management Framework. These omissions weaken the United States’ ability to lead on some of the most pressing societal challenges associated with AI technologies.

Despite the AI Action Plan’s ambitious proposals, it will remain aspirational without funding, proper staffing, and clear timelines. The deep cuts to budgets and personnel across the government present an incongruous picture of the Administration’s priorities and policy agenda for emerging technologies, and places pressure on Congress to ensure this plan is properly supported.

Promising Advances & Opportunities

AI Interpretability

As an organization, we’ve developed and shared concrete ideas for advancing AI interpretability—the science of understanding how AI works under the hood. The Administration’s elevation of AI interpretability in the plan is a promising step. Improving interpretability is not only critical for technical progress but also essential to fostering public trust and confidence in AI systems.

We have provided a roadmap for the government to deliver on the promise of interpretable AI in both our AI Action Plan comments and a more detailed memo. In these documents we’ve advocated for advancing AI explainability through open-access resources, standardized benchmarks, common tasks, user-centered research, and a robust repository of techniques to ensure consistent, meaningful, and widely applicable progress across the field. We’ve also argued for the federal government to prioritize interpretable AI in procurement—especially for high-stakes applications—and to establish research and development agreements with AI companies and interpretability research organizations to red team critical systems and conduct targeted interpretability research.

AI Research and Development

Beyond interpretability, the AI Action Plan lays out an ambitious and far-reaching agenda for AI research and development, including robustness and control, advancing the science of AI, and building an AI evaluation ecosystem. We recognize that the Administration has incorporated forward-looking proposals that echo those from our Day One Project—such as building world-class scientific datasets and using AI to accelerate materials discovery. These policy proposals showcase our perspective that the federal government has a critical role to play in supporting groundbreaking scientific and technical research.

A Toolbox for AI Procurement

The Administration’s focus on strengthening the federal workforce’s capacity to use and manage AI is an essential step toward responsible deployment, cross-agency coordination, and reliability in government AI use. The proposed GSA-led AI procurement toolbox closely mirrors our recommendation for a resource to guide agencies through the AI acquisition process. Proper implementation of this policy could further support government efficiency and agility to respond to the needs of constituents.

Managing National Security Risks

The Administration also clearly recognizes the emerging national security risks posed by AI. While the exact nature of many of these risks remains uncertain, the plan contains prudent recommendations on key areas like biosecurity and cybersecurity, and highlights the important role that the Center for AI Standards and Innovation can play in responding to these risks. FAS has previously published policy ideas on how to prepare for emerging AI threats and create a system for reporting AI incidents, as well as outlining how CAISI can play a greater role in advancing AI reliability and security. These proposals can help the government implement the recommendations advanced in the Action Plan.

Focused Research Organizations

The Administration’s support of Focused Research Organizations (FROs) is a promising step. FROs are organizations that address well-defined challenges that require scale and coordination but that are not immediately profitable, and are an exciting model for accelerating scientific progress. FAS first published on FROs in 2020, and has since released a range of proposals from experts that are well-suited to the FRO model. Since 2020, various FROs have gained over $100 million in philanthropic funding, but we believe that this is the first time that the U.S. government has explicitly embraced the FRO model.

Where the AI Action Plan Falls Short

Restricting State-Level Guardrails

The Administration’s AI Action Plan proposes to restrict federal AI funding to states when state AI rules “hinder the effectiveness” of that funding. While avoiding unnecessary red tape is sensible, this unclear standard could offer the administration a wide latitude to block state rules at its discretion. FAS has recently opposed preemption of state-level AI regulation by Congress in the absence of federal action. Without national standards for AI, state rules provide an opportunity to develop best practices for responsible AI adoption.

Failing to Address Bias in AI Systems

We are also concerned by the recommended revision to the NIST AI Risk Management Framework (RMF) that would eliminate references to diversity, equity, and inclusion. AI bias is a proven, measurable phenomenon, as documented by a broad scientific consensus from leading researchers and practitioners across sectors. Failing to address such biases leaves the public vulnerable to the harms of discriminatory or unfair systems that can affect people in areas like healthcare, housing, hiring, and access to public services. This includes deeply consequential biases, such as those affecting rural communities. A lack of action to address AI bias will only inhibit beneficial adoption and further erode trust in the accuracy of algorithmic systems.

The AI Action Plan contains a direction for the federal government to only procure AI models from developers who “ensure that their systems are objective and free from top-down ideological bias,” which is implemented via an associated executive order. Building modern AI systems involves a huge range of choices, including which data to use for training, how to “fine tune” the model for particular use-cases, and the “system prompt” which guides model behavior. Each of these stages can affect model outputs in ways that are not well understood and can be difficult to control. There is no standard definition for what constitutes a model that is “free from top-down ideological bias”, and this vague standard could easily be misused or improperly implemented at the agency level with unintended consequences for the public. We encourage the administration to instead focus on increasing transparency and explainability of systems as a mechanism to prevent unintended bias in outputs.

Ignoring the Environmental Costs and Opportunities

The Administration’s direction to remove mention of climate change from the RMF overlooks the very real climate and environment impacts associated with the growing resource demands of large-scale AI systems. Measuring and managing environmental impacts is an important component of AI infrastructure buildout, and removing this policy lever will also restrict AI adoption. This is also a missed opportunity to push forward the ways that AI can help tackle climate change and other environmental issues. In our recent AI and Energy Policy Sprint, we developed policy memos which highlighted the benefits AI could bring to our energy system and environment, while also highlighting ways of responding to AI’s environmental and health impacts.

The Importance of Public Trust

The current lack of public trust in AI risks inhibiting innovation and adoption of AI systems, meaning new methods will not be discovered and new benefits won’t be felt. A failure to uphold high standards in the technology we deploy will also place our nation at a strategic disadvantage compared to our competitors. Recognizing this issue, both the first and second Trump administrations have emphasized public trust as a key theme in their AI policy documents. Many of the research directions outlined in the administration’s AI Action Plan promise to steer AI technology in a more trustworthy direction and deliver widespread benefits to the public. However, several measures simultaneously threaten to undermine important guardrails, while cuts to important government programs also work against the goals the administration has set for itself.

The Federation of American Scientists will continue to collaborate with the scientific community to place rigorous evidence-based policy at the heart of delivering AI that works for all Americans.

Use Artificial Intelligence to Analyze Government Grant Data to Reveal Science Frontiers and Opportunities

President Trump challenged the Director of the Office of Science and Technology Policy (OSTP), Michael Kratsios, to “ensure that scientific progress and technological innovation fuel economic growth and better the lives of all Americans”. Much of this progress and innovation arises from federal research grants. Federal research grant applications include detailed plans for cutting-edge scientific research. They describe the hypothesis, data collection, experiments, and methods that will ultimately produce discoveries, inventions, knowledge, data, patents, and advances. They collectively represent a blueprint for future innovations.

AI now makes it possible to use these resources to create extraordinary tools for refining how we award research dollars. Further, AI can provide unprecedented insight into future discoveries and needs, shaping both public and private investment into new research and speeding the application of federal research results.

We recommend that the Office of Science and Technology Policy (OSTP) oversee a multiagency development effort to fully subject grant applications to AI analysis to predict the future of science, enhance peer review, and encourage better research investment decisions by both the public and the private sector. The federal agencies involved should include all the member agencies of the National Science and Technology Council (NSTC).

Challenge and Opportunity

The federal government funds approximately 100,000 research awards each year across all areas of science. The sheer human effort required to analyze this volume of records remains a barrier, and thus, agencies have not mined applications for deep future insight. If agencies spent just 10 minutes of employee time on each funded award, it would take 16,667 hours in total—or more than eight years of full-time work—to simply review the projects funded in one year. For each funded award, there are usually 4–12 additional applications that were reviewed and rejected. Analyzing all these applications for trends is untenable. Fortunately, emerging AI can analyze these documents at scale. Furthermore, AI systems can work with confidential data and provide summaries that conform to standards that protect confidentiality and trade secrets. In the course of developing these public-facing data summaries, the same AI tools could be used to support a research funder’s review process.

There is a long precedent for this approach. In 2009, the National Institutes of Health (NIH) debuted its Research, Condition, and Disease Categorization (RCDC) system, a program that automatically and reproducibly assigns NIH-funded projects to their appropriate spending categories. The automated RCDC system replaced a manual data call, which resulted in savings of approximately $30 million per year in staff time, and has been evolving ever since. To create the RCDC system, the NIH pioneered digital fingerprints of every scientific grant application using sophisticated text-mining software that assembled a list of terms and their frequencies found in the title, abstract, and specific aims of an application. Applications for which the fingerprints match the list of scientific terms used to describe a category are included in that category; once an application is funded, it is assigned to categorical spending reports.

NIH staff soon found it easy to construct new digital fingerprints for other things, such as research products or even scientists, by scanning the title and abstract of a public document (such as a research paper) or by all terms found in the existing grant application fingerprints associated with a person.

NIH review staff can now match the digital fingerprints of peer reviewers to the fingerprints of the applications to be reviewed and ensure there is sufficient reviewer expertise. For NIH applicants, the RePORTER webpage provides the Matchmakertool to create digital fingerprints of title, abstract, and specific aims sections, and match them to funded grant applications and the study sections in which they were reviewed. We advocate that all agencies work together to take the next logical step and use all the data at their disposal for deeper and broader analyses.

We offer five recommendations for specific use cases below:

Use Case 1: Funder support. Federal staff could use AI analytics to identify areas of opportunity and support administrative pushes for funding.

When making a funding decision, agencies need to consider not only the absolute merit of an application but also how it complements the existing funded awards and agency goals. There are some common challenges in managing portfolios. One is that an underlying scientific question can be common to multiple problems that are addressed in different portfolios. For example, one protein may have a role in multiple organ systems. Staff are rarely aware of all the studies and methods related to that protein if their research portfolio is restricted to a single organ system or disease. Another challenge is to ensure proper distribution of investments across a research pipeline, so that science progresses efficiently. Tools that can rapidly and consistently contextualize applications across a variety of measures, including topic, methodology, agency priorities, etc., can identify underserved areas and support agencies in making final funding decisions. They can also help funders deliberately replicate some studies while reducing the risk of unintentional duplication.

Use Case 2: Reviewer support. Application reviewers could use AI analytics to understand how an application is similar to or different from currently funded federal research projects, providing reviewers with contextualization for the applications they are rating.

Reviewers are selected in part for their knowledge of the field, but when they compare applications with existing projects, they do so based on their subjective memory. AI tools can provide more objective, accurate, and consistent contextualization to ensure that the most promising ideas receive funding.

Use Case 3: Grant applicant support: Research funding applicants could be offered contextualization of their ideas among funded projects and failed applications in ways that protect the confidentiality of federal data.

NIH has already made admirable progress in this direction with their Matchmaker tool—one can enter many lines of text describing a proposal (such as an abstract), and the tool will provide lists of similar funded projects, with links to their abstracts. New AI tools can build on this model in two important ways. First, they can help provide summary text and visualization to guide the user to the most useful information. Second, they can broaden the contextual data being viewed. Currently, the results are only based on funded applications, making it impossible to tell if an idea is excluded from a funded portfolio because it is novel or because the agency consistently rejects it. Private sector attempts to analyze award information (e.g., Dimensions) are similarly limited by their inability to access full applications, including those that are not funded. AI tools could provide high-level summaries of failed or ‘in process’ grant applications that protect confidentiality but provide context about the likelihood of funding for an applicant’s project.

Use Case 4: Trend mapping. AI analyses could help everyone—scientists, biotech, pharma, investors— understand emerging funding trends in their innovation space in ways that protect the confidentiality of federal data.

The federal science agencies have made remarkable progress in making their funding decisions transparent, even to the point of offering lay summaries of funded awards. However, the sheer volume of individual awards makes summarizing these funding decisions a daunting task that will always be out of date by the time it is completed. Thoughtful application of AI could make practical, easy-to-digest summaries of U.S. federal grants in close to real time, and could help to identify areas of overlap, redundancy, and opportunity. By including projects that were unfunded, the public would get a sense of the direction in which federal funders are moving and where the government might be underinvested. This could herald a new era of transparency and effectiveness in science investment.

Use Case 5: Results prediction tools. Analytical AI tools could help everyone—scientists, biotech, pharma, investors—predict the topics and timing of future research results and neglected areas of science in ways that protect the confidentiality of federal data.

It is standard practice in pharmaceutical development to predict the timing of clinical trial results based on public information. This approach can work in other research areas, but it is labor-intensive. AI analytics could be applied at scale to specific scientific areas, such as predictions about the timing of results for materials being tested for solar cells or of new technologies in disease diagnosis. AI approaches are especially well suited to technologies that cross disciplines, such as applications of one health technology to multiple organ systems, or one material applied to multiple engineering applications. These models would be even richer if the negative cases—the unfunded research applications—were included in analyses in ways that protect the confidentiality of the failed application. Failed applications may signal where the science is struggling and where definitive results are less likely to appear, or where there are underinvested opportunities.

Plan of Action

Leadership

We recommend that OSTP oversee a multiagency development effort to achieve the overarching goal of fully subjecting grant applications to AI analysis to predict the future of science, enhance peer review, and encourage better research investment decisions by both the public and the private sector. The federal agencies involved should include all the member agencies of the NSTC. A broad array of stakeholders should be engaged because much of the AI expertise exists in the private sector, the data are owned and protected by the government, and the beneficiaries of the tools would be both public and private. We anticipate four stages to this effort.

Recommendation 1. Agency Development

Pilot: Each agency should develop pilots of one or more use cases to test and optimize training sets and output tools for each user group. We recommend this initial approach because each funding agency has different baseline capabilities to make application data available to AI tools and may also have different scientific considerations. Despite these differences, all federal science funding agencies have large archives of applications in digital formats, along with records of the publications and research data attributed to those awards.

These use cases are relatively new applications for AI and should be empirically tested before broad implementation. Trend mapping and predictive models can be built with a subset of historical data and validated with the remaining data. Decision support tools for funders, applicants, and reviewers need to be tested not only for their accuracy but also for their impact on users. Therefore, these decision support tools should be considered as a part of larger empirical efforts to improve the peer review process.

Solidify source data: Agencies may need to enhance their data systems to support the new functions for full implementation. OSTP would need to coordinate the development of data standards to ensure all agencies can combine data sets for related fields of research. Agencies may need to make changes to the structure and processing of applications, such as ensuring that sections to be used by the AI are machine-readable.

Recommendation 2. Prizes and Public–Private Partnerships

OSTP should coordinate the convening of private sector organizations to develop a clear vision for the profound implications of opening funded and failed research award applications to AI, including predicting the topics and timing of future research outputs. How will this technology support innovation and more effective investments?

Research agencies should collaborate with private sector partners to sponsor prizes for developing the most useful and accurate tools and user interfaces for each use case refined through agency development work. Prize submissions could use test data drawn from existing full-text applications and the research outputs arising from those applications. Top candidates would be subject to standard selection criteria.

Conclusion

Research applications are an untapped and tremendously valuable resource. They describe work plans and are clearly linked to specific research products, many of which, like research articles, are already rigorously indexed and machine-readable. These applications are data that can be used for optimizing research funding decisions and for developing insight into future innovations. With these data and emerging AI technologies, we will be able to understand the trajectory of our science with unprecedented breadth and insight, perhaps to even the same level of accuracy that human experts can foresee changes within a narrow area of study. However, maximizing the benefit of this information is not inevitable because the source data is currently closed to AI innovation. It will take vision and resources to build effectively from these closed systems—our federal science agencies have both, and with some leadership, they can realize the full potential of these applications.

This memo produced as part of the Federation of American Scientists and Good Science Project sprint. Find more ideas at Good Science Project x FAS

Measuring and Standardizing AI’s Energy and Environmental Footprint to Accurately Access Impacts

The rapid expansion of artificial intelligence (AI) is driving a surge in data center energy consumption, water use, carbon emissions, and electronic waste—yet these environmental impacts, and how they will change in the future, remain largely opaque. Without standardized metrics and reporting, policymakers and grid operators cannot accurately track or manage AI’s growing resource footprint. Currently, companies often use outdated or narrow measures (like Power Usage Effectiveness, PUE) and purchase renewable credits to obscure true emissions. Their true carbon footprint may be as much as 662% higher than the figures they report. A single hyperscale AI data center can guzzle hundreds of thousands of gallons of water per day and contribute to a “mountain” of e-waste, yet only about a quarter of data center operators even track what happens to retired hardware.

This policy memo proposes a set of congressional and federal executive actions to establish comprehensive, standardized metrics for AI energy and environmental impacts across model training, inference, and data center infrastructure. We recommend that Congress directs the Department of Energy (DOE) and the National Institute of Standards and Technology (NIST) to design, collect, monitor and disseminate uniform and timely data on AI’s energy footprint, while designating the White House Office of Science and Technology Policy (OSTP) to coordinate a multi-agency council that coordinates implementation. Our plan of action outlines steps for developing metrics (led by DOE, NIST, and the Environmental Protection Agency [EPA]), implementing data reporting (with the Energy Information Administration [EIA], National Telecommunications and Information Administration [NTIA], and industry), and integrating these metrics into energy and grid planning (performed by DOE’s grid offices and the Federal Energy Regulatory Commission [FERC]). By standardizing how we measure AI’s footprint, the U.S. can be better prepared for the growth in power consumption while maintaining its leadership in artificial intelligence.

Challenge and Opportunity

Inconsistent metrics and opaque reporting make future AI power‑demand estimates extremely uncertain, leaving grid planners in the dark and climate targets on the line.

AI’s Opaque Footprint

Generative AI and large-scale cloud computing are driving an unprecedented increase in energy demand. AI systems require tremendous amounts of computing power both during training (the AI development period) and inference (when AI is used in real world applications). The rapid rise of this new technology is already straining energy and environmental systems at an unprecedented scale. Data centers consumed an estimated 415 Terawatt hours (TWh) of electricity in 2024 (roughly 1.5% of global power demand), and with AI adoption accelerating, the International Energy Agency (IEA) forecasts that data center energy use could more than double to 945 TWh by 2030. This is an added load comparable to powering an entire country the size of Sweden or even Germany. There are a range of projections of AI’s energy consumption, with some estimates suggesting even more rapid growth than the IEA. Estimates suggest that much of this growth will be concentrated in the United States.

The large divergence in estimates for AI-driven electricity demand stem from the different assumptions and methods used in each study. One study uses one of the parameters like the AI Query volume (the number of requests made by users for AI answers), another tries to estimate energy demand from the estimated supply of AI related hardware. Some estimate the Compound Annual Growth Rate (CAGR) of data center growth under different growth scenarios. Different authors make various assumptions about chip shipment growth, workload mix (training vs inference), efficiency gains, and per‑query energy. Amidst this fog of measurement confusion, energy suppliers are caught by surges in demand from new compute infrastructure on top of existing demands from sources like electric vehicles and manufacturing. Electricity grid operators in the United States typically plan for gradual increases in power demand that can be met with incremental generation and transmission upgrades. But if the rapid build-out of AI data centers, on top of other growing power demands, pushes global demand up by an additional hundreds of terawatt hours annually this will shatter the steady-growth assumption embedded in today’s models. Planners need far more granular, forward-looking forecasting methods to avoid driving up costs for rate-payers, last-minute scrambles to find power, and potential electricity reliability crises.

This surge in power demand also threatens to undermine climate progress. Many new AI data centers require 100–1000 megawatts (MW), equivalent to the demands of a medium-sized city, while grid operators are faced with connection lead times of over 2 years to connect to clean energy supplies. In response to these power bottlenecks some regional utilities, unable to supply enough clean electricity, have even resorted to restarting retired coal plants to meet data center loads, undermining local climate goals and efficient operation. Google’s carbon emissions rose 48% over the past five years and Microsoft’s by 23.4% since 2020, largely due to cloud computing and AI.

In spite of the risks to the climate, carbon emissions data is often obscured: firms often claim “carbon neutrality” via purchased clean power credits, while their actual local emissions go unreported. One analysis found Big Tech (Amazon, Meta) data centers may emit up to 662% more CO₂ than they publicly report. For example, Meta’s 2022 data center operations reported only 273 metric tons CO₂ (using market-based accounting with credits), but over 3.8 million metric tons CO₂ when calculated by actual grid mix according to one analysis—a more than 19,000-fold increase. Similarly, AI’s water impacts are largely hidden. Each interactive AI query (e.g. a short session with a language model) can indirectly consume half a liter of fresh water through data center cooling, contributing to millions of gallons used by AI servers—but companies rarely disclose water usage per AI workload. This lack of transparency masks the true environmental cost of AI, hinders accountability, and impedes smart policymaking.

Outdated and Fragmented Metrics

Legacy measures like Power Usage Effectiveness (PUE) miss what is important for AI compute efficiency, such as water consumption, hardware manufacturing, and e-waste.

The metrics currently used to gauge data center efficiency are insufficient for AI-era workloads. Power Usage Effectiveness (PUE), the two-decades-old standard, gives only a coarse snapshot of facility efficiency under ideal conditions. PUE measures total power delivered to a datacenter versus how much of that power actually makes it to the IT equipment inside. The more power used (e.g. for cooling), the worse the PUE ratio will be. However, PUE does not measure how efficiently the IT equipment actually uses the power delivered to it. Think about a car that reports how much fuel reaches the engine but not the miles per gallon of that engine. You can ensure that the fuel doesn’t leak out of the line on its way to the engine, but that engine might not be running efficiently. A good PUE is the equivalent of saying that fuel isn’t leaking out on its way to the engine; it might tell you that a data center isn’t losing too much energy to cooling, but won’t flag inefficient IT equipment. An AI training cluster with a “good” PUE (around 1.1) could still be wasteful if the hardware or software is poorly optimized.

In the absence of updated standards, companies “report whatever they choose, however they choose” regarding AI’s environmental impact. Few report water usage or lifecycle emissions. Only 28% of operators track hardware beyond its use, and just 25% measure e-waste, resulting in tons of servers and AI chips quietly ending up in landfills. This data gap leads to misaligned incentives—for instance, firms might build ever-larger models and data centers, chasing AI capabilities, without optimizing for energy or material efficiency because there is no requirement or benchmark to do so.

Opportunities for Action

Standardizing metrics for AI’s energy and environmental footprint presents a win-win opportunity. By measuring and disclosing AI’s true impacts, we can manage them. With better data, policymakers can incentivize efficiency innovations (from chip design to cooling to software optimization) and target grid investments where AI load is rising. Industry will benefit too: transparency can highlight inefficiencies (e.g. low server utilization or high water-cooled heat that could be recycled) and spur cost-saving improvements. Importantly, several efforts are already pointing the way. In early 2024, bicameral lawmakers introduced the Artificial Intelligence Environmental Impacts Act, aiming to have the EPA study AI’s environmental footprint and develop measurement standards and a voluntary reporting system via NIST. Internationally, the European Union’s upcoming AI Act will require large AI systems to report energy use, resource consumption, and other life cycle impacts, and the ISO is preparing “sustainable AI” standards for energy, water, and materials accounting. The U.S. can build on this momentum. A recent U.S. Executive Order (Jan 2025) already directed DOE to draft reporting requirements for AI data centers covering their entire lifecycle—from material extraction and component manufacturing to operation and retirement—including metrics for embodied carbon (greenhouse-gas emissions that are “baked into” the physical hardware and facilities before a single watt is consumed to run a model), water usage, and waste heat. It also launched a DOE–EPA “Grand Challenge” to push the PUE ratio below 1.1 and minimize water usage in AI facilities. These signals show that there is willingness to address the problem. Now is the time to implement a comprehensive framework that standardizes how we measure AI’s environmental impact. If we seize this opportunity, we can ensure innovation in AI is driven by clean energy, a smarter grid, and less environmental and economic burden on communities.

Plan of Action

To address this challenge, Congress should authorize DOE and NIST to lead an interagency working group and a consortium of public, private and academic communities to enact a phased plan to develop, implement, and operationalize standardized metrics, in close partnership with industry.

Recommendation 1. Identify and Assign Agency Mandates

Creating and Implementing this measurement framework requires concerted action by multiple federal agencies, each leveraging its mandate. The Department of Energy (DOE) should serve as the co-lead federal agency driving this initiative. Within DOE, the Office of Critical and Emerging Technologies (CET) can coordinate AI-related efforts across DOE programs, given its focus on AI and advanced tech integration. The National Institute of Standards and Technology (NIST) will also act as a co-lead for this initiative leading the metrics development and standardization effort as described, convening experts and industry. The White House Office of Science and Technology Policy (OSTP) will act as the coordinating body for this multi-agency effort. OSTP, alongside the Council on Environmental Quality (CEQ), can ensure alignment with broader energy, environment, and technology policy. The Environmental Protection Agency (EPA) should take charge of environmental data collection and oversight. The Federal Energy Regulatory Commission (FERC) should play a supporting role by addressing grid and electricity market barriers. FERC should streamline interconnection processes for new data center loads, perhaps creating fast-track procedures for projects that commit to high efficiency and demand flexibility.

Congressional leadership and oversight will be key. The Senate Committee on Energy and Natural Resources and House Energy & Commerce Committee (which oversee energy infrastructure and data center energy issues) should champion legislation and hold hearings on AI’s energy demands. The House Science, Space, and Technology Committee and Senate Commerce, Science, & Transportation Committee (which oversee NIST, and OSTP) should support R&D funding and standards efforts. Environmental committees (like Senate Environment and Public Works, House Natural Resources) should address water use and emissions. Ongoing committee oversight can ensure agencies stay on schedule and that recommendations turn into action (for example, requiring an EPA/DOE/NIST joint report to Congress within a set timeframe(s).

Congress should mandate a formal interagency task force or working group, co-led by the Department of Energy (DOE) and the National Institute of Standards and Technology (NIST), with the White House Office of Science and Technology Policy (OSTP) serving as the coordinating body and involving all relevant federal agencies. This body will meet regularly to track progress, resolve overlaps or gaps, and issue public updates. By clearly delineating responsibilities, The federal government can address the measurement problem holistically.

Recommendation 2. Develop a Comprehensive AI Energy Lifecycle Measurement Framework

A complete view of AI’s environmental footprint requires metrics that span the full lifecycle, including every layer from chip to datacenter, workload drivers, and knock‑on effects like water use and electricity prices.

Create new standardized metrics that capture AI’s energy and environmental footprint across its entire lifecycle—training, inference, data center operations (cooling/power), and hardware manufacturing/disposal. This framework should be developed through a multi-stakeholder process led by NIST in partnership with DOE and EPA, and in consultation with industry, academia as well as state and local governments.

Key categories should include:

Data Center Efficiency Metrics: how effectively do data centers use power?
AI Hardware & Compute Metrics: e.g. Performance per Watt (PPW)—the throughput of AI computations per watt of power.
Cooling and Water Metrics: How much energy and water are being used to cool these systems?
Environmental Impact Metrics: What is the carbon intensity per AI task?
Composite or Lifecycle Metrics: Beyond a single point in time, what are the lifetime characteristics of impact for these systems?

Designing standardized metrics

NIST, with its measurement science expertise, should coordinate the development of these metrics in an open process, building on efforts like NIST’s AI Standards Working Group—a standing body chartered under the Interagency Committee on Standards Policy which brings together technical stakeholders to map the current AI-standards landscape, spot gaps, and coordinate U.S. positions and research priorities. The goal is to publish a standardized metrics framework and guidelines that industry can begin adopting voluntarily within 12 months. Where possible, leverage existing standards (for example, those from the Green Grid consortium on PUE and Water Usage Effectiveness (WUE), or IEEE/ISO standards for energy management) and tailor them to AI’s unique demands. Crucially, these metrics must be uniformly defined to enable apples-to-apples comparisons and periodically updated as technology evolves.

Review, Governance, and improving metrics

We recommend establishing a Metrics Review Committee (led by NIST with DOE/EPA and external experts) to refine the metrics whenever needed, host stakeholder workshops, and public updates. This continuous improvement process will keep the framework current with new AI model types, cooling tech, and hardware advances, ensuring relevance into the future. For example, when we move from the current model of chatbots responding to queries to agentic AI systems that plan, act, remember, and iterate autonomously, traditional “energy per query” metrics no longer capture the full picture.

Recommendation 3. Operationalize Data Collection, Reporting, Analysis and Integrate it into Policy

Start with a six‑month voluntary reporting program, and gradually move towards a mandatory reporting mechanism which feeds straight into EIA outlooks and FERC grid planning.

The task force should solicit inputs via a Request for Information (RFI) — similar to DOE’s recent RFI on AI infrastructure development, asking data center operators, AI chip manufacturers, cloud providers, utilities, and environmental groups to weigh in on feasible reporting requirements and data sharing methods. Within 12 months of starting, this taskforce should complete (a) a draft AI energy lifecycle measurement framework (with standardized definitions for energy, water, carbon, and e-waste metrics across training and data center operations), and (b) an initial reporting template for technology companies, data centers and utilities to pilot.

With standardized metrics in hand, we must shift the focus to implementation and data collection at scale. In the beginning, a voluntary AI energy reporting program can be launched by DOE and EPA (with NIST overseeing the standards). This program would provide guidance to AI developers (e.g. major model-training companies), cloud service providers, and data center operators to report their metrics on an annual or quarterly basis.

After a trial run of the voluntary program, Congress should enact legislation to create a mandatory reporting regime that borrows the best features of existing federal disclosure programs. One useful template is EPA’s Greenhouse Gas Reporting Program, which obliges any facility that emits more than 25,000 tons of CO₂ equivalent per year to file standardized, verifiable electronic reports. The same threshold logic could be adapted for data centers (e.g., those with more than 10 MW of IT load) and for AI developers that train models above a specified compute budget. A second model is DOE/EIA’s Form EIA-923 “Power Plant Operations Report,” whose structured monthly data flow straight into public statistics and planning models. An analogous “Form EIA-AI-01” could feed the Annual Energy Outlook and FERC reliability assessments without creating a new bureaucracy. EIA could also consider adding specific questions or categories in the Commercial Buildings Energy Consumption Survey and Form EIA-861 to identify energy use by data centers and large computing loads. This may involve coordinating with the Census Bureau to leverage industrial classification data (e.g., NAICS codes for data hosting facilities) so that baseline energy/water consumption of the “AI sector” is measured in national statistics. NTIA, which often convenes multi stakeholder processes on technology policy, can host industry roundtables to refine reporting processes and address any concerns (e.g. data confidentiality, trade secrets). NTIA can help ensure that reporting requirements are not overly burdensome to smaller AI startups by working out streamlined methods (perhaps aggregated reporting via cloud providers, for instance). DOE’s Grid Deployment Office (GDO) and Office of Electricity (OE), with better data, should start integrating AI load growth into grid planning models and funding decisions. For example, GDO could prioritize transmission projects that will deliver clean power to regions with clusters of AI data centers, based on EIA data showing rapid load increases. FERC, for its part, can use the reported data to update its reliability and resource adequacy guidelines and possibly issue guidance for regional grid operators (RTOs/ISOs) to explicitly account for projected large computing loads in their plans.

Table 1. Roles and Responsibilities to Measure AI’s Environmental Impact

Agency/Entity	Role	Key Responsibilities
Department of Energy (DOE)	Co-lead	Office of Critical and Emerging Technologies (CET): coordinate AI-related efforts across DOE programs Office of Energy Efficiency and Renewable Energy (EERE) can lead on promoting energy-efficient data center technologies and practices (e.g. through R&D programs and partnerships) Office of Electricity (OE) and Grid Deployment Office address grid integration challenges (ensuring AI data centers have access to reliable clean power). DOE should also collaborate with utilities and FERC to plan for AI-driven electricity demand growth and to encourage demand-response or off-peak operation strategies for energy-hungry AI clusters.
National Institute of Standards and Technology (NIST)	Co-lead for metrics and standards	Lead metrics development and standardization efforts Convene experts and industry stakeholders Revive/expand AI Standards Coordination Working Group for sustainability metrics Publish technical standards for measuring AI energy use, water use, and emissions Host stakeholder consortium on AI environmental impacts (with EPA and DOE)
White House Office of Science and Technology Policy (OSTP)	Coordinating body	Coordinate multi-agency efforts Work with Council on Environmental Quality (CEQ) to align with climate and tech policy Integrate AI energy metrics into federal sustainability requirements via Federal Chief Sustainability Officer and OMB guidance Update OMB memos on data center optimization to include AI-specific measures
Environmental Protection Agency (EPA)	Environmental oversight	Lead environmental data collection and oversight Conduct comprehensive study of AI’s environmental impacts (with DOE) Examine AI systems’ lifecycle emissions, water use, and e-waste Apply greenhouse gas (GHG) accounting expertise Quantify metrics like carbon intensity using location-based grid emissions factors
Federal Energy Regulatory Commission (FERC)	Grid and market support	Address grid and electricity market barriers Streamline interconnection processes for new data center loads Create fast-track procedures for high-efficiency, demand-flexible projects Ensure regional grid reliability assessments account for projected AI/data center load growth
Congressional Committees	Legislative oversight	Energy Committees: Champion legislation and hold hearings on AI energy demands Senate Committee on Energy and Natural Resources House Energy & Commerce Committee Science/Technology Committees:Support R&D funding and standards efforts House Science, Space, and Technology Committee Senate Commerce, Science, & Transportation Committee Environmental Committees: Address water use and emissions Senate Environment and Public Works House Natural Resources Oversight Functions: Ensure agencies stay on schedule Require EPA/DOE/NIST joint report to Congress Address further legislative needs

This transparency will let policymakers, researchers, and consumers track improvements (e.g., is the energy per AI training decreasing over time?) and identify leaders/laggards. It will also inform mid-course adjustments that if certain metrics prove too hard to collect or not meaningful, NIST can update the standards. The Census Bureau can contribute by testing the inclusion of questions on technology infrastructure in its Economic Census 2027 and annual surveys, ensuring that the economic data of the tech sector includes environmental parameters (for example, collecting data center utility expenditures, which correlate with energy use). Overall, this would establish an operational reporting system and start feeding the data into both policy and market decisions.

Through these recommendations, responsible offices have clear roles: DOE spearheads efficiency measures in data center initiatives; OE (Office of Electricity and GDO (Grid Deployment Office) use the data to guide grid improvements; NIST creates and maintains the measurement standards; EPA oversees environmental data and impact mitigation; EIA institutionalizes energy data collection and dissemination; FERC adapts regulatory frameworks for reliability and resource adequacy; OSTP coordinates the interagency strategy and keeps the effort a priority; NTIA works with industry to smooth data exchange and involve them; and Census Bureau integrates these metrics into broader economic data. See the table below.Meanwhile, non-governmental actors like utilities, AI companies, and data center operators must not only be data providers but partners. Utilities could use this data to plan investments and can share insights on demand response or energy sourcing; AI developers and data center firms will implement new metering and reporting practices internally, enabling them to compete on efficiency (similar to car companies competing on miles per gallon ratings). Together, these actions create a comprehensive approach: measuring AI’s footprint, managing its growth, and mitigating its environmental impacts through informed policy.

Table 2. Example Metrics to Illustrate the Types of Shared Information

Metric Category	Metric Name	Definition	Purpose/Benefit
Data Center Efficiency Metrics	Power Usage Effectiveness (PUE)	Refined for AI workloads – ratio of total facility energy to IT equipment energy	Measures overall data center energy efficiency for AI-specific operations
	Data Center Infrastructure Efficiency (DCIE)	IT power versus total facility power (inverse of PUE)	Alternative perspective on facility efficiency, focusing on IT equipment proportion
	Energy Reuse Factor (ERF)	Quantifies how much waste heat is reused on-site	Measures ability to capture and utilize waste heat, reducing overall energy needs
	Carbon Usage Effectiveness (CUE)	Links energy use with carbon emissions (kg CO₂ per kWh)	Provides holistic view of facility carbon intensity beyond just power usage
Environmental Metrics	Energy Intensity	Analyzes energy consumed per unit of data volume processed (Kwh/Gb)	Reveals energy cost per data unit. Useful for tuning AI models.
Environmental Metrics	Annual water consumption	Measures total liters of water used annually at a data center level	Tracks overall water consumption, essential for annual planning and sustainability reporting.
AI Hardware & Compute Metrics	Performance per Watt (PPW)	Throughput of AI computations (FLOPS or inferences) per watt of power	Encourages energy-efficient model training and inference hardware
	Compute Utilization	Average utilization rates of AI accelerators (GPUs/TPUs)	Ensures expensive hardware is well-utilized rather than idling
	Training Energy per Model	Total kWh or emissions per training run (normalized by model size/training-hours)	Quantifies energy cost of model development

Conclusion

AI’s extraordinary capabilities should not come at the expense of our energy security or environmental sustainability. This memo outlines how we can effectively operationalize measuring AI’s environmental footprint by establishing standardized metrics and leveraging the strengths of multiple agencies to implement them. By doing so, we can address a critical governance gap: what isn’t measured cannot be effectively managed. Standard metrics and transparent reporting will enable AI’s growth while ensuring that data center expansion is met with commensurate increases in clean energy, grid upgrades, and efficiency gains.

The benefits of these actions are far-reaching. Policymakers will gain tools to balance AI innovation with energy and environment goals. For example, by being able to require improvements if an AI service is energy-inefficient, or to fast-track permits for a new data center that meets top sustainability standards. Communities will be better protected: with data in hand, we can avoid scenarios where a cluster of AI facilities suddenly strains a region’s power or water resources without local officials knowing in advance. Instead, requirements for reporting and coordination can channel resources (like new transmission lines or water recycling systems) to those communities ahead of time. The AI industry itself will benefit by building trust and reducing the risk of backlash or heavy-handed regulation; a clear, federal metrics framework provides predictability and a level playing field (everyone measures the same way), and it showcases responsible stewardship of technology. Moreover, emphasizing energy efficiency and resource reuse can reduce operating costs for AI companies in the long run, a crucial advantage as energy prices and supply chain concerns grow.

This memo is part of our AI & Energy Policy Sprint, a policy project to shape U.S. policy at the critical intersection of AI and energy. Read more about the Policy Sprint and check out the other memos here.

Frequently Asked Questions

Why do we need AI-specific environmental metrics? Don’t data centers already have efficiency standards?

While there are existing metrics like PUE for data centers, they don’t capture the full picture of AI’s impacts. Traditional metrics focus mainly on facility efficiency (power and cooling) and not on the computational intensity of AI workloads or the lifecycle impacts. AI operations involve unique factors—for example, training a large AI model can consume significant energy in a short time, and using that AI model continuously can draw power 24/7 across distributed locations. Current standards are outdated and inconsistent: one data center might report a low PUE but could be using water recklessly or running hardware inefficiently. AI-specific metrics are needed to measure things like energy per training run, water per cooling unit, or carbon per compute task, which no standard reporting currently requires. In short, general data center standards weren’t designed for the scale and intensity of modern AI. By developing AI-specific metrics, we ensure that the unique resource demands of AI are monitored and optimized, rather than lost in aggregate averages. This helps pinpoint where AI can be made more efficient (e.g., via better algorithms or chips)—an opportunity not visible under generic metrics.

How will multiple agencies work together to implement these recommendations?

AI’s environmental footprint is a cross-cutting issue, touching on energy infrastructure, environmental impact, technological standards, and economic data. No single agency has the full expertise or jurisdiction to cover all aspects. Each agency will have clearly defined roles (as outlined in the Plan of Action). For instance, NIST develops the methodology, DOE/EPA collect and use the data, EIA disseminates it, and FERC/Congress use it to adjust policies. This collaborative approach prevents blind spots. A single-agency approach would likely miss critical elements (for instance, a purely DOE-led effort might not address e-waste or standardized methods, which NIST and EPA can). The good news is that frameworks for interagency cooperation already exist, and this initiative aligns with broader administration priorities (clean energy, reliable grid, responsible AI). Thus, while it involves multiple agencies, OSTP and the White House will ensure everyone stays synchronized. The result will be a comprehensive policy that each agency helps implement according to its strength, rather than a piecemeal solution. See below:

Roles and Responsibilities to Measure AI’s Environmental Impact

Department of Energy (DOE): DOE should serve as the co-lead federal agency driving this initiative. Within DOE, the Office of Critical and Emerging Technologies (CET) can coordinate AI-related efforts across DOE programs, given its focus on AI and advanced tech integration. DOE’s Office of Energy Efficiency and Renewable Energy (EERE) can lead on promoting energy-efficient data center technologies and practices (e.g. through R&D programs and partnerships), while the Office of Electricity (OE) and Grid Deployment Office address grid integration challenges (ensuring AI data centers have access to reliable clean power). DOE should also collaborate with utilities and FERC to plan for AI-driven electricity demand growth and to encourage demand-response or off-peak operation strategies for energy-hungry AI clusters.

National Institute of Standards and Technology (NIST): NIST will also act as a co-lead for this initiative leading the metrics development and standardization effort as described, convening experts and industry. NIST should revive or expand its AI Standards Coordination Working Group to focus on sustainability metrics, and ultimately publish technical standards or reference materials for measuring AI energy use, water use, and emissions. NIST is also suited to host stakeholder consortium on AI environmental impacts, working in tandem with EPA and DOE.

White House, including the Office of Science and Technology Policy (OSTP): OSTP will act as the coordinating body for this multi-agency effort. OSTP, alongside the Council on Environmental Quality (CEQ), can ensure alignment with broader climate and tech policy (such as the U.S. Climate Strategy and AI initiatives). The Administration can also use the Federal Chief Sustainability Officer and OMB guidance to integrate AI energy metrics into federal sustainability requirements (for instance, updating OMB’s memos on data center optimization to include AI-specific measures).

Environmental Protection Agency (EPA): EPA should take charge of environmental data collection and oversight. In the near term, EPA (with DOE) would conduct the comprehensive study of AI’s environmental impacts, examining AI systems’ lifecycle emissions, water and e-waste. EPA’s expertise in greenhouse gas (GHG) accounting will ensure metrics like carbon intensity are rigorously quantified (e.g. using location-based grid emissions factors rather than unreliable REC-based accounting).

Federal Energy Regulatory Commission (FERC): FERC plays a supporting role by addressing grid and electricity market barriers. FERC should streamline interconnection processes for new data center loads, perhaps creating fast-track procedures for projects that commit to high efficiency and demand flexibility. FERC can ensure that regional grid reliability assessments start accounting for projected AI/data center load growth using data.

Congressional Committees: Congressional leadership and oversight will be key. The Senate Committee on Energy and Natural Resources and House Energy & Commerce Committee (which oversee energy infrastructure and data center energy issues) should champion legislation and hold hearings on AI’s energy demands. The House Science, Space, and Technology Committee and Senate Commerce, Science, & Transportation Committee (which oversee NIST and OSTP) should support R&D funding and standards efforts. Environmental committees (like Senate Environment and Public Works, House Natural Resources) should address water use and emissions. Ongoing committee oversight can ensure agencies stay on schedule and that recommendations turn into action (for example, requiring the EPA/DOE/NIST joint report to Congress in four years as the Act envisions, and then moving on any further legislative needs).

What exactly will companies and utilities have to report?

The plan requires high-level, standardized data that balances transparency with practicality. Companies running AI operations (like cloud providers or big AI model developers) would report metrics such as: total electricity consumed for AI computations (annually), average efficiency metrics (e.g. PUE, Carbon Usage Effectiveness (CUE), and WUE for their facilities), water usage for cooling, and e-waste generated (amount of hardware decommissioned and how it was handled). These data points are typically already collected internally for cost and sustainability tracking but the difference is they would be reported in a consistent format and possibly to a central repository. For utilities, if involved, they might report aggregated data center load in their service territory or significant new interconnections for AI projects (much of this is already in utility planning documents). See below for examples.

Metrics to Illustrate the Types of Shared Information

Data Center Efficiency Metrics: Power Usage Effectiveness (PUE) (refined for AI workloads), Data Center Infrastructure Efficiency (DCIE) which measures IT versus total facility power (the inverse of PUE), Energy Reuse Factor (ERF) to quantify how much waste heat is reused on-site, and Carbon Usage Effectiveness (CUE) to link energy use with carbon emissions (kg CO₂ per kWh). These give a holistic view of facility efficiency and carbon intensity, beyond just power usage.

AI Hardware & Compute Metrics: Performance per Watt (PPW)—the throughput of AI computations (like FLOPS or inferences) per watt of power, which encourages energy-efficient model training and inference. Compute Utilization—ensuring expensive AI accelerators (GPUs/TPUs) are well-utilized rather than idling (tracking average utilization rates). Training energy per model—total kWh or emissions per training run (possibly normalized by model size or training-hours). Inference efficiency—energy per 1000 queries or per inference for deployed models. Idle power draw—measure and minimize the energy hardware draws when not actively in use.

Cooling and Water Metrics: Cooling Energy Efficiency Ratio (EER)—the output cooling power per watt of energy input, to gauge cooling system efficiency. Water Usage Effectiveness (WUE)—liters of water used per kWh of IT compute, or simply total water used for cooling per year. These help quantify and benchmark the significant water and electricity overhead for thermal management in AI data centers.

Environmental Impact Metrics: Carbon Intensity per AI Task—CO₂ emitted per training or per 1000 inferences, which could be aggregated to an organizational carbon footprint for AI operations. Greenhouse Gas emissions per kWh—linking energy use to actual emissions based on grid mix or backup generation. Also, e-waste metrics—such as total hardware weight decommissioned annually, or a recycling ratio. For instance, tracking the tons of servers/chips retired and the fraction recycled versus landfilled can illuminate the life cycle impact.

Composite or Lifecycle Metrics: Develop ways to combine these factors to rate overall sustainability of AI systems. For example, an “AI Sustainability Score” could incorporate energy efficiency, renewables use, cooling efficiency, and end-of-life recycling. Another idea is an “AI Energy Star” rating for AI hardware or cloud services that meet certain efficiency and transparency criteria, modeled after Energy Star appliance ratings.

Won’t this be a burden or risk revealing trade secrets?

No, the intention is not to force disaggregation down to proprietary details (e.g., exactly how a specific algorithm uses energy) but rather to get macro-level indicators. Regarding trade secrets or sensitive info, the data collected (energy, water, emissions) is not about revealing competitive algorithms or data, it’s about resource use. These are analogous to what many firms already publish in sustainability reports (power usage, carbon footprint), just more uniformly. There will be provisions to protect any sensitive facility-level data (e.g., EIA could aggregate or anonymize certain figures in public releases). The goal is transparency about environmental impact, not exposure of intellectual property.

How will these metrics and data actually be used by the government?

Once collected, the data will become a powerful tool for evidence-based policymaking and oversight. At the strategic level, DOE and the White House can track whether the AI sector is becoming more efficient or not—for instance, seeing trends in energy-per-AI-training decreasing (good) or total water use skyrocketing (a flag for action).

What are some examples?

Energy planning: EIA will incorporate the numbers into its models, which guide national energy policy and investment. If data shows that AI is driving, say, an extra 5% electricity demand growth in certain regions, DOE’s Grid Deployment Office and FERC can respond by facilitating grid expansions or reliability measures in those areas.

Climate policy: EPA can use reported emissions data to update greenhouse gas inventories and identify if AI/data centers are becoming a significant source—if so, that could shape future climate regulations or programs (ensuring this sector contributes to emissions reduction goals).

Water resource management: If we see large water usage by AI in drought-prone areas, federal and state agencies can work on water recycling or alternative cooling initiatives.

Research and incentives: DOE’s R&D programs (through ARPA-E or National Labs) can target the pain points revealed—e.g., if e-waste volumes are high, fund research into longer-lasting hardware or recycling tech; if certain metrics like Energy Reuse Factor are low, push demonstration projects for waste heat reuse.

This could inform everything from ESG investment decisions to local permitting. For example, a company planning a new data center might be asked by local authorities, “What’s your expected PUE and water usage? The national average for AI data centers is X—will you do better?” In essence, the data ensures the government and public can hold the AI industry accountable for progress (or regress) on sustainability. By integrating these data into models and policies, the government can anticipate and avert problems (like grid strain or high emissions) before they grow, and steer the sector toward solutions.

The tech industry is global, so how will U.S. metrics align internationally?

AI services and data centers are worldwide, so consistency in how we measure impacts is important. The U.S. effort will be informed by and contribute to international standards. Notably, the ISO (International Organization for Standardization) is already developing criteria for sustainable AI, including energy, raw materials, and water metrics across the AI lifecycle NIST, which often represents the U.S. in global standards bodies, is involved and will ensure that our metrics framework aligns with ISO’s emerging standards. Similarly, the EU’s AI Act also has requirements for reporting AI energy and resource use. By moving early on our own metrics, the U.S. can actually help shape what those international norms look like, rather than react to them. This initiative will encourage U.S. agencies to engage in forums like the Global Partnership on AI (GPAI) or bilateral tech dialogues to promote common sustainability reporting frameworks. In the end, aligning metrics internationally will create a more level playing field—ensuring that AI companies can’t simply shift operations to avoid transparency. If the U.S., EU, and others all require similar disclosures, it reinforces responsible practices everywhere.

What if these measures make AI development more expensive or slow down innovation?

Shining a light on energy and resource use can drive new innovation in efficiency. Initially, there may be modest costs—for example, installing better sub-meters in data centers or dedicating staff time to reporting. However, these costs are relatively small in context. Many leading companies already track these metrics internally for cost management and corporate sustainability goals. We are recommending formalizing and sharing that information. Over time, the data collected can reduce costs: companies will identify wasteful practices (maybe servers idling, or inefficient cooling during certain hours) and correct them, saving on electricity and water bills. There is also an economic opportunity in innovation: as efficiency becomes a competitive metric, we expect increased R&D into low-power AI algorithms, advanced cooling, and longer-life hardware. Those innovations can improve performance per dollar as well. Moreover, policy support can offset any burdens—for instance, the government can provide technical assistance or grants to smaller firms to help them improve energy monitoring. We should also note that unchecked resource usage carries its own risks to innovation: if AI’s growth starts causing blackouts or public backlash due to environmental damage, that would seriously hinder AI progress.

Table 3. Roles of Government and Non-Government Stakeholders

Type	Agency / Office	Metric Development	Data Collection & Reporting	Analysis & Planning Integration	Policy & Oversight
Government	DOE – EERE	Lead role in energy efficiency metrics	Supports voluntary reporting systems	Integrates energy data into planning tools	Leads clean energy transitions
	DOE – OE			Uses data for grid forecasting	Coordinates grid reliability planning
	DOE – GDO			Integrates data into infrastructure planning	Prioritizes transmission buildout
	EPA	Co-leads lifecycle, emissions, water, and e-waste metrics	Leads environmental impact data collection	Tracks emissions, water use, and e-waste	Oversees regulation and Congressional briefings
	NIST	Lead on standardized metrics (PUE, WUE, etc.)	Provides protocols for reporting	Ensures data accuracy	Aligns with international standards
	EIA	Advises on metric use in national stats	Collects energy/water usage data	Publishes AI-specific trends	Maintains transparency and reporting
	FERC		Collects grid data from ISOs/RTOs	Integrates demand into reliability planning	Issues grid and rate guidance
	OSTP	Coordinates interagency framework	Oversees implementation roadmap	Monitors alignment with national AI goals	Ensures cross-agency cohesion
	NTIA	Supports digital infrastructure metric design	Industry interface for data exchange	Highlights interconnection/data demand	Aligns broadband/data policy with AI metrics
	Census Bureau	Develops AI/data infrastructure codes	Adds metrics to Economic Census	Cross-validates with energy data	Incorporates AI sector into federal stats
Non-Government	AI Developers	Work with NIST to refine compute/task efficiency metrics	Report training/inference energy, water, and emissions	Share model-specific data for load estimation	Participate in voluntary federal programs, support transparency
	Data Center Operators	Support infrastructure-level metric development (PUE, WUE)	Report operational metrics (PUE, WUE, emissions)	Share utilization/design data for planning	Engage in certifications, ESG benchmarks
	Utility Companies & Grid Operators		Provide energy delivery data, load forecasts, interconnection data	Inform regional reliability and grid expansion models	Align rates, plans with AI load growth
	Infrastructure Developers		Report energy/cooling projections and needs	Support planning/zoning/water coordination	Comply with environmental regulations
	Industry Consortia & Auditors	Assist in standard-setting and benchmarks	Aggregate anonymized member data	Validate and synthesize trends for government use	Provide third-party verification, build trust

Speed Grid Connection Using ‘Smart AI Fast Lanes’ and Competitive Prizes

Innovation in artificial intelligence (AI) and computing capacity is essential for U.S. competitiveness and national security. However, AI data center electricity use is growing rapidly. Data centers already consume more than 4% of U.S. electricity annually and could rise to 6% to 12% of U.S. electricity by 2028. At the same time, electricity rates are rising for consumers across the country, with transmission and distribution infrastructure costs a major driver of these increases. For the first time in fifteen years, the U.S. is experiencing a meaningful increase in electricity demand. Electricity use from data centers already consumes more than 25% of electricity in Virginia, which leads the world in data center installations. Data center electricity load growth results in real economic and environmental impacts for local communities. It also represents a national policy trial on how the U.S. responds to rising power demand from the electrification of homes, transportation, and manufacturing– important technology transitions for cutting carbon emissions and air pollution.

Federal and state governments need to ensure that the development of new AI and data center infrastructure does not increase costs for consumers, impact the environment, and exacerbate existing inequalities. “Smart AI Fast Lanes” is a policy and infrastructure investment framework that ensures the U.S. leads the world in AI while building an electricity system that is clean, affordable, reliable, and equitable. Leveraging innovation prizes that pay for performance, coupled with public-private partnerships, data center providers can work with the Department of Energy, the Foundation for Energy Security and Innovation (FESI), the Department of Commerce, National Labs, state energy offices, utilities, and the Department of Defense to drive innovation to increase energy security while lowering costs.

Challenge and Opportunity

Targeted policies can ensure that the development of new AI and data center infrastructure does not increase costs for consumers, impact the environment, and exacerbate existing energy burdens. Allowing new clean power sources co-located or contracted with AI computing facilities to connect to the grid quickly, and then manage any infrastructure costs associated with that new interconnection, would accelerate the addition of new clean generation for AI while lowering electricity costs for homes and businesses.

One of the biggest bottlenecks in many regions of the U.S. in adding much-needed capacity to the electricity grid are the so-called “interconnection queues”. There are different regional requirements for power plants to complete (often, a number of studies on how a project affects grid infrastructure) before they are allowed to connect. Solar, wind, and battery projects represented 95% of the capacity waiting in interconnection queues in 2023. The operator of Texas’ power grid, the Electric Reliability Council of Texas (ERCOT), uses a “connect and manage” interconnection process that results in faster interconnections of new energy supplies than the rest of the country. Instead of requiring each power plant to complete lengthy studies of needed system-wide infrastructure investments before connecting to the grid, the “connect and manage” approach in Texas gets power plants online quicker than a “studies first” approach. Texas manages any risks that arise using the power markets and system-wide planning efforts. The results are clear: the median time from an interconnection request to commercial operations in Texas was four years, compared to five years in New York and more than six and a half years in California.

“Smart AI Fast Lanes” expands the spirit of the Texas “connect and manage” approach nationwide for data centers and clean energy, and adds to it investment and innovation prizes to speed up the process, ensure grid reliability, and lower costs.

Data center providers would work with the Department of Energy, the Foundation for Energy Security and Innovation (FESI), the Department of Commerce, National Laboratories, state energy offices, utilities, and the Department of Defense to speed up interconnection queues, spur innovation in efficiency, and re-invest in infrastructure, to increase energy security and lower costs.

Why FESI Should Lead ‘Smart AI Fast Lanes’

With FESI managing this effort, the process can move faster than the government acting alone. FESI is an independent, non-profit, agency-related foundation that was created by Congress in the CHIPS and Science Act of 2022 to help the Department of Energy achieve its mission and accelerate “the development and commercialization of critical energy technologies, foster public-private partnerships, and provide additional resources to partners and communities across the country supporting solutions-driven research and innovation that strengthens America’s energy and national security goals”. Congress has created many other agency-related foundations, such as the Foundation for NIH, the National Fish and Wildlife Foundation, and the National Park Foundation, which was created in 1935. These agency-related foundations have a demonstrated record of raising external funding to leverage federal resources and enabling efficient public-private partnerships. As a foundation supporting the mission of the Department of Energy, FESI has a unique opportunity to quickly respond to emergent priorities and create partnerships to help solve energy challenges.

As an independent organization, FESI can leverage the capabilities of the private sector, academia, philanthropies, and other organizations to enable collaboration with federal and state governments. FESI can also serve as an access point to opening up additional external investment, and shared risk structures and clear rules of engagement make emerging energy technologies more attractive to institutional capital. For example, the National Fish and Wildlife Foundation awards grants that are matched with non-federal private, philanthropic, or local funding sources that multiply the impact of any federal investments. In addition, the National Fish and Wildlife Foundation has partnered with the Department of Defense and external funding sources to enhance coastal resilience near military installations. Both AI compute capabilities and energy resilience are of strategic importance to the Department of Defense, Department of Energy, and other agencies, and leveraging public-private partnerships is a key pathway to enhance capabilities and security. FESI leading a Smart AI Fast Lanes initiative could be a force multiplier to enable rapid deployment of clean AI compute capabilities that are good for communities, companies, and national security.

Use Prizes to Lessen Cost and Maximize Return

The Department of Energy has long used prize competitions to spur innovation and accelerate access to funding and resources. Prize competitions with focused objectives but unstructured pathways for success enables the private sector to compete and advance innovation without requiring a lot of federal capacity and involvement. Federal prize programs pay for performance and results, while also providing a mechanism to crowd in additional philanthropic and private sector investment. In the Smart AI Fast Lane framework, FESI could use prizes to support energy innovation from AI data centers while working with the Department of Energy’s Office of Cybersecurity, Energy Security, and Emergency Response (CESER) to enable a repeatable and scalable public private partnership program. These prizes would be structured so that there is a low administrative and operational effort required for FESI itself, with other groups such as American-Made, National Laboratories, or organizations like FAS, helping to provide technical expertise to review and administer prize applications. This can ensure quality while enabling scalable growth.

Plan of Action

Here’s how “Smart AI Fast Lanes” would work. For any proposed data center investment of more than 250 MW, companies could apply to work with FESI. Successful application would leverage public, private, and philanthropic funds and technical assistance. Projects would be required to increase clean energy supplies, achieve world-leading data center energy efficiency, invest in transmission and distribution infrastructure, and/or deploy virtual power plants for grid flexibility.

Recommendation 1. Use a “Smart AI Fast Lane” Connection Fee to Quickly Connect to the Grid, Further Incentivized by a “Bring Your Own Power” Prize

New large AI data center loads choosing the “Smart AI Fast Lane” would pay a fee to connect to the grid without first completing lengthy pre-connection cost studies. Those payments would go into a fund, managed and overseen by FESI, that would be used to cover any infrastructure costs incurred by regional grids for the first three years after project completion. The fee could be a flat fee based on data center size, or structured as an auction, enabling the data centers bidding the highest in a region to be at the front of the line. This enables the market to incentivize the highest priority additions. Alternatively, large load projects could choose to do the studies first and remain in the regular – and likely slower – interconnection queue to avoid the fee.

In addition, FESI could facilitate a “Bring Your Own Power” prize award that is a combination of public, private, and philanthropic funds that data center developers can match to contract for new, additional zero-emission electricity generated locally that covers twice as much as the data center uses annually. For data centers committing to this “Smart AI Fast Lane” process, both the data center and the clean energy supply would receive accelerated priority in the interconnection queue and technical assistance from National Laboratories. This leverages economies of scale for projects, lowers the cost of locally-generated clean electricity, and gets clean energy connected to the grid quicker. Prize resources would support a “connect and manage” interconnection approach to cover 75% of the costs of any required infrastructure for local clean power projects resulting from the project. FESI prize resources could further supplement these payments to upgrade electrical infrastructure in areas of national need for new electricity supplies to maintain electricity reliability. These include areas assessed by the North American Reliability Corporation to have a high risk of an electricity shortfall in the coming years, such as the Upper Midwest or Gulf Coast, or areas with an elevated risk such as California, the Great Plains, Texas, the Mid-Atlantic, or the Northeast.

Recommendation 2. Create an Efficiency Prize To Establish World-Leading Energy and Water Efficiency at AI Data Centers

Data centers have different design configurations that affect how much energy and water are needed to operate. Data centers use electricity for computing, but also for the cooling systems needed for computing equipment, and there are innovation opportunities to increase the efficiency of both. One historical measure of AI data center energy efficiency is Power Use Effectiveness (PUE), which is the total facility annual energy use, divided by the computing equipment annual energy use, with values closer to 1.0 being more efficient. Similarly, Water Use Effectiveness (WUE) is measured as total annual water use divided by the computing equipment annual energy use, with closer to zero being more efficient. We should continue to push for improvement in PUE and WUE, but these are incomplete current metrics to drive deep innovation because they do nor reflect how much computing power is provided and do not assess impacts in the broader infrastructure energy system. While there have been multiple different metrics for data center energy efficiency proposed over the past several years, what is important for innovation is to improve the efficiency of how much AI computing work we get for the amount of energy and water used. Just like efficiency in a car is measured in miles per gallon (MPG), we need to measure the “MPG” of how AI data centers perform work and then create incentives and competition for continuous improvements. There could be different metrics for different types of AI training and inference workloads, but a starting point could be the tokens per kilowatt-hour of electricity used. A token is a word or portion of a word that AI foundation models use for analysis. Another way could be to measure the efficiency of computing performance, or FLOPS, per kilowatt-hour. The more analysis an AI model or data center can perform using the same amount of energy, the more energy efficient it is.

FESI could deploy sliding scale innovation prizes based on data center size for new facilities that demonstrate leading edge AI data center MPG. These could be based on efficiency targets for tokens per kilowatt-hour, FLOPS per kilowatt-hour, top-performing PUE, or other metrics of energy efficiency. Similar prizes could be provided for water use efficiency, within different classes of cooling technologies that exceed best-in-class performance. These prizes could be modeled after the USDA’s agency-related foundation’s FFAR Egg-Tech Prize, which was a program that was easy to administer and has had great success. A secondary benefit of an efficiency innovation prize is continuous competition for improvement, and open information about best-in-class data center facilities.

Fig. 1. Power Use Efficiency (PUE) and Water Use Efficiency (WUE) values for Data Centers Source: LBNL 2024

Recommendation 3. Create Prizes to Maximize Transmission Throughput and Upgrade Grid Infrastructure

FESI could award prizes for rapid deployment of reconductoring, new transmission, or grid enhancing technologies to increase the transmission capacity for any project in DOE’s Coordinated Interagency Authorizations and Permit Program. Similarly, FESI could award prizes for utilities to upgrade local distribution infrastructure beyond the direct needs for the project to reduce future electricity rate cases, which will keep electricity costs affordable for residential customers. The Department of Energy already has authority to finance up to $2.5 billion in the Transmission Facilitation Program, a revolving fund administered by the Grid Deployment Office (GDO) that helps support transmission infrastructure. These funds could be used for public-private partnerships in a national interest electric transmission corridor and necessary to accommodate an increase in electricity demand across more than one state or transmission planning region.

Recommendation 4. Develop Prizes That Reward Flexibility and End-Use Efficiency Investments

Flexibility in how and when data centers use electricity can meaningfully reduce the stress on the grid. FESI should award prizes to data centers that demonstrate best-in-class flexibility through smart controls and operational improvements. Prizes could also be awarded to utilities hosting data centers that reduce summer and winter peak loads in the local service territory. Prizes for utilities that meet home weatherization targets and deploy virtual power plants could help reduce costs and grid stress in local communities hosting AI data centers.

Conclusion

The U.S. is facing the risk of electricity demand outstripping supplies in many parts of the country, which would be severely detrimental to people’s lives, to the economy, to the environment, and to national security. “Smart AI Fast Lanes” is a policy and investment framework that can rapidly increase clean energy supply, infrastructure, and demand management capabilities.

It is imperative that the U.S. addresses the growing demand from AI and data centers, so that the U.S. remains on the cutting edge of innovation in this important sector. How the U.S. approaches and solves the challenge of new demand from AI, is a broader test on how the country prepares its infrastructure for increased electrification of vehicles, buildings, and manufacturing, as well as how the country addresses both carbon pollution and the impacts from climate change. The “Smart AI Fast Lanes” framework and FESI-run prizes will enable U.S. competitiveness in AI, keep energy costs affordable, reduce pollution, and prepare the country for new opportunities.

A Holistic Framework for Measuring and Reporting AI’s Impacts to Build Public Trust and Advance AI

As AI becomes more capable and integrated throughout the United States economy, its growing demand for energy, water, land, and raw materials is driving significant economic and environmental costs, from increased air pollution to higher costs for ratepayers. A recent report projects that data centers could consume up to 12% of U.S. electricity by 2028, underscoring the urgent need to assess the tradeoffs of continued expansion. To craft effective, sustainable resource policies, we need clear standards for estimating the data centers’ true energy needs and for measuring and reporting the specific AI applications driving their resource consumption. Local and state-level bills calling for more oversight of utility rates and impacts to ratepayers have received bipartisan support, and this proposal builds on that momentum.

In this memo, we draw on research proposing a holistic evaluation framework for characterizing AI’s environmental impacts, which establishes three categories of impacts arising from AI: (1) Computing-related impacts; (2) Immediate application impacts; and (3) System-level impacts . Concerns around AI’s computing-related impacts, e.g. energy and water use due to AI data centers and hardware manufacturing, have become widely known with corresponding policy starting to be put into place. However, AI’s immediate application and system-level impacts, which arise from the specific use cases to which AI is applied, and the broader socio-economic shifts resulting from its use, remain poorly understood, despite their greater potential for societal benefit or harm.

To ensure that policymakers have full visibility into the full range of AI’s environmental impacts we recommend that the National Institute of Standards and Technology (NIST) oversee creation of frameworks to measure the full range of AI’s impacts. Frameworks should rely on quantitative measurements of the computing and application related impacts of AI and qualitative data based on engagements with the stakeholders most affected by the construction of data centers. NIST should produce these frameworks based on convenings that include academic researchers, corporate governance personnel, developers, utility companies, vendors, and data center owners in addition to civil society organizations. Participatory workshops will yield new guidelines, tools, methods, protocols and best practices to facilitate the evolution of industry standards for the measurement of the social costs of AI’s energy infrastructures.

Challenge and Opportunity

Resource consumption associated with AI infrastructures is expanding quickly, and this has negative impacts, including asthma from air pollution associated with diesel backup generators, noise pollution, light pollution, excessive water and land use, and financial impacts to ratepayers. A lack of transparency regarding these outcomes and public participation to minimize these risks losing the public’s trust, which in turn will inhibit the beneficial uses of AI. While there is a huge amount of capital expenditure and a massive forecasted growth in power consumption, there remains a lack of transparency and scientific consensus around the measurement of AI’s environmental impacts with respect to data centers and their related negative externalities.

A holistic evaluation framework for assessing AI’s broader impacts requires empirical evidence, both qualitative and quantitative, to influence future policy decisions and establish more responsible, strategic technology development. Focusing narrowly on carbon emissions or energy consumption arising from AI’s computing related impacts is not sufficient. Measuring AI’s application and system-level impacts will help policymakers consider multiple data streams, including electricity transmission, water systems and land use in tandem with downstream economic and health impacts.

Regulatory and technical attempts so far to develop scientific consensus and international standards around the measurement of AI’s environmental impacts have focused on documenting AI’s computing-related impacts, such as energy use, water consumption, and carbon emissions required to build and use AI. Measuring and mitigating AI’s computing-related impacts is necessary, and has received attention from policymakers (e.g. the introduction of the AI Environmental Impacts Act of 2024 in the U.S., provisions for environmental impacts of general-purpose AI in the EU AI Act, and data center sustainability targets in the German Energy Efficiency Act). However, research by Kaack et al (2022) highlights that impacts extend beyond computing. AI’s application impacts, which arise from the specific use cases for which AI is deployed (e.g. AI’s enabled emissions, such as application of AI to oil and gas drilling have much greater potential scope for positive or negative impacts compared to AI’s computing impacts alone, depending on how AI is used in practice). Finally, AI’s system-level impacts, which include even broader, cascading social and economic impacts associated with AI energy infrastructures, such as increased pressure on local utility infrastructure leading to increased costs to ratepayers, or health impacts to local communities due to increased air pollution, have the greatest potential for positive or negative impacts, while being the most challenging to measure and predict. See Figure 1 for an overview.

Figure 1. Framework for assessing the impacts of AI

from Kaack et al. (2022). Effectively understanding and shaping AI’s impacts will require going beyond impacts arising from computing alone, and requires consideration and measurement of impacts arising from AI’s uses (e.g. in optimizing power systems or agriculture) and how AI’s deployment throughout the economy leads to broader systemic shifts, such as changes in consumer behavior.

Effective policy recommendations require more standardized measurement practices, a point raised by the Government Accountability Office’s recent report on AI’s human and environmental effects, which explicitly calls for increasing corporate transparency and innovation around technical methods for improved data collection and reporting. But data should also include multi-stakeholder engagement to ensure there are more holistic evaluation frameworks that meet the needs of specific localities, including state and local government officials, businesses, utilities, and ratepayers. Furthermore, while states and municipalities are creating bills calling for more data transparency and responsibility, including in California, Indiana, Oregon, and Virginia, the lack of federal policy means that data center owners may move their operations to states that have fewer protections in place and similar levels of existing energy and data transmission infrastructure.

States are also grappling with the potential economic costs of data center expansion. Ohio’s Policy Matters found that tax breaks for data center owners are hurting tax revenue streams that should be used to fund public services. In Michigan, tax breaks for data centers are increasing the cost of water and power for the public while undermining the state’s climate goals. Some Georgia Republicans have stated that data center companies should “pay their way.” While there are arguments that data centers can provide useful infrastructure, connectivity, and even revenue for localities, a recent report shows that at least ten states each lost over $100 million a year in revenue to data centers because of tax breaks. The federal government can help create standards that allow stakeholders to balance the potential costs and benefits of data centers and related energy infrastructures. We now have an urgent need to increase transparency and accountability through multi-stakeholder engagement, maximizing economic benefits while reducing waste.

Despite the high economic and policy stakes, critical data needed to assess the full impacts—both costs and benefits—of AI and data center expansion remains fragmented, inconsistent, or entirely unavailable. For example, researchers have found that state-level subsidies for data center expansion may have negative impacts on state and local budgets, but this data has not been collected and analyzed across states because not all states publicly release data about data center subsidies. Other impacts, such as the use of agricultural land or public parks for transmission lines and data center siting, must be studied at a local and state level, and the various social repercussions require engagement with the communities who are likely to be affected. Similarly, estimates on the economic upsides of AI vary widely, e.g. the estimated increase in U.S. labor productivity due to AI adoption ranges from 0.9% to 15% due in large part to lack of relevant data on AI uses and their economic outcomes, which can be used to inform modeling assumptions.

Data centers are highly geographically clustered in the United States, more so than other industrial facilities such as steel plants, coal mines, factories, and power plants (Fig. 4.12, IEA World Energy Outlook 2024). This means that certain states and counties are experiencing disproportionate burdens associated with data center expansion. These burdens have led to calls for data center moratoriums or for the cessation of other energy development, including in states like Indiana. Improved measurement and transparency can help planners avoid overly burdensome concentrations of data center infrastructure, reducing local opposition.

With a rush to build new data center infrastructure, states and localities must also face another concern: overbuilding. For example, Microsoft recently put a hold on parts of its data center contract in Wisconsin and paused another in central Ohio, along with contracts in several other locations across the United States and internationally. These situations often stem from inaccurate demand forecasting, prompting utilities to undertake costly planning and infrastructure development that ultimately goes unused. With better measurement and transparency, policymakers will have more tools to prepare for future demands, avoiding the negative social and economic impacts of infrastructure projects that are started but never completed.

While there have been significant developments in measuring the direct, computing-related impacts of AI data centers, public participation is needed to fully capture many of their indirect impacts. Data centers can be constructed so they are more beneficial to communities while mitigating their negative impacts, e.g. by recycling data center heat, and they can also be constructed to be more flexible by not using grid power during peak times. However, this requires collaborative innovation and cross-sector translation, informed by relevant data.

Plan of Action

Recommendation 1. Develop a database of AI uses and framework for reporting AI’s immediate applications in order to understand the drivers of environmental impacts.

The first step towards informed decision-making around AI’s social and environmental impacts is understanding what AI applications are actually driving data center resource consumption. This will allow specific deployments of AI systems to be linked upstream to compute-related impacts arising from their resource intensity, and downstream to impacts arising from their application, enabling estimation of immediate application impacts.

The AI company Anthropic demonstrated a proof-of-concept categorizing queries to their Claude language model under the O*NET database of occupations. However, O*NET was developed in order to categorize job types and tasks with respect to human workers, which does not exactly align with current and potential uses of AI. To address this, we recommend that NIST works with relevant collaborators such as the U.S. Department of Labor (responsible for developing and maintaining the O*NET database) to develop a database of AI uses and applications, similar to and building off of O*NET, along with guidelines and infrastructure for reporting data center resource consumption corresponding to those uses. This data could then be used to understand particular AI tasks that are key drivers of resource consumption.

Any entity deploying a public-facing AI model (that is, one that can produce outputs and/or receive inputs from outside its local network) should be able to easily document and report its use case(s) within the NIST framework. A centralized database will allow for collation of relevant data across multiple stakeholders including government entities, private firms, and nonprofit organizations.

Gathering data of this nature may require the reporting entity to perform analyses of sensitive user data, such as categorizing individual user queries to an AI model. However, data is to be reported in aggregate percentages with respect to use categories without attribution to or listing of individual users or queries. This type of analysis and data reporting is well within the scope of existing, commonplace data analysis practices. As with existing AI products that rely on such analyses, reporting entities are responsible for performing that analysis in a way that appropriately safeguards user privacy and data protection in accordance with existing regulations and norms.

Recommendation 2. NIST should create an independent consortium to develop a system-level evaluation framework for AI’s environmental impacts, while embedding robust public participation in every stage of the work.

Currently, the social costs of AI’s system-level impacts—the broader social and economic implications arising from AI’s development and deployment—are not being measured or reported in any systematic way. These impacts fall heaviest on the local communities that host the data centers powering AI: the financial burden on ratepayers who share utility infrastructure, the health effects of pollutants from backup generators, the water and land consumed by new facilities, and the wider economic costs or benefits of data-center siting. Without transparent metrics and genuine community input, policymakers cannot balance the benefits of AI innovation against its local and regional burdens. Building public trust through public participation is key when it comes to ensuring United States energy dominance and national security interests in AI innovation, themes emphasized in policy documents produced by the first and second Trump administrations.

To develop evaluation frameworks in a way that is both scientifically rigorous and broadly trusted, NIST should stand up an independent consortium via a Cooperative Research and Development Agreement (CRADA). A CRADA allows NIST to collaborate rapidly with non-federal partners while remaining outside the scope of the Federal Advisory Committee Act (FACA), and has been used, for example, to convene the NIST AI Safety Institute Consortium. Membership will include academic researchers, utility companies and grid operators, data-center owners and vendors, state, local, Tribal, and territorial officials, technologists, civil-society organizations, and frontline community groups.

To ensure robust public engagement, the consortium should consult closely with FERC’s Office of Public Participation (OPP)—drawing on OPP’s expertise in plain-language outreach and community listening sessions—and with other federal entities that have deep experience in community engagement on energy and environmental issues. Drawing on these partners’ methods, the consortium will convene participatory workshops and listening sessions in regions with high data-center concentration—Northern Virginia, Silicon Valley, Eastern Oregon, and the Dallas–Fort Worth metroplex—while also making use of online comment portals to gather nationwide feedback.

Guided by the insights from these engagements, the consortium will produce a comprehensive evaluation framework that captures metrics falling outside the scope of direct emissions alone. These system-level metrics could encompass (1) the number, type, and duration of jobs created; (2) the effects of tax subsidies on local economies and public services; (3) the placement of transmission lines and associated repercussions for housing, public parks, and agriculture; (4) the use of eminent domain for data-center construction; (5) water-use intensity and competing local demands; and (6) public-health impacts from air, light, and noise pollution. NIST will integrate these metrics into standardized benchmarks and guidance.

Consortium members will attend public meetings, engage directly with community organizations, deliver accessible presentations, and create plain-language explainers so that non-experts can meaningfully influence the framework’s design and application. The group will also develop new guidelines, tools, methods, protocols, and best practices to facilitate industry uptake and to evolve measurement standards as technology and infrastructure grow.

We estimate a cost of approximately $5 million over two years to complete the work outlined in recommendation 1 and 2, covering staff time, travel to at least twelve data-center or energy-infrastructure sites across the United States, participant honoraria, and research materials.

Recommendation 3. Mandate regular measurement and reporting on relevant metrics by data center operators.

Voluntary reporting is the status quo, via e.g. corporate Environmental, Social, and Governance (ESG) reports, but voluntary reporting has so far been insufficient for gathering necessary data. For example, while the technology firm OpenAI, best known for their highly popular ChatGPT generative AI model, holds a significant share of the search market and likely corresponding share of environmental and social impacts arising from the data centers powering their products, OpenAI chooses not to publish ESG reports or data in any other format regarding their energy consumption or greenhouse gas (GHG) emissions. In order to collect sufficient data at the appropriate level of detail, reporting must be mandated at the local, state, or federal level. At the state level, California’s Climate Corporate Data Accountability Act (SB -253, SB-219) requires that large companies operating within the state report their GHG emissions in accordance with the GHG Protocol, administered by the California Air Resources Board (CARB).

At the federal level, the EU’s Corporate Sustainable Reporting Directive (CSRD), which requires firms operating within the EU to report a wide variety of data related to environmental sustainability and social governance, could serve as a model for regulating companies operating within the U.S. The Environmental Protection Agency’s (EPA) GHG Reporting Program already requires emissions reporting by operators and suppliers associated with large GHG emissions sources, and the Energy Information Administration (EIA) collects detailed data on electricity generation and fuel consumption through forms 860 and  923. With respect to data centers specifically, the Department of Energy (DOE) could require that developers who are granted rights to build AI data center infrastructure on public lands perform the relevant measurement and reporting, and more broadly reporting could be a requirement to qualify for any local, state or federal funding or assistance provided to support buildout of U.S. AI infrastructure.

Recommendation 4. Incorporate measurements of social cost into AI energy and infrastructure forecasting and planning.

There is a huge range in estimates of future data center energy use, largely driven by uncertainty around the nature of demands from AI. This uncertainty stems in part from a lack of historical and current data on which AI use cases are most energy intensive and how those workloads are evolving over time. It also remains unclear the extent to which challenges in bringing new resources online, such as hardware production limits or bottlenecks in permitting, will influence growth rates. These uncertainties are even more significant when it comes to the holistic impacts (i.e. those beyond direct energy consumption) described above, making it challenging to balance costs and benefits when planning future demands from AI.

To address these issues, accurate forecasting of demand for energy, water, and other limited resources must incorporate data gathered through holistic measurement frameworks described above. Further, the forecasting of broader system-level impacts must be incorporated into decision-making around investment in AI infrastructure. Forecasting needs to go beyond just energy use. Models should include predicting energy and related infrastructure needs for transmission, the social cost of carbon in terms of pollution, the effects to ratepayers, and the energy demands from chip production.

We recommend that agencies already responsible for energy-demand forecasting—such as the Energy Information Administration at the Department of Energy—integrate, in line with the NIST frameworks developed above, data on the AI workloads driving data-center electricity use into their forecasting models. Agencies specializing in social impacts, such as the Department of Health and Human Services in the case of health impacts, should model social impacts and communicate those to EIA and DOE for planning purposes. In parallel, the Federal Energy Regulatory Commission (FERC) should update its new rule on long-term regional transmission planning, to explicitly include consideration of the social costs corresponding to energy supply, demand and infrastructure retirement/buildout across different scenarios.

Recommendation 5. Transparently use federal, state, and local incentive programs to reward data-center projects that deliver concrete community benefits.

Incentive programs should attach holistic estimates of the costs and benefits collected under the frameworks above, and not purely based on promises. When considering using incentive programs, policymakers should ask questions such as: How many jobs are created by data centers and for how long do those jobs exist, and do they create jobs for local residents? What tax revenue for municipalities or states is created by data centers versus what subsidies are data center owners receiving? What are the social impacts of using agricultural land or public parks for data center construction or transmission lines? What are the impacts to air quality and other public health issues? Do data centers deliver benefits like load flexibility and sharing of waste heat?

Grid operators (Regional Transmission Organizations [RTOs] and Independent System Operators [ISOs]) can leverage interconnection queues to incentivize data center operators to justify that they have sufficiently considered the impacts to local communities when proposing a new site. FERC recently approved reforms to processing the interconnect request queue, allowing RTOs to implement a “first-ready first-served” approach rather than a first-come first-served approach, wherein proposed projects can be fast-tracked based on their readiness. A similar approach could be used by RTOs to fast-track proposals that include a clear plan for how they will benefit local communities (e.g. through load flexibility, heat reuse, and clean energy commitments), grounded in careful impact assessment.

There is the possibility of introducing state-level incentives in states with existing significant infrastructure. Such incentives could be determined in collaboration with the National Governors Association, who have been balancing AI-driven energy needs with state climate goals.

Conclusion

Data centers have an undeniable impact on energy infrastructures and the communities living close to them. This impact will continue to grow alongside AI infrastructure investment, which is expected to skyrocket. It is possible to shape a future where AI infrastructure can be developed sustainably, and in a way that responds to the needs of local communities. But more work is needed to collect the necessary data to inform government decision-making. We have described a framework for holistically evaluating the potential costs and benefits of AI data centers, and shaping AI infrastructure buildout based on those tradeoffs. This framework includes: establishing standards for measuring and reporting AI’s impacts, eliciting public participation from impacted communities, and putting gathered data into action to enable sustainable AI development.

Frequently Asked Questions

If regulations happen at the state level, will investment just move to less regulated states?

Data centers are highly spatially concentrated largely due to reliance on existing energy and data transmission infrastructure; it is more cost-effective to continue building where infrastructure already exists, rather than starting fresh in a new region. As long as the cost of performing the proposed impact assessment and reporting in established regions is less than that of the additional overhead of moving to a new region, data center operators are likely to comply with regulations in order to stay in regions where the sector is established.

Spatial concentration of data centers also arises due to the need for data center workloads with high data transmission requirements, such as media streaming and online gaming, to have close physical proximity to users in order to reduce data transmission latency. In order for AI to be integrated into these realtime services, data center operators will continue to need presence in existing geographic regions, barring significant advances in data transmission efficiency and infrastructure.

Are these policies bad for economic growth? What about national security?

bad for national security and economic growth. So is infrastructure growth that harms the local communities in which it occurs.

Researchers from Good Jobs First have found that many states are in fact losing tax revenue to data center expansion: “At least 10 states already lose more than $100 million per year in tax revenue to data centers…” More data is needed to determine if data center construction projects coupled with tax incentives are economically advantageous investments on the parts of local and state governments.

What about recent efforts to build AI data centers on public lands?

The DOE is opening up federal lands in 16 locations to data center construction projects in the name of strengthening America’s energy dominance and ensuring America’s role in AI innovation. But national security concerns around data center expansion should also consider the impacts to communities who live close to data centers and related infrastructures.

Data centers themselves do not automatically ensure greater national security, especially because the critical minerals and hardware components of data centers depend on international trade and manufacturing. At present, the United States is not equipped to contribute the critical minerals and other materials needed to produce data centers, including GPUs and other components.

If every state or locality has unique infrastructures and energy needs, then what is the point of a federal policy?

Federal policy ensures that states or counties do not become overburdened by data center growth and will help different regions benefit from the potential economic and social rewards of data center construction.

Developing federal standards around transparency helps individual states plan for data center construction, allowing for a high-level, comparative look at the energy demand associated with specific AI use cases. It is also important for there to be a federal intervention because data centers in one state might have transmission lines running through a neighboring state, and resultant outcomes across jurisdictions. There is a need for a national-level standard.

How will you weigh costs and benefits? What forms of data will you be collecting?

Current cost-benefit estimates can often be extremely challenging. For example, while municipalities often expect there will be economic benefits attached to data centers and that data center construction will yield more jobs in the area, subsidies and short-term jobs in construction do not necessarily translate into economic gains.

To improve the ability of decision makers to do quality cost-benefit analysis, the independent consortium described in Recommendation 2 will examine both qualitative and quantitative data, including permitting histories, transmission plans, land use and eminent domain cases, subsidies, jobs numbers, and health or quality of life impacts in various sites over time. NIST will help develop standards in accordance with this data collection, which can then be used in future planning processes.

How could these changes benefit AI data centers?

Further, there is customer interest in knowing their AI is being sourced from firms implementing sustainable and socially responsible practices. These efforts which can be used in marketing communications and reported as a socially and environmentally responsible practice in ESG reports. This serves as an additional incentive for some data center operators to participate in voluntary reporting and maintain operations in locations with increased regulation.

Advance AI with Cleaner Air and Healthier Outcomes

Artificial intelligence (AI) is transforming industries, driving innovation, and tackling some of the world’s most pressing challenges. Yet while AI has tremendous potential to advance public health, such as supporting epidemiological research and optimizing healthcare resource allocation, the public health burden of AI due to its contribution to air pollutant emissions has been under-examined. Energy-intensive data centers, often paired with diesel backup generators, are rapidly expanding and degrading air quality through emissions of air pollutants. These emissions exacerbate or cause various adverse health outcomes, from asthma to heart attacks and lung cancer, especially among young children and the elderly. Without sufficient clean and stable energy sources, the annual public health burden from data centers in the United States is projected to reach up to $20 billion by 2030, with households in some communities located near power plants supplying data centers, such as those in Mason County, WV, facing over 200 times greater burdens than others.

Federal, state, and local policymakers should act to accelerate the adoption of cleaner and more stable energy sources and address AI’s expansion that aligns innovation with human well-being, advancing the United States’ leadership in AI while ensuring clean air and healthy communities.

Challenge and Opportunity

Forty-six percent of people in the United States breathe unhealthy levels of air pollution. Ambient air pollution, especially fine particulate matter (PM_2.5), is linked to 200,000 deaths each year in the United States. Poor air quality remains the nation’s fifth highest mortality risk factor, resulting in a wide range of immediate and severe health issues that include respiratory diseases, cardiovascular conditions, and premature deaths.

Data centers consume vast amounts of electricity to power and cool the servers running AI models and other computing workloads. According to the Lawrence Berkeley National Laboratory, the growing demand for AI is projected to increase the data centers’ share of the nation’s total electricity consumption to as much as 12% by 2028, up from 4.4% in 2023. Without enough sustainable energy sources like nuclear power, the rapid growth of energy-intensive data centers is likely to exacerbate ambient air pollution and its associated public health impacts.

Data centers typically rely on diesel backup generators for uninterrupted operation during power outages. While the total operation time for routine maintenance of backup generators is limited, these generators can create short-term spikes in PM_2.5, NO_x, and SO₂ that go beyond the baseline environmental and health impacts associated with data center electricity consumption. For example, diesel generators emit 200–600 times more NO_x than natural gas-fired power plants per unit of electricity produced. Even brief exposure to high-level NO_x can aggravate respiratory symptoms and hospitalizations. A recent report to the Governor and General Assembly of Virginia found that backup generators at data centers emitted approximately 7% of the total permitted pollution levels for these generators in 2023. Based on the Environmental Protection Agency’s COBRA modeling tool, the public health cost of these emissions in Virginia is estimated at approximately $200 million, with health impacts extending to neighboring states and reaching as far as Florida. In Memphis, Tennessee, a set of temporary gas turbines powering a large AI data center, which has not undergone a complete permitting process, is estimated to emit up to 2,000 tons of NO_x annually. This has raised significant health concerns among local residents and could result in a total public health burden of $160 million annually. These public health concerns coincide with a paradigm shift that favors dirty energy and potentially delays sustainability goals.

In 2023 alone, air pollution attributed to data centers in the United States resulted in an estimated $5 billion in health-related damages, a figure projected to rise up to $20 billion annually by 2030. This projected cost reflects an estimated 1,300 premature deaths in the United States per year by the end of the decade. While communities near data centers and power plants bear the greatest burden, with some households facing over 200 times greater impacts than others, the health impacts of these facilities extend to communities across the nation. The widespread health impacts of data centers further compound the already uneven distribution of environmental costs and water resource stresses imposed by AI data centers across the country.

While essential for mitigating air pollution and public health risks, transitioning AI data centers to cleaner backup fuels and stable energy sources such as nuclear power presents significant implementation hurdles, including lengthy permitting processes. Clean backup generators that match the reliability of diesel remain limited in real-world applications, and multiple key issues must be addressed to fully transition to cleaner and more stable energy.

While it is clear that data centers pose public health risks, comprehensive evaluations of data center air pollution and related public health impacts are essential to grasp the full extent of the harms these centers pose, yet often remain absent from current practices. Washington State conducted a health risk assessment of diesel particulate pollution from multiple data centers in the Quincy area in 2020. However, most states lack similar evaluations for either existing or newly proposed data centers. To safeguard public health, it is essential to establish transparency frameworks, reporting standards, and compliance requirements for data centers, enabling the assessment of PM2.5, NOₓ, SO₂, and other harmful air pollutants, as well as their short- and long-term health impacts. These mechanisms would also equip state and local governments to make informed decisions about where to site AI data center facilities, balancing technological progress with the protection of community health nationwide.

Finally, limited public awareness, insufficient educational outreach, and a lack of comprehensive decision-making processes further obscure the potential health risks data centers pose to public health. Without robust transparency and community engagement mechanisms, communities housing data center facilities are left with little influence or recourse over developments that may significantly affect their health and environment.

Plan of Action

The United States can build AI systems that not only drive innovation but also promote human well-being, delivering lasting health benefits for generations to come. Federal, state, and local policymakers should adopt a multi-pronged approach to address data center expansion with minimal air pollution and public health impacts, as outlined below.

Federal-level Action

Federal agencies play a crucial role in establishing national standards, coordinating cross-state efforts, and leveraging federal resources to model responsible public health stewardship.

Recommendation 1. Incorporate Public Health Benefits to Accelerate Clean and Stable Energy Adoption for AI Data Centers

Congress should direct relevant federal agencies, including the Department of Energy (DOE), the Nuclear Regulatory Commission (NRC), and the Environmental Protection Agency (EPA), to integrate air pollution reduction and the associated public health benefits into efforts to streamline the permitting process for more sustainable energy sources, such as nuclear power, for AI data centers. Simultaneously, federal resources should be expanded to support research, development, and pilot deployment of alternative low-emission fuels for backup generators while ensuring high reliability.

Public Health Benefit Quantification. Direct the EPA, in coordination with DOE and public health agencies, to develop standardized methods for estimating the public health benefits (e.g., avoided premature deaths, hospital visits, and economic burden) of using cleaner and more stable energy sources for AI data centers. Require lifecycle emissions modeling of energy sources and translate avoided emissions into quantitative health benefits using established tools such as the EPA’s BenMAP. This should:
- Include modeling of air pollution exposure and health outcomes (e.g., using tools like EPA’s COBRA)
- Incorporate cumulative risks from regional electricity generation and local backup generator emissions
- Account for spatial disparities and vulnerable populations (e.g., children, the elderly, and disadvantaged communities)
- Evaluate both short-term (e.g., generator spikes) and long-term (e.g., chronic exposure) health impacts
Preferential Permitting. Instruct the DOE to prioritize and streamline permitting for cleaner energy projects (e.g., small modular reactors, advanced geothermal) that demonstrate significant air pollution reduction and health benefits in supporting AI data center infrastructures. Develop a Clean AI Permitting Framework that allows project applicants to submit health benefit assessments as part of the permitting package to justify accelerated review timelines.
Support for Cleaner Backup Systems. Expand DOE and EPA R&D programs to support pilot projects and commercialization pathways for alternative backup generator technologies, including hydrogen combustion systems and long-duration battery storage. Provide tax credits or grants for early adopters of non-diesel backup technologies in AI-related data center facilities.
Federal Guidance & Training. Provide technical assistance to state and local agencies to implement the protocol, and fund capacity-building efforts in environmental health departments.

Recommendation 2. Establish a Standardized Emissions Reporting Framework for AI Data Centers

Congress should direct the EPA, in coordination with the National Institute of Standards and Technology (NIST), to develop and implement a standardized reporting framework requiring data centers to publicly disclose their emissions of air pollutants, including PM₂_.₅, NOₓ, SO₂, and other hazardous air pollutants associated with backup generators and electricity use.

Multi-Stakeholder Working Group. Task EPA with convening a multi-stakeholder working group, including representatives from NIST, DOE, state regulators, industry, and public health experts, to define the scope, metrics, and methodologies for emissions reporting.
Standardization. Develop a federal technical standard that specifies:
- Types of air pollutants that should be reported
- Frequency of reporting (e.g., quarterly or annually)
- Facility-specific disclosures (including generator use and power source profiles)
- Geographic resolution of emissions data
- Public access and data transparency protocols

State-level Action

Recommendation 1. State environmental and public health departments should conduct a health impact assessment (HIA) before and after data center construction to evaluate discrepancies between anticipated and actual health impacts for existing and planned data center operations. To maintain and build trust, HIA findings, methodologies, and limitations should be publicly available and accessible to non-technical audiences (including policymakers, local health departments, and community leaders representing impacted residents), thereby enhancing community-informed action and participation. Reports should focus on the disparate impact between rural and urban communities, with particular attention to overburdened communities that have under-resourced health infrastructure. In addition, states should coordinate HIA and share findings to address cross-boundary pollution risks. This includes accounting for nearby communities across state lines, considering that jurisdictional borders should not constrain public health impacts and analysis.

Recommendation 2. State public health departments should establish a state-funded program that offers community education forums for affected residents to express their concerns about how data centers impact them. These programs should emphasize leading outreach, engaging communities, and contributing to qualitative analysis for HIAs. Health impact assessments should be used as a basis for informed community engagement.

Recommendation 3. States should incorporate air pollutant emissions related to data centers into their implementation of the National Ambient Air Quality Standards (NAAQS) and the development of State Implementation Plans (SIPs). This ensures that affected areas can meet standards and maintain their attainment statuses. To support this, states should evaluate the adequacy of existing regulatory monitors in capturing emissions related to data centers and determine whether additional monitoring infrastructure is required.

Local-level Action

Recommendation 1. Local governments should revise zoning regulations to include stricter and more explicit health-based protections to prevent data center clustering in already overburdened communities. Additionally, zoning ordinances should address colocation factors and evaluate potential cumulative health impacts. A prominent example is Fairfax County, Virginia, which updated its zoning ordinance in September 2024 to regulate the proximity of data centers to residential areas, require noise pollution studies prior to construction, and establish size thresholds. These updates were shaped through community engagement and input.

Recommendation 2. Local governments should appoint public health experts to the zoning boards to ensure data center placement decisions reflect community health priorities, thereby increasing public health expert representation on zoning boards.

Conclusion

While AI can revolutionize industries and improve lives, its energy-intensive nature is also degrading air quality through emissions of air pollutants. To mitigate AI’s growing air pollution and public health risks, a comprehensive assessment of AI’s health impact and transitioning AI data centers to cleaner backup fuels and stable energy sources, such as nuclear power, are essential. By adopting more informed and cleaner AI strategies at the federal and state levels, policymakers can mitigate these harms, promote healthier communities, and ensure AI’s expansion aligns with clean air priorities.

Federation of American Scientists Statement on the Preemption of State AI Regulation in the One Big Beautiful Bill Act

As the Senate prepares to vote on a provision in the One Big Beautiful Bill Act, which would condition Broadband Equity, Access, and Deployment (BEAD) Program funding on states ceasing enforcement of their AI laws (SEC.0012 Support for Artificial Intelligence Under the Broadband Equity, Access, and Deployment Program), the Federation of American Scientists urges Congress to oppose this measure. This approach threatens to compromise public trust and responsible innovation at a moment of rapid technological change.

The Trump Administration has repeatedly emphasized that public trust is essential to fostering American innovation and global leadership in AI. That trust depends on clear, reasonable guardrails, especially as AI systems are increasingly deployed in high-stakes areas like education, health, employment, and public services. Moreover, the advancement of frontier AI systems is staggering. The capabilities, risks, and use cases of general-purpose models are predicted to evolve dramatically over the next decade. In such a landscape, we require governance structures that are adaptive, multi-layered, and capable of responding in real-time.

While a well-crafted federal framework may ultimately be the right path forward, preempting all state regulation in the absence of federal action would leave a dangerous vacuum, further undermining public confidence in these technologies. According to Pew Research, American concerns about AI are growing, and a majority of US adults and AI experts worry that governments will not go far enough to regulate AI.

State governments have long served as laboratories of democracy, testing policies, implementation strategies, and ways to adapt to local needs. Tying essential broadband infrastructure funding to the repeal of sensible, forward-looking laws would cut off states’ ability to meet the demands of AI evolution in the absence of federal guidance.

We urge lawmakers to protect both innovation and accountability by rejecting this provision. Conditioning BEAD Funding on halting AI regulation sends the wrong message. AI progress does not need to come at the cost of responsible oversight.

Unlocking AI’s Grid Modernization Potential

Surging energy demand and increasingly frequent extreme weather events are bringing new challenges to the forefront of electric grid planning, permitting, operations, and resilience. These hurdles are pushing our already fragile grid to the limit, highlighting decades of underinvestment, stagnant growth, and the pressing need to modernize our system.

While these challenges aren’t new, they are newly urgent. The society-wide emergence of artificial intelligence (AI) is bringing many of these challenges into sharper focus, pushing the already increasing electricity demand to new heights and cementing the need for deployable, scalable, and impactful solutions. Fortunately, many transformational and mature AI tools provide near-term pathways for significant grid modernization.

This policy memo builds on foundational research from the US Department of Energy’s (DOE) AI for Energy (2024) report to present a new matrix that maps these unique AI applications onto an “impact-readiness” scale. Nearly half of the applications identified by DOE are high impact and ready to deploy today. An additional ~40% have high impact potential but require further investment and research to move up the readiness scale. Only 2 of 14 use cases analyzed here fall into the “low-impact / low-readiness” quadrant.

Unlike other emerging technologies, AI’s potential in grid modernization is not simply an R&D story, but a deployment one. However, with limited resources, the federal government should invest in use cases that show high-impact potential and demonstrate feasible levels of deployment readiness. The recommendations in this memo target regulatory actions across the Federal Energy Regulatory Commission (FERC) and the Department of Energy (DOE), data modernization programs at the Federal Permitting Improvement Steering Council (FPISC), and funding opportunities and pilot projects at and the DOE and the Federal Emergency Management Agency (FEMA).

Thoughtful policy coordination, targeted investments, and continued federal support will be needed to realize the potential of these applications and pave the way for further innovation.

Challenge and Opportunity

Surging Load Growth, Extreme Events, and a Fragmented Federal Response

Surging energy demand and more frequent extreme weather events are bringing new challenges to the forefront of grid planning and operations. Not only is electric load growing at rates not seen in decades, but extreme weather events and cybersecurity threats are becoming more common and costly. All the while, our grid is becoming more complex to operate as new sources of generation and grid management tools evolve. Underlying these complexities is the fragmented nature of our energy system: a patchwork of regional grids, localized standards, and often conflicting regulations.

The emergence of artificial intelligence (AI) has brought many of these challenges into sharper focus. However, the potential of AI to mitigate, sidestep, or solve these challenges is also vast. From more efficient permitting processes to more reliable grid operations, many unique AI use cases for grid modernization are ready to deploy today and have high-impact potential.

The federal government has a unique role to play in both meeting these challenges and catalyzing these opportunities by implementing AI solutions. However, the current federal landscape is fragmented, unaligned, and missing critical opportunities for impact. Nearly a dozen federal agencies and offices are engaged across the AI grid modernization ecosystem (see FAQ #2), with few coordinating in the absence of a defined federal strategy.

To prioritize effective and efficient deployment of resources, recommendations for increased investments (both in time and capital) should be based on a solid understanding of where the gaps and opportunities lie. Historically, program offices across DOE and other agencies have focused efforts on early-stage R&D and foundational science activities for emerging technology. For AI, however, the federal government is well-positioned to support further deployment of the technology into grid modernization efforts, rather than just traditional R&D activities.

AI Applications for Grid Modernization

AI’s potential in grid modernization is significant, expansive, and deployable. Across four distinct categories—grid planning, siting and permitting, operations and reliability, and resilience—AI can improve existing processes or enable entirely new ones. Indeed, the use of AI in the power sector is not a new phenomenon. Industry and government alike have long utilized machine learning (ML) models across a range of power sector applications, and the recent introduction of “foundation” models (such as large language models, or LLMs) has opened up a new suite of transformational use cases. While LLMs and other foundation models can be used in various use cases, AI’s potential to accelerate grid modernization will span both traditional and novel approaches, with many applications requiring custom-built models tailored to specific operational, regulatory, and data environments.

The following 14 use cases are drawn from DOE’s AI for Energy (2024) report and form the foundation of this memo’s analytical framework.

Grid Planning

Capital Allocations and Planned Upgrades. Use AI to optimize utility investment decisions by forecasting asset risk, load growth, and grid needs to guide substation upgrades, reconductoring, or distributed energy resource (DER)-related capacity expansions.
Improved Information on Grid Capacity. Use AI to generate more granular and dynamic hosting capacity, load forecast, and congestion data to guide DER siting, interconnection acceleration, and non-wires alternatives.
Improved Transportation and Energy Planning Alignment. Use AI-enabled joint forecasting tools to align EV infrastructure rollout with utility grid planning by integrating traffic, land use, and load growth data.
Interconnection Issues and Power Systems Models. Use AI-accelerated power flow models and queue screening tools to reduce delays and improve transparency in interconnection studies.

Siting and Permitting

Zoning and Local Permitting Analysis. Use AI to analyze zoning ordinances, land use restrictions, and local permitting codes to identify siting barriers or opportunities earlier in the project development process.
Federal Environmental Review Accelerations. Use AI tools to extract, organize, and summarize unstructured and disparate datasets to support more efficient and consistent reviews.
AI Models to Assist Subject Matter Experts in Reviews. Use AI and document analysis tools to support expert reviewers by checking for completeness, inconsistencies, or precedent in technical applications and environmental documents.

Grid Operations and Reliability

Load and Supply Matching. Use AI to improve short-term load forecasting and optimize generation dispatch, reducing imbalance costs and improving integration of variable resources.
Predictive and Risk-Informed Maintenance. Use AI to predict asset degradation or failure and inform maintenance schedules based on equipment health, environmental stressors, and historical failure data.
Operational Safety and Issues Reporting and Analysis. Apply AI to analyze safety incident logs, compliance records, and operator reports to identify patterns of human error, procedural risks, or training needs.

Grid Resilience

Self-healing Infrastructure for Reliability and Resilience. Use AI to autonomously isolate faults, reconfigure power flows, and restore service in real time through intelligent switching and local control systems.
Detection and Diagnosis of Anomalous Events. Use AI to identify and localize grid disturbances such as faults, voltage anomalies, or cyber intrusions using high-frequency telemetry and system behavior data.
AI-enabled Situational Awareness and Actions for Resilience. Leverage AI to synthesize grid, weather, and asset data to support operator awareness and guide event response during extreme weather or grid stress events.
Resilience with Distributed Energy Resources. Coordinate DERs during grid disruptions using AI for forecasting, dispatch, and microgrid formation, enabling system flexibility and backup power during emergencies.

However, not all applications are created equal. With limited resources, the federal government should prioritize use cases that show high-impact potential and demonstrate feasible levels of deployment readiness. Additional investments should also be allocated to high-impact / low-readiness use cases to help unlock and scale these applications.

Unlocking the potential of these use cases requires a better understanding of which ones hit specific benchmarks. The matrix below provides a framework for thinking through these questions.

Using the use cases identified above, we’ve mapped AI’s applications in grid modernization onto a “readiness-impact” chart based on six unique scoring scales (see appendix for full methodological and scoring breakdown).

Readiness Scale Questions

Technical Readiness. Is the AI solution mature, validated, and performant?
Financial Readiness. Is it cost-effective and fundable (via CapEx, OpEx, or rate recovery)?
Regulatory Readiness. Can it be deployed under existing rules, with institutional buy-in?

Impact Scale Questions

Value. Does this AI solution reduce costs, outages, emissions, or delays in a measurable way?
Leverage. Does it enable or unlock broader grid modernization (e.g., DERs, grid enhancing technologies (GETs), and/or virtual power plant (VPP) integration)?
Fit. Is AI the right or necessary tool to solve this compared to conventional tools (i.e., traditional transmission planning, interconnection study, and/or compliance software)?

Each AI application receives a score of 0-5 in each category, which are then averaged to determine its overall readiness and impact scores. To score each application, a detailed rubric was designed with scoring scales for each of the above-mentioned six categories. Industry examples and experience, existing literature, and outside expert consultation was utilized to then assign scores to each application.

When plotted on a coordinate plane, each application falls into one of four quadrants, helping us easily identify key insights about each use case.

High-Impact / High-Readiness use cases → Deploy now
High-Impact / Low-Readiness → Invest, unlock, and scale
Low-Impact / High-Readiness → Optional pilots, but deprioritize federal effort
Low-Impact / Low-Readiness → Monitor private sector action

Once plotted, we can then identify additional insights, such as where the clustering happens, what barriers are holding back the highest impact applications, and if there are recurring challenges (or opportunities) across the four categories of grid modernization efforts.

Plan of Action

Grid Planning

Average Readiness Score: 2.3 | Average Impact Score: 3.8

AI use cases in grid planning face the highest financial and regulatory hurdles of any category. Reducing these barriers can unlock high-impact potential.
These tools are high-leverage use cases. Getting these deployed unlocks deeper grid modernization activities system-wide, such as grid-enhancing technology (GETs) integration.
While many of these AI tools are technically mature, adoption is not yet mainstream.

Recommendation 1. The Federal Energy Regulatory Commission (FERC) should clarify the regulatory pathway for AI use cases in grid planning.

Regional Transmission Organizations (RTOs), utilities, and Public Utility Commissions (PUCs) require confidence that AI tools are approved and supported before they deploy them at scale. They also need financial clarity on viable pathways to rate-basing significant up-front costs. Building on Commissioner Rosner’s Letters Regarding Interconnection Automation, FERC should establish a FERC-DOE-RTO technical working group on “Next-Gen Planning Tools” that informs FERC-compliant AI-enabled planning, modeling, and reporting standards. Current regulations (and traditional planning approaches) leave uncertainty around the explainability, validation, and auditability of AI-driven tools.

Thus, the working group should identify where AI tools can be incorporated into planning processes without undermining existing reliability, transparency, or stakeholder-participation standards. The group should develop voluntary technical guidance on model validation standards, transparency requirements, and procedural integration to provide a clear pathway for compliant adoption across FERC-regulated jurisdictions.

Siting and Permitting

Average Readiness Score: 2.7 | Average Impact Score: 3.8

Zoning and local permitting tools are promising, but adoption is fragmented across state, local, and regional jurisdictions.
Federal permitting acceleration tools score high on technical readiness but face institutional distrust and a complicated regulatory environment.
In general, tools in this category have high value but limited transferability beyond highly specific scenarios (low leverage). Even if unlocked at scale, they have narrower application potential than other tools analyzed in this memo.

Recommendation 2. The Federal Permitting Improvement Steering Council (FPISC) should establish a federal siting and permitting data modernization initiative.

AI tools can increase speed and consistency in siting and permitting processes by automating the review of complex datasets, but without structured data, standardized workflows, and agency buy-in, their adoption will remain fragmented and niche. Furthermore, most grid infrastructure data (including siting and permitting documentation) is confidential and protected, leading to industry skepticism about the ability of AI to maintain important security measures alongside transparent workflows. To address these concerns, FPISC should launch a coordinated initiative that creates structured templates for federal permitting documents, pilots AI integration at select agencies, and develops a public validation database that allows AI developers to test their models (with anonymous data) against real agency workflows. Having launched a $30 million effort in 2024 to improve IT systems across multiple agencies, FPSIC is well-positioned to take those lessons learned and align deeper AI integration across the federal government’s permitting processes. Coordination with the Council on Environmental Quality (CEQ), which was recently called on to develop a Permitting Technology Action Plan, is also encouraged. Additional Congressional appropriations to FPISC can unlock further innovation.

Operations and Reliability

Average Readiness Score: 3.6 | Average Impact Score: 3.6

Overall, this category has the highest average readiness across technical, financial, and regulatory scales. These use cases are clear “ready-now” wins.
They also have the highest fit component of impact, representing unique opportunities for AI tools to improve on existing systems and processes in ways that traditional tools cannot.

Recommendation 3. Launch an AI Deployment Challenge at DOE to scale high-readiness tools across the sector.

From the SunShot Initiative (2011) through the Energy Storage Grand Challenge (2020) to the Energy Earthshots (2021), DOE has a long history of catalyzing the deployment of new technology in the power sector. A dedicated grand challenge – funded with new Congressional appropriations at the Grid Deployment Office – could deploy matching grants or performance-based incentives to utilities, co-ops, and municipal providers to accelerate adoption of proven AI tools.

Grid Resilience

Average Readiness Score: 3.4 | Average Impact Score: 4.2

As a category, resilience applications have the highest overall impact score, including a perfect value score across all four use cases. There is significant potential in deploying AI tools to solve these challenges.
Alongside operations and reliability use cases, these tools also exhibit the highest technical readiness, demonstrating technical maturity alongside high value potential.
Anomalous events detection is the highest-scoring use case across all 14 applications, on both readiness and impact scales. It’s already been deployed and is ready to scale.

Recommendation 4. DOE, the Federal Emergency Management Agency (FEMA), and FERC should create an AI for Resilience Program that funds and validates AI tools that support cross-jurisdictional grid resilience.

AI for resilience applications often require coordination across traditional system boundaries, from utilities to DERs, microgrids to emergency managers, as well as high levels of institutional trust. Federal coordination can catalyze system integration by funding demo projects, developing integration playbooks, and clarifying regulatory pathways for AI-automated resilience actions.

Congress should direct DOE and FEMA, in consultation with FERC, to establish a new program (or carve out existing grid resilience funds) to: (1) support demonstration projects where AI tools are already being deployed during real-world resilience events; (2) develop standardized playbooks for integrating AI into utility and emergency management operations; and (3) clarify regulatory pathways for actions like DER islanding, fault rerouting, and AI-assisted load restoration.

Conclusion

Managing surging electric load growth while improving the grid’s ability to weather more frequent and extreme events is a once-in-a-generation challenge. Fortunately, new technological innovations combined with a thoughtful approach from the federal government can actualize the potential of AI and unlock a new set of solutions, ready for this era.

Rather than technological limitations, many of the outstanding roadblocks identified here are institutional and operational, highlighting the need for better federal coordination and regulatory clarity. The readiness-impact framework detailed in this memo provides a new way to understand these challenges while laying the groundwork for a timely and topical plan of action.

By identifying which AI use cases are ready to scale today and which require targeted policy support, this framework can help federal agencies, regulators, and legislators prioritize high-impact actions. Strategic investments, regulatory clarity, and collaborative initiatives can accelerate the deployment of proven solutions while innovating and building trust in new ones. By pulling on the right policy levers, AI can improve grid planning, streamline permitting, enhance reliability, and make the grid more resilient, meeting this moment with both urgency and precision.

Frequently Asked Questions

How are scores tabulated? What methods underpin this analysis?

Scoring categories (readiness & impact) were selected based on the literature of challenges to AI deployment in the power sector. An LLM (OpenAI’s GPT-4o model) was utilized to refine the 0-5 scoring scale after careful consideration of the multi-dimensional challenges across each category, based on the author’s personal industry experience and additional consultation with outside technical experts. Where applicable, existing frameworks underpin the scales used in this memo: technology readiness levels for the ‘technical readiness category’ and adoption readiness levels for the ‘financial’ and ‘regulatory’ readiness categories. A rubric was then designed to guide scoring.

Each of the 14 AI applications were then scored against that rubric based on the author’s analysis of existing literature, industry examples, and professional experience. Outside experts were consulted and provided additional feedback and insights throughout the process.

What federal agencies, offices, and programs are currently engaged in AI applications to support grid modernization efforts?

Below is a comprehensive, though not exhaustive, list of the key Executive Branch actors involved in AI-driven grid modernization efforts. A detailed overview of the various roles, authorities, and ongoing efforts can be found here.

Executive Office of the President (Office of Science and Technology Policy (OSTP), Council on Environmental Quality (CEQ)); Department of Commerce (National Institute of Standards and Technology (NIST)); Department of Defense (Energy, Installations, and Environment (EI&E), Defense Advanced Research projects Agency (DARPA)); Department of Energy (Advanced Research Projects Agency-Energy (ARPA-E), Energy Efficiency and Renewable Energy (EERE), Grid Deployment Office (GDO), Office of Critical and Emerging Technologies (CET), Office of Cybersecurity, Energy Security, and Emergency Response (CESER), Office of Electricity (OE), National Laboratories); Department of Homeland Security (Cybersecurity and Infrastructure Agency (CISA)); Federal Energy Regulatory Commission (FERC); Federal Permitting Improvement Steering Council (FPISC); Federal Emergency Management Agency (FEMA); National Science Foundation (NSF)

What are some examples of AI tools that are already being developed or deployed today?

A full database of how the federal government is using AI across agencies can be found at the 2024 Federal Agency AI Use Case Inventory. A few additional examples of private sector applications, or public-private partnerships are provided below.

Grid Planning

EPRI’s Open Power AI Consortium

Google’s Tapestry

Octopus Energy’s Kraken

Siting and Permitting

Pacific Northwest National Laboratory’s PermitAI

Paces

FlyPix AI

Operations and Reliability

Schneider Electric’s One Digital Grid Platform

Cammus

Amperon

Grid Resilience

Southwire’s Digital Grid Assessment

Think Power Solutions

DOE’s North American Energy Resilience Model (NAERM)

Enhancing US Power Grid by using AI to Accelerate Permitting

The increased demand for power in the United States is driven by new technologies such as artificial intelligence, data analytics, and other computationally intensive activities that utilize ever faster and power-hungry processors. The federal government’s desire to reshore critical manufacturing industries and shift the economy from service to goods production will, if successful, drive energy demands even higher.

Many of the projects that would deliver the energy to meet rising demand are in the interconnection queue, waiting to be built. There is more power in the queue than on the grid today. The average wait time in the interconnection queue is five years and growing, primarily due to permitting timelines. In addition, many projects are cancelled due to the prohibitive cost of interconnection.

We have identified six opportunities where Artificial Intelligence (AI) has the potential to speed the permitting process.

AI can be used to speed decision-making by regulators through rapidly analyzing environmental regulations and past decisions.
AI can be used to identify generation sites that are more likely to receive permits.
AI can be used to create a database of state and federal regulations to bring all requirements in one place.
AI can be used in conjunction with the database of state regulations to automate the application process and create visibility of permit status for stakeholders.
AI can be used to automate and accelerate interconnection studies.
AI can be used to develop a set of model regulations for local jurisdictions to adapt and adopt.

Challenge and Opportunity

There are currently over 11,000 power generation and consumption projects in the interconnection queue, waiting to connect to the United States power grid. As a result, on average, projects must wait five years for approval, up from three years in 2010.

Historically, a large percentage of projects in the queue, averaging approximately 70%, have been withdrawn due to a variety of factors, including economic viability and permitting challenges. About one-third of wind and solar applications submitted from 2019 to 2024 were cancelled, and about half of these applications faced delays of 6 months or more. For example, the Calico Solar Project in the California Mojave Desert, with a capacity of 850 megawatts, was cancelled due to lengthy multi-year permitting and re-approvals for design changes. Increasing queue wait time is likely to increase the number of projects cancelled and delay those that are viable.

The U.S. grid added 20.2 gigawatts of utility-scale generating capacity in the first half of 2024, a 21% increase over the first half of 2023. However, this is still less power than is likely to be needed to meet increasing power demands in the U.S. Nor does it account for the retirement of generation capacity, which was 5.1 gigawatts in the first half of 2024. In addition to replacing aging energy infrastructure as it is taken offline, this new power is critically needed to address rising energy demands in the U.S. Data centers alone are increasing power usage dramatically, from 1.9% of U.S. energy consumption in 2018 to 4.4% in 2023, and with an expected consumption of at least 6.7% in 2028.

If we want to achieve the Administration’s vision of restoring U.S. domestic manufacturing capacity, a great deal of generation capacity not currently forecast will also need to be added to the grid very rapidly, far faster than indicated by the current pace of interconnections. The primary challenge that slows most power from getting onto the grid is permitting. A secondary challenge that frequently causes projects to be delayed or cancelled is interconnection costs.

Projects frequently face significant permitting challenges. Projects not only need to obtain permits to operate the generation site but must also obtain permits to move power to the point where it connects to the existing grid. Geographically remote projects may require new transmission lines that cover many miles and cross multiple jurisdictions. Even projects relatively close to the existing grid may require multiple permits to connect to the grid.

In addition, poor site selection has resulted in the cancellation of several high-profile renewable installation projects. The Battle Born Solar Project, valued at $1 billion with a 850 megawatt capacity, was cancelled after community concern that the solar farm would impact tourism and archaeological sites in the Mormon Mesa in Nevada. Another project, a 150 megawatt solar facility proposed for Culpeper County, Virginia, was denied permits for interfering with the historic site of a Civil War battle. Similarly, a geothermal plant in Nevada had to be scaled back to less than a third of its original plan after it was found to be in the only known habitat of the endangered Dixie Valley toad. While community outrage over renewable energy installations is not always avoidable, mostly due to complaints about construction impacts and misinformation, better site selection could save developers time and money by avoiding locations that encroach on historical sites, local attractions, or endangered species‘ habitats.

Projects have also historically faced cost challenges as utilities and grid operators could charge the full cost of new operating capacity to each project, even when several pending projects could utilize the same new operating assets. On July 28, 2023, FERC issued a final rule with a compliance date of March 21, 2024, that requires transmission providers to consider all projects in the queue and determine how operating assets would be shared when calculating the cost of connecting a project to the grid. However, the process for calculating costs can be cumbersome when many projects are involved.

On April 15th, 2025, the Trump Administration issued a Presidential Memorandum titled “Updating Permitting Technology for the 21st Century.” This memo directs executive departments and agencies to take full advantage of technology for environmental review and permitting processes and creates a permitting innovation center. While it is unclear how much authority the PIC will have, it demonstrates the Administration’s focus in this area and may serve as a change agent in the future. There is an opportunity to use AI to improve both the speed and the cost of connecting new projects to the grid. Below are recommendations to capitalize on this opportunity.

Plan of Action

Recommendation 1. Funding for PNNL to expand the PolicyAI NEPA model to streamline environmental permitting processes beyond the federal level.

In 2023, Pacific Northwest National Laboratory (PNNL) was tasked by DOE with developing a PermitAI prototype to help regulators understand the National Environmental Policy Act (NEPA) regulations and speed up project environmental reviews. PNNL data scientists created an AI-searchable database of federal impact environmental statements, composed primarily of information that was not readily available to regulators before. The database contains textual data extracted from documents across 2,917 different projects stored as 3.6 million tokens from the GPT-2 tokenizer. Tokens are the units in which text is broken down for natural language processing AI models. The entire dataset is currently publicly available via HuggingFace. The database is then used for generative-AI searching that can quickly find documents and summarize relevant results as a Large Language Model (LLM). While the development of this database is still preliminary and efficiency metrics have not yet been published, based on complaints from those involved in permitting about the complexity of the process and the lack of guidelines, this approach should be a model for tools that could be developed and provided to state and local regulators to assist with permitting reviews.

In 2021, PNNL created a similar process, without using AI, for NEPA permitting for small-to medium-sized nuclear reactors, which simplified the process and reduced the environmental review time from three to six years to between six and twenty-four months. Using AI has the potential to reduce the process exponentially for renewables permitting. The National Renewable Energy Laboratory (NREL) has also studied using LLMs to expedite the processing of policy data from legal documents and found the results to support the expansion of LLMs for policy database analysis, primarily when compared to the current use of manual effort.

State and local jurisdictions can use the “Updating Permitting Technology” Presidential Memorandum as guidance to support the intersection between state and local permitting efforts. The PNNL database of federal NEPA materials, trained on past NEPA cases, would be provided by PNNL to state jurisdictions as a service, through a process similar to that used by EPA to ensure that state jurisdictions do not need to independently develop data collection solutions. Ideally, the initial data analysis model would be trained to be specific to each participating state and continually updated with new material to create a seamless regulatory experience.

Since PNNL has already built a NEPA model and this work is being expanded to a multi-lab effort that includes NREL, Argonne and others The House Energy and Water development committee could appropriate additional funding to the Office of Policy (OP) or EERE (Energy Efficiency and Renewable Energy) to enable the labs to expand the model and make it available to state and local regulatory agencies to integrate it into their permitting processes. States could develop models specific to their ordinances with the backbone of PNNL’s PermitAI. This effort could be expedited through engagement with the Environmental Council of the States (ECOS).

A shared database of NEPA information would reduce time spent reviewing backlogs of data from environmental review documents. State and local jurisdictions would more efficiently identify relevant information and precedent, and speed decision-making while reducing costs. An LLM tool also has the benefit of answering specific questions asked by the user. An example would be answering a question about issues that have arisen for similar projects in the same area.

Recommendation 2. Appropriate funding to expand AI site selection tools and support state and local pilots to improve permitting outcomes and reduce project cancellations.

AI could be used to identify sites that are suitable for energy generation, with different models eventually trained for utility-scale solar siting, onshore and offshore wind siting, and geothermal power plant siting. Key concerns affecting the permitting process include the loss of arable land, impacts on wildlife, and community responses, like opposition based on land use disagreements. Better site selection identifies these issues before they appear during the permitting process.

AI can access data from a range of sources, including satellite imagery from Google Earth, commercially available lidar studies, and local media screening to identify locations with the least number of potential barriers or identify and mitigate barriers for sites that have been selected. Unlike action one, which involves answering questions by pulling from large databases using LLMs, this would primarily utilize machine learning algorithms that process past and current data to identify patterns and predict outcomes, like energy generation potential. Examples of datasets these tools can use are the free, publicly available products created by the Innovative Data Energy Applications (IDEA) group in NREL’s Strategic Energy Analysis Center (SEAC), including the national solar radiation database and the wind resource database. The national solar radiation database visualizes the amount of solar energy potential at a given time and predicts future availability of solar energy for a given location in the dataset, which covers the entirety of the United States.

The wind resource database is a collection of modeled wind resource estimates for locations within the United States. In addition, Argonne National Lab has developed the GEM tool to support the NEPA reviews for transmission projects. A few start-ups have synthesized a variety of datasets like these and created their databases for information like terrain and slope to create site-selection decision-making tools. AI analysis of local news and landmarks important to local communities to identify locations that are likely to oppose renewable installations is particularly important since community opposition is often what kills renewable generation projects that have made it into the permitting process.

The House Committee for Energy and Water Development could appropriate funds to DOE’s Grid Deployment Office which could collaborate with EERE, FECM (Fossil Energy and Carbon Management), NE (Nuclear Energy) and OE (Office of Electricity) to further expand the technology specific models as well as to expand Argonne’s GEM tool. GDO could also provide grant funding to state and local government permitting authorities to pilot AI-powered site selection tools created by start-ups or other organizations. Local jurisdictions, in turn, could encourage use by developers.

Better site selection would speed permitting processes and reduce the number of cancelled projects, as well as wasted time and money by developers.

Recommendation 3. Funding for DOE labs to develop an AI-based permitting database, starting with a state-level pilot, to streamline permit site identification and application for large-scale energy projects.

Use AI to identify all of the non-environmental federal, state, and local permits required for generation projects. A pilot project, focused on one generation type, such as solar, should be launched in a state that is positioned for central coordination. New York may be the best candidate, as the Office of Renewable Energy Siting and Electric Transmission has exclusive jurisdiction over on-shore renewable energy projects of at least 25 megawatts.

A second option could be Illinois, which has statewide standards for utility-scale solar and wind facilities where local governments cannot adopt more restrictive ordinances. This would require the development of a database of regulations and the ability to query that database to provide a detailed list of required permits for each project by jurisdiction, the relevant application process, and forms. The House Energy and Water Development Committee could direct funds to EERE to support PNNL, NREL, Argonne, and other DOE labs to develop this database. Ideally, this tool would be integrated with tools developed by local jurisdictions to automate their individual permitting process.

State-level regulatory coordination would speed the approval of projects contained within a single state, as well as improve coordination between states.

Recommendation 4. Appropriate funds for DOE to develop a state-level AI permitting application to streamline renewable energy permit approvals and improve transparency.

Use AI as a tool to complete the permitting process. While it would be nearly impossible to create a national permitting tool, it would be realistic to create a tool that could be used to manage developers’ permitting processes at the state level.

NREL developed a permitting tool with funding from the DOE Solar Energy Technologies Office (SETO) for residential rooftop solar permitting. The tool, SolarAPP+, automates plan review, permit approval, and project tracking. As of the end of 2023, it had saved more than 33,000 hours of permitting staff time for more than 32,800 projects. However, permitting for rooftop solar is less complex than permitting for utility-scale solar sites or wind farms because of less need for environmental reviews, wildlife endangerment reviews, or community feedback. Using the AI frameworks developed by PNNL mentioned in recommendation one and leveraging the development work completed by NREL could create tools similar to SolarAPP+ for large-scale renewable installations and have similar results in projects approved and time saved. An application that may meet this need is currently under development at NREL.

The House Energy and Water Development Committee should appropriate funds for DOE to create an application through PNNL and NREL that would utilize the NREL SolarAPP+ framework that could be implemented by states to streamline the permitting application process. This would be especially helpful for complex projects that cross multiple jurisdictions. In addition, Congress, through appropriation by the House Energy and Water Development Committee to DOE’s Grid Deployment Office, could establish a grant program to support state and local level implementation of this permitting tool. This tool could include a dashboard to improve permitting transparency, one of the items required by the Presidential Memorandum on Updating Permitting Technology.

Developers are frequently unclear about what permits are required, especially for complex multi-jurisdiction projects. The AI tool would reduce the time a developer spends identifying permits and would support smaller developers who don’t have permitting consultants or prior experience. An integrated electronic permitting solution would reduce the complexity of applying for and approving permits. With a state-wide system, state and local regulators would only need to add their requirements and location-specific requirements and forms into a state-maintained system. Finally, an integrated system with a dashboard could increase status visibility and help resolve issues more quickly. These tools together would allow developers to make realistic budgets and time frames for projects to allocate resources and prioritize projects that have the greatest chance of being approved.

Recommendation 5. Direct FERC to require RTOs to evaluate and possibly implement AI tools to automate interconnection analysis processes.

Use AI tools to reduce the complexity of publishing and analyzing the mandated maps and assigning costs to projects. While FERC has mandated that grid operators consider all projects coming onto the grid when setting interconnection pricing, as well as considering project readiness rather than time in queue for project completion, the requirements are complex to implement.

A number of private sector companies have begun developing tools to model interconnections. Pearl Street has used its model to reproduce a complex and lengthy interconnection cluster study in ten days, and PJM recently announced a collaboration with Google to develop an analysis capability. Given the private sector efforts in this space, the public interest would be best served by FERC requiring RTOs to evaluate and implement, if suitable, an automated tool to speed their analysis process.

Automating parts of interconnection studies would allow developers to quickly understand the real cost of a new generation project, allowing them to quickly evaluate feasibility. It would create more cost certainty for projects and would also help identify locations where planned projects have the potential to reduce interconnection costs, attracting still more projects to share new interconnections. Conversely, the capability would also quickly identify when new projects in an area would exceed expected grid capacity and increase the costs for all projects. Ultimately, the automation would lead to more capacity on the grid faster and at a lower cost as developers optimize their investments.

Recommendation 6. Provide funding to DOE to extend the use of NREL’s AI-compiled permitting data to develop and model local regulations. The results could be used to promote standardization through national stakeholder groups.

As noted earlier, one of the biggest challenges in permitting is the complexity of varying and sometimes conflicting local regulations that a project must comply with. Several years ago, NREL, in support of the DOE Office of Policy, spent 1500 staff hours to manually compile what was believed to be a complete list of local energy permitting ordinances across the country. In 2024, NREL used an LLM to compile the same information with a 90% success rate in a fraction of the time.

The House Energy and Water Development Committee should direct DOE to fund the continued development of the NREL permitting database and evaluate that information with an LLM to develop a set of model regulations that could be promoted to encourage standardization. Adoption of those regulations could be encouraged by policymakers and external organizations through engagement with the National Governors Association, the National Association of Counties, the United States Conference of Mayors, and other relevant stakeholders.

Local jurisdictions often adopt regulations based on a limited understanding of best practices and appropriate standards. A set of model regulations would guide local jurisdictions and reduce complexity for developers.

Conclusion

As demand on the electrical grid grows, the need to speed up the availability of new generation capacity on the grid becomes increasingly urgent. The deployment of new generation capacity is slowed by challenges related to site selection, environmental reviews, permitting, and interconnection costs and wait times. While much of the increasing demand for energy in the United States can be attributed to AI, it can also be a powerful tool to help the nation meet that demand.

The six recommendations for AI to speed up the process of bringing new power to the grid that have been identified in this memo address all of those concerns. AI can be used to assist with site selection, analyze environmental regulations, help both regulators and the regulated community understand requirements, develop better regulations, streamline permitting processes, and reduce the time required for interconnection studies.

Frequently Asked Questions

How much power is currently in the interconnection queue, and how much capacity is expected to be added to the grid by these projects?

The combined generating capacity of the projects awaiting approval is about 1,900 gigawatts, excluding ERCOT and NYISO which do not report this data. In comparison, the generating capacity of the U.S. grid as of Q4 2023 was 1,189 gigawatts. Even if the current high cancellation rate of 70% is maintained, the queue will yield an approximately 50% increase in the amount of power available on the grid through a $600B investment in US energy infrastructure.

How much new power needs to be added to the grid in the next five years, and what are the risks in achieving that goal?

FERC’s five-year growth forecast through 2029 predicts an increased demand for 128 gigawatts of power. In that context, the net addition of 15.1 gigawatts of power in the first half of 2024 suggests an increase of 150 gigawatts of power and little excess capacity over the five-year horizon. This forecast is predicated on the assumption that the power added to the grid does not decline, retirements do not increase, and the load forecast does not increase. All these estimates are being applied to a system where supply and demand are already so closely matched that FERC predicted supply shortages in several regions in the summer of 2024.

Are there other causes of project delays in addition to permitting?

Construction delays and cost overruns can be an issue, but this is more frequently a factor in large projects such as nuclear and large oil and gas facilities, and is rarely a factor for wind and solar which are factory built and modular.

Will the Administration’s order to expedite approvals for energy projects solve this problem?

While the current administration has declared a National Energy Emergency to expedite approvals for energy projects, the order excludes wind, solar, and batteries, which make up 90% of the power presently in the interconnection queue as well as mirroring the mix of capacity recently added to the grid. Therefore, the expedited permitting processes required by the administration only applies to 10% of the queue, composed of 7% natural gas and 3% that includes nuclear, oil, coal, hydrogen, and pumped hydro. Since solar, wind, and batteries are unlikely to be granted similar permitting relief, and relying on as-yet unplanned fossil fuel projects to bring more energy to the grid is not realistic, other methods must be undertaken to speed new power to the grid.

Transform Communities By Adaptive Reuse of Legacy Coal Infrastructure to Support AI Data Centers

The rise of artificial intelligence (AI) and the corresponding hyperscale data centers that support it present a challenge for the United States. Data centers intensify energy demand, strain power grids, and raise environmental concerns. These factors have led developers to search for new siting opportunities outside traditional corridors (i.e., regions with longstanding infrastructure and large clusters of data centers), such as Silicon Valley and Northern Virginia. American communities that have historically relied on coal to power their local economies have an enormous opportunity to repurpose abandoned coal mines and infrastructure to site data centers alongside clean power generation. The decline of the coal industry in the late 20th century led to the abandonment of coal mines, loss of tax revenues, destruction of good-paying jobs, and the dismantling of the economic engine of American coal communities, primarily in the Appalachian, interior, and Western coal regions. The AI boom of the 21st century can reinvigorate these areas if harnessed appropriately.

The opportunity to repurpose existing coal infrastructure includes Tribal Nations, such as the Navajo, Hopi, and Crow, in the Western Coal regions. These regions hold post-mining land with potential for economic development, but operate under distinct governance structures and regulatory frameworks administered by Tribal governments. A collaborative approach involving Federal, State, and Tribal governments can ensure that both non-tribal and Tribal coal regions share in the economic benefits of data center investments, while also promoting the transition to clean energy generation by collocating data centers with renewable, clean energy-powered microgrids.

This memo recommends four actions for coal communities to fully capitalize on the opportunities presented by the rise of artificial intelligence (AI).

Establish a Federal-State-Tribal Partnership for Site Selection, Utilizing the Department of the Interior’s (DOI) Abandoned Mine Land (AML) Program.
Develop a National Pilot Program to Facilitate a GIS-based Site Selection Tool
Promote collaboration between states and utility companies to enhance grid resilience from data centers by adopting plug-in and flexible load standards.
Lay the groundwork for a knowledge economy centered around data centers.

By pursuing these policy actions, states like West Virginia, Pennsylvania, and Kentucky, as well as Tribal Nations, can lead America’s energy production and become tech innovation hubs, while ensuring that the U.S. continues to lead the AI race.

Challenge and Opportunity

Energy demands for AI data centers are expected to rise by between 325 and 580 TWh by 2028, roughly the amount of electricity consumed by 30 to 54 million American households annually. This demand is projected to increase data centers’ share of total U.S. electricity consumption to between 6.7% and 12.0% by 2028, according to the 2024 United States Data Center Energy Usage Report by the Lawrence Berkeley National Lab. According to the same report, AI data centers also consumed around 66 billion liters of water for cooling in 2023. By 2028, that number is expected to be between 60 and 124 billion litres for hyperscale data centers alone. (Hyperscale data centers are massive warehouses of computer servers, powered by at least 40 MW of electricity, and run by major cloud companies like Amazon, Google, or Microsoft. They serve a wide variety of purposes, including Artificial intelligence, automation, data analytics, etc.)

Future emissions are also expected to grow with increasing energy usage. Location has also become important; tech companies with AI investments have increasingly recognized the need for more data centers in different places. Although most digital activities are traditionally centered around tech corridors like Silicon Valley and Northern Virginia, the need for land and considerations of carbon emissions footprints in these places make the case for expansion to other sites.

Coal communities have experienced a severe economic decline over the past decade, as coal severance and tax revenues have plummeted. West Virginia, for example, reported an 83% decline in severance tax collections in fiscal year 2024. Competition from natural gas and renewable energy sources, slow growth in energy demand, and environmental concerns have led to coal often being viewed as a backup option. This has led to low demand for coal locally, and thus a decrease in severance, property, sales, and income taxes.

The percentage of the coal severance tax collected that is returned to the coal-producing counties varies by state. In West Virginia, the State Tax Commissioner collects coal severance taxes from all producing counties and deposits them in the State Treasurer’s office. Seventy-five percent of the net proceeds from the taxes are returned to the coal-producing counties, while the remaining 25% is distributed to the rest of the state. Historically, these tax revenues have usually funded a significant portion of county budgets. For counties like Boone in West Virginia and Campbell County in Wyoming, once two of America’s highest coal-producing counties, these revenues helped maintain essential services and school districts. Property taxes and severance taxes on coal funded about 24% of Boone’s school budget, while 59% of overall property valuations in Campbell county in 2017 were coal mining related. With those tax bases eroding, these counties have struggled to maintain schools and public services.

Likewise, the closure of the Kayenta Mine and the Navajo Generating Station resulted in the elimination of hundreds of jobs and significant public revenue losses for the Navajo and Hopi Nations. The Crow Nation, like many other Native American tribes with coal, is reliant on coal leases with miners for revenue. They face urgent infrastructure gaps and declining fiscal capacity since their coal mines were shut down. These tribal communities, with a rich legacy of land and infrastructure, are well-positioned to lead equitable redevelopment efforts if they are supported appropriately by state and federal action.

These communities now have a unique opportunity to attract investments in AI data centers to generate new sources of revenue. Investments in hyperscale data centers will revive these towns through revenue from property taxes, land reclamation, and investments in energy, among other sources. For example, data centers in Northern Virginia, commonly referred to as the “Data Center Alley,” have contributed an estimated 46,000 jobs and up to $10 billion in economic impact to the state’s economy, according to an economic impact report on data centers commissioned by the Northern Virginia Technology Council.

Coal powered local economies and served as the thread holding together the social fabric of communities in parts of Appalachia for decades. Coal-reliant communities also took pride in how coal powered most of the U.S.’s industrialization in the nineteenth century. However, many coal communities have been hollowed out, with thousands of abandoned coal mines and tens of thousands of lost jobs. By inviting investments in data centers and new clean energy generation, these communities can be economically revived. This time, their economies will be centered on a knowledge base, representing a shift from an extraction-based economy to an information-based one. Data centers attract new AI- and big-data-focused businesses, which reinvigorates the local workforce, inspires research programs at nearby academic institutions, and reverses the brain drain that has long impacted these communities.

The federal government has made targeted efforts to repurpose abandoned coal mines. The Abandoned Mine Land (AML) Reclamation Program, created under the Surface Mining Control and Reclamation Act (SMCRA) of 1977, reclaims lands affected by coal mining and stabilizes them for safe reuse. Building on that, Congress established the Abandoned Mine Land Economic Revitalization (AMLER) Program in 2016 to support the economic redevelopment of reclaimed sites in partnership with state and tribal governments. AMLER sites are eligible for flexible reuse for siting hyperscale AI data centers. Those with flat terrains and legacy infrastructure are particularly desirable for reuse. The AMLER program is supported by a fee collected from active coal mining operations – a fee that has decreased as coal mining operations have ceased – and has also received appropriated Congressional funding since 2016. Siting data centers on AMLER sites can circumvent any eminent domain concerns that arise with project proposals on private lands.

In addition to the legal and logistical advantages of siting data centers on AMLER sites, many of these locations offer more than just reclaimed land; they retain legacy infrastructure that can be strategically repurposed for other uses. These sites often lie near existing transmission corridors, rail lines, and industrial-grade access roads, which were initially built to support coal operations. This makes them especially attractive for rapid redevelopment, reducing the time and cost associated with building entirely new facilities. By capitalizing on this existing infrastructure, communities and investors can accelerate project timelines and reduce permitting delays, making AMLER sites not only legally feasible but economically and operationally advantageous.

Moreover, since some coal mines are built near power infrastructure, there exist opportunities for federal and state governments to allow companies to collocate data centers with renewable, clean energy-powered microgrids, thereby preventing strain on the power grid. These sites present an opportunity for data centers to:

Host local microgrids for energy load balancing and provide an opportunity for net metering;
Develop a model that identifies places across the United States and standardizes data center site selection;
Revitalize local economies and communities;
Invest in clean energy production; and,
Create a knowledge economy outside of tech corridors in the United States.

Precedents for collocating new data centers at existing power plants already exist. In February 2025, the Federal Energy Regulatory Commission (FERC) reviewed potential sites within the PJM Interconnection region to host these pairings. Furthermore, plans to repurpose decommissioned coal power stations as data centers exist in the United States and Europe. However, there remains an opportunity to utilize the reclaimed coal mines themselves. They provide a readily available location with proximity to existing transmission lines, substations, roadways, and water resources. Historically, they also have a power plant ecosystem and supporting infrastructure, meaning minimal additional infrastructure investment is needed to bring them up to par.

Plan of Action

The following recommendations will fast-track America’s investment in data centers and usher it into the next era of innovation. Collaboration among federal agencies, state governments, and tribal governments will enable the rapid construction of data centers in historically coal-reliant communities. Together, they will bring prosperity back to American communities left behind after the decline in the coal industry by investing in their energy capacities, economies, and workforce.

Recommendation 1. Establish a Federal-State-Tribal Partnership for Site Selection, Utilizing the Department of the Interior’s (DOI) Abandoned Mine Land (AML) Program.

The first step in investing in data centers in coal communities should be a collaborative effort among federal, state, and tribal governments to identify and develop data center pilot sites on reclaimed mine lands, brownfields, and tribal lands. The Environmental Protection Agency (EPA) and the Department of the Interior (DOI) should jointly identify eligible sites with intact or near-intact infrastructure, nearby energy generation facilities, and broadband corridors, utilizing the Abandoned Mine Land (AML) Reclamation Program and the EPA Brownfields Program. Brownfields with legacy infrastructure should also be prioritized to reduce the need for greenfield development. Where tribal governments have jurisdiction, they should be engaged as co-developers and beneficiaries of data centers, with the right to lead or co-manage the process, including receiving tax benefits from the project. Pre-law AMLs (coal mines that were abandoned before August 3, 1977, when the SMCRA became law) offer the most flexibility in regulations and should be prioritized. Communities will be nominated for site development based on economic need, workforce readiness, and redevelopment plans.

State governments and lawmakers will nominate communities from the federally identified shortlist based on economic need, workforce readiness and mobility, and redevelopment plans.

Recommendation 2. Develop a National Pilot Program to Facilitate a GIS-based Site Selection Tool

In partnership with private sector stakeholders, the DOE National Labs should develop a pilot program for these sites to inform the development of a standardized GIS-based site selection tool. This pilot would identify and evaluate a small set of pre-law AMLs, brownfields, and tribal lands across the Appalachian, Interior, and Western coal regions for data center development.

The pilot program will assess infrastructure readiness, permitting pathways, environmental conditions, and community engagement needs across all reclaimed lands and brownfields and choose those that meet the above standards for the pilot. Insights from these pilots will inform the development of a scalable tool that integrates data on grid access, broadband, water, land use, tax incentives, and workforce capacity.

The GIS tool will equip governments, utilities, and developers with a reliable, replicable framework to identify high-potential data center locations nationwide. For example, the Geospatial Energy Mapper (GEM), developed by Argonne National Laboratory with support from the U.S. Department of Energy, offers a public-facing tool that integrates data on energy resources, infrastructure, land use, and environmental constraints to guide energy infrastructure siting.

The DOE, working in coordination with agencies such as the Department of the Treasury, the Department of the Interior, the Bureau of Indian Affairs, and state economic development offices, should establish targeted incentives to encourage data center companies to join the coalition. These include streamlined permitting, data confidentiality protections, and early access to pre-qualified sites. Data center developers, AI companies, and operators typically own the majority of the proprietary operational and siting data for data centers. Without incentives, this data will be restricted to private industry, hindering public-sector planning and increasing geographic inequities in digital infrastructure investments.

By leveraging the insights gained from this pilot and expanding access to critical siting data, the federal government can ensure that the benefits of AI infrastructure investments are distributed equitably, reaching communities that have historically powered the nation’s industrial growth but have been left behind in the digital economy. A national site selection tool grounded in real-world conditions, cross-agency coordination, and private-public collaboration will empower coal-impacted communities, including those on Tribal lands and in remote Appalachian and Western regions, to attract transformative investment. In doing so, it will lay the foundation for a more inclusive, resilient, and spatially diverse knowledge economy built on reclaimed land.

Recommendation 3. Promote collaboration between states and utility companies to enhance grid resilience from data centers by adopting plug-in and flexible load standards.

Given the urgency and scale of hyperscale data center investments, state governments, in coordination with Public Utility Commissions (PUCs), should adopt policies that allow temporary, curtailable, and plug-in access to the grid, pending the completion of colocated, preferably renewable, energy microgrids in proposed data centers. This plug-in could involve approving provisional interconnection services for large projects, such as data centers. This short-term access is critical for communities to realize immediate financial benefits from data center construction while long-term infrastructure is still being developed. Renewable-powered on-site microgrids for hyperscale data centers typically exceed 100–400 MW per site and require deployment times of up to three years.

To protect consumers, utilities and data center developers must guarantee that any interim grid usage does not raise electricity rates for households or small businesses. The data center and/or utility should bear responsibility for short-term demand impacts through negotiated agreements.

In exchange for interim grid access, data centers must submit detailed grid resilience plans that include:

A time-bound schedule (typically 18–36 months) for deploying an on-site microgrid, preferably powered by renewable energy.
On-site battery storage systems and demand response capabilities to smooth load profiles and enhance reliability.
Participation in net metering to enable excess microgrid energy to be sold back to the grid, benefiting local communities.

Additionally, these facilities should be treated as large, flexible loads capable of supporting grid stability by curtailing non-critical workloads or shifting demand during peak periods. Studies suggest that up to 126 GW of new data center load could be integrated into the U.S. power system with minimal strain if such facilities allow as little as 1% curtailment time (when data centers reduce or pause their electricity usage by 1% of their annual electricity usage).

States can align near-term economic gains with long-term energy equity and infrastructure sustainability by requiring early commitment to microgrid deployment and positioning data centers as flexible grid assets (see FAQs for ideas on water cooling for the data centers).

Recommendation 4. Lay the groundwork for a knowledge economy centered around data centers.

The DOE Office of Critical and Emerging Technologies (CET), in coordination with the Economic Development Administration (EDA), should conduct an economic impact assessment of data center investments in coal-reliant communities. To ensure timely reporting and oversight, the Senate Committee on Energy and Natural Resources and the House Committee on Energy and Commerce should guide and shape the reports’ outcomes, building on President Donald Trump’s executive order to pass legislation on AI education. Investments in data centers offer knowledge economies as an alternative to extractive economies, which have relied on selling fossil fuels, such as coal, that have failed these communities for generations.

A workforce trained in high-skilled employment areas such as AI data engineering, data processing, cloud computing, advanced digital infrastructure, and cybersecurity can participate in the knowledge economy. The data center itself, along with new business ecosystems built around it, will provide these jobs.

Counties will also generate sustainable revenue through increased property taxes, utility taxes, and income taxes from the new businesses. This new revenue will replace the lost revenue from the decline in coal over the past decade. This strategic transformation positions formerly coal-dependent regions to compete in a national economy increasingly shaped by artificial intelligence, big data, and digital services.

This knowledge economy will also benefit nearby universities, colleges, and research institutes by creating research partnership opportunities, developing workforce pipelines through new degree and certificate programs, and fostering stronger innovation ecosystems built around digital infrastructure.

Conclusion

AI is growing rapidly, and data centers are following suit, straining our grid and requiring new infrastructure. Coal-reliant communities possess land and energy assets, and they have a pressing need for economic renewal. With innovative federal-state coordination, we can repurpose abandoned mine lands, boost local tax bases, and build a knowledge economy where coal once dominated. These two pressing challenges—grid strain and post-coal economic decline—can be addressed through a unified strategy: investing in data centers on reclaimed coal lands.

This memo outlines a four-part action plan. First, federal and state governments must collaborate to prepare abandoned mine lands for data center development. Second, while working with private industry, DOE National Labs should develop a standardized, GIS-based site selection tool to guide smart, sustainable investments. Third, states should partner with utilities to allow temporary grid access to data centers, while requiring detailed microgrid-based resilience plans to reduce long-term strain. Fourth, policymakers must lay the foundation for a knowledge economy by assessing the economic impact of these investments, fostering partnerships with local universities, and training a workforce equipped for high-skilled roles in digital infrastructure.

This is not just an energy strategy but also a sustainable economic revitalization strategy. It will transform coal assets that once fueled America’s innovation in the 19th century into assets that will fuel America’s innovation in the 21st century. The energy demands of data centers will not wait; the economic revitalization of Appalachian communities, heartland coal communities, and the Mountain West coal regions cannot wait. The time to act is now.

Frequently Asked Questions

What is an example of a coal mine reclaimed for data center use?

There is no direct example yet of data center companies reclaiming former coal mines. However, some examples show the potential. For instance, plans are underway to transform an abandoned coal mine in Wise County, Virginia, into a solar power station that will supply a nearby data center.

Why collocate energy generation with data centers?

Numerous examples from the U.S. and abroad exist of tech companies collocating data centers with energy-generating facilities to manage their energy supply and reduce their carbon footprint. Meta signed a long-term power-purchase agreement with Sage Geosystems for 150 MW of next-generation geothermal power in 2024, enough to run multiple hyperscale data centers. The project’s first phase is slated for 2027 and will be located east of the Rocky Mountains, near Meta’s U.S. data center fleet.

Internationally, Facebook built its Danish data center into a district heating system, utilizing the heat generated to supply more than 7,000 homes during the winter. Two wind energy projects power this data center with 294 MW of clean energy.

Are there examples of data centers as anchors for a knowledge economy?

Yes! Virginia, especially Northern Virginia, is a leading hub for data centers, attracting significant investment and fostering a robust tech ecosystem. In 2023, new and expanding data centers accounted for 92% of all new investment announced by the Virginia Economic Development Partnership. This growth supports over 78,000 jobs and has generated $31.4 billion in economic output, a clear sign of the job creation potential of the tech industry. Data centers have attracted supporting industries, including manufacturing facilities for data center equipment and energy monitoring products, further bolstering the state’s knowledge economy.

Why are some AMLER-eligible sites less valuable than post-1977 mine sites for reuse?

AMLER funds are federally restricted to use on or adjacent to coal mines abandoned before August 3, 1977. However, some of these pre-1977 sites—especially in Appalachia and the West—are not ideal for economic redevelopment due to small size, steep slopes, or flood risk. In contrast, post-1977 mine sites that have completed reclamation (SMCRA Phase III release) are more suitable for data centers due to their flat terrain, proximity to transmission lines, and existing utilities. Yet, these sites are not currently eligible for AMLER funding. To fully unlock the economic potential of coal communities, federal policymakers should consider expanding AMLER eligibility or creating a complementary program that supports the reuse of reclaimed post-1977 mine lands, particularly those that are already prepared for industrial use.

Why do Brownfields make sense for data centers?

Brownfields are previously used industrial or commercial properties, such as old factories, decommissioned coal-fired power plants, rail yards, and mines, whose reuse is complicated by real or suspected environmental contamination. By contrast, Greenfields are undeveloped acreage that typically requires the development of new infrastructure and land permitting from scratch. Brownfields offer land developers and investors faster access to existing zoning, permitting, transportation infrastructure, and more.

Since 1995, the EPA Brownfields Program has offered competitive grants and revolving loan funds for assessing, cleaning up, and training for jobs at Brownfield sites, transforming liabilities into readily available assets. A study estimated that every federal dollar spent by the EPA in 2018 leveraged approximately $16.86 in follow-on capital and created 8.6 jobs for every $100,000 of grant money. In 2024, the Agency added another $300 million to accelerate projects in disadvantaged communities.

What federal action is needed to situate data centers on public lands?

In early 2025, the U.S. Department of Energy (DOE) issued a Request for Information (RFI) seeking input on siting artificial intelligence and data infrastructure on DOE-managed federal lands, including National Labs and decommissioned sites. This effort reflects growing federal interest in repurposing publicly-owned sites to support AI infrastructure and grid modernization. Like the approach recommended in this memo, the RFI process recognizes the need for multi-level coordination involving federal, state, tribal, and local governments to assess land readiness, streamline permitting, and align infrastructure development with community needs. Lessons from that process can help guide broader efforts to repurpose pre-law AMLs, brownfields, and tribal lands for data center investment.

Can we use flooded mines for server cooling?

Yes, by turning a flooded mine into a giant underground cooler. Abandoned seams in West Virginia hold water that remains at a steady temperature of ~50–55°F (10–13°C). A Marshall University study logged 54°F mine-pool temperatures and calculated that closed-loop heat exchangers can reduce cooling power enough to achieve paybacks in under five years. The design lifts the cool mine water to the servers in the data centers, absorbs heat from the servers, and then returns the warmed water underground, so the computer hardware side never comes into contact with raw mine water. The approach is already being commercialized: Virginia’s “Data Center Ridge” project secured $3 million in AMLER funds, plus $1.5 million from DOE, to cool 36 MW blocks with up to 10 billion gallons of mine water held at a temperature of below 55°F.

Moving Beyond Pilot Programs to Codify and Expand Continuous AI Benchmarking in Testing and Evaluation

Rapid and advanced AI integration and diffusion within the Department of Defense (DoD) and other government agencies has emerged as a critical national security priority. This convergence of rapid AI advancement and DoD prioritization creates an urgent need to ensure that AI models integrated into defense operations are reliable, safe, and mission-enhancing. For this purpose, the DoD must deploy and expand one of its most critical tools available within its Testing and Evaluation (T&E) process: benchmarking—the structured practice of applying shared tasks and metrics to compare models, track progress, and expose performance gaps.

A standardized AI benchmarking framework is critical for delivering uniform, mission-aligned evaluations across the DoD. Despite their importance, the DoD currently lacks standardized, enforceable AI safety benchmarks, especially for open-ended or adaptive use cases. A shift from ad hoc to structured assessments will support more informed, trusted, and effective procurement decisions.

Particularly at the acquisition stage for AI models, rapid DoD acquisition platforms such as Tradewinds can serve as the policy vehicle for enabling more robust benchmarking efforts. This can be done with the establishment of a federally coordinated benchmarking hub, spearheaded by a coordinated effort between the Chief Data and Artificial Intelligence Officer (CDAO) and Defense Innovation Unit (DIU) in consultation with the newly established Chief AI Officer’s Council (CAIOC) of the White House Office of Management and Budget (OMB).

Challenge and Opportunity

Experts at the intersection of both AI and defense, such as the retired Lieutenant General John (Jack) N.T. Shanahan, have emphasized the profound impact of AI on the way the United States will fight future wars – with the character of war continuously reshaped by AI’s diffusion across all domains. The DoD is committed to remaining at the forefront of these changes: between 2022-2023, the value of federal AI contracts increased by over 1200%, with the surge driven by increases in DoD spending. Secretary of Defense Pete Hegseth has pledged increased investment in AI specifically for military modernization efforts, and has tasked the Army to implement AI in command and control across the theater, corps, and division headquarters by 2027–further underscoring AI’s transformative impact on modern warfare.

Strategic competitors—especially the People’s Republic of China—are rapidly integrating AI into their military and technological systems. The Chinese Communist Party views AI-enabled science and technology as central to accelerating military modernization and achieving global leadership. At this pivotal moment, the DoD is pushing to adopt advanced AI across operations to preserve the U.S. edge in military and national security applications. Yet, accelerating too quickly without proper safeguards risks exposing vulnerabilities adversaries could exploit.

With the DoD at a unique inflection point, it must balance the rapid adoption and integration of AI into its operations with the need for oversight and safety. DoD needs AI systems that consistently meet clearly defined performance standards set by acquisition authorities, operate strictly within the scope of their intended use, and do not exhibit unanticipated or erratic behaviors under operational conditions. These systems can deliver measurable value to mission outcomes while fostering trust and confidence among human operators through predictability, transparency, and alignment with mission-specific requirements.

AI benchmarks are standardized tasks and metrics that systematically measure a model’s performance, reliability, and safety, and have increasingly been adopted as a key measurement tool by the AI industry. Currently, DoD lacks standardized, comprehensive AI safety benchmarks, especially for open-ended or adaptive use cases. Without these benchmarks, the DoD risks acquiring models that underperform, deviate from mission requirements, or introduce avoidable vulnerabilities, leading to increased operational risk, reduced mission effectiveness, and costly contract revisions.

A recent report from the Center for a New American Security (CNAS) on best practices for AI T&E outlined that the rapid and unpredictable pace of AI advancement presents distinctive challenges for both policymakers and end-users. The accelerating pace of adoption and innovation heightens both the urgency and complexity of establishing effective AI benchmarks to ensure acquired models meet the mission-specific performance standards required by the DoD and the services.

The DoD faces particularly outsized risk, as its unique operational demands can expose AI models to extreme conditions where performance may degrade. For example, under adversarial conditions, or when encountering data that is different from its training, an AI model may behave unpredictably, posing heightened risk to the mission. Robust evaluations, such as those offered through benchmarking, help to identify points of failure or harmful model capabilities before they become apparent during critical use cases. By measuring model performance in real-world applicable scenarios and environments, we increase understanding of attack surface vulnerabilities to adversarial inputs. We can identify inaccurate or over-confident measurements of outputs, and recognize potential failures in edge cases and extreme scenarios (including those beyond training parameters, Moreover, we improve human-AI performance and trust factors, and avoid unintended capabilities. Benchmarking helps to surface these issues early.

Robust AI benchmarking frameworks can enhance U.S. leadership by shaping international norms for military AI safety, improving acquisition efficiency by screening out underperforming systems, and surfacing unintended or high-risk model behaviors before deployment. Furthermore, benchmarking enables AI performance to be quantified in alignment with mission needs, using guidance from the CDAO RAI Toolkit and clear acquisition parameters to support decision-making for both procurement officers and warfighters. Given the DoD’s high-risk use cases and unique mission requirements, robust benchmarking is even more essential than in the commercial sector.

The DoD now has an opportunity to formalize AI safety benchmark frameworks within its Testing and Evaluation (T&E) processes, tailored to both dual-use and defense-specific applications. T&E is already embedded in DoD culture, offering a strong foundation for expanding benchmarking. Public-private AI testing initiatives, such as the DoD collaboration with Scale AI to create effective T&E (including through benchmarking) for AI models show promise and existing motivation for such initiatives. Yet, critical policy gaps still exist. With pilot programs underway, the DoD can move beyond vendor-led or ad hoc evaluations to introduce DoD-led testing, assess mission-specific capabilities, launch post-acquisition benchmarking, and develop human-AI team metrics. The widely used Tradewinds platform offers an existing vehicle to integrate these enhanced benchmarks without reinventing the wheel.

To implement robust benchmarking at DoD, this memo proposes the following policy recommendations, to be coordinated by DoD Chief Digital and Artificial Intelligence Office (CDAO):

Expanding on existing benchmarking efforts
Standardizing AI safety thresholds during the procurement cycle
Implementing benchmarking during the lifecycle of the model
Establishing a benchmarking repository
Enabling adversarial stress testing, or “red-teaming”, prior to deployment to enhance current benchmarking gaps for DoD AI use cases

Plan of Action

The CDAO should launch a formalized AI Benchmarking Initiative, moving beyond current vendor-led pilot programs, while continuing to refine its private industry initiatives. This effort should be comprehensive and collaborative in nature, leveraging internal technical expertise. This includes the newly established coordinating bodies on AI such as the Chief AI Officer’s Council, which can help to ensure that DoD benchmarking practices are aligned with federal priorities, and the Defense Innovation Unit, which can be an excellent private industry-national defense sector bridge and coordinator in these efforts. Specifically, the CDAO should integrate benchmarking into the acquisition pipeline. This will establish ongoing benchmarking practices that facilitate continuous model performance evaluation through the entirety of the model lifecycle.

Policy Recommendations

Recommendation 1. Establish a Standardized Defense AI Benchmarking Initiative and create a Centralized Repository of Benchmarks

The DoD should build on lessons learned from its partnership with Scale AI (and others) developing benchmarks specifically for defense use cases. This should expand into a standardized, agency-wide framework.

This recommendation is in line with findings outlined by RAND, which calls for developing a comprehensive framework for robust evaluation and emphasizes the need for collaborative practices, and measurable performance metrics for model performance.

The DoD should incorporate the following recommendations and government entities to achieve this goal:

Develop a Whole-of-Government Approach to AI Benchmarking

Develop and expand on existing pilot benchmarking frameworks, similar to Massive Multitask Language Understanding (MMLU) but tailored to military-relevant tasks and DoD-specific use cases.
Expand the $10 million T&E and research budget by $10 million, with allocations specifically for bolstering internal benchmarking capabilities. One crucial piece is identifying and recruiting technically capable talent to aid in developing internal benchmarking guidelines. As AI models advance, new “reasoning” models with advanced capabilities become far costlier to benchmark, and the DoD must plan for these future demands now. Part of this allocation can come from the $500 million allocated for the combatant command AI budgets. This monetary allocation is critical to successfully implementing this policy because model benchmarking for more advanced models – such as OpenAI’s GPT-3 – can cost millions. This modest budgetary increase is a starting point for moving beyond piecemeal and ad hoc benchmarking, to a comprehensive and standardized process. This funding increases would facilitate:
- Development of and expansion of internal and customized benchmarking capabilities
- Recruitment and retention of technical talent
- Development of simulation environment for more mission-relevant benchmarks

If internal reallocations from the $500 million allocation proves insufficient or unviable, Congressional approval for additional funds can be another funding source. Given the strategic importance of AI in defense, such requests can readily find bipartisan support, particularly when tied to operational success and risk mitigation.

Create a centralized AI benchmarking repository under the CDAO. This will standardize categories, performance metrics, mission alignment, and lessons learned across defense-specific use cases. This repository will enable consistent tracking of model performance over time, support analysis across model iterations, and allow for benchmarking transferability across similar operational scenarios. By compiling performance data at scale, the repository will also help identify interoperability risks and system-level vulnerabilities—particularly how different AI models may behave when integrated—thereby enhancing the DoD’s ability to assess, document, and mitigate potential performance and safety failures.
Convene a partnership, organized by OMB, between the CDAO, the DIU and the CAIOC, to jointly establish and maintain a centralized benchmarking repository. While many CAIOC members represent civilian agencies, their involvement is crucial: numerous departments (such as the Department of Homeland Security, the Department of Energy, and the National Institute of Standards and Technology) are already employing AI in high-stakes contexts and bring relevant technical expertise, safety frameworks, and risk management policies. Incorporating these perspectives ensures that DoD benchmarking practices are not developed in isolation but reflect best practices across the federal government. This partnership will leverage the DIU’s insights on emerging private-sector technologies, the CDAO’s acquisition and policy authorities, and CAIOC’s alignment with broader executive branch priorities, thereby ensuring that benchmarking practices are technically sound, risk-informed, and consistent with government-wide standards and priorities for trustworthy, safe, and reliable AI.

Recommendation 2. Formalize Pre-Deployment Benchmarking for AI Models at the Acquisition Stage

The key to meaningful benchmarking lies in integrating it at the pre-award stage of procurement. The DoD should establish a formal process that:

Integrates benchmarking into existing AI acquisition platforms, such as Tradewinds, and embeds it within the T&E process.
Requires participation from third-party vendors in benchmarking the products they propose for DoD acquisition and use.
Embeds internal adversarial stress testing, or “red-teaming”, into AI benchmarking ensures more realistic, mission-aligned evaluations that account for adversarial threats and the unique, high-risk operating environments the military faces. By leveraging its internal expertise in mission context, classified threat models, and domain-specific edge cases that external vendors are unlikely to fully replicate, the DoD can produce a more comprehensive and defense-relevant assessment of AI system safety, efficacy, and suitability for deployment. Specifically, this policy memo recommends that the AI Rapid Capabilities Cell (AI RCC) be tasked with carrying out the red-teaming, as a technically qualified element of the CDAO.
Assures procurement officers understand the value of incorporating benchmarking performance metrics into their contract award decision-making. This can be done by hosting benchmarking workshops for procurement officers, which outline the benchmarking results for model performance for various models in the acquisition pipeline and to guide them on how to apply these metrics to their own performance requirements and guidelines.

Recommendation 3. Contextualize Benchmarking into Operational Environments

Current efforts to scale and integrate AI reflect the distinct operational realities of the DoD and military services. Scale AI, in partnership with the DoD, Anduril, Microsoft, and the CDAO, is developing AI-powered solutions which are focused on the United States Indo-Pacific Command (INDOPACOM) and United States European Command (EUCOM). With these regional command focused AI solutions, it makes sense to create equally focused benchmarking standards to test AI model performance in specific environments and under unique and focused conditions. In fact, researchers have been identifying the limits of traditional AI benchmarking and making the case for bespoke, holistic, and use-case relevant benchmark development. This is vital because as AI models advance, they introduce entirely new capabilities which require more robust testing and evaluation. For example, large language models, which have introduced new functionalities including natural language querying or multimodal search interfaces, require entirely new benchmarks that measure: natural language understanding, modal integration accuracy, context retention, and result usefulness. In the same vein, DoD relevant benchmarks must be developed in an operationally-relevant context. This can be achieved by:

Developing simulation environments for benchmarking that are mission-specific across a broader set of domains, including technical and regional commands, to test AI models under specific conditions which are likely to be encountered by users in unique, contested, and/or adversarial environments. The Bipartisan House Task Force on Artificial Intelligence report provides useful guidance on AI model functionality, reliability, and safety in operating in contested, denied, and degraded environments.
Prioritizing use-case-specific benchmarks over broad commercial metrics by incorporating user feedback and identifying tailored risk scenarios that more accurately measure model performance.
Introducing context relevant benchmarks to measure performance in specific, DoD-relevant scenarios, such as:
- Task-specific accuracy (i.e. correct ID in satellite imagery cases)
- Alignment with context-specific rules of engagement
- Instances of degraded performance under high-stress conditions
- Susceptibility to adversarial manipulation (i.e. data poisoning)
- Latency in high-risk, fast-paced decision-making scenarios
Creating post-deployment benchmarking to ensure ongoing performance and risk compliance, and to detect and address issues like model drift, a phenomenon where model performance degrades over time. As there is no established consensus on how often continuous model benchmarking should be performed, the DoD should study the appropriate practical, risk-informed timelines for re-evaluating deployed systems.

Frameworks such as Holistic Evaluation of Language Models (HELM) and Focused LLM Ability Skills and Knowledge (FLASK) can offer valuable guidance for developing LLM-focused benchmarks within the DoD, by enabling more comprehensive evaluations based on specific model skill sets, use-case scenarios, and tailored performance metrics.

Recommendation 4. Integration of Human-in-the-Loop Benchmarking

An additional layer of AI benchmarking for safe and effective AI diffusion into the DoD ecosystem is evaluating AI-human team performance, and measuring user trust, perceptions and confidence in various AI models. “Human‑in‑the‑loop” systems require a person to approve or adjust the AI’s decision before action, while “human‑on‑the‑loop” systems allow autonomous operation but keep a person supervising and ready to intervene. Both “Human in the loop” and “Human on the loop” are critical components of the DoD and military approach to AI. Both require continued human oversight of ethical and safety considerations over AI-enabled capabilities with national security implications. A recent study by MIT study found that there are surprising performance gaps between AI only, human only, and AI-human teams. For the DoD particularly, it is important to effectively measure these performance gaps across the various AI models it plans to integrate into its operations due to heavy reliance on user-AI teams.

A CNAS report on effective T&E for AI spotlighted the DARPA Air Combat Evolution (ACE) program, which sought autonomous air‑combat agents needing minimal human intervention. Expert test pilots could override the system, yet often did so prematurely, distrusting its unfamiliar tactics. This case underscores the need for early, extensive benchmarks that test user capacity, surface trust gaps that can cripple human‑AI teams, and assure operators that models meet legal and ethical standards. Accordingly, this memo urges expanding benchmarking beyond pure model performance to AI‑human team evaluations in high‑risk national‑security, lethal, or error‑sensitive environments.

Conclusion

The Department of Defense is racing to integrate AI across every domain of warfare, yet speed without safety will jeopardize mission success and national security. Standardized, acquisition‑integrated, continuous, and mission‑specific benchmarking is therefore not a luxury—it is the backbone of responsible AI deployment. Current pilot programs with private partners are encouraging starts, but they remain too ad hoc and narrow to match the scale and tempo of modern AI development.

Benchmarking must begin at the pre‑award acquisition stage and follow systems through their entire lifecycle, detecting risks, performance drift, and adversarial vulnerabilities before they threaten operations. As the DARPA ACE program showed, early testing of human‑AI teams and rigorous red‑teaming surface trust gaps and hidden failure modes that vendor‑led evaluations often miss. Because AI models—and enemy capabilities—evolve constantly, our evaluation methods must evolve just as quickly.

By institutionalizing robust benchmarks under CDAO leadership, in concert with the Defense Innovation Unit and the Chief AI Officers Council, the DoD can set world‑class standards for military AI safety while accelerating reliable procurement. Ultimately, AI benchmarking is not a hurdle to innovation and acquisition, but rather it is the infrastructure that can make rapid acquisition more reliable and innovation more viable. The DoD cannot afford the risk of deploying AI systems which are risky, unreliable, ineffective or misaligned with mission needs and standards in high-risk operational environments. At this inflection point, the choice is not between speed and safety but between ungoverned acceleration and a calculated momentum that allows our strategic AI advantage to be both sustained and secured.

This memo was written by an AI Safety Policy Entrepreneurship Fellow over the course of a six-month, part-time program that supports individuals in advancing their policy ideas into practice. You can read more policy memos and learn about Policy Entrepreneurship Fellows here.

What is the Scale AI benchmarking pilot program at DoD, and why and how does this policy proposal build on this initiative?

he Scale AI benchmarking initiative, launched in February 2024 in partnership with the DoD, is a pilot framework designed to evaluate the performance of AI models intended for defense and national security applications. It is part of the broader efforts to create a framework for T&E of AI models for the CDAO.

This memo builds on that foundation by:

Formalizing benchmarking as a standard requirement at the procurement stage across DoD acquisition processes.

Inserting benchmarking protocols into rapid acquisition platforms like Tradewinds.

Establishing a defense-specific benchmarking repository and enabling red-teaming led by the AI Rapid Capabilities Cell (AI RCC) within the CDAO.

Shifting the lead on benchmarking from vendor-enabled to internally developed, led, and implemented, creating bespoke evaluation criteria tailored to specific mission needs.

What types of AI systems will these benchmarks apply to, and how will they be tailored for national security use cases?

The proposed benchmarking framework will apply to a diverse range of AI systems, including:

Decision-making and command and control support tools (sensors, target recognition, process automation, and tools involved in natural language processing).

Generative models for planning, logistics, intelligence, or data generation.

Autonomous agents, such as drones and robotic systems.

Benchmarks will be theater and context-specific, reflecting real-world environments (e.g. contested INDOPACOM scenarios), end-user roles (human-AI teaming in combat), and mission-specific risk factors such as adversarial interference and model drift.

How will this benchmarking framework approach open-source or non-proprietary AI models intended for DoD use?

Open-source models present distinct challenges due to model ownership and origin, additional possible exposure to data poisoning, and downstream user manipulation. However, due to the nature of open-source models, it should be noted that the general increase in transparency and potential access to training data could make open-source models less challenging to put through rigorous T&E.

This memo recommends:

Applying standardized evaluation criteria across both open-source and proprietary models which can be developed by utilizing the AI benchmarking repository and applying model evaluations based on possible use cases of the model.

Incorporating benchmarking to test possible areas of vulnerability for downstream user manipulation.

Measuring the transparency of training data.

Performing adversarial testing to assess resilience against manipulated inputs via red-teaming.

Logging the open-source model performance in the proposed centralized repository, enabling ongoing monitoring for drift and other issues

Why is red-teaming a necessity in addition to AI benchmarking, and how will it be executed?

Red-teaming implements adversarial stress-testing (which can be more robust and operationally relevant if led by an internal team as this memo proposes), and can identify vulnerabilities and unintended capabilities before deployment. Internally led red-teaming, in particular, is critical for evaluating models intended for use in unpredictable or hostile environments.

How will red-teaming be executed?

To effectively employ the red-teaming efforts, this policy recommends that:

The AI Rapid Capabilities Cell within the CDAO should lead red-teaming operations, leveraging the team’s technical capabilities with its experience and mission set to integrate and rapidly scale AI at the speed of relevance — delivering usable capability fast enough to affect current operations and decision cycles.

Internal, technically skilled teams should be created who are capable of incorporating classified threat models and edge-case scenarios.

Red-teaming should focus on simulating realistic mission conditions, and searching for specific model capabilities, going beyond generic or vendor-supplied test cases.

How does this benchmarking framework improve acquisition decisions and reduce risks?

Integrating benchmarking at the acquisition stage enables procurement officers to:

Compare models on mission-relevant, standardized performance metrics and ensure that there is evidence of measurable performance metrics which align with their own “vision of success” procurement requirements for the models.

Identify and avoid models with unsafe, misaligned, unverified, or ineffective capabilities.

Prevent cost-overruns or contract revisions.

Benchmarking workshops for acquisition officers can further equip them with the skills to interpret benchmark results and apply them to their operational requirements.

Develop a Risk Assessment Framework for AI Integration into Nuclear Weapons Command, Control, and Communications Systems

As the United States overhauls nearly every element of its strategic nuclear forces, artificial intelligence is set to play a larger role—initially in early‑warning sensors and decision‑support tools, and likely in other mission areas. Improved detection could strengthen deterrence, but only if accompanying hazards—automation bias, model hallucinations, exploitable software vulnerabilities, and the risk of eroding assured second‑strike capability—are well managed.

To ensure responsible AI integration, the Office of the Assistant Secretary of Defense for Nuclear Deterrence, Chemical, and Biological Defense Policy and Programs (OASD (ND-CBD)), the U.S. Strategic Command (STRATCOM), the Defense Advanced Research Projects Agency (DARPA), the Office of the Undersecretary of Defense for Policy (OUSD(P)), and the National Nuclear Security Administration (NNSA), should jointly develop a standardized AI risk-assessment framework guidance document, with implementation led by the Department of Defense’s Chief Digital and Artificial Intelligence Office (CDAO) and STRATCOM. Furthermore, DARPA and CDAO should join the Nuclear Weapons Council to ensure AI-related risks are systematically evaluated alongside traditional nuclear modernization decisions.

Challenge and Opportunity

The United States is replacing or modernizing nearly every component of its strategic nuclear forces, estimated to cost at least $1.7 trillion over the next 30 years. This includes its:

Intercontinental ballistic missiles (ICBMs)
Ballistic missile submarines and their submarine-launched ballistic missiles (SLBMs)
Strategic bombers, cruise missiles, and gravity bombs
Nuclear warhead production and plutonium pit fabrication facilities

Simultaneously, artificial intelligence (AI) capabilities are rapidly advancing and being applied across the national security enterprise, including nuclear weapons stockpile stewardship and some components of command, control, and communications (NC3) systems, which encompass early warning, decision-making, and force deployment components.

The NNSA, responsible for stockpile stewardship, is increasingly integrating AI into its work. This includes using AI for advanced modeling and simulation of nuclear warheads. For example, by creating a digital twin of existing weapons systems to analyze aging and performance issues, as well as using AI to accelerate the lifecycle of nuclear weapons development. Furthermore, NNSA is leading some aspects of the safety testing and systematic evaluations of frontier AI models on behalf of the U.S. government, with a specific focus on assessing nuclear and radiological risk.

Within the NC3 architecture, a complex “system of systems” with over 200 components, simpler forms of AI are already being used in areas including early‑warning sensors, and may be applied to decision‑support tools and other subsystems as confidence and capability grow. General Anthony J. Cotton—who leads STRATCOM, the combatant command that directs America’s global nuclear forces and their command‑and‑control network—told a 2024 conference that STRATCOM is “exploring all possible technologies, techniques, and methods” to modernize NC3. Advanced AI and data‑analytics tools, he said, can sharpen decision‑making, fuse nuclear and conventional operations, speed data‑sharing with allies, and thus strengthen deterrence. General Cotton added that research must also map the cascading risks, emergent behaviors, and unintended pathways that AI could introduce into nuclear decision processes.

Thus, from stockpile stewardship to NC3 systems, AI is likely to be integrated across multiple nuclear capabilities, some potentially stabilizing, others potentially highly destabilizing. For example, on the stabilizing effects, AI could enhance early warning systems by processing large volumes of satellite, radar, and other signals intelligence, thus providing more time to decision-makers. On the destabilizing side, the ability for AI to detect or track other countries’ nuclear forces could be destabilizing, triggering an expansionary arms race if countries doubt the credibility of their second-strike capability. Furthermore, countries may misinterpret each other’s nuclear deterrence doctrines or have no means of verification of human control of their nuclear weapons.

While several public research reports have been conducted on how AI integration into NC3 could upset the balance of strategic stability, less research has focused on the fundamental challenges with AI systems themselves that must be accounted for in any risk framework. Per the National Institute of Standards and Technology’s (NIST) AI Risk Management Framework, several fundamental AI challenges at a technical level must be accounted for in the integration of AI into stockpile stewardship and NC3.

Not all AI applications within the nuclear enterprise carry the same level of risk. For example, using AI to model warhead aging in stockpile stewardship is largely internal to the Department of Energy (DOE) and involves less operational risk. Despite lower risk, there is still potential for an insufficiently secure model to lead to leaked technical data about nuclear weapons.

However, integrating AI into decision support systems or early warning functions within NC3 introduces significantly higher stakes. These systems require time-sensitive, high-consequence judgments, and AI integration in this context raises serious concerns about issues including confabulations, human-AI interactions, and information security:

Confabulations: A phenomenon in which generative AI systems (GAI) systems generate and confidently present erroneous or false content in response to user inputs, or
prompts. These phenomena are colloquially also referred to as “hallucinations” or “fabrications”, and could have particularly dangerous consequences in high-stakes settings.

Human-AI Interactions: Due to the complexity and human-like nature of GAI technology, humans may over-rely on GAI systems or may unjustifiably perceive GAI content to be of higher quality than that produced by other sources. This phenomenon is an example of automation bias or excessive deference to automated systems. This deference can lead to a shift from a human making the final decision (“human in the loop”), to a human merely observing AI generated decisions (“human on the loop”). Automation bias therefore risks exacerbating other risks of GAI systems as it can lead to humans maintaining insufficient oversight.

Information Security: AI expands the cyberattack surface of NC3. Poisoned AI training data and tampered code can embed backdoors, and, once deployed, prompt‑injection or adversarial examples can hijack AI decision tools, distort early‑warning analytics, or leak secret data. The opacity of large AI models can let these exploits spread unnoticed, and as models become more complex, they will be harder to debug.

This is not an exhaustive list of issues with AI systems, however it highlights several key areas that must be managed. A risk framework must account for these distinctions and apply stricter oversight where system failure could have direct consequences for escalation or deterrence credibility. Without such a framework, it will be challenging to harness the benefits AI has to offer.

Plan of Action

Recommendation 1. OASD (ND-CBD), STRATCOM, DARPA, OUSD(P), and NNSA, should develop a standardized risk assessment framework guidance document to evaluate the integration of artificial intelligence into nuclear stockpile stewardship and NC3 systems.

This framework would enable systematic evaluation of risks, including confabulations, human-AI configuration, and information security, across modernization efforts. The framework could assess the extent to which an AI model is prone to confabulations, involving performance evaluations (or “benchmarking”) under a wide range of realistic conditions. While there are public measurements for confabulations, it is essential to evaluate AI systems on data relevant to the deployment circumstances, which could involve highly sensitive military information.

Additionally, the framework could assess human-AI configuration with specific focus on risks from automation bias and the degree of human oversight. For these tests, it is important to put the AI systems in contact with human operators in situations that are as close to real deployment as possible, for example when operators are tired, distracted, or under pressure.

Finally, the framework could include assessments of information security under extreme conditions. This should include simulating comprehensive adversarial attacks (or “red-teaming”) to understand how the AI system and its human operators behave when subject to a range of known attacks on AI systems.

NNSA should be included in this development due to their mission ownership of stockpile stewardship and nuclear safety, and leadership in advanced modeling and simulation capabilities. DARPA should be included due to its role as the cutting edge research and development agency, extensive experience in AI red-teaming, and understanding of the AI vulnerabilities landscape. STRATCOM must be included as the operational commander of NC3 systems, to ensure the framework accounts for real-word needs and escalation risks. OASD (ND-CBD) should be involved given the office’s responsibilities to oversee nuclear modernization and coordinate across the interagency. The OUSD (P) should be included to provide strategic oversight and ensure the risk assessment aligns with broader defense policy objectives and international commitments.

Recommendation 2. CDAO should implement the Risk Assessment Framework with STRATCOM

While NNSA, DARPA, OASD (ND-CBD) and STRATCOM can jointly create the risk assessment framework, CDAO and STRATCOM should serve as the implementation leads for utilizing the framework. Given that the CDAO is already responsible for AI assurance, testing and evaluation, and algorithmic oversight, they would be well-positioned to work with relevant stakeholders to support implementation of the technical assessment. STRATCOM would have the strongest understanding of operational contexts with which to apply the framework. NNSA and DARPA therefore could advise on technical underpinnings with regards to AI of the framework, while the CDAO would prioritize operational governance and compliance, ensuring that there are clear risk assessments completed and understood when considering integration of AI into nuclear-related defense systems.

Recommendation 3. DARPA and CDAO should join the Nuclear Weapons Council

Given their roles in the creation and implementation of the AI risk assessment framework, stakeholders from both DARPA and the CDAO should be incorporated into the Nuclear Weapons Council (NWC), either as full members or attendees to a subcommittee. As the NWC is the interagency body the DOE and the DoD responsible for sustaining and modernizing the U.S. nuclear deterrent, the NWC is responsible for endorsing military requirements, approving trade-offs, and ensuring alignment between DoD delivery systems and NNSA weapons.

As AI capabilities become increasingly embedded in nuclear weapons stewardship, NC3 systems, and broader force modernization, the NWC must be equipped to evaluate associated risks and technological implications. Currently, the NWC is composed of senior officials from the Department of Defense, the Joint Chiefs of Staff, and the Department of Energy, including the NNSA. While these entities bring deep domain expertise in nuclear policy, military operations, and weapons production, the Council lacks additional representation focused on AI.

DARPA’s inclusion would ensure that early-stage technology developments and red-teaming insights are considered upstream in decision-making. Likewise, CDAO’s presence would provide continuity in AI assurance, testing, and digital system governance across operational defense components. Their participation would enhance the Council’s ability to address new categories of risk, such as model confabulation, automation bias, and adversarial manipulation of AI systems, that are not traditionally covered by existing nuclear stakeholders. By incorporating DARPA and CDAO, the NWC would be better positioned to make informed decisions that reflect both traditional nuclear considerations and the rapidly evolving technological landscape that increasingly shapes them.

Conclusion

While AI is likely to be integrated into components of the U.S. nuclear enterprise, without a standardized initial approach to assessing and managing AI-specific risk, including confabulations, automation bias, and novel cybersecurity threats, this integration could undermine an effective deterrent. A risk assessment framework coordinated by OASD (ND-CBD), with STRATCOM, NNSA and DARPA, and implemented with support of the CDAO, could provide a starting point for NWC decisions and assessments of the alignment between DoD delivery system needs, the NNSA stockpile, and NC3 systems.

Frequently Asked Questions

Does the NWC have the authority to create a new subcommittee including DARPA and the CDAO?

Yes, NWC subordinate organizations or subcommittees are not codified in Title 10 USC §179, so the NWC has the flexibility to create, merge, or abolish organizations and subcommittees as needed.

Are there existing regulations that the United States has declared with respect to AI integration into NC3?

Section 1638 of the FY2025 National Defense Authorization Act established a Statement of Policy emphasizing that any use of AI in support of strategic deterrence should not compromise, “the principle of requiring positive human actions in execution of decisions by the President with respect to the employment of nuclear weapons.” However, as this memo describes, AI presents further challenges outside of solely keeping a human in the loop in terms of decision-making.