Bringing Transparency to Federal R&D Infrastructure Costs

There is an urgent need to manage the escalating costs of federal R&D infrastructure and the increasing risk that failing facilities pose to the scientific missions of the federal research enterprise.  Many of the laboratories and research support facilities operating under the federal research umbrella are near or beyond their life expectancy, creating significant safety hazards for federal workers and local communities.  Unfortunately, the nature of the federal budget process forces agencies into a position where the actual cost of operations are not transparent in agency budget requests to OMB before becoming further obscured to appropriators, leading to potential appropriations disasters (including an approximately 60% cut to National Institute of Standards and Technology (NIST) facilities in 2024 after the agency’s challenges became newsworthy).  Providing both Congress and OMB with a complete accounting of the actual costs of agency facilities may break the gamification of budget requests and help the government prioritize infrastructure investments.

Challenge and Opportunity 

Recent reports by the National Research Council and the National Science and Technology Council, including the congressionally-mandated Quadrennial Science and Technology Review have highlighted the dire state of federal facilities.  Maintenance backlogs have ballooned in recent years, forcing some agencies to shut down research activities in strategic R&D domains including Antarctic research and standards development.  At NIST, facilities outages due to failing steam pipes, electricity, and black mold have led to outages reducing research productivity from 10-40 percent.  NASA and NIST have both reported their maintenance backlogs have increased to exceed 3 billion dollars. The Department of Defense forecasts that bringing their buildings up to modern standards would cost approximately 7 billion “putting the military at risk of losing its technological superiority.”  The shutdown of many Antarctic science operations and collapse of the Arecibo Observatory have been placed in stark contrast with the People’s Republic of China opening rival and more capable facilities in both research domains.  In the late 2010s, Senate staffers were often forced to call national laboratories, directly, to ask them what it would actually cost for the country to fully fund a particular large science activity.

This memo does not suggest that the government should continue to fund old or outdated facilities; merely that there is a significant opportunity for appropriators to understand the actual cost of our legacy research and development ecosystem, initially ramped up during the Cold War.  Agencies should be able to provide a straight answer to Congress about what it would cost to operate their inventory of facilities.  Likewise, Congress should be able to decide which facilities should be kept open, where placing a facility on life support is acceptable, and which facilities should be shut down.  The cost of maintaining facilities should also be transparent to the Office of Management and Budget so examiners can help the President make prudent decisions about the direction of the federal budget.

The National Science and Technology Council’s mandated research and development infrastructure report to Congress is a poor delivery vehicle.  As coauthors of the 2024 research infrastructure report, we can attest to the pressure that exists within the White House to provide a positive narrative about the current state of play as well as OMB’s reluctance to suggest additional funding is needed to maintain our inventory of facilities outside the budget process.  It would be much easier for agencies who already have a sense of what it costs to maintain their operations to provide that information directly to appropriators (as opposed to a sanitized White House report to an authorizing committee that may or may not have jurisdiction over all the agencies covered in the report)–assuming that there is even an Assistant Director for Research Infrastructure serving in OSTP to complete the America COMPETES mandate.  Current government employees suggest that the Trump Administration intends to discontinue the Research and Development Infrastructure Subcommittee.

Agencies may be concerned that providing such cost transparency to Congress could result in greater micromanagement over which facilities receive which investments.  Given the relevance of these facilities to their localities (including both economic benefits and environmental and safety concerns) and the role that legacy facilities can play in training new generations of scientists, this is a matter that deserves public debate.  In our experience, the wider range of factors considered by appropriation staff are relevant to investment decisions.  Further, accountability for macro-level budget decisions should ultimately fall on decisionmakers who choose whether or not to prioritize investments in both our scientific leadership and the health and safety of the federal workforce and nearby communities.  Facilities managers who are forced to make agonizing choices in extremely resource-constrained environments currently bear most of that burden.

Plan of Action 

Recommendation 1:  Appropriations committees should require from agencies annual reports on the actual cost of completed facilities modernization, operations, and maintenance, including utility distribution systems.

Transparency is the only way that Congress and OMB can get a grip on the actual cost of running our legacy research infrastructure.  This should be done by annual reporting to the relevant appropriators the actual cost of facilities operations and maintenance.  Other costs that should be accounted for include obligations to international facilities (such as ITER) and facilities and collections that are paid for by grants (such as scientific collections which support the bioeconomy). Transparent accounting of facilities costs against what an administration chooses to prioritize in the annual President’s Budget Request may help foster meaningful dialogue between agencies, examiners, and appropriations staff.

The reports from agencies should describe the work done in each building and impact of disruption.  Using the NIST as an example, the Radiation Physics Building (still without the funding to complete its renovation) is crucial to national security and the medical community. If it were to go down (or away), every medical device in the United States that uses radiation would be decertified within 6 months, creating a significant single point of failure that cannot be quickly mitigated. The identification of such functions may also enable identification of duplicate efforts across agencies.

The costs of utility systems should be included because of the broad impacts that supporting infrastructure failures can have on facility operations. At NIST’s headquarters campus in Maryland, the entire underground utility distribution system is beyond its designed lifespan and suffering nonstop issues. The Central Utility Plant (CUP), which creates steam and chilled water for the campus, is in a similar state. The CUP’s steam distribution system will be at the complete end of life (per forensic testing of failed pipes and components) in less than a decade and potentially as soon as 2030. If work doesn’t start within the next year (by early 2026), it is likely the system could go down.  This would result in a complete loss of heat and temperature control on the campus; particularly concerning given the sensitivity of modern experiments and calibrations to changes in heat and humidity.  Less than a decade ago, NASA was forced to delay the launch of a satellite after NIST’s steam system was down for a few weeks and calibrations required for the satellite couldn’t be completed.

Given the varying business models for infrastructure around the Federal government, standardization of accounting and costs may be too great a lift–particularly for agencies that own and operate their own facilities (government owned, government operated, or GOGOs) compared with federally funded research and development centers (FFRDCs) operated by companies and universities (government owned, contractor operated, or GOCOs).

These reports should privilege modernization efforts, which according to former federal facilities managers should help account for 80-90 percent of facility revitalization, while also delivering new capabilities that help our national labs maintain (and often re-establish) their world-leading status.  It would also serve as a potential facilities inventory, allowing appropriators the ability to de-conflict investments as necessary.

It would be far easier for agencies to simply provide an itemized list of each of their facilities, current maintenance backlog, and projected costs for the next fiscal year to both Congress and OMB at the time of annual budget submission to OMB.  This should include the total cost of operating facilities, projected maintenance costs, any costs needed to bring a federal facility up to relevant safety and environmental codes (many are not).  In order to foster public trust, these reports should include an assessment of systems that are particularly at risk of failure, the risk to the agency’s operations, and their impact on surrounding communities, federal workers, and organizations that use those laboratories.  Fatalities and incidents that affect local communities, particularly in laboratories intended to improve public safety, are not an acceptable cost of doing business.  These reports should be made public (except for those details necessary to preserve classified activities).

Recommendation 2: Congress should revisit the idea of a special building fund from the General Services Administration (GSA) from which agencies can draw loans for revitalization.

During the first Trump Administration, Congress considered the establishment of a special building fund from the GSA from which agencies could draw loans at very low interest (covering the staff time of GSA officials managing the program).  This could allow agencies the ability to address urgent or emergency needs that happen out of the regular appropriations cycle.  This approach has already been validated by the Government Accountability Office for certain facilities, who found that “Access to full, upfront funding for large federal capital projects—whether acquisition, construction, or renovation—could save time and money.”  Major international scientific organizations that operate large facilities, including CERN (the European Organization for Nuclear Research), have similar ability to take loans to pay for repairs, maintenance, or budget shortfalls that helps them maintain financial stability and reduce the risk of escalating costs as a result of deferred maintenance.

Up-front funding for major projects enabled by access to GSA loans can also reduce expenditures in the long run.  In the current budget environment, it is not uncommon for the cost of major investments to double due to inflation and doing the projects piecemeal.  In 2010, NIST proposed a renovation of its facilities in Boulder with an expected cost of $76 million.  The project, which is still not completed today, is now estimated to cost more than $450 million due to a phased approach unsupported by appropriations.  Productivity losses as a result of delayed construction (or a need to wait for appropriations) may have compounding effects on industry that may depend on access to certain capabilities and harm American competitiveness, as described in the previous recommendation.

Conclusion

As the 2024 RDI Report points out “Being a science superpower carries the burden of supporting and maintaining the advanced underlying infrastructure that supports the research and development enterprise.” Without a transparent accounting of costs it is impossible for Congress to make prudent decisions about the future of that enterprise. Requiring agencies to provide complete information to both Congress and OMB at the beginning of each year’s budget process likely provides the best chance of allowing us to address this challenge.

A Certification System for Third Party Climate Models to Support Local Planning and Flood Resilience

As the impacts of climate change worsen and become salient to more communities across the country, state and local planners need access to robust and replicable predictive models in order to effectively plan for emergencies like extreme flooding. However, planning agencies often lack the resources to build these models themselves. And models developed by federal agencies are often built on outdated data and are limited in their interoperability. Many planners have therefore begun turning to private-sector providers of models they say offer higher quality and more up-to-date information. But access to these models can be prohibitively expensive, and many remain “black boxes” as these providers rarely open up their methods and underlying data.

The federal government can support more proactive, efficient, and cost-effective resiliency planning by certifying predictive models to validate and publicly indicate their quality. Additionally, Congress and the new Presidential Administration should protect funding at agencies like FEMA, NOAA, and the National Institute of Standards and Technology (NIST) who have faced budget shortfalls in recent years or are currently facing staffing reductions and proposed budget cuts, to support the collection and sharing of high quality and up-to-date information. A certification system and clearinghouse would enable state and local governments to more easily discern the quality and robustness of a growing number of available climate models. Ultimately, such measures could increase cost-efficiencies and empower local communities by supporting more proactive planning and the mitigation of environmental disasters that are becoming more frequent and intense.

Challenge and Opportunity

The United States experienced an unprecedented hurricane season in 2024. Even as hurricanes continued to affect states like Texas, Louisiana, and Florida, the effects of hurricanes and other climate-fueled storms also expanded to new geographies—including inland and northern regions like Asheville, North Carolina and Burlington, Vermont. Our nation’s emergency response systems can no longer keep up—the Federal Emergency Management Agency (FEMA) spent nearly half the agency’s disaster relief fund within the first two weeks of the 2025 fiscal year. More must be done to support proactive planning and resilience measures at state and local levels. Robust climate and flooding models are critical to planners’ abilities to predict the possible impacts of storms, hurricanes, and flooding, and to inform infrastructure updates, funding prioritization, and communication strategies.

Developing useful climate models requires large volumes of data and considerable computational resources, as well as time and data science expertise, making it difficult for already-strapped state and local planning agencies to build their own. Many global climate models have proven to be highly accurate, but planners must often integrate more granular data for these to be useful at local levels. And while federal agencies like FEMA, the National Oceanic and Atmospheric Administration (NOAA), and the Army Corps of Engineers make their flooding and sea level rise models publicly available, these models have limited predictive capacity, and the datasets they are built on are often outdated or contain large gaps. For example, priority datasets, such as FEMA’s Flood Insurance Rate Maps (FIRMs) and floodplain maps, are notoriously out of date or do not integrate accurate information on local drainage systems, preventing meaningful and broad public use. A lack of coordination across government agencies at various levels, low data interoperability, and variations in data formats and standards also further prevent the productive integration of climate and flooding data into planning agencies’ models, even when data are available. Furthermore, recent White House directives to downsize agencies, freeze funding, and in some cases directly remove information from federal websites, have made some public climate datasets, including FEMA’s, inaccessible and put many more at risk.

A growing private-sector market has begun to produce highly granular flooding models, but these are often cost-prohibitive for state and local entities to access. In addition, these models tend to be black boxes; their underlying methods are rarely publicly available, and thus are difficult or impossible to rigorously evaluate or reproduce. A 2023 article in the Arizona State Law Journal found widely varying levels of uncertainty involved in these models’ predictions and their application of different climate scenarios. And a report from the President’s Council of Advisors on Science and Technology also questioned the quality of these private industry models, and called on NOAA and FEMA to develop guidelines for measuring their accuracy.

To address these issues, public resources should be invested in enabling broader access to robust and replicable climate and flooding models, through establishment of a certification system and clearinghouse for models not developed by government agencies. Several realities make implementing this idea urgent. First, research predicts that even with aggressive and coordinated action, the impacts of hurricanes and other storms are likely to worsen (especially for already disadvantaged communities), as will the costs associated with their clean up. A 2024 U.S. Chamber of Commerce report estimates that “every $1 spent on climate resilience and preparedness saves communities $13 in damages, cleanup costs, and economic impact,” potentially adding up to billions of dollars in savings across the country. Second, flooding data and models may need to be updated to accommodate not only new scientific information, but also updates to built infrastructure as states and municipalities continue to invest in infrastructure upgrades. Finally, government agencies at all levels, as well as private sector entities, are already responding to more frequent or intensified flooding events. These agencies, as well as researchers and community organizations, already hold a wealth of data and knowledge that, if effectively integrated into robust and accessible models, could help vulnerable communities plan for and mitigate the worst impacts of flooding.

Plan of Action

Congress should direct the National Institute of Standards and Technology (NIST) to establish a certification system or stamp of approval for predictive climate and weather models, starting with flood models. Additionally, Congress should support the maintenance of these and agencies’ capacities to build and maintain such a system, as well as that of other agencies whose data are regularly integrated into climate models, including FEMA, NOAA, the Environmental Protection Agency (EPA), National Aeronautics and Space Administration (NASA), U.S. Geological Survey (USGS), and Army Corps of Engineers. Congressional representatives can do this through imposing moratoria on Reductions in Force and opposing budget cuts imposed by the Department of Government Efficiency and the budget reconciliation process. 

Following the publication of the Office of Science and Technology Policy’s Memorandum on “Ensuring Free, Immediate, and Equitable Access to Federally Funded Research,” agencies that fund or conduct research are now required to update their open access policies by the end of 2025 to make all federally funded publications and data publicly accessible. While this may help open up agency models, it cannot compel private organizations to make their models open or less expensive to access. However, federal agencies can develop guidance, standards, and a certification system to make it easier for state and local agencies and organizations to navigate what’s been called the “Wild West of climate modeling.”

A robust certification system would require both an understanding of the technical capabilities of climate models, as well as the modeling and data needs of resilience planners and floodplain managers. Within NIST, the Special Programs Office or Information Technology Laboratory could work with non-governmental organizations that already convene these stakeholders to gather input on what a certification system should consider and communicate. For example, the Association of State Floodplain Managers, American Flood Coalition, American Society of Adaptation Professionals, and American Geophysical Union are all well-positioned to reach researchers and planners across a range of geographies and capacities. Additionally, NIST could publish requests for information to source input more widely. Alternatively, NOAA’s National Weather Service or Office of Oceanic and Atmospheric Research could perform similar functions. However, this would require concerted effort on Congress’s part to protect and finance the agency and its relevant offices. In the face of impending budget cuts, it would benefit NIST to consult with relevant NOAA offices and programs on the design, scope, and rollout of such a system.

Gathered input could be translated into a set of minimum requirements and nice-to-have features of models, indicating, for example, proven accuracy or robustness, levels of transparency in the underlying data or source code, how up-to-date underlying data are, ease of use, or interoperability. The implementing agency could also look to other certification models such as the Leadership in Energy and Environmental Design (LEED) rating system, which communicates a range of performance indicators for building design. Alternatively, because some of the aforementioned features would be challenging to assess in the short term, a stamp of approval system would communicate that a model has met some minimum standard.

Importantly, the design and maintenance of this system would be best led by a federal agency like NIST, rather than third-party actors, because NIST would be better positioned to coordinate efficiently with other agencies that collect and supply climate and other relevant data such as FEMA, USGS, EPA, and the Army Corps of Engineers. Moreover, there are likely to be cost efficiencies associated with integrating such a system into an existing agency program rather than establishing a new third-party organization whose long-term sustainability is not guaranteed. The fact that this system’s purpose would be to mediate trustworthy information and support the prevention of damage and harm to communities represented by the federal government also necessitates a higher level of accountability and oversight than a third-party organization could offer. 

NIST could additionally build and host a clearinghouse or database of replicable models and results, as well as relevant contact information to make it easy for users to find reliable models and communicate with their developers. Ideally information would be presented for technical experts and professionals, as well as non-specialists. Several federal agencies currently host clearinghouses for models, evidence, and interventions, including the Environmental Protection Agency, Department of Labor, and Department of Health and Human Services, among many others. NIST could look to these to inform the goals, design, and structure of a climate model clearinghouse.

Conclusion

Establishing an objective and widely recognized certification standard for climate and weather models would support actors both within and outside of government to use a growing wealth of flooding and climate data for a variety of purposes. For example, state and local agencies could more accurately predict and plan for extreme flooding events more quickly and efficiently, and prioritize infrastructure projects and spending. And if successful, this idea could be adapted for other climate-related emergencies such as wildfire and extreme drought. Ultimately, public resources and data would be put to use to foster safer and more resilient communities across the country, and potentially save billions of dollars in damages, clean up efforts, and other economic impacts.

This action-ready policy memo is part of Day One 2025 — our effort to bring forward bold policy ideas, grounded in science and evidence, that can tackle the country’s biggest challenges and bring us closer to the prosperous, equitable and safe future that we all hope for whoever takes office in 2025 and beyond.

PLEASE NOTE (February 2025): Since publication several government websites have been taken offline. We apologize for any broken links to once accessible public data.

A National Institute for High-Reward Research

The policy discourse about high-risk, high-reward research has been too narrow. When that term is used, people are usually talking about DARPA-style moonshot initiatives with extremely ambitious goals. Given the overly conservative nature of most scientific funding, there’s a fair appetite (and deservedly so) for creating new agencies like ARPA-H, and other governmental and private analogues.

The “moonshot” definition, however, omits other types of high-risk, high-reward research that are just as important for the government to fund—perhaps even more so, because they are harder for anyone else to support or even to recognize in the first place.

Far too many scientific breakthroughs and even Nobel-winning discoveries had trouble getting funded at the outset. The main reason at the time was that the researcher’s idea seemed irrelevant or fanciful. For example, CRISPR was originally thought to be nothing more than a curiosity about bacterial defense mechanisms.

Perhaps ironically, the highest rewards in science often come from the unlikeliest places. Some of our “high reward” funding should therefore be focused on projects, fields, ideas, theories, etc. that are thought to be irrelevant, including ideas that have gotten turned down elsewhere because they are unlikely to “work.” The “risk” here isn’t necessarily technical risk, but the risk of being ignored.

Traditional funders are unlikely to create funding lines specifically for research that they themselves thought was irrelevant. Thus, we need a new agency that specializes in uncovering funding opportunities that were overlooked elsewhere. Judging from the history of scientific breakthroughs, the benefits could be quite substantial. 

Challenge and Opportunity

There are far too many cases where brilliant scientists had trouble getting their ideas funded or even faced significant opposition at the time. For just a few examples (there are many others): 

One could fill an entire book with nothing but these kinds of stories. 

Why do so many brilliant scientists struggle to get funding and support for their groundbreaking ideas? In many cases, it’s not because of any reason that a typical “high risk, high reward” research program would address. Instead, it’s because their research can be seen as irrelevant, too far removed from any practical application, or too contrary to whatever is currently trendy.

To make matters worse, the temptation for government funders is to opt for large-scale initiatives with a lofty goal like “curing cancer” or some goal that is equally ambitious but also equally unlikely to be accomplished by a top-down mandate. For example, the U.S. government announced a National Plan to Address Alzheimer’s Disease in 2012, and the original webpage promised to “prevent and effectively treat Alzheimer’s by 2025.” Billions have been spent over the past decade on this objective, but U.S. scientists are nowhere near preventing or treating Alzheimer’s yet. (Around October 2024, the webpage was updated and now aims to “address Alzheimer’s and related dementias through 2035.”)

The challenge is whether quirky, creative, seemingly irrelevant, contrarian science—which is where some of the most significant scientific breakthroughs originated—can survive in a world that is increasingly managed by large bureaucracies whose procedures don’t really have a place for that type of science, and by politicians eager to proclaim that they have launched an ambitious goal-driven initiative.

The answer that I propose: Create an agency whose sole raison d’etre is to fund scientific research that other agencies won’t fund—not for reasons of basic competence, of course, but because the research wasn’t fashionable or relevant.

The benefits of such an approach wouldn’t be seen immediately. The whole point is to allocate money to a broad portfolio of scientific projects, some of which would fail miserably but some of which would have the potential to create the kind of breakthroughs that, by definition, are unpredictable in advance. This plan would therefore require a modicum of patience on the part of policymakers. But over the longer term, it would likely lead to a number of unforeseeable breakthroughs that would make the rest of the program worth it.

Plan of Action

The federal government needs to establish a new National Institute for High-Reward Research (NIHRR) as a stand-alone agency, not tied to the National Institutes of Health or the National Science Foundation. The NIHRR would be empowered to fund the potentially high-reward research that goes overlooked elsewhere. More specifically, the aim would be to cast a wide net for: 

NIHRR should be funded at, say, $100m per year as a starting point ($1 billion would be better). This is an admittedly ambitious proposal. It would mean increasing the scientific and R&D expenditure by that amount, or else reassigning existing funding (which would be politically unpopular).  But it is a worthy objective, and indeed, should be seen as a starting point. 

Significant stakeholders with an interest in a new NIHRR would obviously include universities and scholars who currently struggle for scientific funding. In a way, that stacks the deck against the idea, because the most politically powerful institutions and individuals might oppose anything that tampers with the status quo of how research funding is allocated. Nonetheless, there may be a number of high-status individuals (e.g., current Nobel winners) who would be willing to support this idea as something that would have aided their earlier work. 

A new fund like this would also provide fertile ground for metascience experiments and other types of studies. Consider the striking fact that as yet, there is virtually no rigorous empirical evidence as to the relative strengths and weaknesses of top-down, strategically-driven scientific funding versus funding that is more open to seemingly irrelevant, curiosity-driven research. With a new program for the latter, we could start to derive comparisons between the results of that funding as compared to equally situated researchers funded through the regular pathways. 

Moreover, a common metascience proposal in recent years is to use a limited lottery to distribute funding, on the grounds that some funding is fairly random anyway and we might as well make it official. One possibility would be for part of the new program to be disbursed by lottery amongst researchers who met a minimum bar of quality and respectability, and who had got a high enough score on “scientific novelty.” One could imagine developing an algorithm to make an initial assessment as well. Then we could compare the results of lottery-based funding versus decisions made by program officers versus algorithmic recommendations. 

Conclusion

A new line of funding like the National Institute for High-Reward Research (NIHRR) could drive innovation and exploration by funding the potentially high-reward research that goes overlooked elsewhere. This would elevate worthy projects with unknown outcomes so that unfashionable or unpopular ideas can be explored. Funding these projects would have the added benefit of offering many opportunities to build in metascience studies from the outset, which is easier than retrofitting projects later. 

This memo produced as part of the Federation of American Scientists and Good Science Project sprint. Find more ideas at Good Science Project x FAS

Frequently Asked Questions (FAQs)
Won’t this type of program end up funding a lot of scientific projects that fizzle out and don’t work?

Absolutely, but that is also true for the current top-down approach of announcing lofty initiatives to “cure Alzheimer’s” and the like. Beyond that, the whole point of a true “high-risk, high-reward” research program should be to fund a large number of ideas that don’t pan out. If most research projects succeed, then it wasn’t a “high-risk” program after all.

What if the program funds research projects that are easily mocked by politicians as irrelevant or silly?

Again, that would be a sign of potential success. Many of history’s greatest breakthroughs were mocked for those exact reasons at the time. And yes, some of the research will indeed be irrelevant or silly. That’s part of the bargain here. You can’t optimize both Type I and Type II errors at the same time (that is, false positives and false negatives). If we want to open the door to more research that would have been previously rejected on overly stringent grounds, then we also open the door to research that would have been correctly rejected on those grounds. That’s the price of being open to unpredictable breakthroughs.

How will we evaluate the success of such a research program?

How to evaluate success is a sticking point here, as it is for most of science. The traditional metrics (citations, patents, etc.) would likely be misleading, at least in the short-term. Indeed, as discussed above, there are cases where enormous breakthroughs took a few decades to be fully appreciated.


One simple metric in the shorter term would be something like this: “How often do researchers send in progress reports saying that they have been tackling a difficult question, and that they haven’t yet found the answer?” Instead of constantly promising and delivering success (which is often achieved by studying marginal questions and/or exaggerating results), scientists should be incentivized to honestly report on their failures and struggles.

Digital Product Passports: Transforming America’s Linear Economy to Combat Waste, Counterfeits, and Supply Chain Vulnerabilities

The U.S. economy is being held back by outdated, linear supply chains that waste valuable materials, expose businesses to counterfeits, and limit consumer choice. American companies lose billions each year to fraudulent goods—everything from fake pharmaceuticals to faulty electronics—while consumers are left in the dark about what they’re buying. At the same time, global disruptions like the COVID-19 pandemic revealed just how fragile and opaque our supply chains really are, especially in critical industries. Without greater transparency and accountability, the U.S. economy will remain vulnerable to these risks, stifling growth and innovation while perpetuating inequities and environmental harm. 

A shift toward more circular, transparent systems would not only reduce waste and increase efficiency, but also unlock new business models, strengthen supply chain resilience, and give consumers better, more reliable information about the products they choose. Digital Product Passports (DPP) – standardized digital records that contain key information about a product’s origin, materials, lifecycle, and authenticity – are a key tool that will help the United States achieve these goals.

The administration should establish a comprehensive Digital Product Passport Initiative that creates the legal, technical, and organizational frameworks for businesses to implement decentralized digital passports for their products while ensuring consumer ownership rights, supply chain integrity, and international interoperability. This plan should consider which entities provide up-front investment until the benefits of a digital product passport (DPP) are manifested.

Challenge and Opportunity 

The United States faces an urgent sustainability challenge driven by its linear economic model, which prioritizes resource extraction, production, and disposal over reuse and recycling. This approach has led to severe environmental degradation, excessive waste generation, and unsustainable resource consumption, with marginalized communities—often communities of color and low-income areas—bearing the brunt of the damage. From toxic pollution to hazardous waste dumps, these populations are disproportionately affected, exacerbating environmental injustice. If this trajectory continues, the U.S. will not only fall short of its climate commitments but also deepen existing economic inequities. To achieve a sustainable future, the nation must transition to a more circular economy, where resources are responsibly managed, reused, and kept in circulation, rather than being discarded after a single use. 

At the same time, the U.S. is contending with widespread counterfeiting and fragile supply chains that threaten both economic security and public health. Counterfeit goods, from unsafe pharmaceuticals to faulty electronics, flood the market, endangering lives and undermining consumer confidence, while costing the economy billions in lost revenue. Furthermore, the COVID-19 pandemic exposed deep weaknesses in global supply chains, particularly in critical sectors like healthcare and technology, leading to shortages that disproportionately affected vulnerable populations. These opaque and fragmented supply chains allow counterfeit goods to flourish and make it difficult to track and verify the authenticity of products, leaving businesses and consumers at risk. 

Achieving true sustainability in the United States requires a shift to item circularity, where products and materials are kept in use for as long as possible through repair, reuse, and recycling. This model not only minimizes waste but also reduces the demand for virgin resources, alleviating the environmental pressures created by the current linear economy. Item circularity helps to close the loop, ensuring that products at the end of their life cycles re-enter the economy rather than ending up in landfills. It also promotes responsible production and consumption by making it easier to track and manage the flow of materials, extending the lifespan of products, and minimizing environmental harm. By embracing circularity, industries can cut down on resource extraction, reduce greenhouse gas emissions, and mitigate the disproportionate impact of pollution on marginalized communities.

One of the most powerful tools to facilitate this transition is the digital product passport (DPP). A DPP is a digital record that provides detailed information about a product’s entire life cycle, including its origin, materials, production process, and end-of-life options like recycling or refurbishment. With this information easily accessible, consumers, businesses, and regulators can make informed decisions about the use, maintenance, and eventual disposal of products. DPPs enable seamless tracking of products through supply chains, making it easier to repair, refurbish, or recycle items. This ensures that valuable materials are recovered and reused, contributing to a circular economy. Additionally, DPPs empower consumers by offering transparency into the sustainability and authenticity of products, encouraging responsible purchasing, and fostering trust in both the products and the companies behind them.

In addition to promoting circularity, digital product passports (DPPs) are a powerful solution for combating counterfeits and ensuring supply chain integrity. In 2016, counterfeits and pirated products represented $509B and 3.3% of world trade. By assigning each product a unique digital identifier, a DPP enables transparent and verifiable tracking of goods at every stage of the supply chain, from raw materials to final sale. This transparency makes it nearly impossible for counterfeit products to infiltrate the market, as every legitimate product can be traced back to its original manufacturer with a clear, tamper-proof digital record. In industries where counterfeiting poses serious safety and financial risks—such as pharmaceuticals, electronics, and luxury goods—DPPs provide a critical layer of protection, ensuring consumers receive authentic products and helping companies safeguard their brands from fraud.

Moreover, DPPs offer real-time insights into supply chain operations, identifying vulnerabilities or disruptions more quickly. This allows businesses to respond to issues such as production delays, supplier failures, or the introduction of fraudulent goods before they cause widespread damage. With greater visibility into where products are sourced, produced, and transported, companies can better manage their supply chains, ensuring that products meet regulatory standards and maintaining the integrity of goods as they move through the system. This level of traceability strengthens trust between businesses, consumers, and regulators, ultimately creating more resilient and secure supply chains.

Beyond sustainability and counterfeiting, digital product passports (DPPs) offer transformative potential in four additional key areas: 

Plan of Action

The administration should establish a comprehensive Digital Product Passport Initiative that creates the legal, technical, and organizational frameworks for businesses to implement decentralized digital passports for their products while ensuring consumer ownership rights, supply chain integrity, and international interoperability. This plan should consider which entities provide up-front investment until the benefits of DPP are realized.

Recommendation 1. Legal Framework Development (Lead: White House Office of Science and Technology Policy)

The foundation of any successful federal initiative must be a clear legal framework that establishes authority, defines roles, and ensures enforceability. The Office of Science and Technology Policy is uniquely positioned to lead this effort given its cross-cutting mandate to coordinate science and technology policy across federal agencies and its direct line to the Executive Office of the President. 

Recommendation 2. Product Category Definition & Standards Development (Lead: DOC/NIST)

The success of the DPP initiative depends on clear, technically sound standards that define which products require passports and what information they must contain. This effort must consider the industries and products that will benefit from DPPs, as goods of varying value will find different returns on the investment of DPPs. NIST, as the nation’s lead standards body with deep expertise in digital systems and measurement science, is the natural choice to lead this critical definitional work. 

    Recommendation 3. Consumer Rights & Privacy Framework (Lead: FTC Bureau of Consumer Protection)

    A decentralized DPP system must protect consumer privacy while ensuring consumers maintain control over the digital passports of products they own. The FTC’s Bureau of Consumer Protection, with its statutory authority to protect consumer interests and experience in digital privacy issues, is best equipped to develop and enforce these critical consumer protections.

    Recommendation 4. DPP Architecture & Verification Framework (Lead: GSA Technology Transformation Services)

    A decentralized DPP system requires robust technical architecture that enables secure data storage, seamless transfers, and reliable verification across multiple private databases. GSA’s Technology Transformation Services, with its proven capability in building and maintaining federal digital infrastructure and its experience in implementing emerging technologies across government, is well-equipped to design and oversee this complex technical ecosystem.

    Recommendation 5. Industry Engagement & Compliance Program (Lead: DOC Office of Business Liaison)

    Successful implementation of DPPs requires active participation and buy-in from the private sector, as businesses will be responsible for creating and maintaining their product clouds. The DOC Office of Business Liaison, with its established relationships across industries and experience in facilitating public-private partnerships, is ideally suited to lead this engagement and ensure that implementation guidelines meet both government requirements and business needs.

    Recommendation 6. Supply Chain Verification System (Lead: Customs and Border Protection)

    Digital Product Passports must integrate seamlessly with existing import/export processes to effectively combat counterfeiting and ensure supply chain integrity. Customs and Border Protection, with its existing authority over imports and expertise in supply chain security, is uniquely positioned to incorporate DPP verification into its existing systems and risk assessment frameworks.

    Recommendation 7. Sustainability Metrics Integration (Lead: EPA Office of Pollution Prevention)

    For DPPs to meaningfully advance sustainability goals, they must capture standardized, verifiable environmental impact data throughout product lifecycles. The EPA’s Office of Pollution Prevention brings decades of expertise in environmental assessment and verification protocols, making it the ideal leader for developing and overseeing these critical sustainability metrics.

      Recommendation 8. International Coordination (Lead: State Department Bureau of Economic Affairs)

      The global nature of supply chains requires that U.S. DPPs be compatible with similar initiatives worldwide, particularly the EU’s DPP system. The State Department’s Bureau of Economic Affairs, with its diplomatic expertise and experience in international trade negotiations, is best positioned to ensure U.S. DPP standards align with global frameworks while protecting U.S. interests.

      Recommendation 9. Small Business Support Program (Lead: Small Business Administration)

      The technical and financial demands of implementing DPPs could disproportionately burden small businesses, potentially creating market barriers. The Small Business Administration, with its mandate to support small business success and experience in providing technical assistance and grants, is the natural choice to lead efforts ensuring small businesses can effectively participate in the DPP system.

      Conclusion

      Digital Product Passports represent a transformative opportunity to address two critical challenges facing the United States: the unsustainable waste of our linear economy and the vulnerability of our supply chains to counterfeiting and disruption. Through a comprehensive nine-step implementation plan led by key federal agencies, the administration can establish the frameworks necessary for businesses to create and maintain digital passports for their products while ensuring consumer rights and international compatibility. This initiative will not only advance environmental justice and sustainability goals by enabling product circularity, but will also strengthen supply chain integrity and security, positioning the United States as a leader in the digital transformation of global commerce.

      Improve healthcare data capture at the source to build a learning health system

      Studies estimate that only one in 10 recommendations made by major professional societies are supported by high-quality evidence. Medical care that is not evidence-based can result in unnecessary care that burdens public finances, harms patients, and damages trust in the medical profession. Clearly, we must do a better job of figuring out the right treatments, for the right patients, at the right time. To meet this challenge, it is essential to improve our ability to capture reusable data at the point of care that can be used to improve care, discover new treatments, and make healthcare more efficient. To achieve this vision, we will need to shift financial incentives to reward data generation, change how we deliver care using AI, and continue improving the technological standards powering healthcare.

      The Challenge and Opportunity of health data

      Many have hailed health data collected during everyday healthcare interactions as the solution to some of these challenges. Congress directed the U.S. Food and Drug Administration (FDA) to increase the use of real-world data (RWD) for making decisions about medical products. However, FDA’s own records show that in the most recent year for which data are available, only two out of over one hundred new drugs and biologics approved by FDA were approved based primarily on real-world data.

      A major problem is that our current model in healthcare doesn’t allow us to generate reusable data at the point of care. This is even more frustrating because providers face a high burden of documentation, and patients report repetitive questions from providers and questionnaires. 

      To expand a bit: while large amounts of data are generated at the point of care, these data lack the quality, standardization, and interoperability to enable downstream functions such as clinical trials, quality improvement, and other ways of generating more knowledge about how to improve outcomes. 

      By better harnessing the power of data, including results of care,  we could finally build a learning healthcare system where outcomes drive continuous improvement and where healthcare value leads the way.  There are, however, countless barriers to such a transition. To achieve this vision,  we need to develop new strategies for the capture of high-quality data in clinical environments, while reducing the burden of data entry on patients and providers. 

      Efforts to achieve this vision follow a few basic principles:

      1. Data should be entered only once– by the person or entity most qualified to do so – and be used many times.
      2. Data capture should be efficient, so as to minimize the burden on those entering the data, allowing them to focus their time on doing what actually matters, like providing patient care.
      3. Data generated at the point of care needs to be accessible for appropriate secondary uses (quality improvement, trials, registries), while respecting patient autonomy and obtaining informed consent where required. Data should not be stuck in any one system but should flow freely between systems, enabling linkages across different data sources.
      4. Data need to be used to provide real value to patients and physicians. This is​ achieved by developing data visualizations, automated data summaries, and decision support (e.g. care recommendations, trial matching) that allow data users to spend less time searching for data and more time on analysis, problem solving, and patient care– and help them see the value in entering data in the first place.

      Barriers to capturing high-quality data at the point of care:

      Plan of Action

      Recommendation 1. Incentivize generation of reusable data at the point of care

      Financial incentives are needed to drive the development of workflows and technology to capture high-quality data at the point of care. There are several payment programs already in existence that could provide a template for how these incentives could be structured.

      For example, the Centers for Medicare and Medicaid Services (CMS) recently announced the Enhancing Oncology Model (EOM), a voluntary model for oncology providers caring for patients with common cancer types. As part of the EOM, providers are required to report certain data fields to CMS, including staging information and hormone receptor status for certain cancer types. These data fields are essential for clinical care, research, quality improvement, and ongoing care observation  involving cancer patients. Yet,  at present, these data are rarely recorded in a way that makes it easy to exchange and reuse this information. To reduce the burden of reporting this data, CMS has collaborated with the HHS Assistant Secretary for Technology Policy (ASTP) to develop and implement technological tools that can facilitate automated reporting of these data fields.

      CMS also has a long-standing program that requires participation in evidence generation as a prerequisite for coverage, known as coverage with evidence development (CED). For example, hospitals that would like to provide Transcatheter Aortic Valve Replacement (TAVR) are required to participate in a registry that records data on these procedures.

      To incentivize evidence generation as part of routine care, CMS should refine these programs and expand their use. This would involve strengthening collaborations across the federal government to develop technological tools for data capture, and increasing the number of payment models that require generation of data at the point of care. Ideally, these models should evolve to reward 1) high-quality chart preparation (assembly of structured data) 2) establishing diagnoses and development of a care plan, and 3) tracking outcomes.  These payment policies are powerful tools because they incentivize the generation of reusable infrastructure that can be deployed for many purposes.

      Recommendation 2. Improve workflows to capture evidence at the point of care

      With the right payment models, providers can be incentivized to capture reusable data at the point of care. However, providers are already reporting being crushed by the burden of documentation and patients are frequently filling out multiple questionnaires with the same information. To usher in the era of the learning health system (a system that includes continuous data collection to improve service delivery), without increasing the burden on providers and patients, we need to redesign how care is provided. Specifically, we must focus on approaches that integrate generation of reusable data into the provision of routine clinical care. 

      While the advent of AI is an opportunity to do just that, current uses of AI have mainly focused on drafting documentation in free-text formats, essentially replacing human scribes. Instead, we need to figure out how we can use AI to improve the usability of the resulting data. While it is not feasible to capture all data in a structured format on all patients, a core set of data are needed to provide high-quality and safe care. At a minimum, those should be structured and part of a basic core data set across disease types and health maintenance scenarios.

      In order to accomplish this, NIH and the Advanced Research Projects Agency for Health (ARPA-H) should fund learning laboratories that develop, pilot, and implement new approaches for data capture at the point of care. These centers would leverage advances in human-centered design and artificial intelligence (AI) to revolutionize care delivery models for different types of care settings, ranging from outpatient to acute care and intensive care settings. Ideally, these centers would be linked to existing federally funded research sites that could implement the new care and discovery processes in ongoing clinical investigations.

      The federal government already spends billions of dollars on grants for clinical research- why not use some of that funding to make clinical research more efficient, and improve the experience of patients and physicians in the process?

      Recommendation 3. Enable technology systems to improve data standardization and interoperability

      Capturing high-quality data at the point of care is of limited utility if the data remains stuck within individual electronic health record (EHR) installations. Closed systems hinder innovation and prevent us from making the most of the amazing trove of health data. 

      We must create a vibrant ecosystem where health data can travel seamlessly between different systems, while maintaining patient safety and privacy. This will enable an ecosystem of health data applications to flourish. HHS has recently made progress by agreeing to a unified approach to health data exchange, but several gaps remain. To address these we must

      Conclusion

      The treasure trove of health data generated during routine care has given us a huge opportunity to generate knowledge and improve health outcomes. These data should serve as a shared resource for clinical trials, registries, decision support, and outcome tracking to improve the quality of care. This is necessary for society to advance towards personalized medicine, where treatments are tailored to biology and patient preference. However, to make the most of these data, we must improve how we capture and exchange these data at the point of care.

      Essential to this goal is evolving our current payment systems from rewarding documentation of complexity or time spent, to generation of data that supports learning and improvement. HHS should use its payment authorities to encourage data generation at the point of care and promote the tools that enable health data to flow seamlessly between systems, building on the success stories of existing programs like coverage with evidence development. To allow capture of this data without making the lives of providers and patients even more difficult, federal funding bodies need to invest in developing technologies and workflows that leverage AI to create usable data at the point of care. Finally, HHS must continue improving the standards that allow health data to travel seamlessly between systems. This is essential for creating a vibrant ecosystem of applications that leverage the benefits of AI to improve care.

      This memo produced as part of the Federation of American Scientists and Good Science Project sprint. Find more ideas at Good Science Project x FAS

      Reduce Administrative Research Burden with ORCID and DOI Persistent Digital Identifiers

      There exists a low-effort, low-cost way to reduce administrative burden for our scientists, and make it easier for everyone – scientists, funders, legislators, and the public – to document the incredible productivity of federal science agencies. If adopted throughout government research these tools would maximize interoperability across reporting systems, reduce the administrative burden and costs, and increase the accountability of our scientific community. The solution: persistent digital identifiers (Digital Object Identifiers, or DOIs) and Open Researcher and Contributor IDs (ORCIDs) for key personnel. ORCIDs are already used by most federal science agencies. We propose that federal science agencies also adopt digital object identifiers for research awards, an industry-wide standard. A practical and detailed implementation guide for this already exists

      The Opportunity

      Tracking the impact and outputs of federal research awards is labor-intensive and expensive. Federally funded scientists spend over 900,000 hours a year writing interim progress reports alone. Despite that tremendous effort, our ability to analyze the productivity of federal research awards is limited. These reports only capture research products created while the award is active, but many exciting papers and data sets are not published until after the award is over, making it hard for the funder to associate them with a particular award or agency initiative. Further, these data are often not structured in ways that support easy analysis or collaboration. When it comes time for the funding agency to examine the impact of an award, a call for applications, or even an entire division, staff rely on a highly manual process that is time-intensive and expensive. Thus, such evaluations are often not done. Deep analysis of federal spending is next to impossible, and simple questions regarding which type of award is better suited for one scientific problem over another, or whether one administrative funding unit is more impactful than a peer organization with the same spending level, are rarely investigated by federal research agencies. These questions are difficult to answer without a simple way to tie award spending to specific research outputs such as papers, patents, and datasets.

      To simplify tracking of research outputs, the Office of Science and Technology Policy (OSTP) directed federal research agencies to “assign unique digital persistent identifiers to all scientific research and development awards and intramural research protocols […] through their digital persistent identifiers.” This directive builds on work from the Trump White House in 2018 to reduce the burden on researchers and the National Security Strategy guidance. It is a great step forward, but it has yet to be fully implemented, and allows implementation to take different paths. Agencies are now taking a fragmented, agency-specific approach, which will undermine the full potential of the directive by making it difficult to track impact using the same metrics across federal agencies.

      Without a unified federal standard, science publishers, awards management systems, and other disseminators of federal research output will continue to treat award identifiers as unstructured text buried within a long document, or URLs tucked into acknowledgement sections or other random fields of a research product. These ad hoc methods make it difficult to link research outputs to their federal funding. It leaves scientists and universities looking to meet requirements for multiple funding agencies, relying on complex software translations of different agency nomenclatures and award persistent identifiers, or, more realistically, continue to track and report productivity by hand. It remains too confusing and expensive to provide the level of oversight our federal research enterprise deserves.

      There is an existing industry standard for associating digital persistent identifiers with awards that has been adopted by the Department of Energy and other funders such as the ALS Association, the American Heart Association, and the Wellcome Trust. It is a low-effort, low-cost way to reduce administrative burden for our scientists and make it easier for everyone – scientists, federal agencies, legislators, and the public – to document the incredible productivity of federal science expenditures.

      Adopting this standard means funders can automate the reporting of most award products (e.g., scientific papers, datasets), reducing administrative burden, and allowing research products to be reliably tracked even after the award ends. Funders could maintain their taxonomy linking award DOIs to specific calls for proposals, study sections, divisions, and other internal structures, allowing them to analyze research products in much easier ways. Further, funders would be able to answer the fundamental questions about their programs that are usually too labor-intensive to even ask, such as: did a particular call for applications result in papers that answered the underlying question laid out in that call? How long should awards for a specific type of research problem last to result in the greatest scientific productivity? In the light of rapid advances in artificial intelligence (AI) and other analytic tools, making the linkages between research funding and products standardized and easy to analyze opens possibilities for an even more productive and accountable federal research enterprise going forward. In short, assigning DOIs to awards fulfills the requirements of the 2022 directive to maximize interoperability with other funder reporting systems, the promise of the 2018 NSTC report to reduce burden, and new possibilities for a more accountable and effective federal research enterprise.

      Plan of Action

      The overall goal is to increase accountability and transparency for federal research funding agencies and dramatically reduce the administrative burden on scientists and staff. Adopting a uniform approach allows for rapid evaluation and improvements across the research enterprise. It also enables and for the creation of comparable data on agency performance. We propose that federal science agencies adopt the same industry-wide standard – the DOI – for awards. A practical and detailed implementation guide already exists.

      These steps support the existing directive and National Security Strategy guidance issued by OSTP and build on 2018 work from the NSTC:.

      Recommendation 1. An interagency committee led by OSTP should coordinate and harmonize implementation to:

      Recommendation 2. Agencies should fully adopt the industry standard persistent identifier infrastructure for research funding—DOIs—for awards. Specifically, funders should:

      Recommendation 3. Agencies should require the Principal Investigator (PI) to cite the award DOI in research products (e.g., scientific papers, datasets). This requirement could be included in the terms and conditions of each award. Using DOIs to automate much of progress reporting, as described below, provides a natural incentive for investigators to comply. 

      Recommendation 4. Agencies should use award persistent identifiers from ORCID and award DOI systems to identify research products associated with an award to reduce PI burden. Awardees would still be required to certify that the product arose directly from their federal research award. After the award and reporting obligation ends, the agency can continue to use these systems to link products to awards based on information provided by the product creators to the product distributors (e.g., authors citing an award DOI when publishing a paper), but without the direct certification of the awardee. This compromise provides the public and the funder with better information about an award’s output, but does not automatically hold the awardee liable if the product conflicts with a federal policy.

      Recommendation 5. Agencies should adopt or incorporate award DOIs into their efforts to describe agency productivity and create more efficient and consistent practices for reporting research progress across all federal research funding agencies. Products attributable to the award should be searchable by individual awards, and by larger collections of awards, such as administrative Centers or calls for applications. As an example of this transparency, PubMed, with its publicly available indexing of the biomedical literature, supports the efforts of the National Institutes of Health (NIH)’s RePORTER), and could serve as a model for other fields as persistent identifiers for awards and research products become more available.

      Recommendation 6. Congress should issue appropriations reporting language to ensure that implementation costs are covered for each agency and that the agencies are adopting a universal standard. Given that the DOI for awards infrastructure works even for small non-profit funders, the greatest costs will be in adapting legacy federal systems, not in utilizing the industry standard itself.

      Challenges 

      We envision the main opposition to come from the agencies themselves, as they have multiple demands on their time and might have shortcuts to implementation that meet the letter of the requirement but do not offer the full benefits of an industry standard. This short-sighted position denies both the public transparency needed on research award performance and the massive time and cost savings for the agencies and researchers.

      A partial implementation of this burden-reducing workflow already exists. Data feeds from ORCID and PubMed populate federal tools such as My Bibliography, and in turn support the biosketch generator in SciENcv or an agency’s Research Performance Progress Report. These systems are feasible because they build on PubMed’s excellent metadata and curation. But PubMed does not index all scientific fields.

      Adopting DOIs for awards means that persistent identifiers will provide a higher level of service across all federal research areas. DOIs work for scientific areas not supported by PubMed. And even for the sophisticated existing systems drawing from PubMed, user effort could be reduced and accuracy increased if awards were assigned DOIs. Systems such as NIH RePORTER and PubMed currently have to pull data from citation of award numbers in the acknowledgment sections of research papers, which is more difficult to do.

      Conclusion

      OSTP and the science agencies have put forth a sound directive to make American science funding even more accountable and impactful, and they are on the cusp of implementation. It is part of a long-standing effort  to reduce burden and make the federal research enterprise more accountable and effective. Federal research funding agencies are susceptible to falling into bureaucratic fragmentation and inertia by adopting competing approaches that meet the minimum requirements set forth by OSTP, but offer minimal benefit. If these agencies instead adopt the industry standard that is being used by many other funders around the world, there will be a marked reduction in the burden on awardees and federal agencies, and it will facilitate greater transparency, accountability, and innovation in science funding. Adopting the standard is the obvious choice and well within America’s grasp, but avoiding bureaucratic fragmentation is not simple. It takes leadership from each agency, the White House, and Congress.

      This memo produced as part of the Federation of American Scientists and Good Science Project sprint. Find more ideas at Good Science Project x FAS

      Use Artificial Intelligence to Analyze Government Grant Data to Reveal Science Frontiers and Opportunities

      President Trump challenged the Director of the Office of Science and Technology Policy (OSTP), Michael Kratsios, to “ensure that scientific progress and technological innovation fuel economic growth and better the lives of all Americans”. Much of this progress and innovation arises from federal research grants. Federal research grant applications include detailed plans for cutting-edge scientific research. They describe the hypothesis, data collection, experiments, and methods that will ultimately produce discoveries, inventions, knowledge, data, patents, and advances. They collectively represent a blueprint for future innovations.

      AI now makes it possible to use these resources to create extraordinary tools for refining how we award research dollars. Further, AI can provide unprecedented insight into future discoveries and needs, shaping both public and private investment into new research and speeding the application of federal research results. 

      We recommend that the Office of Science and Technology Policy (OSTP) oversee a multiagency development effort to fully subject grant applications to AI analysis to predict the future of science, enhance peer review, and encourage better research investment decisions by both the public and the private sector. The federal agencies involved should include all the member agencies of the National Science and Technology Council (NSTC)

      Challenge and Opportunity

      The federal government funds approximately 100,000 research awards each year across all areas of science. The sheer human effort required to analyze this volume of records remains a barrier, and thus, agencies have not mined applications for deep future insight. If agencies spent just 10 minutes of employee time on each funded award, it would take 16,667 hours in total—or more than eight years of full-time work—to simply review the projects funded in one year. For each funded award, there are usually 4–12 additional applications that were reviewed and rejected. Analyzing all these applications for trends is untenable. Fortunately, emerging AI can analyze these documents at scale. Furthermore, AI systems can work with confidential data and provide summaries that conform to standards that protect confidentiality and trade secrets. In the course of developing these public-facing data summaries, the same AI tools could be used to support a research funder’s review process.

      There is a long precedent for this approach. In 2009, the National Institutes of Health (NIH) debuted its Research, Condition, and Disease Categorization (RCDC) system, a program that automatically and reproducibly assigns NIH-funded projects to their appropriate spending categories. The automated RCDC system replaced a manual data call, which resulted in savings of approximately $30 million per year in staff time, and has been evolving ever since. To create the RCDC system, the NIH pioneered digital fingerprints of every scientific grant application using sophisticated text-mining software that assembled a list of terms and their frequencies found in the title, abstract, and specific aims of an application. Applications for which the fingerprints match the list of scientific terms used to describe a category are included in that category; once an application is funded, it is assigned to categorical spending reports.

      NIH staff soon found it easy to construct new digital fingerprints for other things, such as research products or even scientists, by scanning the title and abstract of a public document (such as a research paper) or by all terms found in the existing grant application fingerprints associated with a person.

      NIH review staff can now match the digital fingerprints of peer reviewers to the fingerprints of the applications to be reviewed and ensure there is sufficient reviewer expertise. For NIH applicants, the RePORTER webpage provides the Matchmaker tool to create digital fingerprints of title, abstract, and specific aims sections, and match them to funded grant applications and the study sections in which they were reviewed. We advocate that all agencies work together to take the next logical step and use all the data at their disposal for deeper and broader analyses.

      We offer five recommendations for specific use cases below:

      Use Case 1: Funder support. Federal staff could use AI analytics to identify areas of opportunity and support administrative pushes for funding.

      When making a funding decision, agencies need to consider not only the absolute merit of an application but also how it complements the existing funded awards and agency goals. There are some common challenges in managing portfolios. One is that an underlying scientific question can be common to multiple problems that are addressed in different portfolios. For example, one protein may have a role in multiple organ systems. Staff are rarely aware of all the studies and methods related to that protein if their research portfolio is restricted to a single organ system or disease. Another challenge is to ensure proper distribution of investments across a research pipeline, so that science progresses efficiently. Tools that can rapidly and consistently contextualize applications across a variety of measures, including topic, methodology, agency priorities, etc., can identify underserved areas and support agencies in making final funding decisions. They can also help funders deliberately replicate some studies while reducing the risk of unintentional duplication.

      Use Case 2: Reviewer support. Application reviewers could use AI analytics to understand how an application is similar to or different from currently funded federal research projects, providing reviewers with contextualization for the applications they are rating.

      Reviewers are selected in part for their knowledge of the field, but when they compare applications with existing projects, they do so based on their subjective memory. AI tools can provide more objective, accurate, and consistent contextualization to ensure that the most promising ideas receive funding.

      Use Case 3: Grant applicant support: Research funding applicants could be offered contextualization of their ideas among funded projects and failed applications in ways that protect the confidentiality of federal data.

      NIH has already made admirable progress in this direction with their Matchmaker tool—one can enter many lines of text describing a proposal (such as an abstract), and the tool will provide lists of similar funded projects, with links to their abstracts. New AI tools can build on this model in two important ways. First, they can help provide summary text and visualization to guide the user to the most useful information. Second, they can broaden the contextual data being viewed. Currently, the results are only based on funded applications, making it impossible to tell if an idea is excluded from a funded portfolio because it is novel or because the agency consistently rejects it. Private sector attempts to analyze award information (e.g., Dimensions) are similarly limited by their inability to access full applications, including those that are not funded. AI tools could provide high-level summaries of failed or ‘in process’ grant applications that protect confidentiality but provide context about the likelihood of funding for an applicant’s project.

      Use Case 4: Trend mapping. AI analyses could help everyone—scientists, biotech, pharma, investors— understand emerging funding trends in their innovation space in ways that protect the confidentiality of federal data.

      The federal science agencies have made remarkable progress in making their funding decisions transparent, even to the point of offering lay summaries of funded awards. However, the sheer volume of individual awards makes summarizing these funding decisions a daunting task that will always be out of date by the time it is completed. Thoughtful application of AI could make practical, easy-to-digest summaries of U.S. federal grants in close to real time, and could help to identify areas of overlap, redundancy, and opportunity. By including projects that were unfunded, the public would get a sense of the direction in which federal funders are moving and where the government might be underinvested. This could herald a new era of transparency and effectiveness in science investment.

      Use Case 5: Results prediction tools. Analytical AI tools could help everyone—scientists, biotech, pharma, investors—predict the topics and timing of future research results and neglected areas of science in ways that protect the confidentiality of federal data.

      It is standard practice in pharmaceutical development to predict the timing of clinical trial results based on public information. This approach can work in other research areas, but it is labor-intensive. AI analytics could be applied at scale to specific scientific areas, such as predictions about the timing of results for materials being tested for solar cells or of new technologies in disease diagnosis. AI approaches are especially well suited to technologies that cross disciplines, such as applications of one health technology to multiple organ systems, or one material applied to multiple engineering applications. These models would be even richer if the negative cases—the unfunded research applications—were included in analyses in ways that protect the confidentiality of the failed application. Failed applications may signal where the science is struggling and where definitive results are less likely to appear, or where there are underinvested opportunities.

      Plan of Action

      Leadership

      We recommend that OSTP oversee a multiagency development effort to achieve the overarching goal of fully subjecting grant applications to AI analysis to predict the future of science, enhance peer review, and encourage better research investment decisions by both the public and the private sector. The federal agencies involved should include all the member agencies of the NSTC. A broad array of stakeholders should be engaged because much of the AI expertise exists in the private sector, the data are owned and protected by the government, and the beneficiaries of the tools would be both public and private. We anticipate four stages to this effort.

      Recommendation 1. Agency Development

      Pilot: Each agency should develop pilots of one or more use cases to test and optimize training sets and output tools for each user group. We recommend this initial approach because each funding agency has different baseline capabilities to make application data available to AI tools and may also have different scientific considerations. Despite these differences, all federal science funding agencies have large archives of applications in digital formats, along with records of the publications and research data attributed to those awards.

      These use cases are relatively new applications for AI and should be empirically tested before broad implementation. Trend mapping and predictive models can be built with a subset of historical data and validated with the remaining data. Decision support tools for funders, applicants, and reviewers need to be tested not only for their accuracy but also for their impact on users. Therefore, these decision support tools should be considered as a part of larger empirical efforts to improve the peer review process.

      Solidify source data: Agencies may need to enhance their data systems to support the new functions for full implementation. OSTP would need to coordinate the development of data standards to ensure all agencies can combine data sets for related fields of research. Agencies may need to make changes to the structure and processing of applications, such as ensuring that sections to be used by the AI are machine-readable.

      Recommendation 2. Prizes and Public–Private Partnerships

      OSTP should coordinate the convening of private sector organizations to develop a clear vision for the profound implications of opening funded and failed research award applications to AI, including predicting the topics and timing of future research outputs. How will this technology support innovation and more effective investments?

      Research agencies should collaborate with private sector partners to sponsor prizes for developing the most useful and accurate tools and user interfaces for each use case refined through agency development work. Prize submissions could use test data drawn from existing full-text applications and the research outputs arising from those applications. Top candidates would be subject to standard selection criteria.

      Conclusion

      Research applications are an untapped and tremendously valuable resource. They describe work plans and are clearly linked to specific research products, many of which, like research articles, are already rigorously indexed and machine-readable. These applications are data that can be used for optimizing research funding decisions and for developing insight into future innovations. With these data and emerging AI technologies, we will be able to understand the trajectory of our science with unprecedented breadth and insight, perhaps to even the same level of accuracy that human experts can foresee changes within a narrow area of study. However, maximizing the benefit of this information is not inevitable because the source data is currently closed to AI innovation. It will take vision and resources to build effectively from these closed systems—our federal science agencies have both, and with some leadership, they can realize the full potential of these applications.

      This memo produced as part of the Federation of American Scientists and Good Science Project sprint. Find more ideas at Good Science Project x FAS

      Supporting Data Center Development by Reducing Energy System Impact

      In the last decade, American data center energy use has tripled. By 2028, the Department of Energy predicts it will either double or triple again. To meet growing tech industry energy demands without imposing a staggering toll on individual energy consumers, and to best position the United States to benefit from the advancements of artificial intelligence (AI), Congress should invest in innovative approaches to powering data centers. Namely, Congress should create a pathway for data centers to be viably integrated into Thermal Energy Networks (TENs) in order to curb costs, increase efficiency, and support grid resilience and reliability for all customers. 

      Congress should invest in American energy security and maximize benefits from data center use by: 

      1. Authorizing a program for a new TEN pilot program that ties grants to performance metrics such as reducing the cost of installing underground infrastructure, 
      2. Including requirements for data centers related to Power Usage Effectiveness (PUE) in the National Defense Authorization Act for Fiscal Year 2026, and 
      3. Updating the 2018 Commercial Buildings Energy Consumption Survey (CBECS) Data Center Pilot to increase data center participation. 

      These actions will position the federal government to deploy innovative approaches to energy infrastructure while unlocking technological advancement and economic growth from AI.

      Challenge and Opportunity

      By 2028, American data center energy demands are expected to account for up to 12% of the country’s electricity consumption from 4.4% in 2023. The development of artificial intelligence (AI) technologies is  driving this increase because they consume more compute resources than other technologies. As a result of their significant energy demand, data centers face two hurdles to development: (1) interconnection delays due to infrastructure development requirements and (2) the resulting costs borne by consumers in those markets, which heighten resident resistance to siting centers nearby.

      Interconnection rates across the country are lengthy. In 2023, the interconnection request to commercial operations period was five years for typical power plant projects. In states like Virginia, widely-known as the “Data Center Capital of the World,” waits can stretch to seven years for data centers specifically. These interconnection timelines have grown over time, and are expected to continue growing based on queue lengths.

      Interconnection is also costly. The primary cost drivers are various upgrade requirements to the broader transmission system. Unlike upgrades for energy generators, which are typically paid for by the energy generators, the cost of interconnection for new energy consumers such as data centers affects everyone around them as well. Experts believe that by socializing the costs of new data center infrastructure, utilities are passing these costs to ratepayers.

      Efforts are underway to minimize data center energy costs while improving operational efficiency. One way to do that is to reclaim the energy that data centers consume by repurposing waste heat through thermal energy networks (TENs). TENs are shared networks of pipes that move heat between locations; they may incorporate any number of heat sources, including data centers. Data centers can not only generate heat for these systems, but also benefit from cooling—a major source of current data center energy consumption—provided by integrated systems.

      Like other energy infrastructure projects, TENs require significant upfront financial investment to reap long-term rewards. However, they can potentially offset some of those upfront costs by shortening interconnection timelines based on demonstrated lower energy demand and reduced grid load. Avoiding larger traditional grid infrastructure upgrades would also avert the skyrocketing consumer costs described above.

      At a community or utility level, TENs also offer other benefits. They improve grid resiliency and reliability: The network loops that compose a TEN increase redundancy, reducing the likelihood that a single point of failure will yield systemic failure, especially in light of increasing energy demands brought about by weather events such as extreme heat. Further, TENs allow utilities to decrease and transfer electrical demand, offering a way to balance peak loads. TENs offer building tradespeople such as pipefitters ”plentiful and high-paying jobs” as they become more prevalent, especially in rural areas. They also provide employment paths for employees of utilities and natural gas companies with expertise in underground infrastructure. By creating jobs, reducing water stress and grid strain, and decreasing the risk of quickly rising utility costs, investing in TENs to bolster data center development would reduce the current trend of community resistance to development. Many of these benefits extend to non-data center TEN participants, like nearby homes and businesses, as well. 

      Federal coordination is essential to accelerating the creation of TENs in data-center heavy areas. Some states, like New York and Colorado, have passed legislation to promote TEN development. However, the states with the densest data center markets, many of which also rank poorly on grid reliability, are not all putting forth efforts to develop TENs. Because the U.S. grid is divided into multiple regions and managed by the Federal Energy Regulatory Commission, the federal government is uniquely well positioned to invest in improvements in grid resiliency through TENs and to make the U.S. a world leader in this technology.

      Plan of Action

      The Trump Administration and Congress can promote data center development while improving grid resiliency and reliability and reducing consumers’ financial burden through a three-part strategy:

      Recommendation 1. Create a new competitive grant program to help states launch TEN pilots.

      Congress should create a new TEN pilot competitive grant program administered by the Department of Energy. The federal TEN program should allow states to apply for funding to run their own TEN programs administered by states’ energy offices and organizations. This program could build on two strong precedents:

      1. The Department of Energy’s 2022 funding opportunity for Community Geothermal Heating and Cooling Design and Deployment. This opportunity supported geothermal heating and cooling networks, which are a type of TEN that relies on the earth’s constant temperature and heat pumps to heat or cool buildings. Though this program generated significant interest, an opportunity remains for the federal government to invest in non-geothermal TEN projects. These would be projects that rely on exchanging heat with other sources, such as bodies of water, waste systems, or even energy-intensive buildings like data centers. The economic advantages are promising: one funded project reported expecting “savings of as much as 70% on utility bills” for beneficiaries of the proposed design.
      1. The New York State’s Large-Scale Thermal program, run by its Energy Research and Development Authority (NYSERDA), has offered multiple funding opportunities that specifically include the development of TENs. In 2021, it launched a Community Heat Pump Systems (PON 4614) program that has since awarded multiple projects that include data centers. One project reported its design would save $2.4 million or roughly 77% annually in operations costs. 

      Congress should authorize a new pilot program with $30 million to be distributed to state TEN programs, which states could disperse via grants and performance contracts. Such a program would support the Trump administration’s goal of fast-tracking AI data center development.

      To ensure that the funding benefits both grant recipients and their host communities, requirements should be attached to these grants that incentivize consumer benefits such as reduced electricity or heating bills, improved air quality and decreased pollution. The grant awards should be prioritized according to performance metrics such as projected cost reductions related to drilling or to installing underground infrastructure and greater operational efficiency. 

      Recommendation 2. Include power usage effectiveness in the amendments to the National Defense Authorization Act for Fiscal Year 2026 (2026 NDAA).

      In the National Defense Authorization Act of 2024, Sec. 5302 (“Federal Data Center Consolidation Initiative amendments”) amended Section 834 of the Carl Levin and Howard P. “Buck” McKeon National Defense Authorization Act for Fiscal Year 2015 by specifying minimum requirements for new data centers.  Sec. 5302(b)(2)(b)(2)(A)(ii) currently reads:

       […The minimum requirements established under paragraph (1) shall include requirements relating to—…] “the use of new data centers, including costs related to the facility, energy consumption, and related infrastructure;.” 

      To couple data center development with improved grid resilience and stability, the 2026 NDAA should amend Sec. 5302(b)(2)(b)(2)(A)(ii) as follows:

       […The minimum requirements established under paragraph (1) shall include requirements relating to—…] “the use of new data centers, including power usage effectiveness, costs related to the facility, energy consumption, and related infrastructure.” 

      Power usage effectiveness (PUE) is a common metric to measure the efficiency of data center power use. It is the ratio of total power used by the facility over the amount of that power dedicated to IT services. The PUE metric has limitations, such as its inability to provide an apples-to-apples comparison of data center energy efficiency based on variability in underlying technology and its lack of precision, especially given the growth of AI data centers. However, introducing the PUE metric as part of the regulatory framework for data centers would provide a specific target for new builds to use, making it easier for both developers and policymakers to identify progress. Requirements related to PUE would also encourage developers to invest in technologies that increase energy efficiency without unduly hurting their bottom lines. In the future, legislators should continue to amend this section of the NDAA as new, more accurate, and useful efficiency metrics develop.

      Recommendation 3. The U.S. Energy Information Administration (EIA) should update the 2018 Commercial Buildings Energy Consumption Survey (CBECS) Data Center Pilot. 

      To facilitate community acceptance and realize benefits like better financing terms based on lower default risk, data center developers should seek to benchmark their facilities’ energy consumptions. Energy consumption benchmarking, the process of analyzing consumption data and comparing to both past performance and the performance of similar facilities, results in operational cost savings. These savings amplify the economic benefits of vehicles like TENs for cost-sensitive developers and lower the potential increase of community utility costs.

      Data center developers should create industry-standard benchmarking tools, much as other industries do. However, it’s challenging for them to embark on this work without accurate and current information that facilitates the development of useful models and targets, especially in such a fast-changing field. Yet data sources such as those used to create benchmarks for other industries are unavailable. One popular source is the CBECS, which does not include data centers as a separate building type. This issue is longstanding; in 2018, the EIA released a report detailing the results of their data center pilot, which they undertook to address this gap. The pilot cited three main hurdles to accurately account for data centers’ energy consumption: the lack of a comprehensive frame or list of data centers, low cooperation rates, and a high rate of nonresponse to important survey questions. 

      With the proliferation of data centers since the pilot, it has become only more pressing to differentiate this building type and enable data centers to seek accurate representation and develop industry benchmarks. To address the framing issue, CBECS should use a commercial data source like Data Center Map. At the time the EIA considered this source “unvalidated,” but it has been used as a data source by the U.S. Department of Commerce and the International Energy Agency. Additionally, the EIA should also perform the “cognitive research and pretests” recommended in the pilot to find ways to encourage complete responses in order to recreate its pilot and seek an improved outcome.

      Conclusion

      Data center energy demand has exploded in recent years and continues to climb, due in part to the advent of widespread AI development. Data centers need access to reliable energy without creating grid instability or dramatically increasing utility costs for individual consumers. This creates a unique opportunity for the federal government to develop and implement innovative technology such as TENs in areas working to support changing energy demands. The government should also seize this moment to define and update standards for site developers to ensure they are building cost-effective and operationally efficient facilities. By progressing systems and tools that benefit other area energy consumers down to the individual ratepayer, the federal government can transform data centers from infrastructural burdens to good neighbors.

      Frequently Asked Questions
      How was the $30 million budget to help states launch TEN pilots calculated?

      This budget was calculated by using the allocation for the NYSERDA Large-Scale Thermal pilot program ($10 million) and multiplying by three (for a three year pilot). Because NYSERDA’s program funded projects at over 50 sites, this initial pilot would plan to fund roughly 150 projects across the states.

      What are performance contracts?

      Performance-based contracts differ from other types of contracts in that they focus on what work is to be performed rather than how specifically it is accomplished. Solicitations include either a Performance Work Statement or Statement of Objectives and resulting contracts include measurable performance standards and potentially performance incentives.

      Rebuild Corporate Research for a Stronger American Future

      The American research enterprise, long the global leader, faces intensifying competition and mounting criticism regarding its productivity and relevance to societal challenges. At the same time, a vital component of a healthy research enterprise has been lost: corporate research labs, epitomized by the iconic Bell Labs of the 20th century. Such labs uniquely excelled at reverse translational research, where real-world utility and problem-rich environments served as powerful inspirations for fundamental learning and discovery. Rebuilding such labs in a 21st century “Bell Labs X” form would restore a powerful and uniquely American approach to technoscientific discovery—harnessing the private sector to discover and invent in ways that fundamentally improve U.S. national and economic competitiveness. Moreover, new metaresearch insights into “how to innovate how we innovate” provide principles that can guide their rebuilding. The White House Office of Science and Technology Policy (OSTP) can help turn these insights into reality by convening a working group of stakeholders (philanthropy, business, and science agency leaders), alongside policy and metascience scholars, to make practical recommendations for implementation.

      Challenge and Opportunity

      The American research enterprise faces intensifying competition and mounting criticism regarding its productivity and relevance to societal challenges. While a number of reasons have been proposed for why, among the most important is that corporate research labs, a vital piece of a healthy research enterprise, are missing. Exemplified by the Bell Labs, these labs dominated the research enterprise of the first half of the 20th century but became defunct in the second half. The reason: formalization of profits as the prime goal of corporations, which is incompatible with research, particularly the basic research that produces public-goods science and technology. Instead, academic research is now dominant. The reason: the rise of federal agencies like the National Science Foundation (NSF) with a near-total focus on academia. This dynamic, however, is not fundamental: federal agencies could easily fund research at corporations and not just in academia.

      Moreover, there is a compelling reason to do so. Utility and learning are cyclical and build on each other. In one direction, learning serves as a starting point for utility. Academia excels at such translational research. In the other direction, utility serves as a starting point for learning. Corporations in principle excel at such reverse translational research. Corporations are where utility lives and breathes and where real-world problem-rich environments and inspiration for learning thrives. This reverse translational half of the utility-learning cycle, however, is currently nearly absent, and is a critical void that could be filled by corporate research.

      For example, at Bell Labs circa WWII, Claude Shannon’s exposure to real-world problems in cryptography and noisy communications inspired his surprising idea to treat information as a quantifiable and manipulable entity independent of its physical medium, revolutionizing information science and technology. Similarly, Mervyn Kelly’s exposure to the real-world benefit of compact and reliable solid-state amplifiers inspired him to create a research activity at Bell Labs that invented the transistor and discovered the transistor effect. These advances, inspired by real-world utility, laid the foundations for our modern information age.

      Importantly, these advances were given freely to the nation because Bell Labs’ host corporation, the AT&T of the 20th century, was a monopoly and could be altruistic with respect to its research. Now, in the 21st century, corporations, even when they have dominant market power, are subject to intense competitive pressures on their bottom-line profit which make it difficult for them to engage in research that is given freely to the nation. But to throw away corporate research along with the monopolies that could afford to do such research is to throw away the baby with the bathwater. Instead, the challenge is to rebuild corporate research in a 21st century: “Bell Labs X” form without relying on monopolies, using public-private partnerships instead.

      Moreover, new insights into the nature and nurture of research provide principles that can guide the creation of such public-private partnerships for the purpose of public-goods research.

      1. Inspire, but Don’t Constrain, Research by Particular Use. Reverse-translational research should start with real-world challenges but not be constrained by them as it seeks the greatest advances in learning—advances that surprise and contradict prevailing wisdom. This principle combines Donald Stokes’ “use-inspired research” with Ken Stanley and Joel Lehman’s “why greatness cannot be planned” with Gold Standard Science’s informed contrariness and dissent.
      2. Fund and Execute Research at the Institution, not Individual Researcher, Level. This would be very different from the dominant mode of research funding in the U.S.: matrix-funding to principal investigators (PIs) in academia. Here, instead, research funding would be to research institutes that employ researchers rather than contract with researchers employed by other institutions. Leadership would be empowered to nurture and orchestrate the people, culture, and organizational structure of the institute for the singular purpose of empowering researchers to achieve groundbreaking discoveries.
      3. Evolve Research Institutions by Retrospective, Competitive Reselection. There should be many research institutes and none should have guaranteed perpetual funding. Instead, they should be subject to periodic evaluation “with teeth” where research institutions only continue to receive support if they are significantly changing the way we think and/or do. This creates a dynamic market-like ecosystem within which the population of research institutes evolves in response to a competitive re-selection pressure towards ever-increasing research productivity.

      Plan of Action

      The White House Office of Science and Technology Policy (OSTP) should convene a working group of stakeholders, alongside policy and metaresearch scholars, to make practical recommendations for public-private partnerships that enable corporate research akin to the Bell Labs of the 20th century, but in a 21st century “Bell Labs X” form.

      Among the stakeholders would be government agencies, corporations and philanthropies—perhaps along the lines of the Government-University-Industry-Philanthropy Research Roundtable (GUIPRR) of the National Academies of Sciences, Engineering and Medicine (NASEM).

      Importantly, the working group does not need to start from scratch. A high-level, funding and organizational model was recently articulated.

      Its starting point is the initial selection of ten or so Bell Labs Xs based on their potential for major advances in public-goods science and technology. Each Bell Labs X would be hosted and cost-shared by a corporation that brings with it its problem-rich use environment and state-of-the-art technological contexts, but majority block-funded by a research funder (federal agencies and/or philanthropies) with broad societal benefit in mind. To establish a sense of scale, we might imagine each Bell Labs X having a $120M/year operating budget and a 20% cost share—so $20M/year coming from the corporate host and $100M/year coming from the research funder. 

      This plan also envisions a market-like competitive renewal structure of these corporate research labs. At the end of a period of time (say, ten years) appropriate for long-term basic research, all ten or so Bell Labs Xs would be evaluated for their contributions to public-goods science and technology independent of their contributions to commercial applications of the host corporation. Only the most productive seven or eight of the ten would be renewed. In between selection, re-selection and subsequent re-re-selections, leadership of each Bell Labs X would be free to nurture its people, culture and organizational structure as it believes will maximize research productivity. Each Bell Labs X would thus be an experiment in research institution design. And each Bell Labs X would make its own bet on the knowledge domain it believes is ripe for the greatest disruptive advances. Government’s role would be largely confined to retrospectively rewarding or disrewarding those Bell Labs Xs that made better or worse bets, without itself making bets.

      Conclusion

      Imagine a private institution whose researchers routinely disrupted knowledge and changed the world. That’s the story of Bell Labs—a legendary research institute that gave us scientific and technological breakthroughs we now take for granted. In its heyday in the mid-20th century, Bell Labs was a crucible of innovation where brilliant minds were exposed to and inspired by real-world problems, then given the freedom to explore those problems in deep and fundamental ways, often pivoting to and solving unanticipated new problems of even greater importance.

      Recreating that innovative environment is possible and its impact on American research productivity would be profound. By innovating how we innovate, we would leap-frog other nations who are investing heavily in their own research productivity but are largely copying the structure of the current U.S. research enterprise. The resulting network of Bell Labs Xs would flip the relationship between corporations and the nation’s public-goods science and technology from asking not what the nation’s public-goods science and technology can do for corporations, but what corporations can do for the nation’s public-goods science and technology. Disruptive and useful ideas are not getting harder to find; our current research enterprise is just not well optimized to find them.

      This memo produced as part of the Federation of American Scientists and Good Science Project sprint. Find more ideas at Good Science Project x FAS

      Bounty Hunters for Science

      Fraud in scientific research is more common than we’d like to think. Such research can mislead entire scientific fields for years, driving futile and wasteful followup studies, and slowing down real scientific discoveries. To truly push the boundaries of knowledge, researchers should be able to base their theories and decisions on a more trustworthy scientific record.

      Currently there are insufficient incentives to identify fraud and correct the record. Meanwhile, fraudsters can continue to operate with little chance of being caught. That should change: Scientific funders should establish one or more bounty programs aimed at rewarding people who identify significant problems with federally-funded research, and should particularly reward fraud whistleblowers whose careers are on the line. 

      Challenge and Opportunity

      In 2023 it was revealed that 20 papers from Hoau-Yan Wang, an influential Alzheimer’s researcher, were marred by doctored images and other scientific misconduct. Shockingly, his research led to the development of a drug that was tested on 2,000 patients. A colleague described the situation as “embarrassing beyond words”.

      There is a common belief that science is self-correcting. But what’s interesting about this case is that the scientist who uncovered Wang’s  fraud was not driven by the usual academic incentives. He was being paid by Wall Street short sellers who were betting against the drug company!

      This was not an isolated incident. The most notorious example of Alzheimer’s research misconduct – doctored images in Sylvain Lesné’s papers – was also discovered with the help of short sellers. And as reported in Science, Lesné’s “paper has been cited in about 2,300 scholarly articles—more than all but four other Alzheimer’s basic research reports published since 2006, according to the Web of Science database. Since then, annual NIH support for studies labeled ‘amyloid, oligomer, and Alzheimer’s’ has risen from near zero to $287 million in 2021.” While not all of that research was motivated by Lesné’s paper, it’s inconceivable that a paper with that many citations could not have had some effect on the direction of the field.

      These cases show how a critical part of the scientific ecosystem – the exposure of faked research – can be undersupplied by ordinary science. Unmasking fraud is a difficult and awkward task, and few people want to do it. But financial incentives can help close those gaps.

      Plan of Action

      People who witness scientific fraud often stay silent due to perceived pressure from their colleagues and institutions. Whistleblowing is an undersupplied part of the scientific ecosystem.

      We can correct these incentives by borrowing an idea from the Securities and Exchange Commission, whose bounty program around financial fraud pays whistleblowers 10-30% of the fines imposed by the government. The program has been a huge success, catching dozens of fraudsters and reducing the stigma around whistleblowing. The Department of Justice has recently copied the model for other types of fraud, such as healthcare fraud. The model should be extended to scientific fraud.

      The amount of the bounty should vary with the scientific field and the nature of the whistleblower in question. For example, compare the following two situations: 

      The stakes are higher in the latter case. Few graduate students or post-docs will ever be willing to make the intense personal sacrifice of whistleblowing on their own mentor and adviser, potentially forgoing approval of their dissertation or future recommendation letters for jobs. If we want such people to be empowered to come forward despite the personal stakes, we need to make it worth their while. 

      Suppose that one of Lesné’s students in 2006 had been rewarded with a significant bounty for direct testimony about the image manipulation and fraud that was occurring. That reward might have saved tens of millions in future NIH spending, and would have been more than worth it. In actuality, as we know, none of Lesné’s students or postdocs ever had the courage to come forward in the face of such immense personal risk. 

      The Office of Research Integrity at the Department of Health and Human Services should be funded to create a bounty program for all HHS-funded research at NIH, CDC, FDA, or elsewhere. ORI’s budget is currently around $15 million per year. That should be increased by at least $1 million to account for a significant number of bounties plus at least one full-time employee to administer the program. 

      Conclusion

      Some critics might say that science works best when it’s driven by people who are passionate about truth for truth’s sake, not for the money. But by this point it’s clear that like anyone else, scientists can be driven by incentives that are not always aligned with the truth. Where those incentives fall short, bounty programs can help.

      This memo produced as part of the Federation of American Scientists and Good Science Project sprint. Find more ideas at Good Science Project x FAS

      Confirming Hope: Validating Surrogate Endpoints to Support FDA Drug Approval Using an Inter-Agency Approach

      To enable more timely access to new drugs and biologics, clinical trials are increasingly using surrogate markers in lieu of traditional clinical outcomes that directly measure how patients feel, function, or survive. Surrogate markers, such as imaging findings or laboratory measurements, are expected to predict clinical outcomes of interest. In comparison to clinical outcomes, surrogate markers offer an advantage in reducing the duration, size, and total cost of trials. Surrogate endpoints are considered to be “validated” if they have undergone extensive testing that confirms their ability to predict a clinical outcome. However, reviews of “validated” surrogate markers used as primary endpoints in trials supporting U.S. Food and Drug Administration (FDA) approvals suggest that many lack sufficient evidence of being associated with a clinical outcome. 

      Since 2018, FDA has regularly updated the publicly available “Table of Surrogate Endpoints That Were the Basis of Drug Approval or Licensure”, which includes over 200 surrogate markers that have been or would be accepted by the agency to support approval of a drug or biologic. Not included within the table is information regarding  the strength of evidence for each surrogate marker and its association with a clinical outcome. As surrogate markers are increasingly being accepted by FDA to support approval of new drugs and biologics, it is imperative that patients and clinicians understand whether such novel endpoints are reflective of meaningful clinical benefits. Thus, FDA, in collaboration with other agencies, should take steps to increase transparency regarding the strength of evidence for surrogate endpoints used to support product approvals, routinely reassess the evidence behind such endpoints to continue justifying their use in regulatory decision-making, and sunset those that fail to show association with meaningful clinical outcomes. Such transparency would not only benefit the public, clinicians, and the payers responsible for coverage decisions, but also help shape the innovation landscape for drug developers to design clinical trials that assess endpoints truly reflective of clinical efficacy. 

      Challenge and Opportunity

      To receive regulatory approval by FDA, new therapeutics are generally required to be supported by “substantial evidence of effectiveness” from two or more “adequate and well-controlled” pivotal trials. However, FDA has maintained a flexible interpretation of this guidance to enable timely access to new treatments. New drugs and biologics can be approved for specific disease indications based on pivotal trials measuring clinical outcomes (how patients feel, function, or survive). They can also be approved based on pivotal trials measuring surrogate markers that are meant to be proxy measures and expected to predict clinical outcomes. Examples of such endpoints include changes in tumor size as seen on imaging or blood laboratory tests such as cholesterol. 

      Surrogate markers are considered “validated” when sufficient evidence demonstrates that the endpoint reliably predicts clinical benefit. Such validated surrogate markers are typically the basis of traditional FDA therapeutics approval. However, FDA has also accepted the use of “unvalidated” surrogate endpoints that are reasonably likely to predict clinical benefit as the basis of approval of new therapeutics, particularly if they are being used to treat or prevent a serious or life-threatening disease. Under expedited review pathways, such as accelerated approval that grant drug manufacturers faster FDA market authorization using unvalidated surrogate markers, manufacturers are required to complete an additional clinical trial after approval to confirm the predicted clinical benefit. Should the manufacturer fail to do so, FDA has  the authority to withdraw that drug’s particular indication approval. 

      For drug developers, the use of surrogate markers in clinical trials can shorten the duration, size, and total cost of the pivotal trial. Over time, FDA has increasingly allowed for surrogate markers to be used as primary endpoints in pivotal trials, allowing for shorter clinical trial testing periods and thus faster market access. Moreover, use of unvalidated surrogate markers has grown outside of expedited review pathways such as accelerated approval. One analysis of FDA approved drugs and biologics that received “breakthrough therapy designation” found that among those that received traditional approval, over half were based on pivotal trials using surrogate markers. 

      While basing FDA approval on surrogate markers can enable more timely market access to novel therapeutics, such endpoints also involve certain trade-offs, including the risk of making erroneous inferences and diminishing certainty about the medical product’s long-term clinical effect. In oncology, evidence suggests that most validation studies of surrogate markers find low correlations with meaningful clinical outcomes such as overall survival or a patient’s quality of life. For instance, in a review of 15 surrogate validation studies conducted by the FDA for oncologic drugs, only one was found to demonstrate a strong correlation between surrogate markers and overall survival. Another study suggested that there are weak or missing correlations between surrogate markers for solid tumors and overall survival. A more recent evaluation found that most surrogate markers used as primary endpoints in clinical trials to support FDA approval of drugs treating non-oncologic chronic disease lack high-strength evidence of associations with clinical outcomes.

      Section 3011 of the 21st Century Cures Act of 2016 amended the Federal Food, Drug, and Cosmetic Act to mandate FDA publish a list of “surrogate endpoints which were the basis of approval or licensure (as applicable) of a drug or biological product” under both accelerated and traditional approval pathways. While FDA has posted surrogate endpoint tables for adult and pediatric disease indications that fulfil this legislative requirement, missing within these tables is any justification for surrogate selection, including evidence supporting validation. Without this information, patients, prescribers, and payers are left uncertain about the actual clinical benefit of therapeutics approved by the FDA based on surrogate markers. Instead, drug developers have continued to use this table as a guide in designing their clinical trials, viewing the included surrogate markers as “accepted” by the FDA regardless of the evidence (or lack thereof) undergirding them. 

      Plan of Action

      Recommendation 1. FDA should make more transparent the strength of evidence of surrogate markers included within the “Adult Surrogate Endpoint Table” as well as the “Pediatric Surrogate Endpoint Table.” 

      Previously, agency officials stated that the use of surrogate markers to support traditional approvals was usually based, at a minimum, on evidence from meta-analyses of clinical trials demonstrating an association between surrogate markers and clinical outcomes for validation. However, more recently, FDA officials have indicated that they consider a “range of sources, including mechanistic evidence that the [surrogate marker] is on the causal pathway of disease, nonclinical models, epidemiologic data, and clinical trial data, including data from the FDA’s own analyses of patient- and trial-level data to determine the quantitative association between the effect of treatment on the [surrogate marker] and the clinical outcomes.” Nevertheless, what specific evidence and how the agency weighed such evidence is not included as part of their published tables of surrogate endpoints, leaving unclear to drug developers as well as patients, clinicians, and payers the strength of the evidence behind such endpoints. Thus, this serves as an opportunity for the agency to enhance their transparency and communication with the public.

      FDA should issue a guidance document detailing their current thinking about how surrogate markers should be validated and evaluated on an ongoing basis. Within the guidance, the agency could detail the types of evidence that would be considered to establish surrogacy.

      FDA should also include within the tables of surrogate endpoints, a summary of evidence for each surrogate marker listed. This  would provide justification (through citations to relevant articles or internal analyses) so that all stakeholders understand the evidence establishing surrogacy. Moreover, FDA can clearly indicate within the tables which clinical outcomes each surrogate marker listed is thought to predict.

      FDA should also publicly report on an annual basis a list of therapeutics approved by the agency based on clinical trials using surrogate markers as primary endpoints. This coupled with the additional information around strength of evidence for each surrogate marker would allow patients and clinicians to make more informed decisions around treatments where there may be uncertainty of the therapeutic’s clinical benefit at the time of FDA approval.

      Recently, FDA’s Oncology Center for Excellent through Project Confirm has made additional efforts to communicate that status of required postmarketing studies meant to confirm clinical benefit of drugs for oncologic disease indications that received accelerated approval. FDA could further expand this across therapeutic areas and approval pathways by publishing a list of ongoing postmarketing studies for therapeutics where approval was based on surrogate markers that are intended to confirm clinical benefit.

      FDA should also regularly convene advisory committees to allow for independent experts to review and vote on recommendations around the use of new surrogate markers for disease indications. Additionally, FDA should regularly convene these advisory committees to re-evaluate the use of surrogate markers based on current evidence, especially those not supported by high-strength evidence demonstrating their association with clinical outcomes. At a minimum, FDA should convene such advisory committees focused on re-examining surrogate markers listed on their publicly available tables annually. In 2024, FDA convened the Oncologic Drugs Advisory Committee to discuss the use of the surrogate marker, minimal residual disease as an endpoint for multiple myeloma. Further such meetings including for those “unvalidated” endpoints would provide FDA opportunity to re-examine their use in regulatory decision-making.

      Recommendation 2. In collaboration with the FDA, other federal research agencies should contribute evidence generation to determine whether surrogate markers are appropriate for use in regulatory decision-making, including approval of new therapeutic products and indications for use.

      Drug manufacturers that receive FDA approval for products based on unvalidated surrogate markers may not be incentivized to conduct studies that demonstrate a lack of association between such surrogate markers with clinical outcomes. To address this, the Department of Health and Human Services (HHS) should establish an interagency working group including FDA, National Institutes of Health (NIH), Patient Centered Outcomes Research Institute (PCORI), Advanced Research Projects Agency for Health (ARPA-H), Centers for Medicare and Medicaid Services (CMS) and other agencies engaged in biomedical and health services research. These agencies could collaboratively conduct or commission meta-analyses of existing clinical trials to determine whether there is sufficient evidence to establish surrogacy. Such publicly-funded studies would then be brought to FDA advisory committees to be considered by members in making recommendations around the validity of various surrogate endpoints or whether any endpoints without sufficient evidence should be sunset. NIH in particular should prioritize funding large-scale trials aimed at validating important surrogate outcomes. 

      Through regular collaboration and convening, FDA can help guide the direction of resources towards investigating surrogate markers of key regulatory as well as patient, clinician, and payer interest to strengthen the science behind novel therapeutics. Such information would also be invaluable to drug developers in identifying evidence-based endpoints as part of their clinical trial design, thus contributing to a more efficient research and development landscape.

      Recommendation 3. Congress should build upon the provisions related to surrogate markers that passed as part of the 21st Century Cures Act of 2016 in their “Cures 2.0” efforts.

      The aforementioned interagency working group convened by HHS could be authorized explicitly through legislation coupled with funding specifically for surrogate marker validation studies. Congress should also mandate that FDA and other federal health agencies re-evaluate listed surrogate endpoints on an annual basis with additional reporting requirements. Additionally, through legislation, FDA could also be granted explicit authority for those endpoints where there is no clear evidence of their surrogacy to sunset them, thus preventing future drug candidates from establishing efficacy based on flawed endpoints. Congress should also require routine reporting from FDA on the status of the interagency working group focused on surrogate endpoints as well as other metrics including a list of new therapeutic approvals based on surrogate markers, expansion of the existing surrogate marker tables on FDA’s website to include the evidence of their surrogacy, and issuance of a guidance document detailing what scientific evidence would be considered by the agency in validating and re-evaluating surrogate markers. 

      Conclusion

      FDA increasingly has allowed new drugs and biologics to be approved based on surrogate markers that are meant to be predictive of meaningful clinical outcomes demonstrating that patients feel better, function better, and survive longer. Although the agency has made more clear what surrogate endpoints could be or are being used to support approval, significant gaps exist in the evidence demonstrating that these novel endpoints are associated with meaningful clinical outcomes. Continued use of surrogate endpoints with little association with clinical benefit leaves patients and clinicians without assurance that novel therapeutics approved by FDA are meaningfully effective, as well as payers responsible for coverage decisions. Transparency of the evidence supporting clinical endpoints is urgently needed to mitigate this uncertainty around new drug approvals, including for drug developers as they continue clinical trials for therapeutic candidates seeking FDA approval. FDA should and in collaboration with other federal biomedical research agencies, routinely re-evaluate surrogate endpoints to determine their continued use in therapeutic innovation. Such regular re-evaluation that informs regulatory decision-making will strengthen FDA’s credibility and ensure accountability of the agency, tasked with ensuring the safety and efficacy of drugs and other medical products as well as with shaping the innovation landscape.

      This memo produced as part of the Federation of American Scientists and Good Science Project sprint. Find more ideas at Good Science Project x FAS

      Frequently Asked Questions
      Are there examples of drug approvals that highlight the concerns that have been raised about surrogate markers?

      Yes. In 2016, eteplirsen (Exondys 51) was granted accelerated approval for the treatment of Duchenne muscular dystrophy (DMD) against the recommendation of an advisory committee and FDA’s own scientific staff. Concerns were raised that the approval was based on a small clinical trial that showed that eteplirsen led to a small increase in protein dystrophin, a surrogate marker. Three additional approvals for similar DMD drugs have been made based on the same surrogate endpoint. However, no studies have been completed providing confirmation of clinical benefit.


      In 2021, aducanumab (Aduhelm) was granted accelerated approval for the treatment of Alzheimer’s disease against the recommendation of an advisory committee and FDA’s scientific staff. Concerns were raised that the approval was based on a surrogate marker, beta-amyloid levels, which has not been found to correlate with cognitive or function changes for Alzheimer’s disease patients. In particular, FDA’s internal statistical review team found no association between changes to the surrogate marker and the clinical outcomes reported in pivotal trials.

      What challenges might FDA or the administration encounter from industry when launching this initiative?

      Industry may claim that such re-evaluation and potential removal of continued unvalidated surrogate endpoints would slow down the pace of innovation and thus, patient access to novel therapeutics. However, it is more likely this would instead enable more efficient drug development in providing manufacturers, particularly smaller companies with surrogate endpoints that not only decrease the duration and cost of clinical trials, but that also have strong evidence of association with meaningful clinical outcomes. This may also mitigate the need for postmarketing requirements for manufacturers meant to confirm clinical benefit if adequate validation is conducted through FDA and other federal agencies.

      Will FDA and other agencies undertaking the validation of surrogate endpoints slow down the development and approval of novel drugs?

      No. Having FDA in collaboration with other federal health agencies to validate surrogate endpoints would not halt the use of unvalidated surrogate endpoints reasonably likely to predict clinical benefit. Expedited regulatory pathways such as accelerated approval that are codified by law allowing manufacturers to use unvalidated surrogate markers as endpoints in pivotal clinical trials will still be available for manufacturers. Instead, this creates a process for re-evaluation such that unvalidated surrogate endpoints are not forever left unvalidated, but instead examined within a timely manner to inform their continued use in supporting FDA approval. Ultimately, patients and clinicians want drugs that meaningfully work to treat or prevent against a disease or condition. Routine re-evaluation and validation of surrogate endpoints would provide assurance that for those therapeutics whose approval is based off of these novel endpoints that the FDA approved treatment is clinically effective.

      Why does this memo propose an expansive multi-agency effort instead of just targeting the FDA?

      FDA’s function as a regulator is to evaluate the evidence that is brought before them by industry sponsors. To do so effectively, the evidence must be available. This is often not the case, particularly for new surrogate markers as there may not be commercial incentive to do so, particularly if after approval, a surrogate endpoint is found to be not associated with a meaningful clinical outcome. Thus, the involvement of multiple federal biomedical research agencies including NIH and ARPA-H alongside FDA can play an instrumental role in conducting or funding studies demonstrating a clear association between surrogate marker and clinical outcome. Already, several institutes within the NIH are engaged in biomarker development and in supporting validation. Collaboration between NIH institutes with expertise as well as other agencies engaged in translational research with FDA will enable validation of surrogate markers to inform regulatory decision-making of novel therapeutics.

      What similar activity is already underway?

      Under the Prescription Drug User Fee Act VII passed in 2022, FDA was authorized to establish the Rare Disease Endpoint Advancement (RDEA) pilot program. This program is intended to foster the development of novel endpoints for rare diseases through FDA collaboration with industry sponsors. with proposed novel endpoints for a drug candidate, opportunities for stakeholders including the public to inform such endpoint development, and greater FDA staff capacity to help develop novel endpoints for rare diseases. Such a pilot program could be further expanded to not only develop novel endpoints, but to also develop approaches for validating novel endpoints such as surrogate markers and communicating the strength of evidence to the public.

      Why are existing efforts insufficient?

      Payers such as Medicare have also taken steps to enable postmarket evidence generation including for drugs approved by FDA based on surrogate endpoints. Following the accelerated approval of aducanumab (Aduhelm), the Centers for Medicare and Medicaid Services (CMS) issued a national coverage determination under the coverage with evidence development (CED) program, conditioning coverage of this class of drugs to studies approved by CMS approved by FDA based on a surrogate endpoint with access only available through randomized controlled trials assessing meaningful clinical outcomes. Further evaluation of surrogate endpoints informing FDA approval can be beneficial for payers as they make coverage decisions. Additionally, coverage and reimbursement could also be tied to evidence for such surrogate endpoints, providing additional incentive to complete and communicate the findings from such studies.

      A Cross-Health and Human Services Initiative to Cut Wasteful Spending and Improve Patient Lives

      Challenge and Opportunity

      Many common medical practices do not have strong evidence behind them. In 2019, a group of prominent medical researchers—including Robert Califf, the former Food and Drug Administration (FDA) Commissioner—undertook the tedious task of looking into the level of evidence behind 2,930 recommendations in guidelines issued by the American Heart Association and the American College of Cardiology. They asked one simple question: how many recommendations were supported by multiple small randomized trials or at least one large trial? The answer: 8.5%. The rest were supported by only one small trial, by observational evidence, or just by “expert opinion only.”

      For infectious diseases, a team of researchers looked at 1,042 recommendations in guidelines issued by the Infectious Diseases Society of America. They found that only 9.3% were supported by strong evidence. For 57% of the recommendations, the quality of evidence was “low” or “very low.” And to make matters worse, more than half of the recommendations considered low in quality of evidence were still issued as “strong” recommendations.

      In oncology, a review of 1,023 recommendations from the National Comprehensive Cancer Network found that “…only 6% of the recommendations … are based on high-level evidence”, suggesting “a huge opportunity for research to fill the knowledge gap and further improve the scientific validity of the guidelines.”

      Even worse, there are many cases where not only is a common medical treatment lacking the evidence to support it, but also one or more randomized trials have shown that the treatment is useless or even harmful! One of the most notorious examples is that of the anti-arrhythmic drugs given to millions of cardiac patients in the 1980s. Cardiologists at the time had the perfectly logical belief that since arrhythmia (irregular heartbeat) leads to heart attacks and death, drugs that prevented arrhythmia would obviously prevent heart attacks and death. In 1987, the National Institutes of Health (NIH) funded the Cardiac Arrhythmia Suppression Trial (CAST) to test three such drugs. One of the drugs had to be pulled after just a few weeks, because 17 patients had already died compared with only three in the placebo group. The other two drugs similarly turned out to be harmful, although it took several months to see that patients given those drugs were more than two times as likely to die.  According to one JAMA article, “…there are estimates that 20,000 to 75,000 lives were lost each year in the 1980s in the United States alone…” due to these drugs. The CAST trial is a poignant reminder that doctors can be convinced they are doing the best for their patients, but they can be completely wrong if there is not strong evidence from randomized trials.

      In 2016, randomized trials of back fusion surgery found that it does not work. But a recent analysis by the Lown Institute found that the Centers for Medicare & Medicaid Services (CMS) spent approximately $2 billion in the past 3 years on more than 200,000 of these surgeries.

      There are hundreds of additional examples where medical practice was ultimately proven wrong. Given how few medical practices, even now, are actually supported by strong evidence, there are likely many more examples of treatments that either do not work or actively cause harm. This is not only wasted spending, but also puts patients at risk.

      We can do better – both for patients and for the federal budget – if we reduce the use of medical practices that simply do not work.

      Plan of Action

      The Secretary of Health and Human Services should create a cross-division committee to develop an extensive and prioritized list of medical practices, products, and treatments that need evidence of effectiveness, and then roll out an ambitious agenda to run randomized clinical trials for the highest-impact medical issues.

      That is, the CMS needs to work with the NIH and the FDA, and the Centers for Disease Control and Prevention (CDC) to develop a prioritized list of medical treatments, procedures, drugs, and devices with little evidence behind them and for which annual spending is large and the health impacts could be most harmful. Simultaneously, the FDA needs to work with its partner agencies to identify drugs, vaccines, and devices with widespread medical usage that need rigorous post-market evaluation. This includes drugs with off-label uses, oncology regimens that have never been tested against each other, surrogate outcomes that have not been validated against long-term outcomes, accelerated approvals without the needed follow-up studies, and more.

      With priority lists available, the NIH could immediately launch trials to evaluate the effectiveness of the identified treatments and practices to ensure effective health and safety. The Department should report to Congress on a yearly basis as to the number and nature of clinical trials in progress, and eventually the results of those trials (which should also be made available on a public dashboard, with any resulting savings). The project should be ongoing for the indefinite future, and over time, HHS should explore ways to have artificial intelligence tools identify the key unstudied medical questions that deserve a high-value clinical trial.

      Expected opponents to any such effort will be pharmaceutical, biotechnology and device companies and their affiliated trade associations, whose products might come under further scrutiny, and professional medical associations who are firmly convinced that their practices should not be questioned. Their lobbying power might be considerable, but the intellectual case behind the need for rigorous and unbiased studies is unquestionable, particularly when billions of federal dollars and millions of patients’ lives and health are at stake.

      Conclusion

      Far too many medical practices and treatments have not been subjected to rigorous randomized trials, and the divisions of Health and Human Services should come together to fix this problem. Doing so will likely lead to billions of dollars in savings and huge improvements to patient health.

      This memo produced as part of the Federation of American Scientists and Good Science Project sprint. Find more ideas at Good Science Project x FAS