Bold Goals Require Bold Funding Levels. The FY25 Requests for the U.S. Bioeconomy Fall Short

Over the past year, there has been tremendous momentum in policy for the U.S. bioeconomy – the collection of advanced industry sectors, like pharmaceuticals, biomanufacturing, and others, with biology at their core. This momentum began in part with the Bioeconomy Executive Order (EO) and the programs authorized in CHIPS and Science, and continued with the Office of Science and Technology Policy (OSTP) release of the Bold Goals for U.S. Biotechnology and Biomanufacturing (Bold Goals) report. The report highlighted ambitious goals that the Department of Energy (DOE), Department of Commerce (DOC), Human Health Services (HHS), National Science Foundation (NSF), and the Department of Agriculture (USDA) have committed to in order to further the U.S. bioeconomical enterprise.

However, these ambitious goals set by various agencies in the Bold Goals report will also require directed and appropriate funding, and this is where we have been falling short. Multiple bioeconomy-related programs were authorized through the bipartisan CHIPS & Science legislation but have yet to receive anywhere near their funding targets. Underfunding and the resulting lack of capacity has also led to a delay in the tasks under the Bioeconomy EO. In order for the bold goals outlined in the report to be realized, it will be imperative for the U.S. to properly direct and fund the many different endeavors under the U.S. bioeconomy.

Despite this need for funding for the U.S. bioeconomy, the recently-completed FY2024 (FY24) appropriations were modest for some science agencies but abysmal for others, with decreases seen across many different scientific endeavors across agencies. The DOC, and specifically the National Institute of Standards and Technology (NIST), saw massive cuts in funding base program funding, with earmarks swamping core activities in some accounts. 

There remains some hope that the FY2025 (FY25) budget will alleviate some of the cuts that have been seen to science endeavors, and in turn, to programs related to the bioeconomy. But the strictures of the Fiscal Responsibility Act, which contributed to the difficult outcomes in FY24, remain in place for FY25 as well.

Bioeconomy in the FY25 Request

With this difficult context in mind, the Presidential FY25 Budget was released as well as the FY25 budgets for DOE, DOC, HHS, NSF, and USDA

The President’s Budget makes strides toward enabling a strong bioeconomy by prioritizing synthetic biology metrology and standards within NIST and by directing OSTP to establish the Initiative Coordination Office to support the National Engineering Biology Research and Development Initiative. However, beyond these two instances, the President’s budget only offers limited progress for the bioeconomy because of mediocre funding levels.

The U.S. bioeconomy has a lot going on, with different agencies prioritizing different areas and programs depending on their jurisdiction. This makes it difficult to properly grasp all the activity that is ongoing (but we’re working on it, stay tuned!). However, we do know that the FY25 budget requests from the agencies themselves have been a mix bag for bioeconomy activities related to the Bold Goals Report. Some agencies are asking for large appropriations, while some agencies are not investing enough to support these goals:

Department of Energy supports Bold Goals Report efforts in biotech & biomanufacturing R&D to further climate change solutions

The increase in funding levels requested for FY25 for BER and MESC will enable increased biotech and biomanufacturing R&D, supporting DOE efforts to meet its proposed objectives in the Bold Goals Report.

Department of Commerce falls short in support of biotech & biomanufacturing R&D supply chain resilience

One budgetary increase request is offset by two flat funding levels.

Department of Agriculture falls short in support of biotech & biomanufacturing R&D to further food & Ag innovation

Human Health Services falls short in support of biotech & biomanufacturing R&D to further human health

National Science Foundation supports Bold Goals Report efforts in biotech & biomanufacturing R&D to further cross-cutting advances

* FY23 amounts are listed due to FY24 appropriations not being finalized at the time that this document was created.

Overall, the DOE and NSF have asked for FY25 budgets that could potentially achieve the goals stated in the Bold Goals Report, while the DOC, USDA and HHS have unfortunately limited their budgets and it remains questionable if they will be able to achieve the goals listed with the funding levels requested. The DOC, and specifically NIST, faces one of the biggest challenges this upcoming year. NIST has to juggle tasks assigned to it from the AI EO as well as the Bioeconomy EO and the Presidential Budget. The 8% decrease in funding for NIST does not paint a promising picture for either the Bioeconomy EO and should be something that Congress rectifies when they enact their appropriation bills. Furthermore, the USDA faces cuts in funding for vital programs related to their goals and AgARDA continues to be unfunded. In order for USDA to achieve the goals listed in the Bold Goals report, it will be imperative that Congress prioritize these areas for the benefit of the U.S. bioeconomy.

CHIPS and Science Funding Gaps Continues to Stifle Scientific Competitiveness

The bipartisan CHIPS and Science Act sought to accelerate U.S. science and innovation, to let us compete globally and solve problems at home. The multifold CHIPS approach to science and tech reached well beyond semiconductors: it authorized long-term boosts for basic science and education, expanded the geography of place-based innovation, mandated a whole-of-government science strategy, and other moves.

But appropriations in FY 2024, and the strictures of the Fiscal Responsibility Act in FY 2025, make clear that we’re falling well short of CHIPS aspirations. The ongoing failure of the U.S. to invest comes at a time when our competitors continue to up their investments in science, with China pledging 10% growth in investment, the EU setting forth new strategies for biotechnology and manufacturing, and Korea’s economy approaching 5% R&D investment intensity, far more than the U.S.

Research Agency Funding Shortfalls 

In the aggregate, CHIPS and Science authorized three research agencies – the National Science Foundation (NSF), the Department of Energy Office of Science (DOE SC), and the National Institute of Standards and Technology (NIST) – to receive $26.8 billion in FY 2024 and $28.8 billion in FY 2025, representing substantial growth in both years. But appropriations have increasingly underfunded the CHIPS agencies, with a gap now over $8 billion (see graph).

appropriations have increasingly underfunded the CHIPS agencies, with a gap now over $8 billion

The table below shows agency funding data in greater detail, including FY 2023 and FY 2024 appropriations, the FY 2025 CHIPS authorization, and the FY 2025 request.

The National Science Foundation is experiencing the largest gap between CHIPS targets and actual appropriations following a massive year-over-year funding reduction in FY 2024. That cut is partly the result of appropriators rescuing NSF in FY 2023 with over $1 billion in supplemental spending to support both NSF base activities and implementation of the Technology, Innovation and Partnerships Directorate (TIP). While that spending provided NSF a welcome boost in FY 2023, it could not be replicated in FY 2024, and NSF only received a modest boost in base appropriations. As a result, the full year-over-decline for NSF amounted to over $800 million, which will likely mean cutbacks in both core and TIP (the exact distribution is to be determined though Congress called for an even-handed approach). It also means a CHIPS shortfall of $6.5 billion in both FY 2024 and FY 2025.

The National Institute of Standards and Technology also requires some additional explanation. Like NSF, NIST received some supplemental spending for both lab programs and industrial innovation in FY 2023, but NIST also has been subject to quite substantial earmarks in FY 2023 and FY 2024, as seen in the table above. The presence of earmarks in FY 2024 meant, in practice, a nearly $100 million reduction in funding for core NIST lab programs, which cover a range of activities in measurement science and emerging technology areas.

The Department of Energy’s Office of Science fared better than the other two in the omnibus with a modest increase, but still faces a $1.5 billion shortfall below CHIPS targets in the White House request. 

Select Account Shortfalls

National Science Foundation

Core research. Excluding the newly-created TIP Directorate, purchasing power of core NSF research activities in biology, computing, engineering, geoscience, math and physical sciences, and social science dropped by over $300 million between FY 2021 and FY2023. If the FY 2024 funding cuts are distributed proportionally across directorates, their collective purchasing power would have dropped by over $1 billion all-in between FY 2021 and the present, representing a decline of more than 15%. This would also represent a shortfall of $2.9 billion below the CHIPS target for FY 2024, and will likely result in hundreds of fewer research awards.

STEM Education. While not quite as large as core research, NSF’s STEM directorate has still lost over 8% of its purchasing power since FY 2021, and remains $1.3 billion below its CHIPS target after a 15% year-over-year cut in the FY 2024 omnibus. This cut will likely mean hundreds of fewer graduate fellowships and other opportunities for STEM support, let alone multimillion-dollar shortfalls in CHIPS funding targets for programs like CyberCorps and Noyce teacher scholarships. The minibus did allocate $40 million for the National STEM Teacher Corps pilot program established in CHIPS, but implementing this carveout will pose challenges in light of funding cuts elsewhere.

TIP Programs. FY 2023 funding fell over $800 million shy of the CHIPS target for the new technology directorate, which had been envisioned to grow rapidly but instead will now have to deal with fiscal retrenchment. Several items established in CHIPS remain un- or under-funded. For instance, NSF Entrepreneurial Fellowships have received only $10 million from appropriators to date out of $125 million total authorized, while Centers for Transformative Education Research and Translation – a new initiative intended to research and scale educational innovations – has gotten no funding to date. Also underfunded are the Regional Innovation Engines (see below).

Department of Energy

Microelectronics Centers. While the FY 2024 picture for the Office of Science (SC) is perhaps not quite as stark as it is for NSF – partly because SC didn’t enjoy the benefit of a big but transient boost in FY 2023 – there remain underfunded CHIPS priorities throughout. One more prominent initiative is DOE’s Microelectronics Science Research Centers, intended to be a multidisciplinary R&D network for next-generation science funded across the SC portfolio. CHIPS authorized these at $25 million per center per year.

Fission and Fusion. Fusion energy was a major priority in CHIPS and Science, which sought among other things expansion of milestone-based development to achieve a fusion pilot plant. But following FY 2024 appropriations, the fusion science program continues to face a more than $200 million shortfall, and DOE’s proposal for a stepped-up research network – now dubbed the Fusion Innovation Research Engine (FIRE) centers – remains unfunded. CHIPS and Science also sought to expand nuclear research infrastructure at the nation’s universities, but the FY 2024 omnibus provided no funding for the additional research reactors authorized in CHIPS.

Clean Energy Innovation. CHIPS Title VI authorized a wide array of energy innovation initiatives – including clean energy business vouchers and incubators, entrepreneurial fellowships, a regional energy innovation program, and others. Not all received a specified funding authorization, but those that did have generally not yet received designated line-item appropriations. 

NIST

In addition to the funding challenges for NIST lab programs described above – which are critical for competitiveness in emerging technology – NIST manufacturing programs also continue to face shortfalls, of $192 million in the FY 2024 omnibus and over $500 million in the FY 2025 budget request.

Regional Innovation

As envisioned when CHIPS was signed, three major place-based innovation and economic development programs – EDA’s Regional Technology and Innovation Hubs (Tech Hubs), NSF’s Regional Innovation Engines (Engines), and EDA’s Distressed Area Recompete Pilot Program (Recompete) – would be moving from exclusively planning and selection into implementation phases as well in FY25. But with recent budget announcements, some implementation may need to be scaled back from what was originally planned, putting at risk our ability to rise to the confluence of economic and industrial challenges we face.

EDA Tech Hubs. In October 2023, the Biden-Harris administration announced the designation of 31 inaugural Tech Hubs and 29 recipients of Tech Hubs Strategy Development Grants from nearly 400 applicants. These 31 Tech Hubs designees were chosen for their potential to become ​​global centers of innovation and job creators. Upon announcement, the designees were then able to apply to receive implementation grants of $40-$70 million to each of approximately 5-10 of the designated Tech Hubs. Grants are expected to be announced in summer 2024.

The FY 2025 budget request for Tech Hubs includes $41 million in discretionary spending to fund additional grants to the existing designees, and another $4 billion in mandatory spending – spread over several years – to allow for additional Tech Hubs designees and strategy development grants. CHIPS and Science authorized the Hubs at $10 billion in total, but the program has only received 5% of this in actual appropriations to date. The FY25 request would bring total program funding up to 46% of the authorization. 

The ambitious goal of Tech Hubs is to restore the U.S. position as a leader in critical technology development, but this ambition is dependent on our ability to support the quantity and quality of the program as originally envisioned. Without meeting the funding expectations set in CHIPS, the Tech Hubs’ ability to restore American leadership will be vastly limited. 

NSF Engines. In January 2024, NSF announced the first NSF Engines awards to 10 teams across the United States. Each NSF Engine will receive an initial $15 million over the next two years with the potential to receive up to $160 million each over the next decade.

Beyond those 10 inaugural Engines awards, a selection of applicants were invited to apply for NSF Engines development awards, with each receiving up to $1 million to support team-building, partnership development, and other necessary steps toward future NSF Engines proposals. NSF’s initial investment in the 10 awardee regions is being matched almost two to one in commitments from local and state governments, other federal agencies, private industry, and philanthropy. NSF previously announced 44 Development Awardees in May 2023.

To bolster the efforts of NSF Engines, NSF also announced the Builder Platform in September 2023, which serves as a post-award model to provide resources, support, and engagement to awardees. 

The FY25 request level for NSF Engines is $205 million, which will support up to 13 NSF Regional Innovation Engines. While this $205 million would be a welcome addition – especially in light of the funding risks and uncertainty in FY24 mentioned above – total funding to date is considerably below CHIPS aspirations, accounting for just over 6% of authorized funding. 

EDA Recompete. The EDA Recompete Program, authorized for up to $1 billion in the CHIPS and Science Act, aims to allocate resources towards economically disadvantaged areas and create good jobs. By targeting regions where prime-age (25-54 years) employment lags behind the national average, the program seeks to revitalize communities long overlooked, bridging the gap through substantial and flexible investments.

Recompete received $200 million in appropriations in 2023 for the initial competition. This competition received 565 applications, with total requests exceeding $6 billion. Of those applicants, 22 Phase 1 Finalists were announced in December 2023. 

Recompete Finalists are able to apply for the Phase 2 Notice of Funding Opportunity and are provided access to technical assistance support for their plans. In Phase 2, EDA will make approximately 4-8 implementation investments, with awarded regions receiving between $20 to $50 million on average.

Alongside the 22 Finalists, Recompete Strategy Development Grant recipients were announced. These grants support applicant communities in strategic planning and capacity building. 

Following a shutout in FY 2024 appropriations, Recompete funding in the FY25 request is $41 million, bringing total funding to date to $241 million or just over 24% of authorized funding.

Congress will soon have the chance to rectify these collective shortfalls, with FY 2025 appropriations legislation coming down the pike soon. But the November elections throw substantial uncertainty over what was already a difficult situation. If Congress can’t muster the votes necessary to properly fund CHIPS and Science programs, U.S. competitiveness will continue to suffer.

Predicting Progress: A Pilot of Expected Utility Forecasting in Science Funding

Read more about expected utility forecasting and science funding innovation here.

The current process that federal science agencies use for reviewing grant proposals is known to be biased against riskier proposals. As such, the metascience community has proposed many alternate approaches to evaluating grant proposals that could improve science funding outcomes. One such approach was proposed by Chiara Franzoni and Paula Stephan in a paper on how expected utility — a formal quantitative measure of predicted success and impact — could be a better metric for assessing the risk and reward profile of science proposals. Inspired by their paper, the Federation of American Scientists (FAS) collaborated with Metaculus to run a pilot study of this approach. In this working paper, we share the results of that pilot and its implications for future implementation of expected utility forecasting in science funding review. 

Brief Description of the Study

In fall 2023, we recruited a small cohort of subject matter experts to review five life science proposals by forecasting their expected utility. For each proposal, this consisted of defining two research milestones in consultation with the project leads and asking reviewers to make three forecasts for each milestone:

  1. the probability of success;
  2. The scientific impact of the milestone, if it were reached; and
  3. The social impact of the milestone, if it were reached.

These predictions can then be used to calculate the expected utility, or likely impact, of a proposal and design and compare potential portfolios.

Key Takeaways for Grantmakers and Policymakers

The three main strengths of using expected utility forecasting to conduct peer review are

Despite the apparent complexity of this process, we found that first-time users were able to successfully complete their review according to the guidelines without any additional support. Most of the complexity occurs behind-the-scenes, and either aligns with the responsibilities of the program manager (e.g., defining milestones and their dependencies) or can be automated (e.g., calculating the total expected utility). Thus, grantmakers and policymakers can have confidence in the user friendliness of expected utility forecasting. 

How Can NSF or NIH Run an Experiment on Expected Utility Forecasting?

An initial pilot study could be conducted by NSF or NIH by adding a short, non-binding expected utility forecasting component to a selection of review panels. In addition to the evaluation of traditional criteria, reviewers would be asked to predict the success and impact of select milestones for the proposals assigned to them. The rest of the review process and the final funding decisions would be made using the traditional criteria. 

Afterwards, study facilitators could take the expected utility forecasting results and construct an alternate portfolio of proposals that would have been funded if that approach was used, and compare the two portfolios. Such a comparison would yield valuable insights into whether—and how—the types of proposals selected by each approach differ, and whether their use leads to different considerations arising during review. Additionally, a pilot assessment of reviewers’ prediction accuracy could be conducted by asking program officers to assess milestone achievement and study impact upon completion of funded projects.

Findings and Recommendations

Reviewers in our study were new to the expected utility forecasting process and gave generally positive reactions. In their feedback, reviewers said that they appreciated how the framing of the questions prompted them to think about the proposals in a different way and pushed them to ground their assessments with quantitative forecasts. The focus on just three review criteria–probability of success, scientific impact, and social impact–was seen as a strength because it simplified the process, disentangled feasibility from impact, and eliminated biased metrics. Overall, reviewers found this new approach interesting and worth investigating further. 

In designing this pilot and analyzing the results, we identified several important considerations for planning such a review process. While complex, engaging with these considerations tended to provide value by making implicit project details explicit and encouraging clear definition and communication of evaluation criteria to reviewers. Two key examples are defining the proposal milestones and creating impact scoring systems. In both cases, reducing ambiguities in terms of the goals that are to be achieved, developing an understanding of how outcomes depend on one another, and creating interpretable and resolvable criteria for assessment will help ensure that the desired information is solicited from reviewers. 

Questions for Further Study

Our pilot only simulated the individual review phase of grant proposals and did not simulate a full review committee. The typical review process at a funding agency consists of first, individual evaluations by assigned reviewers, then discussion of those evaluations by the whole review committee, and finally, the submission of final scores from all members of the committee. This is similar to the Delphi method, a structured process for eliciting forecasts from a panel of experts, so we believe that it would work well with expected utility forecasting. The primary change would therefore be in the definition and approach for eliciting criterion scores, rather than the structure of the review process. Nevertheless, future implementations may uncover additional considerations that need to be addressed or better ways to incorporate forecasting into a panel environment. 

Further investigation into how best to define proposal milestones is also needed. This includes questions such as, who should be responsible for determining the milestones? If reviewers are involved, at what part(s) of the review process should this occur? What is the right balance between precision and flexibility of milestone definitions, such that the best outcomes are achieved? How much flexibility should there be in the number of milestones per proposal? 

Lastly, more thought should be given to how to define social impact and how to calibrate reviewers’ interpretation of the impact score scale. In our report, we propose a couple of different options for calibrating impact, in addition to describing the one we took in our pilot. 

Interested grantmakers, both public and private, and policymakers are welcome to reach out to our team if interested in learning more or receiving assistance in implementing this approach.


Introduction

The fundamental concern of grantmakers, whether governmental or philanthropic, is how to make the best funding decisions. All funding decisions come with inherent uncertainties that may pose risks to the investment. Thus, a certain level of risk-aversion is natural and even desirable in grantmaking institutions, especially federal science agencies which are responsible for managing taxpayer dollars. However, without risk, there is no reward, so the trade-off must be balanced. In mathematics and economics, expected utility is the common metric assumed to underlie all rational decision making. Expected utility has two components: the probability of an outcome occurring if an action is taken and the value of that outcome, which roughly corresponds with risk and reward. Thus, expected utility would seem to be a logical choice for evaluating science funding proposals. 

In the debates around funding innovation though, expected utility has largely flown under the radar compared to other ideas. Nevertheless, Chiara Franzoni and Paula Stephan have proposed using expected utility in peer review. Building off of their paper, the Federation of American Scientists (FAS) developed a detailed framework for how to implement expected utility into a peer review process. We chose to frame the review criteria as forecasting questions, since determining the expected utility of a proposal inherently requires making some predictions about the future. Forecasting questions also have the added benefit of being resolvable–i.e., the true outcome can be determined after the fact and compared to the prediction–which provides a learning opportunity for reviewers to improve their abilities and identify biases. In addition to forecasting, we incorporated other unique features, like an exponential scale for scoring impact, that we believe help reduce biases against risky proposals. 

With the theory laid out, we conducted a small pilot in fall of 2023. The pilot was run in collaboration with Metaculus, a crowd forecasting platform and aggregator, to leverage their expertise in designing resolvable forecasting questions and to use their platform to collect forecasts from reviewers. The purpose of the pilot was to test the mechanics of this approach in practice, see if there are any additional considerations that need to be thought through, and surface potential issues that need to be solved for. We were also curious if there would be any interesting or unexpected results that arise based on how we chose to calculate impact and total expected utility. It is important to note that this pilot was not an experiment, so we did not have a control group to compare the results of the review with. 

Since FAS is not a grantmaking institution, we did not have a ready supply of traditional grant proposals to use. Instead, we used a set of two-page research proposals for Focused Research Organizations (FROs) that we had sourced through separate advocacy work in that area.1 With the proposal authors’ permission, we recruited a cohort of twenty subject matter experts to each review one of five proposals. For each proposal, we defined two research milestones in consultation with the proposal authors. Reviewers were asked to make three forecasts for each milestone:

  1. The probability of success;
  2. The scientific impact, conditional on success; and
  3. The social impact, conditional on success.

Reviewers submitted their forecasts on Metaculus’ platform; in a separate form they provided explanations for their forecasts and responded to questions about their experience and impression of this new approach to proposal evaluation. (See Appendix A for details on the pilot study design.)

Insights from Reviewer Feedback

Overall, reviewers liked the framing and criteria provided by the expected utility approach, while their main critique was of the structure of the research proposals. Excluding critiques of the research proposal structure, which are unlikely to apply to an actual grant program, two thirds of the reviewers expressed positive opinions of the review process and/or thought it was worth pursuing further given drawbacks with existing review processes. Below, we delve into the details of the feedback we received from reviewers and their implications for future implementation.

Feedback on Review Criteria

Disentangling Impact from Feasibility

Many of the reviewers said that this model prompted them to think differently about how they assess the proposals and that they liked the new questions. Reviewers appreciated that the questions focused their attention on what they think funding agencies really want to know and nothing more: “can it occur?” and “will it matter?” This approach explicitly disentangles impact from feasibility: “Often, these two are taken together, and if one doesn’t think it is likely to succeed, the impact is also seen as lower.” Additionally, the emphasis on big picture scientific and social impact “is often missing in the typical review process.” Reviewers also liked that this approach eliminates what they consider biased metrics, such as the principal investigator’s reputation, track record, and “excellence.” 

Reducing Administrative Burden

The small set of questions was seen as more efficient and less burdensome on reviewers. One reviewer said, “I liked this approach to scoring a proposal. It reduces the effort to thinking about perceived impact and feasibility.” Another reviewer said, “On the whole it seems a worthwhile exercise as the current review processes for proposals are onerous.” 

Quantitative Forecasting

Reviewers saw benefits to being asked to quantify their assessments, but also found it challenging at times. A number of reviewers enjoyed taking a quantitative approach and thought that it helped them be more grounded and explicit in their evaluations of the proposals. However, some reviewers were concerned that it felt like guesswork and expressed low confidence in their quantitative assessments, primarily due to proposals lacking details on their planned research methods, which is an issue discussed in the section “Feedback on Proposals.” Nevertheless, some of these reviewers still saw benefits to taking a quantitative approach: “It is interesting to try to estimate probabilities, rather than making flat statements, but I don’t think I guess very well. It is better than simply classically reviewing the proposal [though].” Since not all academics have experience making quantitative predictions, we expect that there will be a learning curve for those new to the practice. Forecasting is a skill that can be learned though, and we think that with training and feedback, reviewers can become better, more confident forecasters.

Defining Social Impact

Of the three types of questions that reviewers were asked to answer, the question about social impact seemed the harder one for reviewers to interpret. Reviewers noted that they would have liked more guidance on what was meant by social impact and whether that included indirect impacts. Since questions like these are ultimately subjective, the “right” definition of social impact and what types of outcomes are considered most valuable will depend on the grantmaking institution, their domain area, and their theory of change, so we leave this open to future implementers to clarify in their instructions. 

Calibrating Impact

While the impact score scale (see Appendix A) defines the relative difference in impact between scores, it does not define the absolute impact conveyed by a score. For this reason, a calibration mechanism is necessary to provide reviewers with a shared understanding of the use and interpretation of the scoring system. Note that this is a challenge that rubric-based peer review criteria used by science agencies also face. Discussion and aggregation of scores across a review committee helps align reviewers and average out some of this natural variation.2

To address this, we surveyed a small, separate set of academics in the life sciences about how they would score the social and scientific impact of the average NIH R01 grant, which many life science researchers apply to and review proposals for. We then provided the average scores from this survey to reviewers to orient them to the new scale and help them calibrate their scores. 

One reviewer suggested an alternative approach: “The other thing I might change is having a test/baseline question for every reviewer to respond to, so you can get a feel for how we skew in terms of assessing impact on both scientific and social aspects.” One option would be to ask reviewers to score the social and scientific impact of the average grant proposal for a grant program that all reviewers would be familiar with; another would be to ask reviewers to score the impact of the average funded grant for a specific grant program, which could be more accessible for new reviewers who have not previously reviewed grant proposals. A third option would be to provide all reviewers on a committee with one or more sample proposals to score and discuss, in a relevant and shared domain area.

When deciding on an approach for calibration, a key consideration is the specific resolution criteria that are being used — i.e., the downstream measures of impact that reviewers are being asked to predict. One option, which was used in our pilot, is to predict the scores that a comparable, but independent, panel of reviewers would give the project some number of years following its successful completion. For a resolution criterion like this one, collecting and sharing calibration scores can help reviewers get a sense for not just their own approach to scoring, but also those of their peers.

Making Funding Decisions

In scoring the social and scientific impact of each proposal, reviewers were asked to assess the value of the proposal to society or to the scientific field. That alone would be insufficient to determine whether a proposal should be funded though, since it would need to be compared with other proposals in conjunction with its feasibility. To do so, we calculated the total expected utility of each proposal (see Appendix C). In a real funding scenario, this final metric could then be used to compare proposals and determine which ones get funded. Additionally, unlike a traditional scoring system, the expected utility approach allows for the detailed comparison of portfolios — including considerations like the expected proportion of milestones reached and the range of likely impacts.

In our pilot, reviewers were not informed that we would be doing this additional calculation based on their submissions. As a result, one reviewer thought that the questions they were asked failed to include other important questions, like “should it occur?” and “is it worth the opportunity cost?” Though these questions were not asked of reviewers explicitly, we believe that they would be answered once the expected utility of all proposals is calculated and considered, since the opportunity cost of one proposal would be the expected utility of the other proposals. Since each reviewer only provided input on one proposal, they may have felt like the scores they gave would be used to make a binary yes/no decision on whether to fund that one proposal, rather than being considered as a part of a larger pool of proposals, as it would be in a real review process.

Feedback on Proposals

Missing Information Impedes Forecasting

The primary critique that reviewers expressed was that the research proposals lacked details about their research plans, what methods and experimental protocols would be used, and what preliminary research the author(s) had done so far. This hindered their ability to properly assess the technical feasibility of the proposals and their probability of success. A few reviewers expressed that they also would have liked to have had a better sense of who would be conducting the research and each team member’s responsibilities. These issues arose because the FRO proposals used in our pilot had not originally been submitted for funding purposes, and thus lacked the requirements of traditional grant proposals, as we noted above. We assume this would not be an issue with proposals submitted to actual grantmakers.3  

Improving Milestone Design

A few reviewers pointed out that some of the proposal milestones were too ambiguous or were not worded specifically enough, such that there were ways that researchers could technically say that they had achieved the milestone without accomplishing the spirit of its intent. This made it more challenging for reviewers to assess milestones, since they weren’t sure whether to focus on the ideal (i.e., more impactful) interpretation of the milestone or to account for these “loopholes.” Moreover, loopholes skew the forecasts, since they increase the probability of achieving a milestone, while lowering the impact of doing so if it is achieved through a loophole.

One reviewer suggested, “I feel like the design of milestones should be far more carefully worded – or broken up into sub-sentences/sub-aims, to evaluate the feasibility of each. As the questions are currently broken down, I feel they create a perverse incentive to create a vaguer milestone, or one that can be more easily considered ‘achieved’ for some ‘good enough’ value of achieved.” For example, they proposed that one of the proposal milestones, “screen a library of tens of thousands of phage genes for enterobacteria for interactions and publish promising new interactions for the field to study,” could be expanded to

  1. “Generate a library of tens of thousands of genes from enterobacteria, expressed in E. coli
  2. “Validate their expression under screenable conditions
  3. “Screen the library for their ability to impede phage infection with a panel of 20 type phages
  4. “Publish … 
  5. “Store and distribute the library, making it as accessible to the broader community”

We agree with the need for careful consideration and design of milestones, given that “loopholes” in milestones can detract from their intended impact and make it harder for reviewers to accurately assess their likelihood. In our theoretical framework for this approach, we identified three potential parties that could be responsible for defining milestones: (1) the proposal author(s), (2) the program manager, with or without input from proposal authors, or (3) the reviewers, with or without input from proposal authors. This critique suggests that the first approach of allowing proposal authors to be the sole party responsible for defining proposal milestones is vulnerable to being gamed, and the second or third approach would be preferable. Program managers who take on the task of defining milestones should have enough expertise to think through the different potential ways of fulfilling a milestone and make sure that they are sufficiently precise for reviewers to assess.

Benefits of Flexibility in Milestones

Some flexibility in milestones may still be desirable, especially with respect to the actual methodology, since experimentation may be necessary to determine the best technique to use. For example, speaking about the feasibility of a different proposal milestone – “demonstrate that Pro-AG technology can be adapted to a single pathogenic bacterial strain in a 300 gallon aquarium of fish and successfully reduce antibiotic resistance by 90%” – a reviewer noted that 

“The main complexity and uncertainty around successful completion of this milestone arises from the native fish microbiome and whether a CRISPR delivery tool can reach the target strain in question. Due to the framing of this milestone, should a single strain be very difficult to reach, the authors could simply switch to a different target strain if necessary. Additionally, the mode of CRISPR delivery is not prescribed in reaching this milestone, so the authors have a host of different techniques open to them, including conjugative delivery by a probiotic donor or delivery by engineered bacteriophage.”

Peer Review Results

Sequential Milestones vs. Independent Outcomes

In our expected utility forecasting framework, we defined two different ways that a proposal could structure its outcomes: as sequential milestones where each additional milestone builds off of the success of the previous one, or as independent outcomes where the success of one is not dependent on the success of the other(s). For proposals with sequential milestones in our pilot, we would expect the probability of success of milestone 2 to be less than the probability of success of milestone 1 and for the opposite to be true of their impact scores. For proposals with independent outcomes, we do not expect there to be a relationship between the probability of success and the impact scores of milestones 1 and 2. There are different equations for calculating the total expected utility, depending on the relationship between outcomes (see Appendix C).

For each of the proposals in our study, we categorized them based on whether they had sequential milestones or independent outcomes. This information was not shared with reviewers. Table 1 presents the average reviewer forecasts for each proposal. In general, milestones received higher scientific impact scores than social impact scores, which makes sense given the primarily academic focus of research proposals. For proposals 1 to 3, the probability of success of milestone 2 was roughly half of the probability of success of milestone 1; reviewers also gave milestone 2 higher scientific and social impact scores than milestone 1. This is consistent with our categorization of proposals 1 to 3 as sequential milestones.

Table 1. Mean forecasts for each proposal.
See next section for discussion about the categorization of proposal 4’s milestones.
Milestone 1Milestone 2
ProposalMilestone CategoryProbability of SuccessScientific Impact ScoreSocial Impact ScoreProbability of SuccessScientific Impact ScoreSocial Impact Score
1sequential0.807.837.350.418.228.25
2sequential0.886.413.720.368.217.62
3sequential0.687.076.450.348.207.50
4?0.726.583.920.477.064.19
5independent0.557.142.370.406.662.25

Further Discussion on Designing and Categorizing Milestones

We originally categorized proposal 4’s milestones as sequential, but one reviewer gave milestone 2 a lower scientific impact score than milestone 1 and two reviewers gave it a lower social impact score. One reviewer also gave milestone 2 roughly the same probability of success as milestone 1. This suggests that proposal 4’s milestones can’t be considered strictly sequential. 

The two milestones for proposal 4 were

The reviewer who gave milestone 2 a lower scientific impact score explained: “Given the wording of the milestone, I do not believe that if the scientific milestone was achieved, it would greatly improve our understanding of the brain.” Unlike proposals 1-3, in which milestone 2 was a scaled-up or improved-upon version of milestone 1, these milestones represent fundamentally different categories of output (general-purpose tool vs specific model). Thus, despite the necessity of milestone 1’s tool for achieving milestone 2, the reviewer’s response suggests that the impact of milestone 2 was being considered separately rather than cumulatively.

Milestone Design Recommendations
Explicitly define sequential milestones
Recommendation 1

To properly address this case of sequential milestones with different types of outputs, we recommend that for all sequential milestones, latter milestones should be explicitly defined as inclusive of prior milestones. In the above example, this would imply redefining milestone 2 as “Complete milestone 1 and develop a model of the C. elegans nervous system…” This way, reviewers know to include the impact of milestone 1 in their assessment of the impact of milestone 2.

Clarify milestone category with reviewers
Recommendation 2

To help ensure that reviewers are aligned with program managers in how they interpret the proposal milestones (if they aren’t directly involved in defining milestones), we suggest that either reviewers be informed of how program managers are categorizing the proposal outputs so they can conduct their review accordingly or allow reviewers to decide the category (and thus how the total expected utility is calculated), whether individually or collectively or both.

Allow for a flexible number of milestones
Recommendation 3

We chose to use only two of the goals that proposal authors provided because we wanted to standardize the number of milestones across proposals. However, this may have provided an incomplete picture of the proposals’ goals, and thus an incomplete assessment of the proposals. We recommend that future implementations be flexible and allow the number of milestones to be determined based on each proposal’s needs. This would also help accommodate one of the reviewers’ suggestion that some milestones should be broken down into intermediary steps.

Importance of Reviewer Explanations

As one can tell from the above discussion, reviewers’ explanation of their forecasts were crucial to understanding how they interpreted the milestones. Reviewers’ explanations varied in length and detail, but the most insightful responses broke down their reasoning into detailed steps and addressed (1) ambiguities in the milestone and how they chose to interpret ambiguities if they existed, (2) the state of the scientific field and the maturity of different techniques that the authors propose to use, and (3) factors that improve the likelihood of success versus potential barriers or challenges that would need to be overcome.

Exponential Impact Scales Better Reflect the Real Distribution of Impact 

The distribution of NIH and NSF proposal peer review scores tends to be skewed such that most proposals are rated above the center of the scale and there are few proposals rated poorly. However, other markers of scientific impact, such as citations (even with all of its imperfections), tend to suggest a long tail of studies with high impact. This discrepancy suggests that traditional peer review scoring systems are not well-structured to capture the nonlinearity of scientific impact, resulting in score inflation. The aggregation of scores at the top end of the scale also means that very negative scores have a greater impact than very positive scores when averaged together, since there’s more room between the average score and the bottom end of the scale. This can generate systemic bias against more controversial or risky proposals.

In our pilot, we chose to use an exponential scale with a base of 2 for impact to better reflect the real distribution of scientific impact. Using this exponential impact scale, we conducted a survey of a small pool of academics in the life sciences about how they would rate the impact of the average funded NIH R01 grant. They responded with an average scientific impact score 5 and an average social impact score of 3, which are much lower on our scale compared to traditional peer review scores4, suggesting that the exponential scale may be beneficial for avoiding score inflation and bunching at the top. In our pilot, the distribution of scientific impact scores was centered higher than 5, but still less skewed than NIH peer review scores for significance and innovation typically are. This partially reflects the fact that proposals were expected to be funded at one to two orders of magnitude more than NIH R01 proposals are, so impact should also be greater. The distribution of social impact scores exhibits a much wider spread and lower center.

Figure 1. Distribution of Impact scores for milestone 1 (top) and 2 (bottom)

Conclusion

In summary, expected utility forecasting presents a promising approach to improving the rigor of peer review and quantitatively defining the risk-reward profile of science proposals. Our pilot study suggests that this approach can be quite user-friendly for reviewers, despite its apparent complexity. Further study into how best to integrate forecasting into panel environments, define proposal milestones, and calibrate impact scales will help refine future implementations of this approach. 

More broadly, we hope that this pilot will encourage more grantmaking institutions to experiment with innovative funding mechanisms. Reviewers in our pilot were more open-minded and quick-to-learn than one might expect and saw significant value in this unconventional approach. Perhaps this should not be so much of a surprise given that experimentation is at the heart of scientific research. 

Interested grantmakers, both public and private, and policymakers are welcome to reach out to our team if interested in learning more or receiving assistance in implementing this approach. 

Acknowledgements

Many thanks to Jordan Dworkin for being an incredible thought partner in designing the pilot and providing meticulous feedback on this report. Your efforts made this project possible!


Appendix A: Pilot Study Design

Our pilot study consisted of five proposals for life science-related Focused Research Organizations (FROs). These proposals were solicited from academic researchers by FAS as part of our advocacy for the concept of FROs. As such, these proposals were not originally intended as proposals for direct funding, and did not have as strict content requirements as traditional grant proposals typically do. Researchers were asked to submit one to two page proposals discussing (1) their research concept, (2) the motivation and its expected social and scientific impact, and (3) the rationale for why this research can not be accomplished through traditional funding channels and thus requires a FRO to be funded.

Permission was obtained from proposal authors to use their proposals in this study. We worked with proposal authors to define two milestones for each proposal that reviewers would assess: one that they felt confident that they could achieve and one that was more ambitious but that they still thought was feasible. In addition, due to the brevity of the proposals, we included an additional 1-2 pages of supplementary information and scientific context. Final drafts of the milestones and supplementary information were provided to authors to edit and approve. Because this pilot study could not provide any actual funding to proposal authors, it was not possible to solicit full length research proposals from proposal authors.

We recruited four to six reviewers for each proposal based on their subject matter expertise. Potential participants were recruited over email with a request to help review a FRO proposal related to their area of research. They were informed that the review process would be unconventional but were not informed of the study’s purpose. Participants were offered a small monetary compensation for their time.

Confirmed participants were sent instructions and materials for the review process on the same day and were asked to complete their review by the same deadline a month and a half later. Reviewers were told to assume that, if funded, each proposal would receive $50 million in funding over five years to conduct the research, consistent with the proposed model for FROs. Each proposal had two technical milestones, and reviewers were asked to answer the following questions for each milestone: 

  1. Assuming that the proposal is funded by 2025, will the milestone be achieved before 2031?
  2. What will be the average scientific impact score, as judged in 2032, of accomplishing the milestone?
  3. What will be the average social impact score, as judged in 2032, of accomplishing the milestone?

The impact scoring system was explained to reviewers as follows:

Please consider the following in determining the impact score: the current and expected long-term social or scientific impact of a funded FRO’s outputs if a funded FRO accomplishes this milestone before 2030.

The impact score we are using ranges from 1 (low) to 10 (high). It is base 2 exponential, meaning that a proposal that receives a score of 5 has double the impact of a proposal that receives a score of 4, and quadruple the impact of a proposal that receives a score of 3. In a small survey we conducted of SMEs in the life sciences, they rated the scientific and social impact of the average NIH R01 grant — a federally funded research grant that provides $1-2 million for a 3-5 year endeavor — on this scale to be 5.2 ± 1.5 and 3.1 ± 1.3, respectively. The median scores were 4.75 and 3.00, respectively.

Below is an example of how a predicted impact score distribution (left) would translate into an actual impact distribution (right). You can try it out yourself with this interactive version (in the menu bar, click Runtime > Run all) to get some further intuition on how the impact score works. Please note that this is meant solely for instructive purposes, and the interface is not designed to match Metaculus’ interface.

The choice of an exponential impact scale reflects the tendency in science for a small number of research projects to have an outsized impact. For example, studies have shown that the relationship between the number of citations for a journal article and its percentile rank scales exponentially.

Scientific impact aims to capture the extent to which a project advances the frontiers of knowledge, enables new discoveries or innovations, or enhances scientific capabilities or methods. Though each is imperfect, one could consider citations of papers, patents on tools or methods, or users of software or datasets as proxies of scientific impact. 

Social impact aims to capture the extent to which a project contributes to solving important societal problems, improving well-being, or advancing social goals. Some proxy metrics that one might use to assess a project’s social impact are the value of lives saved, the cost of illness prevented, the number of job-years of employment generated, economic output in terms of GDP, or the social return on investment. 

You may consider any or none of these proxy metrics as a part of your assessment of the impact of a FRO accomplishing this milestone.

Reviewers were asked to submit their forecasts on Metaculus’ website and to provide their reasoning in a separate Google form. For question 1, reviewers were asked to respond with a single probability. For questions 2 and 3, reviewers were asked to provide their median, 25th percentile, and 75th percentile predictions, in order to generate a probability distribution. Metaculus’ website also included information on the resolution criteria of each question, which provided guidance to reviewers on how to answer the question. Individual reviewers were blind to other reviewers’ responses until after the submission deadline, at which point the aggregated results of all of the responses were made public on Metaculus’ website. 

Additionally, in the Google form, reviewers were asked to answer a survey question about their experience: “What did you think about this review process? Did it prompt you to think about the proposal in a different way than when you normally review proposals? If so, how? What did you like about it? What did you not like? What would you change about it if you could?” 

Some participants did not complete their review. We received 19 complete reviews in the end, with each proposal receiving three to six reviews. 

Study Limitations

Our pilot study had certain limitations that should be noted. Since FAS is not a grantmaking institution, we could not completely reproduce the same types of research proposals that a grantmaking institution would receive nor the entire review process. We will highlight these differences in comparison to federal science agencies, which are our primary focus.

  1. Review Process: There are typically two phases to peer review at NIH and NSF. First, at least three individual reviewers with relevant subject matter expertise are assigned to read and evaluate a proposal independently. Then, a larger committee of experts is convened. There, the assigned reviewers present the proposal and their evaluation, and then the committee discusses and determines the final score for the proposal. Our pilot study only attempted to replicate the first phase of individual review.
  1. Sample Size: In our pilot, the sample size was quite small, since only five proposals were reviewed, and they were all in different subfields, so different reviewers were assigned to each proposal. NIH and NSF peer review committees typically focus on one subfield and review on the order of twenty or so proposals. The number of reviewers per proposal–three to six–in our pilot was consistent with the number of reviewers typically assigned to a proposal by NIH and NSF. Peer review committees are typically larger, ranging from six to twenty people, depending on the agency and the field.
  1. Proposals: The FRO proposals plus supplementary information were only two to four pages long, which is significantly shorter than the 12 to 15 page proposals that researchers submit for NIH and NSF grants. Proposal authors were asked to generally describe their research concept, but were not explicitly required to describe the details of the research methodology they would use or any preliminary research. Some proposal authors volunteered more information on this for the supplementary information, but not all authors did. 
  1. Grant Size: For the FRO proposals, reviewers were asked to assume that funded proposals would receive $50 million over five years, which is one to two orders of magnitude more funding than typical NIH and NSF proposals.

Appendix B: Feedback on Study-Specific Implementation

In addition to feedback about the review framework, we received feedback on how we implemented our pilot study, specifically the instructions and materials for the review process and the submission platforms. This feedback isn’t central to this paper’s investigation of expected value forecasting, but we wanted to include it in the appendix for transparency.

Reviewers were sent instructions over email that outlined the review process and linked to Metaculus’ webpage for this pilot. On Metaculus’ website, reviewers could find links to the proposals on FAS’ website and the supplementary information in Google docs. Reviewers were expected to read those first and then read through the resolution criteria for each forecasting question before submitting their answers on Metaculus’ platform. Reviewers were asked to submit the explanations behind their forecasts in a separate Google form.

Some reviewers had no problem navigating the review process and found Metaculus’ website easy to use. However, feedback from other reviewers suggested that the different components necessary for the review were spread out over too many different websites, making it difficult for reviewers to keep track of where to find everything they needed.

Some had trouble locating the different materials and pieces of information needed to conduct the review on Metaculus’ website. Others found it confusing to have to submit their forecasts and explanations in two separate places. One reviewer suggested that the explanation of the impact scoring system should have been included within the instructions sent over email rather than in the resolution criteria on Metaculus’ website so that they could have read it before reading the proposal. Another reviewer suggested that it would have been simpler to submit their forecasts through the same Google form that they used to submit their explanations rather than through Metaculus’ website. 

Based on this feedback, we would recommend that future implementations streamline their submission process to a single platform and provide a more extensive set of instructions rather than seeding information across different steps of the review process. Training sessions, which science funding agencies typically conduct, would be a good supplement to written instructions.

Appendix C: Total Expected Utility Calculations

To calculate the total expected utility, we first converted all of the impact scores into utility by taking two to the exponential of the impact score, since the impact scoring system is base 2 exponential:

Utility=2Impact Score.

We then were able to average the utilities for each milestone and conduct additional calculations. 

To calculate the total utility of each milestone, ui, we averaged the social utility and the scientific utility of the milestone:

ui = (Social Utility + Scientific Utility)/2.

The total expected utility (TEU) of a proposal with two milestones can be calculated according to the general equation:

TEU = u1P(m1 ∩ not m2) + u2P(m2 ∩ not m1) + (u1+u2)P(m1m2),

where P(mi) represents the probability of success of milestone i and

P(m1 ∩ not m2) = P(m1) – P(m1 ∩ m2)
P(m2 ∩ not m1) = P(m2) – P(m1 ∩ m2).

For sequential milestones, milestone 2 is defined as inclusive of milestone 1 and wholly dependent on the success of milestone 1, so this means that

u2, seq = u1+u2
P(m2) = Pseq(m1 ∩ m2)
P(m2 ∩ not m1) = 0.

Thus, the total expected utility of sequential milestones can be simplified as

TEU = u1P(m1)-u1P(m2) + (u2, seq)P(m2)
TEU = u1P(m1) + (u2, seq-u1)P(m2)

This can be generalized to

TEUseq = Σi(ui, seq-ui-1, seq)P(mi).

Otherwise, the total expected utility can be simplified to 

TEU = u1P(m1) + u2P(m2) – (u1+u2)P(m1 ∩ m2).

For independent outcomes, we assume 

Pind(m1 ∩ m2) = P(m1)P(m2), 

so

TEUind = u1P(m1) + u2P(m2) – (u1+u2)P(m1)P(m2).

To present the results in Tables 1 and 2, we converted all of the utility values back into the impact score scale by taking the log base 2 of the results.

FY24 NDAA AI Tracker

As both the House and Senate gear up to vote on the National Defense Authorization Act (NDAA), FAS is launching this live blog post to track all proposals around artificial intelligence (AI) that have been included in the NDAA. In this rapidly evolving field, these provisions indicate how AI now plays a pivotal role in our defense strategies and national security framework. This tracker will be updated following major updates.

Senate NDAA. This table summarizes the provisions related to AI from the version of the Senate NDAA that advanced out of committee on July 11. Links to the section of the bill describing these provisions can be found in the “section” column. Provisions that have been added in the manager’s package are in red font. Updates from Senate Appropriations committee and the House NDAA are in blue.

Senate NDAA Provisions
ProvisionSummarySection
Generative AI Detection and Watermark CompetitionDirects Under Secretary of Defense for Research and Engineering to create a competition for technology that detects and watermarks the use of generative artificial intelligence.218
DoD Prize Competitions for Business Systems ModernizationAuthorizes competitions to improve military business systems, emphasizing the integration of AI where possible.221
Broad review and update of DoD AI StrategyDirects the Secretary of Defense to perform a periodic review and update of its 2018 AI strategy, and to develop and issue new guidance on a broad range of AI issues, including adoption of AI within DoD, ethical principles for AI, mitigation of bias in AI, cybersecurity of generative AI, and more.222
Strategy and assessment on use of automation and AI for shipyard optimizationDevelopment of a strategy on the use of AI for Navy shipyard logistics332
Strategy for talent development and management of DoD Computer Programming WorkforceEstablishes a policy for “appropriate” talent development and management policies, including for AI skills.1081
Sense of the Senate Resolution in Support of NATOOffers support for NATO and NATO’s DIANA program as critical to AI and other strategic priorities1238 | 1239
Enhancing defense partnership with IndiaDirects DoD to enhance defense partnership with India, including collaboration on AI as one potential priority area.1251
Specification of Duties for Electronic Warfare Executive CommitteeAmends US code to specify the duties of the Electronic Warfare Executive Committee, including an assessment of the need for automated, AI/ML-based electronic warfare capabilities1541
Next Generation Cyber Red TeamsDirects the DoD and NSA to submit a plan to modernize cyber red-teaming capabilities, ensuring the ability to emulate possible threats, including from AI1604
Management of Data Assets by Chief Digital OfficerOutlines responsibilities for CDAO to provide data analytics capabilities needed for “global cyber-social domain.”1605
Developing Digital Content Provenance CourseDirects Director of Defense Media Activity to develop a course on digital content provenance, including digital forgeries developed with AI systems, e.g. AI-generated “deepfakes,”1622

Report on Artificial Intelligence Regulation in Financial Services Industry

Directs regulators of the financial services industry to produce reports analyzing how AI is and ought to be used by the industry and by regulators6096

AI Bug Bounty Programs

Directs CDAO to develop a bug bounty program for AI foundation models that are being integrated in DOD operations6097

Vulnerability analysis study for AI-enabled military applications

Directs CDAO to complete a study analyzing vulnerabilities to the privacy, security, and accuracy of AI-enabled military applications, as well as R&D needs for such applications, including foundation models.6098

Report on Data Sharing and Coordination

Directs SecDef to to submit a report on ways to improve data sharing across DoD6099

Establishment of Chief AI Officer of the Department of State

Establishes within the Department of State a Chief AI Officer, who may also serve as Chief Data Officer to oversee adoption of AI in the Department and to advise the Secretary of State on the use of AI in conducting data-informed diplomacy.6303

House NDAA. This table summarizes the provisions related to AI from the version of the House NDAA that advanced out of committee. Links to the section of the bill describing these provisions can be found in the “section” column.

House NDAA Provisions
ProvisionSummarySection
Process to ensure the responsible development and use of artificial intelligenceDirects CDAO to develop a process for assessing whether AI technology used by DoD is functioning responsibly, including through the development of clear standards, and to amend AI technology as needed220
Intellectual property strategyDirects DoD to develop an intellectual property strategy to enhance capabilities in procurement of emerging technologies and capabilities263
Study on establishment of centralized platform for development and testing of autonomy softwareDirects SecDef and CDAO to conduct a study, assessing the feasibility and advisability of developing a centralized platform to develop and test autonomous software.264
Congressional notification of changes to Department of Defense policy on autonomy in weapon systemsRequires that Congress be notified of changes to DoD Directive 3000.09 (on autonomy in weapons systems) within 30 days of any changes266
Sense of Congress on dual use innovative technology for the robotic combat vehicle of the ArmyThis offers support for the Army’s acquisition strategy for the Robot Combat Vehicle program, and recommends that the Army consider a similar framework for future similar programs.267
Pilot program on optimization of aerial refueling and fuel management in contested logistics environments through use of artificial intelligenceDirects CDAO, USD(A&S), and Air Force to develop a pilot program to optimize the logistics of aerial refueling and to consider the use of AI technology to help with this mission.266
Modification to acquisition authority of the senior official with principal responsibility for artificial intelligence and machine learningIncreases annual acquisition authority for CDAO from $75M to $125M, and extends this authority from 2025 to 2029.827
Framework for classification of autonomous capabilitiesDirects CDAO and others within DoD to establish a department-wide classification framework for autonomous capabilities to enable easier use of autonomous systems in the department.930

Funding Comparison. The following tables compare the funding requested in the President’s budget to funds that are authorized in current House and Senate versions of the NDAA. All amounts are in thousands of dollars.

Funding Comparison
ProgramRequestedAuthorized in HouseAuthorized in SenateNEW! Passed in Senate Approps 7/27NEW! Passed in full House 9/28
Other Procurement, Army–Engineer (non-construction) equipment: Robotics and Applique Systems68,89368,89368,893

65,118 (-8,775 for “Effort previously funded,” +5,000 for “Soldier borne sensor”)

73,893 (+5,000 for “Soldier borne sensor”)

AI/ML Basic Research, Army10,70810,70810,708

10,708

10,708

AI/ML Technologies, Army24,14224,14224,142

27,142 (+3,000 for “Automated battle damage assessment and adjust fire”)

24

AI/ML Advanced Technologies, Army13,18715,687
(+ 2,500 for “Autonomous Long Range Resupply”)
18,187
(+ 5,000 for “Tactical AI & ML”)

24,687 (+11,500 for “Cognitive computing architecture
for military systems”)

13,187

AI Decision Aids for Army Missile Defense Systems Integration06,0000

0

0

Robotics Development, Army3,0243,0243,024

3,024

3,024

Ground Robotics, Army35,31935,31935,319

17,337 (-17,982 for “SMET Inc II early to need”)

45,319 (+10,000 for “common robotic controller”)

Applied Research, Navy: Long endurance mobile autonomous passive acoustic sensing research02,5000

0

0

Advanced Components, Navy: Autonomous surface and underwater dual-modality vehicles05,0000

3,000

0

Air Force University Affiliated Research Center (UARC)—Tactical Autonomy8,0188,0188,018

8,018

8,018

Air Force Applied Research: Secure Interference Avoiding Connectivity of Autonomous AI Machines03,0005,000

0

0

Air Force Advanced Technology Development: Semiautonomous adversary air platform0010,000

0

0

Advanced Technology Development, Air Force: High accuracy robotics02,5000

0

0

Air Force Autonomous Collaborative Platforms118,826176,013
(+ 75,000 for Project 647123: Air-Air Refueling TMRR,
-17,813 for Technical realignment )
101,013
(- 17,813 for DAF requested realignment of funds)

101,013

101,013

Space Force: Machine Learning Techniques for Radio Frequency (RF) Signal Monitoring and Interference Detection010,0000

0

0

Defense-wide: Autonomous resupply for contested logistics02,5000

0

0

Military Construction–Pennsylvania Navy Naval Surface Warfare Center Philadelphia: AI Machinery Control Development Center088,20088,200

0

0

Intelligent Autonomous Systems for Seabed Warfare007,000

5,000

0

Funding for Office of Chief Digital and Artificial Intelligence Officer
ProgramRequestedAuthorized in HouseAuthorized in SenateNEW! Passed in Senate AppropsNEW! Passed in full House
Advanced Component Development and Prototypes34,35034,35034,350

34,350

34,350

System Development and Demonstration615,245570,246
(-40,000 for “insufficient justification,” -5,000 for “program decrease.”)
615,246

246,003 (-369,243, mostly for functional transfers to JADC2 and Alpha-1)

704,527 (+89,281, mostly for “management innovation pilot” and transfers from other programs for “enterprise digital alignment”)

Research, Development, Test, and Evaluation17,24717,24717,247

6,882 (-10,365, “Functional transfer to line 130B for ALPHA-1″)

13,447 (-3,800 for “excess growth”)

Senior Leadership Training Courses02,7500

0

0

ALPHA-1000

222,723

0


On Senate Approps Provisions

The Senate Appropriations Committee generally provided what was requested in the White House’s budget regarding artificial intelligence (AI) and machine learning (ML), or exceeded it. AI was one of the top-line takeaways from the Committee’s summary of the defense appropriations bill. Particular attention has been paid to initiatives that cut across the Department of Defense, especially the Chief Digital and Artificial Intelligence Office (CDAO) and a new initiative called Alpha-1. The Committee is supportive of Joint All-Domain Command and Control (JADC2) integration and the recommendations of the National Security Commission on Artificial Intelligence (NSCAI).

On House final bill provisions

Like the Senate Appropriations bill, the House of Representatives’ final bill generally provided or exceeded what was requested in the White House budget regarding AI and ML. However, in contract to the Senate Appropriations bill, AI was not a particularly high-priority takeaway in the House’s summary. The only note about AI in the House Appropriations Committee’s summary of the bill was in the context of digital transformation of business practices. Program increases were spread throughout the branches’ Research, Development, Test, and Evaluation budgets, with a particular concentration of increased funding for the Defense Innovation Unit’s AI-related budget.

Systems Thinking In Entrepreneurship Or: How I Learned To Stop Worrying And Love “Entrepreneurial Ecosystems”

As someone who works remotely and travels quite a long way to be with my colleagues, I really value my “water cooler moments” in the FAS office, when I have them. The idea for this series came from one such moment, when Josh Schoop and I were sharing a sparkling water break. Systems thinking, we realized, is a through line in many parts of our work, and part of the mental model that we share that leads to effective change making in complex, adaptive systems. In the geekiest possible terms:

A diagram of 'water cooler conversations' from a Systems Thinking perspective
Figure 1: Why Water Cooler Conversations Work

Systems analysis had been a feature of Josh’s dissertation, while I had had an opportunity to study a slightly more “quant” version of the same concepts under John Sterman at MIT Sloan, through my System Dynamics coursework. The more we thought about it, systems thinking and system dynamics were present across the team at FAS–from our brilliant colleague Alice Wu, who had recently given a presentation on Tipping Points, to folks who had studied the topic more formally as engineers, or as students at Michigan and MIT.  This led to the first meeting of our FAS “Systems Thinking Caucus” and inspired  a series of blog posts which intend to make this philosophical through-line more clear. This is just the first, and describes how and why systems thinking is so important in the context of entrepreneurship policy, and how systems modeling can help us better understand which policies are effective. 


The first time I heard someone described as an “ecosystem builder,” I am pretty sure that my eyes rolled involuntarily. The entrepreneurial community, which I have spent my career supporting, building, and growing, has been my professional home for the last 15 years. I came to this work not out of academia, but out of experience as an entrepreneur and leader of entrepreneur support programs. As a result, I’ve always taken a pragmatic approach to my work, and avoided (even derided) buzzwords that make it harder to communicate about our priorities and goals. In the world of tech startups, in which so much of my work has roots, buzzwords from “MVP” to “traction” are almost a compulsion. Calling a community an “ecosystem” seemed no different to me, and totally unnecessary. 

And yet, over the years, I’ve come to tolerate, understand, and eventually embrace “ecosystems.” Not because it comes naturally, and not because it’s the easiest word to understand, but because it’s the most accurate descriptor of my experience and the dynamics I’ve witnessed first-hand. 

So what, exactly, are innovation ecosystems? 

My understanding of innovation ecosystems is grounded first in the experience of navigating one in my hometown of Kansas City–first, as a newly minted entrepreneur, desperately seeking help understanding how to do taxes, and later as a leader of an entrepreneur support organization (ESO), a philanthropic funder, and most recently, as an angel investor. It’s also informed by the academic work of Dr. Fiona Murray and Dr. Phil Budden. The first time that I saw their stakeholder model of innovation ecosystems, it crystallized what I had learned through 15 years of trial-and-error into a simple framework. It resonated fully with what I had seen firsthand as an entrepreneur desperate for help and advice–that innovation ecosystems are fundamentally made up of people and institutions that generally fall into the same categories:  entrepreneurs, risk capital, universities, government, or corporations. 

Over time–both as a student and as an ecosystem builder, I came to see the complexity embedded in this seemingly simple idea and evolved my view. Today, I amend that model of innovation ecosystems to, essentially, split universities into two stakeholder groups: research institutions and workforce development. I take this view because, though not every secondary institution is a world-leading research university like MIT, smaller and less research-focused colleges and universities play important roles in an innovation ecosystem. Where is the room for institutions like community colleges, workforce development boards, or even libraries in a discussion that is dominated by the need to commercialize federally-funded research? Two goals–the production of human capital and the production of intellectual property–can also sometimes be in tension in larger universities, and thus are usually represented by different people with different ambitions and incentives. The concerns of  a tech transfer office leader are very different from those of a professor in an engineering or business school, though they work for the same institution and may share the same overarching aspirations for a community. Splitting the university stakeholder into two different stakeholder groups makes the most sense to me–but the rest of the stakeholder model comes directly from Dr. Murray and Dr. Budden. 

IMAGE: An innovation ecosystem stakeholder model a network of labeled nodes, including entrepreneur, workforce, research, corporations, government, and capital nodes, each connected to the other.
Figure 2: Innovation Ecosystem Stakeholder Model

One important consideration in thinking about innovation ecosystems is that boundaries really do matter. Innovation ecosystems are characterized by the cooperation and coordination of these stakeholder groups–but not everything these stakeholders do is germane to their participation in the ecosystem, even when it’s relevant to the industry that the group is trying to build or support. 

As an example, imagine a community that is working to build a biotech innovation ecosystem. Does the relocation of a new biotech company to the area meaningfully improve the ecosystem? Well, that depends! It might, if that company actively engages in efforts to build the ecosystem say, by directing an executive to serve on the board of an ecosystem building nonprofit, helping to inform workforce development programs relevant to their talent needs, instructing their internal VC to attend the local accelerator’s demo day, offering dormant lab space in their core facility to a cash-strapped startup at cost, or engaging in sponsored research with the local university. Relocation of the company may not improve the ecosystem  if they simply happen to be working in the targeted industry and receive a relocation tax credit. In short, by itself, shared work between two stakeholders on an industry theme does not constitute ecosystem building. That shared work must advance a vision that is shared by all of the stakeholders that are core to the work.

Who are the stakeholders in innovation ecosystems? 

Innovation ecosystems are fundamentally made up of six different kinds of stakeholders, who, ideally, work together to advance a shared vision  grounded in a desire to make the entrepreneurial experience easier. One of the mistakes I often see in efforts to build innovation ecosystems is an imbalance or an absence of a critical stakeholder group. Building innovation ecosystems is not just about involving many people (though it helps), it’s about involving people that represent different institutions and can help influence those institutions to deploy resources in support of a common effort. Ensuring stakeholder engagement is not a passive box-checking activity, but an active resource-gathering one. 

An innovation ecosystem in which one or more stakeholders is absent will likely struggle to make an impact. Entrepreneurs with no access to capital don’t go very far, nor do economic development efforts without government buy-in, or a workforce training program without employers. 

In the context of today’s bevvy of federal innovation grant opportunities with 60-day deadlines, it can be tempting to “go to war with the army you have” instead of prioritizing efforts to build relationships with new corporate partners or VCs. But how would you feel if you were “invited” to do a lot of work and deploy your limited resources to advance a plan that you had no hand in developing? Ecosystem efforts that invest time in building relationships and trust early will benefit from their coordination, regardless of federal funding.  

These six stakeholder groups are listed in Figure 2 and include: 

In the context of regional, place-based innovation clusters (including tech hubs), this stakeholder model is a tool that can help a burgeoning coalition both assess the quality and capacity of their ecosystem in relation to a specific technology area and provide a guide to prompt broad convening activities. From the standpoint of a government funder of innovation ecosystems, this model can be used as a foundation for conducting due diligence on the breadth and engagement of emerging coalitions. It can also be used to help articulate the shortcomings of a given community’s engagements, to highlight ecosystem strengths and weaknesses, and to design support and communities of practice that convene stakeholder groups across communities.

What about entrepreneur support organizations (ESO)? What about philanthropy? Where do they fit into the model? 

When I introduce this model to other ecosystem builders, one of the most common questions I get is, “where do ESOs fit in?” Most ESOs like to think of themselves as aligned with entrepreneurs, but that merits a few cautionary notes. First, the critical question you should ask to figure out where an ESO, a Chamber or any other shape-shifting organization fits into this model is, “what is their incentive structure?” That is to say, the most important thing is to understand to whom an organization is accountable. When I worked for the Enterprise Center in Johnson County, despite the fact that I would have sworn up-and-down that I belonged in the “E” category with the entrepreneurs I served, our sustaining funding was provided by the county government. My core incentive was to protect the interests of a political subdivision of the metro area, and a perceived failure to do that would have likely resulted in our organization’s funding being cut (or at least, in my being fired from it). That means that I truly was a “G,” or a government stakeholder. So, intrepid ESO leader, unless the people that fund, hire, and fire you are majority entrepreneurs, you’re likely not an “E.”

The second danger of assuming that ESOs are, in fact, entrepreneurs, is that it often leads to a lack of actual entrepreneurs in the conversation. ESOs stand in for entrepreneurs who are too busy to make it to the meeting. But the reality is that even the most well-meaning ESOs have a different incentive structure than entrepreneurs–meaning that it is very difficult for them to naturally represent the same views. Take for instance, a community survey of entrepreneurs that finds that entrepreneurs see “access to capital” as the primary barrier to their growth in a given community. In my experience, ESOs generally take that somewhat literally, and begin efforts to raise investment funds. Entrepreneurs, on the other hand who simply meant “I need more money,” might see many pathways to getting it, including by landing a big customer. (After all, revenue is the cheapest form of cash.) This often leads ESOs to prioritize problems that match their closest capabilities, or the initiatives most likely to be funded by government or philanthropic grants. Having entrepreneurs at the table directly is critically important, because they see the hairiest and most difficult problems first–and those are precisely the problems it take a big group of stakeholders to solve. 

Finally, I have seen folks ask a number of times where philanthropy fits into the model. The reality is that I’m not sure. My initial reaction is that most philanthropic organizations have a very clear strategic reason for funding work happening in ecosystems–their theory of change should make it clear which stakeholder views they represent. For example, a community foundation might act like a “government” stakeholder, while a funder of anti-poverty work who sees workforce development as part or their theory of change is quite clearly part of the “W” group. But not every philanthropy has such a clear view, and in some cases, I think philanthropic funders, especially those in small communities, can think of themselves as a “shadow stakeholder,” standing in for different viewpoints that are missing in a conversation. Finally, philanthropy might play a critical and underappreciated role as a “platform creator.” That is, they might seed the conversation about innovation ecosystems in a community, convene stakeholders for the first time, or fund activities that enable stakeholders to work and learn together, such as planning retreats, learning journeys, or simply buying the coffee or providing the conference room for a recurring meeting. Finally, and especially right now, philanthropy has an opportunity to act as an “accelerant,” supporting communities by offering the matching funds that are so critical to their success in leveraging federal funds.  

Why is “ecosystem” the right word? 

Innovation ecosystems, like natural systems, are both complex and adaptive. They are complex because they are systems of systems. Each stakeholder in an innovation ecosystem is not just one person, but a system of people and institutions with goals, histories, cultures, and personalities. Not surprisingly, these systems of systems are adaptive, because they are highly connected and thus produce unpredictable, ungovernable performance. It is very, very difficult to predict what will happen in a complex system, and most experts in fields like system dynamics will tell you that a model is never truly finished, it is just “bounded.” In fact, the way that the quality of a systems model is usually judged is based on how closely it maps to a reference mode of output in the past. This means that the best way to tell whether your systems model is any good is to give it “past” inputs, run it, and see how closely it compares to what actually happened. If I believe that job creation is dependent on inflation, the unemployment rate, availability of venture capital, and the number of computer science majors graduating from a local university, one way to test if that is truly the case is to input those numbers over the past 20 years, run a simulation of how many jobs would be created, according to the equations in my model, and seeing how closely that maps to the actual number of jobs created in my community over the same time period. If the line maps closely, you’ve got a good model. If it’s very different, try again, with more or different variables. It’s quite easy to see how this trial-and-error based process can end up with an infinitely expanding equation of increasing complexity, which is why the “bounds” of the model are important. 

Finally, complex, adaptive systems are, as my friend and George Mason University Professor Dr. Phil Auerswald says, “self-organizing and robust to intervention”. That is to say, it is nearly impossible to predict a linear outcome (or whether there will be any outcome at all) based on just a couple of variables. This means that the simple equation(money in = jobs out) is wrong. To be better able to understand the impact of a complex, adaptive system requires mapping the whole system and understanding how many different variables change cyclically and in relation to each other over a long period of time. It also requires understanding the stochastic nature of each variable. That is a very math-y way of saying it requires understanding the precise way in which each variable is unpredictable, or the shape of its bell-curve.

All of this is to say that understanding and evaluation of innovation ecosystems requires an entirely different approach than the linear jobs created = companies started * Moretti multiplier assumptions of the past. 

So how do you know if ecosystems are growing or succeeding if the number of jobs created doesn’t matter? 

The point of injecting complexity thinking into our view of ecosystems is not to create a sense of hopelessness. Complex things can be understood–they are not inherently chaotic. But trying to understand these ecosystems through traditional outputs and outcomes is not the right approach since those outputs and outcomes are so unpredictable in the context of a complex system. We need to think differently about what and how we measure to demonstrate success. The simplest and most reliable thing to measure in this situation then becomes the capacities of the stakeholders themselves, and the richness or quality of the connections between them. This is a topic we’ll dive into further in future posts.

Enabling Faster Funding Timelines in the National Institutes of Health

Summary

The National Institutes of Health (NIH) funds some of the world’s most innovative biomedical research, but rising administrative burden and extended wait times—even in crisis—have shown that its funding system is in desperate need of modernization. Examples of promising alternative models exist: in the last two years, private “fast science funding” initiatives such as Fast Grants and Impetus Grants have delivered breakthroughs in responding to the coronavirus pandemic and aging research on days to one-month timelines, significantly faster than the yearly NIH funding cycles. In response to the COVID-19 pandemic the NIH implemented a temporary fast funding program called RADx, indicating a willingness to adopt such practices during acute crises. Research on other critical health challenges like aging, the opioid epidemic, and pandemic preparedness deserves similar urgency. We therefore believe it is critical that the NIH formalize and expand its institutional capacity for rapid funding of high-potential research.

Using the learnings of these fast funding programs, this memo proposes actions that the NIH could take to accelerate research outcomes and reduce administrative burden. Specifically, the NIH director should consider pursuing one of the following approaches to integrate faster funding mechanisms into its extramural research programs: 

Future efforts by the NIH and other federal policymakers to respond to crises like the COVID-19 pandemic would also benefit from a clearer understanding of the impact of the decision-making process and actions taken by the NIH during the earliest weeks of the pandemic. To that end, we also recommend that Congress initiate a report from the Government Accountability Office to illuminate the outcomes and learnings of fast governmental programs during COVID-19, such as RADx.

Challenge and Opportunity

The urgency of the COVID-19 pandemic created adaptations not only in how we structure our daily lives but in how we develop therapeutics and fund science. Starting in 2020, the public saw a rapid emergence of nongovernmental programs like Fast Grants, Impetus Grants, and Reproductive Grants to fund both big clinical trials and proof-of-concept scientific studies within timelines that were previously thought to be impossible. Within the government, the NIH launched RADx, a program for the rapid development of coronavirus diagnostics with significantly accelerated approval timelines. Though the sudden onset of the pandemic was unique, we believe that an array of other biomedical crises deserve the same sense of urgency and innovation. It is therefore vital that the new NIH director permanently integrate fast funding programs like RADx into the NIH in order to better respond to these crises and accelerate research progress for the future. 

To demonstrate why, we must remember that the coronavirus is far from being an outlier—in the last 20 years, humanity has gone through several major pandemics, notably swine flu, SARS CoV-1, and Ebola. Based on the long-observed history of infectious diseases, the risk of pandemics with an impact similar to that of COVID-19 is about two percent in any year. An extension of naturally occurring pandemics is the ongoing epidemic of opioid use and addiction. The rapidly changing landscape of opioid use—with overdose rates growing rapidly and synthetic opioid formulations becoming more common—makes slow, incremental grantmaking ill-suited for the task. The counterfactual impact of providing some awards via faster funding mechanisms in these cases is self-evident: having tests, trials, and interventions earlier saves lives and saves money, without sacrificing additional resources.

Beyond acute crises, there are strong longer-term public health motivations for achieving faster funding of science. In about 10 years, the United States will have more seniors (people aged 65+) than children. This will place substantial stress on the U.S. healthcare system, especially given that two-thirds of seniors suffer from more than one chronic disease. New disease treatments may help, but it often takes years to translate the results of basic research into approved drugs. The idiosyncrasies of drug discovery and clinical trials make them difficult to accelerate at scale, but we can reliably accelerate drug timelines on the front end by reducing the time researchers spend in writing and reviewing grants—potentially easing the long-term stress on U.S. healthcare.

The existing science funding system developed over time with the best intentions, but for a variety of reasons—partly because the supply of federal dollars has not kept up with demand—administrative requirements have become a major challenge for many researchers. According to surveys, working scientists now spend 44% of their research time on administrative activities and compliance, with roughly half of that time spent on pre-award activities. Over 60% of scientists say administrative burden compromises research productivity, and many fear it discourages students from pursuing science careers. In addition, the wait for funding can be extensive: one of the major NIH grants, R01, takes more than three months to write and around 8–20 months to receive (see FAQ). Even proof-of-concept ideas face onerous review processes and take at least a year to fund. This can bottleneck potentially transformative ideas, as with Katalin Kariko famously struggling to get funding for her breakthrough mRNA vaccine work when it was at its early stages. These issues have been of interest for science policymakers for more than two decades, but with little to show for it. 

Though several nongovernmental organizations have attempted to address this need, the model of private citizens continuously fundraising to enable fast science is neither sustainable nor substantial enough compared to the impact of the NIH. We believe that a coordinated governmental effort is needed to revitalize American research productivity and ensure a prompt response to national—and international—health challenges like naturally occurring pandemics and imminent demographic pressure from age-related diseases. The new NIH director has an opportunity to take bold action by making faster funding programs a priority under their leadership and a keystone of their legacy. 

The government’s own track record with such programs gives grounds for optimism. In addition to the aforementioned RADx program at NIH, the National Science Foundation (NSF) runs the Early-Concept Grants for Exploratory Research (EAGER) and Rapid Response Research (RAPID) programs, which can have response times in a matter of weeks. Going back further in history, during World War II, the National Defense Research Committee maintained a one-week review process.
Faster grant review processes can be either integrated into existing grant programs or rolled out by institutes in temporary grant initiatives responding to pressing needs, as the RADx program was. For example, when faced with data falsification around the beta amyloid hypothesis, the National Institute of Aging (NIA) could leverage fast grant review infrastructure to quickly fund replication studies for key papers, without waiting for the next funding cycle. In case of threats to human health due to toxins, the National Institute of Environmental Health Sciences (NIEHS) could rapidly fund studies on risk assessment and prevention, giving public evidence-based recommendations with no delay. Finally, empowering the National Institute of Allergy and Infectious Diseases (NIAID) to quickly fund science would prepare us for many yet-to-come pandemics.

Plan of Action

The NIH is a decentralized organization, with institutes and centers (ICs) that each have their own mission and focus areas. While the NIH Office of the Director sets general policies and guidelines for research grants, individual ICs have the authority to create their own grant programs and define their goals and scope. The Center for Scientific Review (CSR) is responsible for the peer review process used to review grants across the NIH and recently published new guidelines to simplify the review criteria. Given this organizational structure, we propose that the NIH Office of the Director, particularly the Office of Extramural Research, assess opportunities for both NIH-wide and institute-specific fast funding mechanisms and direct the CSR, institutes, and centers to produce proposed plans for fast funding mechanisms within one year. The Director’s Office should consider the following approaches. 

Approach 1. Develop an expedited peer review process for the existing R21 grant mechanism to bring it more in line with the NIH’s own goals of funding high-reward, rapid-turnaround research. 

The R21 program is designed to support high-risk, high-reward, rapid-turnaround, proof-of-concept research. However, it has been historically less popular among applicants compared to the NIH’s traditional research mechanism, the R01. This is in part due to the fact that its application and review process is known to be only slightly less burdensome than the R01, despite providing less than half of the financial and temporal support. Therefore, reforming the application and peer review process for the R21 program to make it a fast grant–style award would both bring it more in line with its own goals and potentially make it more attractive to applicants. 

All ICs follow identical yearly cycles for major grant programs like the R21, and the CSR centrally manages the peer review process for these grant applications. Thus, changes to the R21 grant review process must be spearheaded by the NIH director and coordinated in a centralized manner with all parties involved in the review process: the CSR, program directors and managers at the ICs, and the advisory councils at the ICs. 

The track record of federal and private fast funding initiatives demonstrates that faster funding timelines can be feasible and successful (see FAQ). Among the key learnings and observations of public efforts that the NIH could implement are:

Pending the success of these changes, the NIH should consider applying similar changes to other major research grant programs.

Approach 2. Direct NIH institutes and centers to independently develop and deploy programs with faster funding timelines using Other Transaction Authority (OTA).

Compared to reforming an existing mechanism, the creation of institute-specific fast funding programs would allow for context-specific implementation and cross-institute comparison. This could be accomplished using OTA—the same authority used by the NIH to implement COVID-19 response programs. Since 2020, all ICs at the NIH have had this authority and may implement programs using OTA with approval from the director of NIH, though many have yet to make use of it.

As discussed previously, the NIA, NIDA, and NIAID would be prime candidates for the roll-out of faster funding. In particular, these new programs could focus on responding to time-sensitive research needs within each institute or center’s area of focus—such as health crises or replication of linchpin findings—that would provide large public benefits. To maintain this focus, these programs could restrict investigator-initiated applications and only issue funding opportunity announcements for areas of pressing need. 

To enable faster peer review of applications, ICs should establish (a) new study section(s) within their Scientific Review Branch dedicated to rapid review, similar to how the RADx program had its own dedicated review committees. Reviewers who join these study sections would commit to short meetings on a monthly or bimonthly basis rather than meeting three times a year for one to two days as traditional study sections do. Additionally, as recommended above, these new programs should have a three-page limit on applications to reduce the administrative burden on both applicants and reviewers. 

In this framework, we propose that the ICs be encouraged to direct at least one percent of their budget to establish new research programs with faster funding processes. We believe that even one percent of the annual budget is sufficient to launch initial fast grant programs funded through National Institutes. For example, the National Institute of Aging had an operating budget of $4 billion in the 2022 fiscal year. One percent of this budget would constitute $40 million for faster funding initiatives, which would be on the order of initial budgets of Impetus and Fast Grants ($25 million and $50 million accordingly). 

NIH ICs should develop success criteria in advance of launching new fast funding programs. If the success criteria are met, they should gradually increase the budget and expand the scope of the program by allowing for investigator-initiated applications, making it a real alternative to R01 grants. A precedent for this type of grant program growth is the Maximizing Investigators’ Research Award (MIRA) (R35) grant program within the National Institute of General Medical Sciences (NIGMS), which set the goal of funding 60% of all R01 equivalent grants through MIRA by 2025. In the spirit of fast grants, we recommend setting a deadline on how long each institute can take to establish a fast grants program to ensure that the process does not extend for too many years.

Additional recommendation. Congress should initiate a Government Accountability Office report to illuminate the outcomes and learnings of governmental fast funding programs during COVID-19, such as RADx.

While a number of published papers cite RADx funding, the program’s overall impact and efficiency haven’t yet been assessed. We believe that the agency’s response during the pandemic isn’t yet well-understood but likely played an important role. Illuminating the learnings of these interventions would greatly benefit future emergency fast funding programs.

Conclusion

The NIH should become a reliable agent for quickly mobilizing funding to address emergencies and accelerating solutions for longer-term pressing issues. As present, no funding mechanisms within NIH or its branch institutes enable them to react to such matters rapidly. However, both public and governmental initiatives show that fast funding programs are not only possible but can also be extremely successful. Given this, we propose the creation of permanent fast grants programs within the NIH and its institutes based on learnings from past initiatives.

The changes proposed here are part of a larger effort from the scientific community to modernize and accelerate research funding across the U.S. government. In the current climate of rapidly advancing technology and increasing global challenges, it is more important than ever for U.S. agencies to stay at the forefront of science and innovation. A fast funding mechanism would enable the NIH to be more agile and responsive to the needs of the scientific community and would greatly benefit the public through the advancement of human health and safety.

Frequently Asked Questions
What actions, besides RADx, did the NIH take in response to the COVID-19 pandemic?

The NIH released a number of Notices of Special Interest to allow emergency revision to existing grants (e.g., PA-20-135 and PA-18-591) and a quicker path for commercialization of life-saving COVID technologies (NOT-EB-20-008). Unfortunately, repurposing existing grants reportedly took several months, significantly delaying impactful research.

What does the current review process look like?

The current scientific review process in NIH involves  multiple stakeholders. There are two stages of review at NIH, with the first stage being conducted by a Scientific Review Group that consists primarily of nonfederal scientists. Typically, Center for Scientific Review committees meet three times a year for one or two days. This way, the initial review starts only four months after the proposal submission. Special Emphasis Panel meetings that are not recurring take even longer due to panel recruitment and scheduling. The Institute and Center National Advisory Councils or Boards are responsible for the second stage of review, which usually happens after revision and appeals, taking the total timeline to approximately a year.

Is there evidence for the NIH’s current approach to scientific review?

Because of the difficulty of empirically studying drivers of scientific impact, there has been little research evaluating peer review’s effects on scientific quality. A Cochrane systematic review from 2007 found no studies directly assessing review’s effects on scientific quality, and a recent Rand review of the literature in 2018 found a similar lack of empirical evidence. A few more recent studies have found modest associations between NIH peer review scores and research impact, suggesting that peer review may indeed successfully identify innovative projects. However, such a relationship still falls short of demonstrating that the current model of grant review reliably leads to better funding outcomes than alternative models. Additionally, some studies have demonstrated that the current model leads to variable and conservative assessments. Taken together, we think that experimentation with models of peer review that are less burdensome for applicants and reviewers is warranted.

One concern with faster reviews is a lower science quality. How do you ensure high-quality science while keeping fast response times and short proposals?

Intuitively, it seems that having longer grant applications and longer review processes ensures that both researchers and reviewers expend great effort to address pitfalls and failure modes before research starts. However, systematic reviews of the literature have found that reducing the length and complexity of applications has minimal effects on funding decisions, suggesting that the quality of resulting science is unlikely to be affected. 


Historical examples have also suggested that the quality of an endeavor is largely uncorrelated from its planning times. It took Moderna 45 days from COVID-19 genome publication to submit the mRNA-1273 vaccine to the NIH for use in its Phase 1 clinical study. Such examples exist within government too: during World War II, National Defense Research Committee set a record by reviewing and authorizing grants within one week, which led to DUKWProject PigeonProximity fuze, and Radar.


Recent fast grant initiatives have produced high-quality outcomes. With its short applications and next-day response times, Fast Grants enabled:



  • detection of new concerning COVID-19 variants before other sources of funding became available.

  • work that showed saliva-based COVID-19 tests can work just as well as those using nasopharyngeal swabs.

  • drug-repurposing clinical trials, one of which identified a generic drug reducing hospitalization from COVID-19 by ~40%. 

  • Research into “Long COVID,” which is now being followed up with a clinical trial on the ability of COVID-19 vaccines to improve symptoms.


Impetus Grants focused on projects with longer timelines but led to a number of important preprints in less than a year from the moment person applied:



With the heavy toll that resource-intensive approaches to peer review take on the speed and innovative potential of science—and the early signs that fast grants lead to important and high-quality work—we feel that the evidentiary burden should be placed on current onerous methods rather than the proposed streamlined approaches. Without strong reason to believe that the status quo produces vastly improved science, we feel there is no reason to add years of grant writing and wait times to the process.

Why focus on the NIH, as opposed to other science funding agencies?

The adoption of faster funding mechanisms would indeed be valuable across a range of federal funding agencies. Here, we focus on the NIH because its budget for extramural research (over $30 billion per year) represents the single largest source of science funding in the United States. Additionally, the NIH’s umbrella of health and medical science includes many domains that would be well-served by faster research timelines for proof-of-concept studies—including pandemics, aging, opioid addiction, mental health, cancer, etc.

Supercharging Biomedical Science at the National Institutes of Health

Summary

For decades, the National Institutes of Health (NIH) has been the patron of groundbreaking biomedical research in the United States. NIH has paved the way for life-saving gene therapies, cancer treatments, and most recently, mRNA vaccines. More than 80% of NIH’s $42 billion budget supports extramural research, including nearly 50,000 grants disbursed to more than 300,000 researchers.

But NIH has grown incremental in its funding decisions. The result is a U.S. biomedical-research enterprise discouraged from engaging in the risk-taking and experimentation needed to foster scientific breakthroughs. To maximize returns on its massive R&D budget, NIH should consider the following actions:

Challenge and Opportunity

Each year, federal science agencies allocate billions of dollars to launch new research initiatives and to create novel grant mechanisms.  But an embarrassingly tiny amount is invested into discerning which funding policies are actually effective. Despite having the requisite data, methods, and technology, science agencies such as NIH do not subject science-funding policies to nearly the same rigor as the funded science itself.

Another problem plaguing science funding at NIH is that it is difficult for scientists to secure funding for risky but potentially transformative work. When NIH’s peer-review process was designed more than half a century ago, over half of grant applications to the agency were funded. NIH’s proposal-success rate has dropped to 15% today. Even credible researchers must submit an ever-growing number of proposals in order to have a reasonable chance of securing funding. The result is that scientists spend almost half of their working time on average writing grants—time that could otherwise be spent conducting research and training other scientists. Our nation has created a federally funded research ecosystem that makes scientists beg, fight, and rewrite to do the work they’ve spent years training to do.

Compounding the problem is the fact that fewer and fewer early-career researchers are getting adequate support to do their work. Indeed, it takes fewer years to become an experienced surgeon than it does to launch a biomedical research career and obtain a first R01 grant from NIH (the average age of R01 grantees in 2020 was 44 years). When we place hurdles in front of young scientists, we lose out on empowering them at a particularly innovative career stage.1 Limited access to funding early on hamstrings the ability of early-career scientists to set up labs, tackle interesting ideas, and train the next generation. And the early careers of young scientists are often judged by their publishing records, which has the pernicious effect of guiding young scientists to propose safe research that will easily pass peer review. 

A scientific ecosystem that incentivizes incrementalism instead of impact discourages scientists from bringing their best, most creative ideas to the table2 — an effect multiplied for women and underrepresented minorities. The risky research underpinning mRNA vaccines would struggle to be funded under today’s peer-review system. To catalyze groundbreaking biomedical research—and lead the way for other federal science-funding agencies to follow suit—NIH should reconsider how it funds research, what it funds, and who it funds. The Plan of Action presented below includes recommendations aligned with each of these policy questions.

Plan of Action

Recommendation 1. Diversify and assess NIH’s grant-funding mechanisms.

In 2020, privately funded COVID “Fast Grants” accelerated pandemic science by allocating over $50 million in grants awarded within 48 hours of proposal receipt. In a world where grant proposals typically take months to prepare and months more to receive a decision, Fast Grants offered a welcome departure from the norm. The success of Fast Grants signals that federal research funders like the NIH can and must adopt faster, more flexible approaches to scientific grantmaking—an approach that improves productivity and impact by getting scientists the resources they need when they need them. 

While Fast Grants have received a great deal of attention for their novelty and usefulness during a crisis, it’s unclear whether the wealth of experimental funding approaches that the NIH has tried—such as its R21 grant for developmental research, or its K99 grant for on-ramping postdoctoral researchers to traditional R01 grant funding—have positively impacted scientific productivity. Indeed, NIH has never rigorously assessed the efficacy of these approaches. NIH must institute mechanisms for evaluating the success of funding experiments to understand how to optimize its resources and stretch R&D dollars as far as possible. 

As such, the NIH Director should establish a “Science of Science Funding” Working Group within the NIH’s Advisory Committee to the Director. The Working Group should be tasked with (1) evaluating the efficacy of existing funding mechanisms at the NIH and, (2) piloting three to five) experimental funding mechanisms. The Working Group should also suggest a structure for evaluating existing and novel funding mechanisms through Randomized Control Trials (RCTs), and should recommend ways in which the NIH can expand its capacity for policy evaluation (see FAQ for more on RCTs).

Novel funding mechanisms that the Working Group could consider include:

This Working Group should be chaired by the incoming Director of Extramural Research and should include other NIH leaders (such as the Director of the Office of Strategic Coordination and the Director of the Office of Research Reporting and Analysis) as participants. The Working Group should also include members from other federal science agencies such as NSF and NASA. The Working Group should include and/or consult with diverse faculty at all career stages as well. Buy-in from the NIH Director will be crucial for this group to enact transformative change.

Lastly, the working group should seek to open up NIH up to outside evaluation by the public. Full access to grantmaking data and the corresponding outcomes could unlock transformative insights that holistically uplift the biomedical community. While NIH has a better track record of data sharing than some other science-funding agencies, there is still a long way to go. One key step is putting data on grant applicants in an open-access database (with privacy-preserving properties) so that it can be analyzed and merged with other relevant datasets, informing decision-making. Opening up data on grant applicants and their outcomes also supports external evaluation—paving the way for other groups to augment NIH evaluations conducted internally, as well as helping keep the NIH accountable for its programmatic outcomes.

Recommendation 2. Foster a culture of scientific risk-taking by funding more high-risk, high-reward grants.

Uncertainty is a hallmark of breakthrough scientific discovery. The research that led to rapid development of mRNA COVID vaccines, for instance, would have struggled to get funded through traditional funding channels.  NIH has taken some admirable steps to encourage risk-taking. Since 2004, NIH has rolled out a set of High-Risk, High-Reward (HRHR) grant-funding mechanisms (Table 1). The agency’s evaluations have found that its HRHR grants have led to increased scientific productivity relative to other grant types. Yet HRHR grants account for a vanishingly small percentage of NIH’s extramural R&D funding. Only 85 HRHR grants were awarded in all of 2020, compared to 7,767 standard R01 grants awarded in the same year.3 Such disproportionate allocation of funds to safe and incremental research largely yields safe and incremental results. Additionally, it should be noted that designating specific programs “high-risk, high-reward” does not necessarily guarantee that those programs are funding high-risk, high-reward research in reality.

AwardPurposeFunding Amount# Awarded in 2020
New Innovator AwardFor exceptionally creative early-career scientists proposing innovative, high-impact projects. $1.5M/5 yrs53
Pioneer AwardFor individuals of exceptional creativity proposing pioneering approaches, at all career stages$3.5M/5 yrs10
Transformative Research AwardFor individuals or teams proposing transformative research that may require very large budgets          No cap9
Early Independence AwardFor outstanding junior scientists wishing to “skip the postdoc” and immediately begin independent research$250K/yr12
R01 Investigator (NIH’s flagship Grant)For mature research projects that are hypothesis-driven with strong preliminary data$250K/yr7,767
Table 1: NIH’s High-Risk, High Reward Grant Mechanisms and its flagship R01 grant.

It is time for the NIH to actively foster a culture of scientific risk-taking. The agency can do this by balancing funding relatively predictable projects with projects that are riskier but have the potential to deliver greater returns.

Specifically, NIH should:

Recommendation 3. Better support early-career scientists.

NIH can supercharge the biomedical R&D ecosystem by better embracing newer investigators bringing bold, fresh approaches to science. In recent years, NIH allocated seven times more R01 funding to scientists who are older than 65 years old than it did to scientists under 35. The average age of R01 grantees in 2020 was 44 years. In other words, it takes fewer years to become an experienced surgeon than it does to launch a biomedical research career and obtain a first R01 grant. This paradigm leaves promising early-career researchers scrambling for alternative funding sources, or causes them to change careers entirely. Postdoctoral researchers in particular struggle to have their ideas funded.

NIH has attempted to alleviate funding disparities through some grants—R00, R03, K76, K99, etc.—targeted at younger scientists. However, these grants do not provide a clear onramp to NIH’s “bread and butter” R01 grants. 

NIH should better support early-career researchers by:

Conclusion

NIH funding forms the backbone of the American biomedical research enterprise. But if the NIH does not diversify its approach to research funding, progress in the field will stagnate. Any renewed commitment to biomedical innovation demands that NIH reconsider how it funds research, what it funds, and who it funds — and to rigorously evaluate its funding processes as well.

The federal government spent about $160 billion on scientific R&D in 2021. It is shocking that it doesn’t routinely seek to optimize how those dollars are spent. While this memo focuses on the NIH, the analysis and recommendations contained herein are broadly applicable to other federal agencies with large extramural R&D funding operations, including the National Science Foundation; the Departments of Defense, Agriculture, NASA, Commerce; and others. Increasing funding for science is a necessary but not sufficient part of catalyzing scientific progress. The other side of the coin is ensuring that research dollars are being spent effectively and optimizing return on investment.

Frequently Asked Questions
Are Randomized Controlled Trials (RCTs) the only way for the NIH to effectively evaluate funding mechanisms?

To really understand what works and what doesn’t, NIH must consider how to evaluate the success of existing and novel funding mechanisms. MIT economist Pierre Azoulay suggests that the NIH can systematically build out a knowledge base of what funding mechanisms are effective by “turning the scientific method on itself” using RCTs, the “gold standard” of evaluation methods. NIH could likely launch a suite of RCTs that would evaluate multiple funding mechanisms at scale with minimal disruption for around $250,000 per year for five years—a small investment relative to the value of knowing what types of funding work.


RCTs can be easier to implement than is often thought.[1] That said, NIH would be wise to couple RCTs with less ambitious mechanisms for evaluating funding mechanisms, such as a two-step approach that filters out clearly sub-par applicants and then applies narrower criteria based on the remaining pool to filter a second time for the most competitive or prioritized applicants.  Even just collecting and comparing data on NIH grant applicants—data such as education level, career stage, and prior funding history—would provide insight into whether different funding interventions are affecting the composition of the applicant pool.


[1] For more on this topic, see Why Government Needs More Randomized Controlled Trials: Refuting the Myths from the Arnold Foundation.

How would the proposed “Science of Science Funding” Working Group differ from the ACD Working Group on High-Risk, High-Reward Programs?

The ACD Working Group on HRHR programs reviewed “the effectiveness of distinct NIH HRHR research programs that emphasize exceptional innovation.” This working group only focused on evaluating a couple of HRHR programs, which form a trivial portion of grantmaking compared to the rest of the extramural NIH funding apparatus. The Science of Science Funding Working Group would (i) build NIH’s capacity to evaluate the efficacy of different funding mechanisms, and (ii) oversee implementation of several (three to five) experimental funding mechanisms or substantial modifications to existing mechanisms.

How would the “Science of Science Funding” Working Group differ from the Science of Science Policy Approach to Analyzing and Innovating the Biomedical Research Enterprise (SCISIPBIO) Active Awards, jointly hosted by the NSF and the NIH?

SCISIPBIO isn’t focused on systematic change in the biomedical innovation ecosystem. Instead, it is a curiosity-driven grant program for individual PIs to conduct “science of science policy” research. NIH can build on SCISIPBIO to advance rigorous evaluation of science funding internally and agency-wide.

Isn’t the NIH one of the government’s premier research institutions? Is it really doing such a bad job funding research?

NIH funding certainly supports an extensive body of high-quality, high-impact work. But just because something is performing acceptably doesn’t mean that there are not still improvements to be made. As outlined in this memo, there is good reason to believe that static funding practices are preventing the NIH from maximizing returns on its investments in biomedical research. NIH is the nation’s crown jewel of biomedical research. We should seek to polish it to its fullest shine.

What are platform technologies?

Platform technologies are tools, techniques, and instruments that are applicable to many areas of research, enabling novel approaches for scientific investigation that were not previously possible. Platform technologies often generate orders-of-magnitude improvements over current abilities in fundamental aspects such as accuracy, precision, resolution, throughput, flexibility, breadth of application, costs of construction or operation, or user-friendliness. The following are examples of platform technologies:



  • Polymerase chain reaction (PCR)

  • CRISPR-Cas9

  • Cryo-electron microscopy

  • Phage display

  • Charge-coupled device (CCD) sensor

  • Fourier transforms

  • Atomic force microscopy (AFM) and scanning force microscopy (SFM)


There has been an appetite to fund more platform technologies. The recently announced ARPA-H seeks to achieve medical breakthroughs and directly impact clinical care by building new platform technologies. During the Obama Administration, the White House Office of Science and Technology Policy (OSTP) hosted a platform technologies ideation contest. Although multiple NIH-funded Nobel Prize winners have won the award for platform technologies that have fundamentally shifted the way scientists approach problem solving, not enough emphasis is placed on development of such technologies. Without investing deeply in platform technologies, our nation risks continuing its piecemeal approach to solving pressing challenges.

A Profile of Defense Science & Tech Spending

Annual spending on defense science and technology has “grown substantially” over the past four decades from $2.3 billion in FY1978 to $13.4 billion in FY2018 or by nearly 90% in constant dollars, according to a new report from the Congressional Research Service.

Defense science and technology refers to the early stages of military research and development, including basic research (known by its budget code 6.1), applied research (6.2) and advanced technology development (6.3).

“While there is little direct opposition to Defense S&T spending in its own right,” the CRS report says, “there is intense competition for available dollars in the appropriations process,” such that sustained R&D spending is never guaranteed.

Still, “some have questioned the effectiveness of defense investments in R&D.”

CRS takes note of a 2012 article published by the Center for American Progress which argued that military spending was an inefficient way to spur innovation and that the growing sophistication of military technology was poorly suited to meet some low-tech threats such as improvised explosive devices (IEDs) in Iraq and Afghanistan (as discussed in an earlier article in the Bulletin of the Atomic Scientists).

The new CRS report presents an overview of the defense science and tech budget, its role in national defense, and questions about its proper size and proportion. See Defense Science and Technology Funding, February 21, 2018,

Other new and updated reports from the Congressional Research Service include the following.

Armed Conflict in Syria: Overview and U.S. Response, updated February 16, 2018

Jordan: Background and U.S. Relations, updated February 16, 2018

Bahrain: Reform, Security, and U.S. Policy, updated February 15, 2018

Potential Options for Electric Power Resiliency in the U.S. Virgin Islands, February 14, 2018

U.S. Manufacturing in International Perspective, updated February 21, 2018

Methane and Other Air Pollution Issues in Natural Gas Systems, updated February 15, 2018

Where Can Corporations Be Sued for Patent Infringement? Part ICRS Legal Sidebar, February 20, 2018

How Broad A Shield? A Brief Overview of Section 230 of the Communications Decency ActCRS Legal Sidebar, February 21, 2018

Russians Indicted for Online Election TrollingCRS Legal Sidebar, February 21, 2018

Hunting and Fishing on Federal Lands and Waters: Overview and Issues for Congress, February 14, 2018