Leveraging Machine Learning To Reduce Cost & Burden of Reviewing Research Proposals at S&T Agencies
With about $130 billion USD, the United States leads the world in federal research and development (R&D) spending. Most of this spending is distributed by science and technology agencies that use internal reviews to identify the best proposals submitted in response to competitive funding opportunities. As stewards of quality scientific research, part of each funding agency’s mission is to ensure fairness, transparency, and integrity in the proposal-review process. The selection process is a crucial aspect of ensuring that federal dollars are invested in quality research.
Manual proposal review is time-consuming and expensive, costing an estimated $2,000–$10,000 per proposal. This equates to an estimated $300 million spent annually on proposal review at the National Science Foundation alone. Yet at current proposal-success rates (between 5% and 20% for most funding opportunities), a substantial fraction of proposals reviewed are simply not competitive. We propose leveraging machine learning to accelerate the agency-review process without a loss in the quality of proposals selected and funded. By helping filter out noncompetitive proposals early in the review process, machine learning could allow substantial financial and personnel resources to be repurposed for more valuable applications. Importantly, machine learning would not be used to evaluate scientific merit—it would only eliminate the poor or incomplete proposals that are immediately and unanimously rejected by manual reviewers.
The next administration should initiate and execute a pilot program that uses machine learning to triage scientific proposals. To demonstrate the reliability of a machine-learning-based approach, the pilot should be carried out in parallel with (and compared to) the traditional method of proposal selection. Following successful pilot implementation, the next administration should convene experts in machine learning and proposal review from funding agencies, universities, foundations, and grant offices for a day-long workshop to discuss how to scale the pilot across agencies. Our vision is that machine-learning will ultimately become a standard component of proposal review across science and technology agencies, and improving the efficiency of the funding process without compromising the quality of funded research.
Challenge and Opportunity
Allocating research funding is expensive, time-consuming, and inefficient for all stakeholders (funding agencies, proposers, reviewers, and universities). The actual cost of reviewing proposals (including employee salaries and administrative expenses) has never been published by any federal funding agency. Based on our experience with the process, we estimate the cost to be between $2,000 and $10,000 per proposal, with the variation reflecting the wide range of proposals across programs and agencies. For the National Science Foundation (NSF), which reviews around 50,000 research proposals each year1, this equates to an average of $300 million spent annually on proposal review.
Multiple issues beyond cost plague the proposal-review process. These include the following:
- Decreasing proposal-success rates. This decline is attributable to a combination of an increase in the number of science, technology, engineering, and math (STEM) graduates in the United States3 and an increase in the size of average federal STEM funding awards (from $110,000 to about $130,000 in less than 10 years). Current success rates are low enough that the costs of applying for federal funding opportunities (i.e., from time spent on unsuccessful proposals) may outweigh the benefits (i.e., funding received for successful proposals).
- Difficulties recruiting qualified reviewers.
- Delayed decisions. For example, NSF takes more than six months to reach a funding decision for about 30% of proposals reviewed.
- Increasing numbers of identical re-submissions. With proposal-success rates as low as 5%, the results of selection processes are often seen as representing “the luck of the draw” rather than reflective of fundamental proposal merit. Hence there is a growing tendency for principal investigators (PIs) to simply re-submit the same proposal year after year rather than invest the time to prepare new or updated proposals.
There is a consensus is that the current state of proposal review is unsustainable.Most proposed solutions to problems summarized above are “outside” solutions involving either expanding available research funding or placing restrictions on the numbers of proposals submitted that may be submitted (by a PI or an institution). Neither option is attractive. Partisanship combined with the financial implications of COVID-19 render the possibility of an increased budget for S&T funding agencies vanishingly small. Restrictions on submissions are generally resented by scientists as a “penalty on excellence”. Incorporating machine learning could improve the efficiency and effectiveness of proposal review at little cost and without limiting submissions.
Incorporating machine learning would also align with multiple federal and agency objectives. On January 4, 2011, President Obama signed the GPRA Modernization Act of 2010. One of the purposes of the GPRA Modernization Act was to “lead to more effective management of government agencies at a reduced cost”. One of NSF’s Evaluation and Assessment Capability (EAC) goals established in response to that directive is to “create innovative approaches to assessing and improving program investment performance”. Indeed, two of the four key areas identified in NSF’s most recent Strategic Plan (2018) are to “make information technology work for us” and “Streamlining, standardizing and simplifying programs and processes.” In addition, NSF recognized the importance of reviewing its processes for efficiency and effectiveness in light of OMB memo M-17-26.9 NSF’s Strategic Plan includes a strong commitment to “work internally and with the Office of Management and Budget and other science agencies to find opportunities to reduce administrative burden.“ These principles are also mentioned in NSF’s 2021 budget request to Congress, as a part of Strategic Goals (e.g., “Enhance NSF’s performance of its mission”) and Strategic Objectives (e.g., “Continually improve agency operations”). Finally, longterm goal outlined in the Strategic Plan is reducing the so-called “dwell time” for research proposals—i.e., the time between when a proposal is submitted and a funding decision is issued.
Incorporating machine learning into proposal review would facilitate progress towards each of these goals. Using machine learning to limit the number of proposals subjected to manual review is a prime example of “making information technology work for us” and would certainly help streamline, standardize, and simplify proposal review. Limiting the number of proposals subjected to manual review would also reduce administrative burden as well as dwell time. In addition, money saved from using machine learning to weed out non-competitive proposals can be used to fund additional competitive proposals, thereby increasing return on investment (ROI) in research-funding programs. Additional benefits include an improved workload for expert reviewers—who will be able to focus on reviewing the scientific merit of competitive proposals instead of wasting time on non-competitive proposals—as well as the establishment of a strong disincentive for PIs to resubmit identical proposals years after years. The latter outcome in particular is expected to improve proposal quality in the long run.
Plan of Action
We propose the following steps to implement and test a machine-learning approach to proposal review:
- Initiate and execute a pilot program that uses machine learning to triage scientific proposals. To demonstrate the reliability of a machine-learning-based approach, the pilot should be carried out in parallel with (and compared to) the traditional method of proposal selection. The pilot would be deemed successful if the machine-learning algorithm was able to reliably identify proposals ranked poorly by human reviewers, and/or proposals rejected unanimously by review panels. NSF—particularly the agency’s Science of Science and Innovation Policy (SciSIP) Program—would be a natural home for such a pilot.
- Showcase pilot results. Following a successful pilot, the next administration should convene experts in machine learning and proposal review from funding agencies, universities, foundations, and grant offices for a day-long workshop. The workshop would showcase pilot results and provide an opportunity for attendees to discuss how to scale the pilot across agencies.
- Scale pilot across federal government. We envision machine learning ultimately becoming a standard component of proposal review across science and technology agencies, improving the efficiency of the funding process without compromising the quality of funded research.
Reducing the numbers of scientific proposals handled by experts without jeopardizing the quality of science funded benefits everyone—high-quality proposals receive support, expert reviewers don’t waste time on non-competitive proposals, and the money saved on manual proposal review can be reallocated to fund additional proposals. Using machine learning to “triage” large submission pools is a promising strategy for achieving such objectives. Preliminary compliance checks are already almost fully automated. Machine learning would simply extend the automation stage one step further. We expect that initial costs of developing appropriate machine-learning algorithms and testing algorithms in pilots would ultimately be justified by greater long-run ROI in research-funding programs. We envision a pilot that could benefit the government but also foundations that are increasingly shouldering research funding. Ideally the pilot would be experimented in two different set-ups: a government funding agency and a Foundation.
The incoming administration must act to address bias in medical technology at the development, testing and regulation, and market-deployment and evaluation phases.
The incoming administration should work towards encouraging state health departments to develop clear and well-communicated data storage standards for newborn screening samples.
Proposed bills advance research ecosystems, economic development, and education access and move now to the U.S. House of Representatives for a vote
NIST’s guidance on “Managing Misuse Risk for Dual-Use Foundation Models” represents a significant step forward in establishing robust practices for mitigating catastrophic risks associated with advanced AI systems.