Working with academics: A primer for U.S. government agencies

Collaboration between federal agencies and academic researchers is an important tool for public policy. By facilitating the exchange of knowledge, ideas, and talent, these partnerships can help address pressing societal challenges. But because it is rarely in either party’s job description to conduct outreach and build relationships with the other, many important dynamics are often hidden from view. This primer provides an initial set of questions and topics for agencies to consider when exploring academic partnership.

Why should agencies consider working with academics?

What considerations may arise when working with academics?

Table (Of Contents)
Characteristics of discussed collaborative structures
StructurePrimary needPotential mechanismsStructural complexityLevel of effort
Informal advisingKnowledge >> CapacityAd-hoc engagement; formal consulting agreementLowOccasional work, over the short- to long-term
Study groupsKnowledge > CapacityInformal working group; formal extramural awardModerateOccasional to part-time work, over the short- to medium-term
Collaborative researchCapacity ~= KnowledgeInformal research partnership, formal grant, or cooperative agreement / contractVariablePart-time work, over the medium- to long-term
Short-term placementsCapacity > KnowledgeIPA, OPM Schedule A(r), or expert contract; either ad-hoc or through a formal programModeratePart- to full-time work, over a short- to medium-term
Long-term rotationsCapacity >> KnowledgeIPA, OPM Schedule A(r), or SGE designation; typically through a formal programHighFull-time work, over a medium- to long-term
BOX 1. Key academic considerations
Academic career stages.

Academic faculty progress through different stages of professorship — typically assistant, associate, and full — that affect their research and teaching expectations and opportunities. Assistant professors are tenure-track faculty who need to secure funding, publish papers, and meet the standards for tenure. Associate professors have job security and academic freedom, but also more mentoring and leadership responsibilities; associate professors are typically tenured, though this is not always the case. Full professors are senior faculty who have a high reputation and recognition in their field, but also more demands for service and supervision. The nature of agency-academic collaboration may depend on the seniority of the academic. For example, junior faculty may be more available to work with agencies, but primarily in contexts that will lead to traditional academic outputs; while senior faculty may be more selective, but their academic freedom will allow for less formal and more impact-oriented work.

Soft vs. hard money positions.

Soft money positions are those that depend largely or entirely on external funding sources, typically research grants, to support the salary and expenses of the faculty. Hard money positions are those that are supported by the academic institution’s central funds, typically tied to more explicit (and more expansive) expectations for teaching and service than soft-money positions. Faculty in soft money positions may face more pressure to secure funding for research, while faculty in hard money positions may have more autonomy in their research agenda but more competing academic activities. Federal agencies should be aware of the funding situation of the academic faculty they collaborate with, as it may affect their incentives and expectations for agency engagement.

Sabbatical credits.

A sabbatical is a period of leave from regular academic duties, usually for one or two semesters, that allows faculty to pursue an intensive and unstructured scope of work — this can include research in their own field or others, as well as external engagements or tours of service with non-academic institutions . Faculty accrue sabbatical credits based on their length and type of service at the university, and may apply for a sabbatical once they have enough credits. The amount of salary received during a sabbatical depends on the number of credits and the duration of the leave. Federal agencies may benefit from collaborating with academic faculty who are on sabbatical, as they may have more time and interest to devote to impact-focused work.

Consulting/outside activity limits.

Consulting limits & outside activity limits are policies that regulate the amount of time that academic faculty can spend on professional activities outside their university employment. These policies are intended to prevent conflicts of commitment or interest that may interfere with the faculty’s primary obligations to the university, such as teaching, research, and service, and the specific limits vary by university. Federal agencies may need to consider these limits when engaging academic faculty in ongoing or high-commitment collaborations.

9 vs. 12 month salaries.

Some academic faculty are paid on a 9-month basis, meaning that they receive their annual salary over nine months and have the option to supplement their income with external funding or other activities during the summer months. Other faculty are paid on a 12-month basis, meaning that they receive their annual salary over twelve months and have less flexibility to pursue outside opportunities. Federal agencies may need to consider the salary structure of the academic faculty they work with, as it may affect their availability to engage on projects and the optimal timing with which they can do so.

Advisory relationships consist of an academic providing occasional or periodic guidance to a federal agency on a specific topic or issue, without being formally contracted or compensated. This type of collaboration can be useful for agencies that need access to cutting-edge expertise or perspectives, but do not have a formal deliverable in mind.

Academic considerations

Regulatory & structural considerations

Box 2. Key structural considerations
Regulatory guidance.

Federal agencies and academic institutions are subject to various laws and regulations that affect their research collaboration, and the ownership and use of the research outputs. Key legislation includes the Federal Advisory Committee Act (FACA), which governs advisory committees and ensures transparency and accountability; the Federal Acquisition Regulation (FAR), which controls the acquisition of supplies and services with appropriated funds; and the Federal Grant and Cooperative Agreement Act (FGCAA), which provides criteria for distinguishing between grants, cooperative agreements, and contracts. Agencies should ensure that collaborations are structured in accordance with these and other laws.

Contracting mechanisms.

Federal agencies may use various contracting mechanisms to engage researchers from non-federal entities in collaborative roles. These mechanisms include the IPA Mobility Program, which allows the temporary assignment of personnel between federal and non-federal organizations; the Experts & Consultants authority, which allows the appointment of qualified experts and consultants to positions that require only intermittent and/or temporary employment; and Cooperative Research and Development Agreements (CRADAs), which allow agencies to enter into collaborative agreements with non-federal partners to conduct research and development projects of mutual interest.

University Office of Sponsored Programs.

Offices of Sponsored Programs are units within universities that provide administrative support and oversight for externally funded research projects. OSPs are responsible for reviewing and approving proposals, negotiating and accepting awards, ensuring compliance with sponsor and university policies and regulations, and managing post-award activities such as reporting, invoicing, and auditing. Federal agencies typically interact with OSPs as the authorized representative of the university in matters related to sponsored research.

Non-disclosure agreements.

When engaging with academics, federal agencies may use NDAs to safeguard sensitive information. Agencies each have their own rules and procedures for using and enforcing NDAs involving their grantees and contractors. These rules and procedures vary, but generally require researchers to sign an NDA outlining rights and obligations relating to classified information, data, and research findings shared during collaborations.

A study group is a type of collaboration where an academic participates in a group of experts convened by a federal agency to conduct analysis or education on a specific topic or issue. The study group may produce a report or hold meetings to present their findings to the agency or other stakeholders. This type of collaboration can be useful for agencies that need to gather evidence or insights from multiple sources and disciplines with expertise relevant to their work.

Academic considerations

Regulatory & structural considerations

Case study

In 2022, the National Science Foundation (NSF) awarded the National Bureau of Economic Research (NBER) a grant to create the EAGER: Place-Based Innovation Policy Study Group. This group, led by two economists with expertise in entrepreneurship, innovation, and regional development — Jorge Guzman from Columbia University and Scott Stern from MIT — aimed to provide “timely insight for the NSF Regional Innovation Engines program.” During Fall 2022, the group met regularly with NSF staff to i) provide an assessment of the “state of knowledge” of place-based innovation ecosystems, ii) identify the insights of this research to inform NSF staff on design of their policies, and iii) surface potential means by which to measure and evaluate place-based innovation ecosystems on a rigorous and ongoing basis. Several of the academic leads then completed a paper synthesizing the opportunities and design considerations of the regional innovation engine model, based on the collaborative exploration and insights developed throughout the year. In this case, the study group was structured as a grant, with funding provided to the organizing institution (NBER) for personnel and convening costs. Yet other approaches are possible; for example, NSF recently launched a broader study group with the Institute for Progress, which is structured as a no-cost Other Transaction Authority contract.

Active collaboration covers scenarios in which an academic engages in joint research with a federal agency, either as a co-investigator, a subrecipient, a contractor, or a consultant. This type of collaboration can be useful for agencies that need to leverage the expertise, facilities, data, or networks of academics to conduct research that advances their mission, goals, or priorities.

Academic considerations

Regulatory & structural considerations

Case studies

External collaboration between academic researchers and government agencies has repeatedly proven fruitful for both parties. For example, in May 2020, the Rhode Island Department of Health partnered with researchers at Brown University’s Policy Lab to conduct a randomized controlled trial evaluating the effectiveness of different letter designs in encouraging COVID-19 testing. This study identified design principles that improved uptake of testing by 25–60% without increasing cost, and led to follow-on collaborations between the institutions. The North Carolina Office of Strategic Partnerships provides a prime example of how government agencies can take steps to facilitate these collaborations. The office recently launched the North Carolina Project Portal, which serves as a platform for the agency to share their research needs, and for external partners — including academics — to express interest in collaborating. Researchers are encouraged to contact the relevant project leads, who then assess interested parties on their expertise and capacity, extend an offer for a formal research partnership, and initiate the project.

Short-term placements allow for an academic researcher to work at a federal agency for a limited period of time (typically one year or less), either as a fellow, a scholar, a detailee, or a special government employee. This type of collaboration can be useful for agencies that need to fill temporary gaps in expertise, capacity, or leadership, or to foster cross-sector exchange and learning.

Academic considerations

Regulatory & structural considerations

Case studies

Various programs exist throughout government to facilitate short-term rotations of outside experts into federal agencies and offices. One of the most well-known examples is the American Association for the Advancement of Science (AAAS) Science & Technology Policy Fellowship (STPF) program, which places scientists and engineers from various disciplines and career stages in federal agencies for one year to apply their scientific knowledge and skills to inform policy making and implementation. The Schedule A(r) hiring authority tends to be well-suited for these kinds of fellowships; it is used, for example, by the Bureau of Economic Analysis to bring on early career fellows through the American Economic Association’s Summer Economics Fellows Program. In some circumstances, outside experts are brought into government “on loan” from their home institution to do a tour of service in a federal office or agency; in these cases, the IPA program can be a useful mechanism. IPAs are used by the National Science Foundation (NSF) in its Rotator Program, which brings outside scientists into the agency to serve as temporary Program Directors and bring cutting-edge knowledge to the agency’s grantmaking and priority-setting. IPA is also used for more ad-hoc talent needs; for example, the Office of Evaluation Sciences (OES) at GSA often uses it to bring in fellows and academic affiliates.

Long-term rotations allow an academic to work at a federal agency for an extended period of time (more than one year), either as a fellow, a scholar, a detailee, or a special government employee. This type of collaboration can be useful for agencies that need to recruit and retain expertise, capacity, or leadership in areas that are critical to their mission, goals, or priorities.

Academic considerations

Regulatory & structural considerations

Case study

One example of a long-term rotation that draws experts from academia into federal agency work is the Advanced Research Projects Agency (ARPA) Program Manager (PM) role. ARPA PMs — across DARPA, IARPA, ARPA-E, and now ARPA-H — are responsible for leading high-risk, high-reward research programs, and have considerable autonomy and authority in defining their research vision, selecting research performers, managing their research budget, and overseeing their research outcomes. PMs are typically recruited from academia, industry, or government for a term of three to five years, and are expected to return to their academic institutions or pursue other career opportunities after their term at the agency. PMs coming from academia or nonprofit organizations are often brought on through the IPA mobility program, and some entities also have unique term-limited, hiring authorities for this purpose. PMs can also be hired as full government employees; this mechanism is primarily used for candidates coming from the private sector.

Expected Utility Forecasting for Science Funding

The typical science grantmaker seeks to maximize their (positive) impact with a limited amount of money. The decision-making process for how to allocate that funding requires them to consider the different dimensions of risk and uncertainty involved in science proposals, as described in foundational work by economists Chiara Franzoni and Paula Stephan. The Von Neumann-Morgenstern utility theorem implies that there exists for the grantmaker — or the peer reviewer(s) assessing proposals on their behalf — a utility function whose expected value they will seek to maximize. 

Common frameworks for evaluating proposals leave this utility function implicit, often evaluating aspects of risk, uncertainty, and potential value independently and qualitatively. Empirical work has suggested that such an approach may lead to biases, resulting in funding decisions that deviate from grantmakers’ ultimate goals. An expected utility approach to reviewing science proposals aims to make that implicit decision-making process explicit, and thus reduce biases, by asking reviewers to directly predict the probability and value of different potential outcomes occurring. Implementing this approach through forecasting brings the added benefits of providing (1) a resolution and scoring process that could help incentivize reviewers to make better, more accurate predictions over time and (2) empirical estimates of reviewers’ accuracy and tendency to over or underestimate the value and probability of success of proposals.

At the Federation of American Scientists, we are currently piloting this approach on a series of proposals in the life sciences that we have collected for Focused Research Organizations (FROs), a new type of non-profit research organization designed to tackle challenges that neither academia or industry is incentivized to work on. The pilot study was developed in collaboration with Metaculus, a forecasting platform and aggregator, and is hosted on their website. In this paper, we provide the detailed methodology for the approach that we have developed, which builds upon Franzoni and Stephan’s work, so that interested grantmakers may adapt it for their own purposes. The motivation for developing this approach and how we believe it may help address biases against risk in traditional peer review processes is discussed in our article “Risk and Reward in Peer Review”.

Defining Outcomes

To illustrate how an expected utility forecasting approach could be applied to scientific proposal evaluation, let us first imagine a research project consisting of multiple possible outcomes or milestones. In the most straightforward case, the outcomes that could arise are mutually exclusive (i.e., only a single one will be observed). Indexing each outcome with the letter 𝑖, we can define the expected value of each as the product of its value (or utility; 𝓊𝑖) and the probability of it occurring, 𝑃(𝑚𝑖). Because the outcomes in this example are mutually exclusive, the total expected utility (TEU) of the proposed project is the sum of the expected value of each outcome1:

𝑇𝐸𝑈 = 𝛴𝑖𝓊𝑖𝑃(𝑚𝑖).

However, in most cases, it is easier and more accurate to define the range of outcomes of a research project as a set of primary and secondary outcomes or research milestones that are not mutually exclusive, and can instead occur in various combinations.

For instance, science proposals usually highlight the primary outcome(s) that they aim to achieve, but may also involve important secondary outcome(s) that can be achieved in addition to or instead of the primary goals. Secondary outcomes can be a research method, tool, or dataset produced for the purpose of achieving the primary outcome; a discovery made in the process of pursuing the primary outcome; or an outcome that researchers pivot to pursuing as they obtain new information from the research process. As such, primary and secondary outcomes are not necessarily mutually exclusive. In the simplest scenario with just two outcomes (either two primary or one primary and one secondary), the total expected utility becomes

𝑇𝐸𝑈 = 𝓊1𝑃(𝑚1⋂ not 𝑚2) + 𝓊2𝑃(𝑚2⋂ not 𝑚1) + (𝓊1 + 𝓊2)𝑃(𝑚1⋂𝑚2),

𝑇𝐸𝑈 = 𝓊1𝑃(𝑚1) – (𝑚1⋂ 𝑚2)) + 𝓊2𝑃(𝑚2) – 𝑃(𝑚1⋂ 𝑚2)) + (𝓊1 + 𝓊2)𝑃(𝑚1⋂𝑚2)

𝑇𝐸𝑈 = 𝓊1𝑃(𝑚1) + 𝓊2𝑃(𝑚2) – (𝓊1 + 𝓊2)𝑃(𝑚1⋂𝑚2).

As the number of outcomes increases, the number of joint probability terms increases as well. Assuming the outcomes are independent though, they can be reduced to the product of the probabilities of individual outcomes. For example,

𝑃(𝑚1⋂𝑚2) = 𝑃(𝑚1) * 𝑃(𝑚2)

On the other hand, milestones are typically designed to build upon one another, such that achieving later milestones necessitates the achievement of prior milestones. In these cases, the value of later milestones typically includes the value of prior milestones: for example, the value of demonstrating a complete pilot of a technology is inclusive of the value of demonstrating individual components of that technology. The total expected utility can thus be defined as the sum of the product of the marginal utility of each additional milestone and its probability of success:

𝑇𝐸𝑈 = 𝛴𝑖(𝓊𝑖 + 𝓊𝑖-1)𝑃(𝑚𝑖),
where 𝓊0 = 0.

Depending on the science proposal, either of these approaches — or a combination — may make the most sense for determining the set of outcomes to evaluate.

In our FRO Forecasting pilot, we worked with proposal authors to define two outcomes for each of their proposals. Depending on what made the most sense for each proposal, the two outcomes reflected either relatively independent primary and secondary goals, or sequential milestone outcomes that directly built upon one another (though for simplicity, we called all of the outcomes milestones).

Defining Probability of Success

Once the set of potential outcomes have been defined, the next step is to determine the probability of success between 0% and 100% for each outcome if the proposal is funded. A prediction of 50% would indicate the highest level of uncertainty about the outcome, whereas the closer the predicted probability of success is to 0% or 100%, the more certainty there is that the outcome will be one over the other. 
Furthermore, Franzoni and Stephan decompose probability of success into two components: the probability that the outcome can actually occur in nature or reality and the probability that the proposed methodology will succeed in obtaining the outcome (conditional on it being possible in nature). The total probability is then the product of these two components:

𝑃(𝑚𝑖) = 𝑃nature(𝑚𝑖) * 𝑃proposal(𝑚𝑖)

Depending on the nature of the proposal (e.g., more technology-driven, or more theoretical/discovery driven), each component may be more or less relevant. For example, our forecasting pilot includes a proposal to perform knockout validation of renewable antibodies for 10,000 to 15,000 human proteins; for this project, 𝑃nature(𝑚𝑖) approaches 1 and 𝑃proposal(𝑚𝑖) drives the overall probability of success.

Defining Utility

Similarly, the value of an outcome can be separated into its impact on the scientific field and its impact on society at large. Scientific impact aims to capture the extent to which a project advances the frontiers of knowledge, enables new discoveries or innovations, or enhances scientific capabilities or methods. Social impact aims to capture the extent to which a project contributes to solving important societal problems, improving well-being, or advancing social goals. 

In both of these cases, determining the value of an outcome entails some subjective preferences, so there is no “correct” choice, at least mathematically speaking. However, proxy metrics may be helpful in considering impact. Though each is imperfect, one could consider citations of papers, patents on tools or methods, or users of method, tools, and datasets as proxies of scientific impact. For social impact, some proxy metrics that one might consider are the value of lives saved, the cost of illness prevented, the number of job-years of employment generated, economic output in terms of GDP, or the social return on investment.

The approach outlined by Franzoni and Stephan asks reviewers to assess scientific and social impact on a linear scale (0-100), after which the values can be averaged to determine the overall impact of an outcome. However, we believe that an exponential scale better captures the tendency in science for a small number of research projects to have an outsized impact and provides more room at the top end of the scale for reviewers to increase the rating of the proposals that they believe will have an exceptional impact.

Exponential relationship between the impact score and actual impact + Citation distribution of journal articles

As such, for our FRO Forecasting pilot, we chose to use a framework in which a simple 1–10 score corresponds to real-world impact via a base 2 exponential scale. In this case, the overall impact score of an outcome can be calculated according to

𝓊𝑖 = log[2science impact of 𝑖 + 2social impact of 𝑖] – 1.

For an exponential scale with a different base, one would substitute that base for two in the above equation. Depending on each funder’s specific understanding of impact and the type(s) of proposals they are evaluating, different relationships between scores and utility could be more appropriate.

In order to capture reviewers’ assessment of uncertainty in their evaluations, we asked them to provide median, 25th, and 75th percentile predictions for impact instead of a single prediction. High uncertainty would be indicated by a narrow confidence interval, while low uncertainty would be indicated by a wide confidence interval.

Determining the “But For” Effect of Funding

The above approach aims to identify the highest impact proposals. However, a grantmaker may not want to simply fund the highest impact proposals; rather, they may be most interested in understanding where their funding would make the highest impact — i.e., their “but for” effect. In this case, the grantmaker would want to fund proposals with the maximum difference between the total expected utility of the research proposal if they chose to funded it versus if they chose not to:

“But For” Impact = 𝑇𝐸𝑈(funding) – 𝑇𝐸𝑈(no funding).

For TEU(funding), the probability of the outcome occurring with this specific grantmaker’s funding using the proposed approach would still be defined as above

𝑃(𝑚𝑖 | funding) = 𝑃nature(𝑚𝑖) * 𝑃proposal(𝑚𝑖),

but for 𝑇𝐸𝑈(no funding),  reviewers would need to consider the likelihood of the outcome being achieved through other means. This could involve the outcome being realized by other sources of funding, other researchers, other approaches, etc. Here, the probability of success without this specific grantmaker’s funding could be described as

𝑃(𝑚𝑖 | no funding) = 𝑃nature(𝑚𝑖) * 𝑃other mechanism(𝑚𝑖).

In our FRO Forecasting pilot, we assumed that 𝑃other mechanism(𝑚𝑖) ≈ 0. The theory of change for FROs is that there exists a set of research problems at the boundary of scientific research and engineering that are not adequately supported by traditional research and development models and are unlikely to be pursued by academia or industry. Thus, in these cases it is plausible to assume that,

𝑃(𝑚𝑖 | no funding) ≈ 0
𝑇𝐸𝑈(no funding) ≈ 0
“But For” Impact ≈ 𝑇𝐸𝑈(funding).

This assumption, while not generalizable to all contexts, can help reduce the number of questions that reviewers have to consider — a dynamic which we explore further in the next section.

Designing Forecasting Questions

Once one has determined the total expected utility equation(s) relevant for the proposal(s) that they are trying to evaluate, the parameters of the equation(s) must be translated into forecasting questions for reviewers to respond to. In general, for each outcome, reviewers will need to answer the following four questions:

  1. If this proposal is funded, what is the probability that this outcome will occur?
  2. If this proposal is not funded, what is the probability that this outcome will still occur? 
  3. What will be the scientific impact of this outcome occurring?
  4. What will be the social impact of this outcome occurring?

For the probability questions, one could alternatively ask reviewers about the different probability components (𝑃nature(𝑚𝑖), 𝑃proposal(𝑚𝑖), 𝑃other mechanism(𝑚𝑖), etc.), but in most cases it will be sufficient — and simpler for the reviewer — to focus on the top-level probabilities that feed into the TEU calculation.

In order for the above questions to tap into the benefits of the forecasting framework, they must be resolvable. Resolving the forecasting questions means that at a set time in the future, reviewers’ predictions will be compared to a ground truth based on the actual events that have occurred (i.e., was the outcome actually achieved and, if so, what was its actual impact?). Consequently, reviewers will need to be provided with the resolution date and the resolution criteria for their forecasts. 

Resolution of the probability-based questions hinges mostly on a careful and objective definition of the potential outcomes, and is otherwise straightforward — though note that only one of the probability questions will be resolved, since they are mutually exclusive. The optimal resolution of the scientific and social impact questions may depend on the context of the project and the chosen approach to defining utility. A widely applicable approach is to resolve the utility forecasts by having either program managers or subject matter experts evaluate the results of the completed project and score its impact at the resolution date.

For our pilot, we asked forecasting questions only about the probability of success given funding (question 1 above) and the scientific and social impact of each outcome (questions 3 and 4); since we assumed that the probability of success without funding was zero, we did not ask question 2. Because outcomes for the FRO proposals were designed to be either independent or sequential, we did not have to ask additional questions on the joint probability of multiple outcomes being achieved. We chose to resolve our impact questions with a post-project panel of subject matter experts.

Additional Considerations

In general, there is a tradeoff in implementing this approach between simplicity and thoroughness, efficiency and accuracy. Here are some additional considerations on that tradeoff for those looking to use this approach:

  1. The responsibility of determining the range of potential outcomes for a proposal could be assigned to three different parties: the proposal author, the proposal reviewers, or the program manager. First, grantmakers could ask proposal authors to comprehensively define within their proposal the potential primary and secondary outcomes and/or project milestones. Alternatively, reviewers could be allowed to individually — or collectively — determine what they see as the full range of potential outcomes. The third option would be for program managers to define the potential outcomes based on each proposal, with or without input from proposal authors. In our pilot, we chose to use the third approach with input from proposal authors, since it simplified the process for reviewers and allowed us to limit the number of outcomes under consideration to a manageable amount.
  1. In many cases, a “failed” or null outcome may still provide meaningful value by informing other scientists that the research method doesn’t work or that the hypothesis is unlikely to be true. Considering the replication crises in multiple fields, this could be an important and unaddressed aspect of peer review. Grantmakers could choose to ask reviewers to consider the value of these null outcomes alongside other outcomes to obtain a more complete picture of the project’s utility. We chose not to address this consideration in our pilot for the sake of limiting the evaluation burden on reviewers.
  1. If grant recipients’ are permitted greater flexibility in their research agendas, this expected value approach could become more difficult to implement, since reviewers would have to consider a wider and more uncertain range of potential outcomes. This was not the case for our FRO Forecasting pilot, since FROs are designed to have specific and well-defined research goals.

Other Similar Efforts

Currently, forecasting is an approach rarely used in grantmaking. Open Philanthropy is the only grantmaking organization we know of that has publicized their use of internal forecasts about grant-related outcomes, though their forecasts do not directly influence funding decisions and are not specifically of expected value. Franzoni and Stephan are also currently piloting their Subjective Expected Utility approach with Novo Nordisk.


Our goal in publishing this methodology is for interested grantmakers to freely adapt it to their own needs and iterate upon our approach. We hope that this paper will help start a conversation in the science research and funding communities that leads to further experimentation. A follow up report will be published at the end of the FRO Forecasting pilot sharing the results and learnings from the project.


We’d like to thank Peter Mühlbacher, former research scientist at Metaculus, for his meticulous feedback as we developed this approach and for his guidance in designing resolvable forecasting questions. We’d also like to thank the rest of the Metaculus team for being open to our ideas and working with us on piloting this approach, the process of which has helped refine our ideas to their current state. Any mistakes here are of course our own.

ICSSI 2023: Hacking the Science of Science

What are the best approaches for structuring, funding, and conducting innovative scientific research? The importance of this question — long pondered by philosophers, historians, sociologists, and scientists themselves — is motivating the rapid growth of a new, interdisciplinary and empirically minded Science of Science that spans academia, industry, and government. At the 2nd annual International Conference on the Science of Science and Innovation, held June 26-29 at Northwestern University, experts from across this diverse community gathered to build new connections and showcase the cutting edge of the field.

At this year’s conference, the Federation of American Scientists aimed to further these goals by partnering with Northwestern’s Kellogg School of Management to host the first Metascience Hackathon. This event brought together participants from eight different countries — representing 20 universities, two federal agencies, and two non-profits — to stimulate cross-disciplinary collaboration and develop new approaches to impact. Diverging from the traditional hackathon model, we encouraged teams to advance the field along one of three distinct dimensions: Policy, Knowledge, and Tools.

Participants rose to the occasion, producing eight creative and impactful projects. In the Policy track, teams proposed transformative strategies to enhance scientific reproducibility, support immigrant STEM researchers, and foster impactful interdisciplinary research. The Knowledge track saw teams leveraging data and AI to explore the dynamics of peer review, scientific collaboration, and the growth of the science of science community. The Tools track introduced novel platforms for fostering global research collaboration and visualizing academic mobility.

These projects, developed in less than a day (!), underscore the potential of the science of science community to drive impactful change. They represent not only the innovative spirit of the participants but also the broader value of interdisciplinary collaboration in addressing complex challenges and shaping the future of science. 

We are excited to showcase the teams’ work below.


A Funders’ Tithe for Reproducibility Centers

Project Team: Shambhobi Battacharya (Northwestern University), Elena Chechik (European University at St. Petersburg), Alexander Furnas (Northwestern University), & Greg Quigg (University of Massachusetts Amherst)

Background: The responsibility for ensuring scientific reproducibility is primarily on individual researchers and academic institutions. However, reproducibility efforts are often inadequate due to limited resources, publication bias, time constraints, and lack of incentives.

Solution: We propose a policy whereby large science funding bodies earmark a certain percentage of their allocated grants towards establishing and maintaining reproducibility centers. These specialized entities would employ dedicated teams of independent scientists to reproduce or replicate high-impact, high-leverage, or suspicious research. The existence of dedicated reproducibility centers with independent scientists conducting post-hoc, self-directed reproducibility and replication studies will alter the incentives for researchers throughout the scientific community, strengthening the body of scientific knowledge and increasing public trust in scientific findings.

View the proposal

Immigrant STEM Training: Crossing the Valley of Death

Project Team: Sujata Emani (Personal Capacity), Takahiro Miura (University of Tokyo), Mengyi Sun (Northwestern University), & Alice Wu (Federation of American Scientists)

Background: Immigrants significantly contribute to the U.S. economy, particularly in STEM entrepreneurship and innovation. However, they often encounter legal, financial, and interpersonal barriers that lead to high rates of mental health disorders and attrition from scientific research.

Solution: To mitigate these challenges, we propose that science funding agencies expand eligibility for major federal science fellowships (e.g., the NSF GRFP and NIH NRSA) to include international students, providing them with more stable funding sources. We also propose a broader shift in federal research funding towards research fellowships, reducing hierarchical power structures and improving the training environment. Implementing these recommendations can empower all graduate students, foster greater scientific progress, and benefit the American economy.

View the proposal

Increasing Interdisciplinary Research through a More Balanced Research Funding and Evaluation Process

Project Team: Jonathan Coopersmith (Texas A&M University), Jari Kuusisto (University of Vaasa), Ye Sun (University College London), & Hongyu Zhou (University of Antwerp)

Background: Solving local, national, and global challenges will increasingly require interdisciplinary research that spans diverse perspectives and insights. Despite the need for impactful interdisciplinary research, it has not reached its full potential due to the persistence of significant obstacles at many levels of the creation of knowledge. This lack of support makes it challenging to develop and utilize the full potential of interdisciplinarity.

Solution: We propose that national funding agencies should launch a Balanced Research Funding and Evaluation Initiative to create and implement national standards for interdisciplinary research development, management, promotion, funding, and evidence-based evaluation. We outline the specific mechanisms such a program could use to unlock more impactful research on global challenges.

View the proposal


Identifying Reviewer Disciplines and their Impact on Peer Review

Project Team: Chenyue Jiao (University of Illinois, Urbana Champaign), Erzhuo Shao (Northwestern University), Louis Shekhtman (Northeastern University), & Satyaki Sikdar (Indiana University, Bloomington)

Background: Given the rise in interdisciplinary and multidisciplinary research, there is an increasing need to obtain the perspectives of multiple peer reviewers with unique expertise. In this project, we explore whether reviewers from particular disciplines tend to be more critical of papers applying a different disciplinary approach.

Solution: Using a dataset of open reviews from Nature Communications, we assign concepts to papers and reviews using the OpenAlex concept tagger, and analyze review sentiment using OpenAI’s ChatGPT API. Our results identify network pairs of review and paper concepts; several pairs correspond to expectations, such as engineers’ negativity towards physicists’ work and economists’ criticisms of biology studies. Further study and collection of additional datasets could improve the utility of these results.

View the abstract

Team Formation: Expected or Unexpected

Project Team: Noly Higashide (University of Tokyo), Oh-Hyun Kwon (Pohang University of Science and Technology), Zeijan Lyu (University of Chicago), & Seokkyun Woo (Northwestern University)

Background: This year’s conference highlighted the importance of studying the interaction of scientists to better understand the scientific ecosystem. Here, we explore the dynamics of scientific collaboration and its influence on the success of resulting research.

Solution: Using SciSciNet data, we investigate how the likelihood of team formation affects the impact, disruption, and novelty of the papers in the field of biology, chemistry, psychology, and sociology. Our results suggest that the relationship between team structure and research impact varies across disciplines. Specifically, in chemistry and biology the relationship between proximity and citations has an inverse U-shape, such that papers with moderate proximity have the highest impact. ​​These findings underline the need for further exploration of how collaboration patterns affect scientific discovery.

View the abstract

SciSciPeople: Identifying New Members of the Science of Science Community

Project Team: Sirag Erkol (Northwestern University), Yifan Qian (Northwestern University), & Henry Xu (Carnegie Mellon University)

Background: The growth and diversification of the science of science community is crucial for fostering innovation and broadening perspectives.

Solution: Our project introduces SciSciPeople, a new pipeline designed to identify potential new members for this community. Using data from the ICSSI website, SciSciNet, and Google Scholar, our pipeline identifies individuals who have shown interest in the science of science — either through citing well-known review papers, or noting the field as a research interest on Google Scholar — but are not yet part of the ICSSI community. Applying this pipeline successfully identified hundreds of relevant individuals. This tool not only enriches the science of science community but also has potential applications for various fields aiming to discover new individuals to expand their communities.

View the abstract


ScholarConnect: A Platform for Fostering Knowledge Exchange

Project Team: Sai Koneru (Pennsylvania State University), Xuelai Li (Imperial College London), Casey Meyer (OurResearch), Mark Tschopp (Army Research Laboratory)

Background: The rapid growth of the scientific community has made it hard to stay aware of the researchers working on similar projects to your own. As a result, there is a need for new ways to identify researchers doing relevant work in other institutions or fields.

Solution: We created “ScholarConnect”, an open-source tool designed to foster global collaboration among researchers. ScholarConnect recommends potential collaborators based on similarities in research expertise, determined by factors like publication records, concepts, institutions, and countries. The tool offers personalized recommendations and an interactive user interface, allowing users to explore and connect with like-minded researchers from diverse backgrounds and disciplines. We’ve ensured privacy and security by not storing user-entered information and basing recommendations on anonymized, aggregated profiles, and we invite contributions from the wider research community to improve ScholarConnect.

View the project

Scientist Map: A Tool for Visualizing Academic Mobility across Institutions

Project Team: Tianji Jiang (University of California Los Angeles), Jesse Tabak (Northwestern University), & Shibo Zhou (Georgia Institute of Technology)

Background: The migration of academic researchers provides a unique window to observe the mobility of knowledge and innovations today, and has been a valuable area of investigation for scholars across various disciplines.

Solution: To study the migration of academic individuals, we introduce a tool designed to allow users to search an academic’s history of affiliation and visualize their historical path on a map. This tool aims to help scientific producers and consumers better understand the migration of experts across institutions, and to support relevant science of science by providing easy access to researchers’ migration history.

View the project