Kickstarting Collaborative, AI-Ready Datasets in the Life Sciences with Government-funded Projects

In the age of Artificial Intelligence (AI), large high-quality datasets are needed to move the field of life science forward. However, the research community lacks strategies to incentivize collaboration on high-quality data acquisition and sharing. The government should fund collaborative roadmapping, certification, collection, and sharing of large, high-quality datasets in life science. In such a system, nonprofit research organizations engage scientific communities to identify key types of data that would be valuable for building predictive models, and define quality control (QC) and open science standards for collection of that data. Projects are designed to develop automated methods for data collection, certify data providers, and facilitate data collection in consultation with researchers throughout various scientific communities. Hosting of the resulting open data is subsidized as well as protected by security measures. This system would provide crucial incentives for the life science community to identify and amass large, high-quality open datasets that will immensely benefit researchers.

Challenge and Opportunity 

Life science has left the era of “one scientist, one problem.” It is becoming a field wherein collaboration on large-scale research initiatives is required to make meaningful scientific progress. A salient example is Alphafold2, a machine learning (ML) model that was the first to predict how a protein will fold with an accuracy meeting or exceeding experimental methods. Alphafold2 was trained on the Protein Data Bank (PDB), a public data repository containing standardized and highly curated results of >200,000 experiments collected over 50 years by thousands of researchers.

Though such a sustained effort is laudable, science need not wait another 50 years for the ‘next PDB’. If approached strategically and collaboratively, the data necessary to train ML models can be acquired more quickly, cheaply, and reproducibly than efforts like the PDB through careful problem specification and deliberate management. First, by leveraging organizations that are deeply connected with relevant experts, unified projects taking this approach can account for the needs of both the people producing the data and those consuming it. Second, by centralizing plans and accountability for data and metadata standards, these projects can enable rigorous and scalable multi-site data collection. Finally, by securely hosting the resulting open data, the projects can evaluate biosecurity risk and provide protected access to key scientific data and resources that might otherwise be siloed in industry. This approach is complementary to efforts that collate existing data, such as the Human Cell Atlas and UCSD Genome Browser, and satisfy the need for new data collection that adheres to QC and metadata standards.

In the past, mid-sized grants have allowed multi-investigator scientific centers like the recently funded Science and Technology Center for Quantitative Cell Biology (QCB, $30M in funding 2023) to explore many areas in a given field. Here, we outline how the government can expand upon such schemes to catalyze the creation of impactful open life science data. In the proposed system, supported projects would allow well-positioned nonprofit organizations to facilitate distributed, multidisciplinary collaborations that are necessary for assembling large, AI-ready datasets. This model would align research incentives and enable life science to create the ‘next PDBs’ faster and more cheaply than before.  

Plan of Action 

Existing initiatives have developed processes for creating open science data and successfully engaged the scientific community to identify targets for the ‘next PDB’ (e.g., Chan Zuckerberg Initiative’s Open Science program, Align’s Open Datasets Initiative). The process generally occurs in five steps:

  1. A multidisciplinary set of scientific leaders identify target datasets, assessing the scale of data required and the potential for standardization, and defining standards for data collection methods and corresponding QC metrics.
  2. Collaboratively develop and certify methods for data acquisition to de-risk the cost-per-datapoint and utility of the data.
  3. Data collection methods are onboarded at automation partner organizations, such as NSF BioFoundries and existing National Labs, and these automation partners are certified to meet the defined data collection standards and QC metrics.
  4. Scientists throughout the community, including those at universities and for-profit companies, can request data acquisition, which is coordinated, subsidized, and analyzed for quality.
  5. Data becomes publicly available and is hosted in a stable, robustly maintained database with biosecurity, cybersecurity, and privacy measures in perpetuity for researchers to access. 

The U.S. Government should adapt this process for collaborative, AI-ready data collection in the life sciences by implementing the following recommendations:  

Recommendation 1. An ARPA-like agency — or agency division — should launch a Collaborative, AI-Ready Datasets program to fund large-scale dataset identification and collection.

This program should be designed to award two types of grants:

  1. A medium-sized “phase 1” award of $1-$5m to fund new dataset identification and certification. To date, roadmapping dataset concepts (Steps 1-2 above) has been accomplished by small-scale projects of $1-$5M with a community-driven approach. Though selectively successful, these projects have not been as comprehensive or inclusive as they could otherwise be. Government funding could more sustainably and systematically permit iterative roadmapping and certification in areas of strategic importance.
  2. A large “phase 2” award of $10-$50m to fund the collection of previously identified datasets. Currently, there are no funding mechanisms designed to scale up acquisition (Steps #3-4 above) for dataset concepts that have been deemed valuable and derisked. To fill this gap, the government should leverage existing expertise and collaboration across the nonprofit research ecosystem by awarding grants of $10-50m for the coordination, acquisition, and release of mature dataset concepts. The Human Genome project is a good analogy, wherein a dataset concept was identified and collection was distributed amongst several facilities.

Recommendation 2. The Office of Management and Budget should direct the NSF and NIH to develop plans for funding academics and for-profits traunched on data deposition.

Once an open dataset is established, the government can advance the use and further development of that dataset by providing grants to academics that are traunched on data deposition. This approach would be in direct alignment with the government’s goals for supporting open, shared resources for AI innovation as laid out in section 5.2 of the Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence

Agencies’ approaches to meeting this priority could vary. In one scenario, a policy or program could be established in which grantees would use a portion of the funds disbursed to them to pay for open data acquisition at a certified data provider. Analogous structures have enabled scientists to access other types of shared scientific infrastructure, such as the NSF’s ACCESS program. In the same way that ACCESS offers academics access to compute resources, it could be expanded to offer academic access to data acquisition resources at verified facilities. Offering grants in this way would incentivize the scientific community to interact with and expand upon open datasets, as well as encourage compliance through traunching. 

Efforts to support use and development of open, certified datasets could also be incorporated into existing programs, including the National AI Research Resource, for which complementary programs could be developed to provide funding for standardized data acquisition and deposition. Similar ideas could also be incorporated into core programs within NSF and NIH, which already disburse funds after completion of annual progress reports. Such programs could mandate checks for data deposition in these reports.

Conclusion 

Collaborative, AI-Ready datasets would catalyze progress in many areas of life science, but realizing them requires innovative government funding. By supporting coordinated projects that span dataset roadmapping, methods and standards development, partner certification, distributed collection, and secure release on a large scale, the government can coalesce stakeholders and deliver the next generation of powerful predictive models. To do so, it should combine small-sized, mid-sized, and traunched grants in unified initiatives that are orchestrated by nonprofit research organizations, which are uniquely positioned to execute these initiatives end-to-end. These initiatives should balance intellectual property protection and data availability, and thereby help deliver key datasets upon which new scientific insights depend.

This action-ready policy memo is part of Day One 2025 — our effort to bring forward bold policy ideas, grounded in science and evidence, that can tackle the country’s biggest challenges and bring us closer to the prosperous, equitable and safe future that we all hope for whoever takes office in 2025 and beyond.

Frequently Asked Questions
What is involved in roadmapping dataset opportunities?

Roadmapping dataset opportunities, which can take up to a year, requires convening experts across multiple disciplines, including experimental biology, automation, machine learning, and others. In collaboration, these experts assess both the feasibility and impact of opportunities, as well as necessary QC standards. Roadmapping culminates in determination of dataset value — whether it can be used to train meaningful new machine learning models.

Why should data collection be centralized but redundant?

To mitigate single-facility risk and promote site-to-site interoperability, data should be collected across multiple sites. To ensure that standards and organization holds across sites, planning and documentation should be centralized.

How should automation partners be certified?

Automation partners will be evaluated according to the following criteria:



  • Commitment to open science

  • Rigor and consistency in methods and QC procedures

  • Standardization of data and metadata ontologies


More specifically, certification will depend upon the abilities of partners to accommodate standardized ontologies, capture sufficient metadata, and reliably pass data QC checks. It will also require partners to have demonstrated a commitment to data reusability and replicability, and that they are willing to share methods and data in the open science ecosystem.

Should there be an embargo before data is made public?

Today, scientists have no obligation to publish every piece of data they collect. In an Open Data paradigm, all data must eventually be shared. For some types of data, a short, optional embargo period would enable scientists to participate in open data efforts without compromising their ability to file patents or publish papers. For example, in protein engineering, the patentable product is the sequence of a designed protein, making immediate release of data untenable. An embargo period of one to two years is sufficient to alleviate this concern and may even hasten data sharing by linking it to a fixed length of time after collection, rather than to publication. Whether or not an embargo should be implemented and its length should be determined for each data type, and designed to encourage researchers to participate in acquisition of open data.

How do we ensure biosecurity of the data?

Biological data is a strategic resource and requires stewardship and curation to ensure it has maximum impact. Thus, data that is generated through the proposed system should be hosted by high-quality providers that adhere to biosecurity standards and enforce embargo periods. Appropriate biosecurity standards will be specific to different types of data, and should be formulated and periodically reevaluated by a multidisciplinary group of stakeholders. When access to certified, post-embargo data is requested, the same standards will apply as will export controls. In some instances, for some users, restricting access may be reasonable. For offering this suite of valuable services, hosting providers should be subsidized through reimbursements.

How to Prompt New Cross-Agency and Cross-Sector Collaboration to Advance Learning Agendas

The 2018 Foundations for Evidence-Based Policymaking Act (Evidence Act) promotes a culture of evidence within federal agencies. A central part of that culture entails new collaboration between decision-makers and those with diverse forms of expertise inside and outside of the federal government. The challenge, however, is that new cross-agency and cross-sector collaborative relationships don’t always arise on their own. To overcome these challenges, federal evaluation staff can use “unmet desire surveys,” an outreach tool that prompts agency staff to reflect on how the success of their programs relates to what is happening in other agencies and outside government and how engaging with these other programs and organizations would help their work be more effective. It also prompts them to consider the situation from the perspective of potential collaborators—why should they want to engage?

The unmet desire survey is an important data-gathering mechanism that provides actionable information to create new connections between agency staff and people—such as those in other federal agencies, along with researchers, community stakeholders, and others outside the federal government—who have the information they desire. Then, armed with that information, evaluation staff can use the new Evidence Project Portal on Evaluation.gov (to connect with outside researchers) and/or other mechanisms (to connect with other potential collaborators) to conduct matchmaking that will foster new collaborative relationships. Using existing authorities and resources, agencies can pilot unmet desire surveys as a concrete mechanism for advancing federal learning agendas in a way that builds buy-in by directly meeting the needs of agency staff.

Challenge and Opportunity

A core mission of the Evidence Act is to foster a culture of evidence-based decision-making within federal agencies. Since the problems agencies tackle are often complex and multidimensional, new collaborative relationships between decision-makers in the federal government and those in other agencies and in organizations outside the federal government are essential to realizing the Evidence Act’s vision. Along these lines, Office of Management and Budget (OMB) implementation guidance stresses that learning agendas are “an opportunity to align efforts and promote interagency collaboration in areas of joint focus or shared populations or goals” (OMB M-19-23), and that more generally a culture of evidence “cannot happen solely at the top or in isolated analytical offices, but rather must be embedded throughout each agency…and adopted by the hardworking civil servants who serve on behalf of the American people” (OMB M-21-27).

New cross-agency and cross-sector collaborative relationships rarely arise on their own. They are voluntary, and between people who often start off as strangers to one another. Limited resources, lack of explicit permission, poor prior experiences, differing incentives, and stereotypes are all challenges to persuading strangers to engage with each other. In addition, agency staff may not previously have spent much time thinking about how new collaborative relationships could help answer questions posed by their learning agenda, or even that accessible mechanisms exist to form new relationships.  This presents an opportunity for new outreach by evaluation staff, to expand a sense of what kinds of collaborative relationships would be both valuable and possible. 

For instance, the Department of the Interior (DOI)’s 2024 Learning Agenda asks: What are the primary challenges to training a diverse, highly skilled workforce capable of delivering the department’s mission? The DOI itself has vital historical and other contextual information for answering this question. Yet officials from other departments likely have faced (or currently face) a similar challenge, and are in a position to share what they’ve tried so far, what has worked well, and what has fallen short. In addition, researchers who study human resource development could share insights from literature, as well as possibly partner on a new study to help answer this question in the DOI context. 

Each department and agency is different, with its own learning agenda, decision-making processes, capacity constraints, and personnel needs. And so what is needed are new forms of informal collaboration (knowledge exchange) and/or formal collaboration (projects with shared ownership, decision-making authority, and accountability) that foster back-and-forth interaction. The challenge, however, is that agency staff may not consider such possibilities without being prompted to do so or may be uncertain how to communicate the opportunity to potential collaborators in a way that resonates with their goals.

This memo proposes a flexible tool that evaluation staff (e.g., evaluation officers at federal agencies) can use to generate buy-in among agency staff and leadership while also promoting collaboration as emphasized in OMB guidance and in the Evidence Act. The tool, which has already proven valuable in the federal government (see FAQs) , local government, and in the nonprofit sector, is called an “unmet desire survey.” The survey measures unmet desires for collaboration by prompting staff to consider the following types of questions: 

These questions elicit critical insights about why agency staff value new connection and are highly flexible. For instance, in the first question posed above, evaluation staff can choose to ask about new information that would be helpful for any program or only about information relevant to programs that are top priorities for their agency. In other words, unmet desire surveys need not add one more thing to the plate; rather, they can be used to accelerate collaboration directly tied to current learning priorities. 

Unmet desire surveys also legitimize informal collaborative relationships. Too often, calls for new collaboration in the policy sphere immediately segue into overly structured meetings that fail to uncover promising areas for joint learning and problem-solving. Meetings across government agencies are often scripted presentations about each organization’s activities, providing little insight on ways they could collaborate to achieve better results. Policy discussions with outside research experts tend to focus on formal evaluations and long-term research projects that don’t surface opportunities to accelerate learning in the near term. In contrast, unmet desire surveys explicitly legitimize the idea that diverse thinkers may want to connect only for informal knowledge exchange rather than formal events or partnerships. Indeed, even single conversations can greatly impact decision-makers, and, of course, so can more intensive relationships.

Whether the goal is informal or formal collaboration, the problem that needs to be solved is both factual and relational. In other words, the issue isn’t simply that strangers do not know each other—it’s also that strangers do not always know how to talk to one another. People care about how others relate to them and whether they can successfully relate to others. Uncertainty about relationality prevents people from interacting with others they do not know. This is why unmet desire surveys also include questions that directly measure hesitations about interacting with people from other agencies and organizations, and encourage agency staff to think about interactions from others’ perspectives.

The fact that the barriers to new collaborative relationships are both factual as well as relational underscores why people may not initiate them on their own. That’s why measuring unmet desire is only half the battle—it’s also important to ensure that evaluation staff have a plan in place to conduct matchmaking using the data gathered from the survey. One way is to create a new posting on the Evidence Project Portal (especially if the goal is to engage with outside researchers). A second way is to field the survey as part of a convening, which already has as one of its goals the development of new collaborative relationships. A third option is to directly broker connections. Regardless of which option is pursued, note that large amounts of extra capacity are likely unnecessary, at least at first. The key point is simply to ensure that matchmaking is a valued part of the process.

In sum, by deliberately inquiring about connections with others who have diverse forms of relevant expertise—and then making those connections anew—evaluation staff can generate greater enthusiasm and ownership among people who may not consider evaluation and evidence-building as part of their core responsibilities.

Plan of Action

Using existing authorities and resources, evaluation staff (such as evaluation officers at federal agencies) can take three steps to position unmet desire surveys as a standard component of the government’s evidence toolbox. 

Step 1. Design and implement pilot unmet desire surveys. 

Evaluation staff are well positioned to conduct outreach to assess unmet desire for new collaborative relationships within their agencies. While individual staff can work independently to design unmet desire surveys, it may be more fruitful to work together, via the Evaluation Officer Council, to design a baseline survey template. Individuals could then work with their teams to adapt the baseline template as needed for each agency, including identifying which agency staff to prioritize as well as the best way to phrase particular questions (e.g., regarding the types of connections that employees want in order to improve the effectiveness of their work or the types of hesitancies to ask about). Given that the question content is highly flexible, unmet desire surveys can directly accelerate learning agendas and build buy-in at the same time. Thus, they can yield tangible, concrete benefits with very little upfront cost.

Step 2. Meet unmet desires by matchmaking. 

After the pilot surveys are administered, evaluation staff should act on their results. There are several ways to do this without new appropriations. One way is to create a posting for the Evidence Project Portal, which is explicitly designed to advertise opportunities for new collaborative relationships, especially with researchers outside the federal government. Another way is to field unmet desire surveys in advance of already-planned convenings, which themselves are natural places for matchmaking (e.g., the Agency for Healthcare Research and Quality experience described in the FAQs). Lastly, for new cross-agency collaborative relationships along with other situations, evaluation staff may wish to engage in other low-lift matchmaking on their own. Depending upon the number of people they choose to survey, and the prevalence of unmet desire they uncover, they may also wish to bring on short-term matchmakers through flexible hiring mechanisms (e.g., through the Intergovernmental Personnel Act). Regardless of which option is pursued, the key point is that matchmaking itself must be a valued part of this process. Documenting successes and lessons learned then set the stage for using agency-specific discretionary funds to hire one or more in-house matchmakers as longer-term or staff appointments.

Step 3. Collect information on successes and lessons learned from the pilot.

Unmet desire surveys can be tricky to field because they entail asking employees about topics they may not be used to thinking about. It often takes some trial and error to figure out the best ways to ask about employees’ substantive goals and their hesitations about interacting with people they do not know. Piloting unmet desire surveys and follow-on matchmaking can not only demonstrate value (e.g., the impact of new collaborative relationships fostered through these combined efforts) to justify further investment but also suggest how evaluation leads might best structure future unmet desire surveys and subsequent matchmaking.

Conclusion

An unmet desire survey is an adaptable tool that can reveal fruitful pathways for connection and collaboration. Indeed, unmet desire surveys leverage the science of collaboration by ensuring that efforts to broker connections among strangers consider both substantive goals and uncertainty about relationality. Evaluation staff can pilot unmet desire surveys using existing authorities and resources, and then use the information gathered to identify opportunities for productive matchmaking via the Evidence Project Portal or other methods. Ultimately, positioning the survey as a standard component of the government’s evidence toolbox has great potential to support agency staff in advancing federal learning agendas and building a robust culture of evidence across the U.S. government.

This action-ready policy memo is part of Day One 2025 — our effort to bring forward bold policy ideas, grounded in science and evidence, that can tackle the country’s biggest challenges and bring us closer to the prosperous, equitable and safe future that we all hope for whoever takes office in 2025 and beyond.

Frequently Asked Questions
Have any agencies tried using unmet desire surveys? What impact did they have?

Yes, the Agency for Healthcare Research and Quality (AHRQ) has used unmet desire surveys several times in 2023 and 2024. Part of AHRQ’s mission is to improve the quality and safety of healthcare delivery. It has prioritized scaling and spreading evidence-based approaches to implementing person-centered care planning for people living with or at risk for multiple chronic conditions. This requires fostering new cross-sector collaborative relationships between clinicians, patients, caregivers, researchers, payers, agency staff and other policymakers, and many others. That’s why, in advance of several recent convenings with these diverse stakeholders, AHRQ fielded unmet desire surveys among the participants. The surveys uncovered several avenues for informal and formal collaboration that stakeholders believed were necessary and, importantly, informed the agenda for their meetings. Relative to many convenings, which are often composed of scripted presentations about individuals’ diverse activities, conducting the surveys in advance and presenting the results during the meeting shaped the agenda in more action-oriented ways.


AHRQ’s experience demonstrates a way to seamlessly incorporate unmet desire surveys into already-planned convenings, which themselves are natural opportunities for matchmaking. While some evaluation staff may wish to hire separate matchmakers or engage in matchmaking using outside mechanisms like the Evidence Project Portal, the AHRQ experience also demonstrates another low-lift, yet powerful, avenue. Lastly, while the majority of this memo and the FAQs focus on measuring unmet desire among agency staff, the AHRQ experience also demonstrates the applicability of this idea to other stakeholders as well.

Who should unmet desire surveys be administered to?

The best place to start—especially when resources are limited—is with potential evidence champions. These are people who are already committed to answering questions on their agency’s learning agenda and are likely to have an idea of the kinds of cross-agency or cross-sector collaborative relationships that would be helpful. These potential evidence champions may not self-identify as such; rather, they may see themselves as program managers, customer-experience experts, bureaucracy hackers, process innovators, or policy entrepreneurs. Regardless of terminology, the unmet desire survey provides people who are already motivated to collaborate and connect with a clear opportunity to articulate their needs. Evaluation staff can then respond by posting on the Evidence Project portal or other matchmaking on their own to stimulate new and productive relationships for those people.

Who should conduct an unmet desire survey?

The administrator should be someone with whom agency staff feel comfortable discussing their needs (e.g., a member of an agency evaluation team) and who is able to effectively facilitate matchmaking—perhaps because of their network, their reputation within the agency, their role in convenings, or their connection to the Evidence Project Portal. The latter criterion helps ensure that staff expect useful follow-up, which in turn motivates survey completion and participation in follow-on activities; it also generates enthusiasm for engaging in new collaborative relationships (as well as creating broader buy-in for the learning agenda). In some cases, it may make the most sense to have multiple people from an evaluation team surveying different agency staff or co-sponsoring the survey with agency innovation offices. Explicit support from agency leadership for the survey and follow-on activities is also crucial for achieving staff buy-in.

What questions should be asked in an unmet desire survey?

Survey content is meant to be tailored and agency-specific, so the sample questions can be adapted as follows:



  • Which learning agenda question(s) are you focused on? Is there information about other programs within the government and/or information that outside researchers and other stakeholders have that would help answer it? What kinds of people would be helpful to connect with?
    This question can be left entirely open-ended or be focused on particular priorities and/or particular potential collaborators (e.g., only researchers, or only other agency staff, etc.).

  • Are you looking for informal collaboration (oriented toward knowledge exchange) or formal collaboration (oriented toward projects with shared ownership, decision-making authority, and accountability)?
    This question may invite responses related to either informal or formal collaboration, or instead may only ask about knowledge exchange (a relatively lower commitment that may be more palatable to agency leadership).

  • What hesitations (perhaps due to prior experiences, lack of explicit permission, stereotypes, and so on) do you have about interacting with other stakeholders? What hesitations do you think they might have about interacting with you?
    This question should refer to specific types of hesitancy that survey administrators believe are most likely (e.g., ask about a few hesitancies that seem most likely to arise, such as lack of explicit permission, concerns about saying something inappropriate, or concerns about lack of trustworthy information).

  • Why should they want to connect with you?

  • Why do you think these connections don’t already exist?
    These last two questions can similarly be left broad or include a few examples to help spark ideas.


Evaluation staff may also choose to only ask a subset of the questions.

Who should conduct matchmaking in response to an unmet desire survey?

Again, the answer is agency-specific. In cases that will use the Evidence Project Portal, agency evaluation staff will take the first stab at crafting postings. In other cases, meeting the unmet desire may occur via already-planned convenings or matchmaking on one’s own. Formalizing this duty as a part of one or more people’s official responsibilities sends a signal about how much this work is valued. Exactly who those people are will depend on the agency’s structure, as well as on whether there are already people in a given agency who see matchmaking as part of their job. The key point is that matchmaking itself should be a valued part of the process.

When is the right time to field an unmet desire survey?

While unmet desire surveys can be done any time and on a continuous basis, it is best to field them when there is either an upcoming convening (which itself is a natural opportunity for matchmaking) or there is identified staff capacity for follow-on matchmaking and employee willingness to build collaborative relationships.

How is this tool different from other collaboration tools?

Many evaluation officers and their staff are already forming collaborative relationships as part of developing and advancing learning agendas. Unmet desire surveys place explicit focus on what kinds of new collaborative relationships agency staff want to have with staff in other programs, either within their agency/department or outside it. These surveys are designed to prompt staff to reflect on how the success of their program relates to what is happening elsewhere and to consider who might have information that is relevant and helpful, as well as any hesitations they have about interacting with those people. Unmet desire surveys measure both substantive goals as well as staff uncertainty about interacting with others.

Not Accessible: Federal Policies Unnecessarily Complicate Funding to Support Differently Abled Researchers. We Can Change That.

Persons with disabilities (PWDs) are considered the largest minority in the nation and in the world. There are existing policies and procedures from agencies, directorates, or funding programs that provide support for Accessibility and Accommodations (A&A) in federally funded research efforts. Unfortunately, these policies and procedures all have different requirements, processes, deadlines, and restrictions. This lack of standardization can make it difficult to acquire the necessary support for PWDs by placing the onus on them or their Principal Investigators (PIs) to navigate complex and unique application processes for the same types of support. 

This memo proposes the development of a standardized, streamlined, rolling, post-award support mechanism to provide access and accommodations for PWDs as they conduct research and disseminate their work through conferences and convenings. The best case scenario is one wherein a PI or their institution can simply submit the identifying information for the award that has been made and then make a direct request for the support needed for a given PWD to work on the project. In a multi-year award such a request should be possible at any time within the award period. 

This could be implemented by a single, streamlined policy adopted by all agencies with the process handled internally. Or, by a new process across agencies under Office of Science and Technology Policy (OSTP) or Office of Management and Budget (OMB) that handles requests for accessibility and accommodations at federally funded research sites and at federally funded convenings. An alternative to a single streamlined policy across these agencies might be a new section in the uniform guidance for federal funding agencies, also known as 2 CFR 200.

This memo focuses on Federal Open Science funding programs to illustrate the challenges in getting A&A funding requests supported.  The authors have taken an informal look at agencies outside of science and technology funding.  We found similar challenges across federal grantmaking in the Arts and Humanities, Social Services, and Foreign Relations and Aid entities. Similar issues likely exist in private philanthropy as well.

Challenge and Opportunity

Deaf/hard-of-hearing (DHH), Blind/low-vision (BLV), and other differently abled academicians, senior personnel, students, and post-doctoral fellows engaged in federally funded research face challenges in acquiring accommodations for accessibility. These include, but are not limited to: 

Having these services available is crucial for promoting an inclusive research environment on a larger scale. 

Moving to a common, post-award process:

Such a process might follow these steps below. The example below is from the National Science Foundation (NSF), but the same, or similar process could be done within any agency:

  1. PI receives notification of grant award from NSF. PI identifies need for A & A services at start, or at any time during the grant period
  2. PI (or SRS staff) submits request for A&A funding support to NSF. Request includes NSF program name and award number, the specifics of the requested A & A support, a budget justification and three vendor quotes (if needed)
  3. Use of funds is authorized, and funding is released to PI’s institution and acquisition would follow their standard purchasing or contracting procedures
  4. PI submits receipts/ paid vendor invoice to funding body
  5. PI cites and documents use of funds in annual report, or equivalent, to NSF

Current Policies and Practices

Pre-Award Funding

Principal Investigators (PIs) who request A&A  support for themselves or for other members of the research team are sometimes required to apply for it in their initial grant proposals. This approach has several flaws. 

First and foremost, this funding process reduces the direct application of research dollars for these PIs and their teams compared to other researchers in the same program. Simply put, if two applicants are applying for a $100,000 grant, and one needs to fund $10,000 worth of accommodations, services, and equipment out of the award, they have $10,000 less to pursue the proposed research activities.  This essentially creates a “10% A & A tax” on the overall research funding request.

Lived Experience Example

In a real world example, the author and his colleague, the late Dr. Mel Chua, were awarded a $60,000, one year grant to do a qualitative research case study as part of the Ford Foundation Critical Digital Infrastructure Research cohort.  As Dr. Chua was Deaf, the PIs pointed out to Ford that $10,000 worth of support services would be needed to cover costs for 

We communicated the fact that spending general research award money on those services would reduce the research work the funds were awarded to support.  The Ford Foundation understood and provided an additional $10,000 as post-award funding to cover those services. Ford did not inform the PIs as to whether that support came from another directed set of funds for A&A support or from discretionary dollars within the foundation.

Second, it can be limiting for the funded project to work with or hire PWDs as co-PIs, students, or if they weren’t already part of the original grant proposal. For example, suppose a research project is initially awarded funding for four years without A&A support and then a promising team member who is a PWD appears on the scene in year three who would require it. In this case, PIs then must: 

Post-Award Funding

Some agencies have programs for post-award supplemental funding that address the challenges described above. While these are well-intentioned, many are complicated and often have different timelines, requirements, etc. In some cases, a single supplemental funding source may be addressing all aspects of diversity, equity and inclusion as well as A&A.  The needs and costs in the first three categories are significantly different than in the last. Some post-award pools come from the same agency’s annual allocation program-wide. If those funds have been primarily expended on the initial awards for the solicitation, there may be little, or no money left to support post-award funding for needed accommodations. The table below briefly illustrates the range of variability across a subset of representative supplemental funding programs. There are links in the top row of the table to access the complete program information. Beyond the programs in this table, more extensive lists of NSF and NIH offerings are provided by those agencies. One example is the NSF Dear Colleague Letter Persons with Disabilities – STEM Engagement and Access.

ProgramNSF STEM Access for Persons with Disabilities (STEM-APW D) NIH Grants GuideNSF PAPPG FASED
(Under Section E #7)
NIH Support for Scientific Conferences (R13 and U13)US – NSF BIO MCB Guide Proposals
Streamlined processNoNoYesNoNo
Specifically focused on Accessibility/A accommodationYesNoYesNoNo
Application and award timeline2 months before the funds are needed3-4 months from application to award. 10-month window for applying – October to MayIf part of the PAPPG, same as the proposal date.
If supplemental 2 months
8-9 months from application to awardPart of a full event proposal
Funding Caps?Yes, $100,000VariableMust not be a major component of the total budgetVariableConferences $5,000 to $20,000; Workshops, $50,000 to $100,000
Conf Support Only?NoNoNoYesYes
Submitted by PIYes or by eligible organizations on behalf of PIsYesYesYesYes
Special Procedures or Approvals?YesYesYesYesNo

Ideally these policies and procedures, and others like them, would be replaced by a common, post-award process. PIs or their institutions would simply submit the identifying information on the grant that had been awarded and the needs for Accommodations and Accessibility to support team members with disabilities at any time during the grant period.

Plan of Action

The OSTP, possibly in a National Science and Technology Council interworking group process,, should conduct an internal review of the A&A policies and procedures  for grant programs from federal scientific research aligned agencies. This could be led by OSTP directly or under their auspices and led by either NSF or the National Institute of Health (NIH).  Participants would be relevant personnel from DOE, DOD, NASA, USDA, EPA, NOAA, NIST and HHS, at minimum. The goal should be to create a draft of a single, streamlined policy and process, post-award, for all federal grant programs or a new section in the uniform guidance for federal funding agencies.

There should be an analysis of the percentages, size and amounts of awards currently being made to support A&A in research funding grant programs. It’s not clear how the various funding ranges and caps listed in the table above were determined or if they meet the needs. One goal of this analysis would be to determine how well current needs within and across agencies are being met and what future needs might be. 

A second goal would be to look at the level of duplication of effort and scope of manpower savings that might be attained by moving to a single, streamlined policy. This might be a coordinated process between OMB and OSTP or a separate one done by OMB. No matter how it is coordinated, an understanding of these issues should inform whatever new policies or new additions to 2 CFR 200 would emerge. 

A third goal of this evaluation could be to consider if the support for A&A post-award funding might best be served by a single entity across all federal grants, consolidating the personnel expertise and policy and process recommendations in one place. It would be a significant change, and could require an act of Congress to achieve, but from the point of view of the authors it might be the most efficient way to serve grantees who are PWDs. 

Once the initial reviews as described above, or a similar process is completed, the next step should be a convening of stakeholders outside of the federal government with the purpose of providing input to the streamlined draft policy. These stakeholder entities could include, but should not be limited to, the National Association for the Deaf, The American Foundation for the Blind, The American Association of People with Disabilities and the American Diabetes Association. One of the goals of that convening should be a discussion, and decision, as to whether a period of public comment should be put in place as well, before the new policy is adopted. 

Conclusion

The above plan of action should be pursued so that more PWDS will be able to participate, or have their participation improved, in federally funded research. A policy like the one described above lays the groundwork and provides a more level playing field for Open Science to become more accessible and accommodating.It also opens the door for streamlined processes, reduced duplication of effort and greater efficiency within the engine of Federal Science support.

Acknowledgments 

The roots of this effort began when the author and Dr. Mel Chua and Stephen Jacobs received funding for their research as part of the first Critical Digital Infrastructure research cohort and were able to negotiate for accessibility support services outside their award. Those who provided input on the position paper this was based on are: 

This action-ready policy memo is part of Day One 2025 — our effort to bring forward bold policy ideas, grounded in science and evidence, that can tackle the country’s biggest challenges and bring us closer to the prosperous, equitable and safe future that we all hope for whoever takes office in 2025 and beyond.

Frequently Asked Questions
Why are conferences and convenings included in the table above?

Based on the percentage of PWDs in the general population size, conference funders should assume that some of their presenters or attendees will need accommodations. Funding from federal agencies should be made available to provide an initial minimum-level of support for necessary A & A. The event organizers should be able to apply for additional support above the minimum level if needed, provided participant requests are made within a stated time before the event. For example, a stipulated deadline of six weeks before the event to request supplemental accommodation, so that the organizers can acquire what’s needed within thirty days of the event.

Are accommodations different for conferences and convenings?

Yes, in several ways. In general, most of the support needed for these is in service provision vs. hardware/software procurement. However, understanding the breadth and depth of issues surrounding human services support is more complex and outside the experience of most PIs running a conference in their own scientific discipline.


Again, using the example of DHH researchers who are attending a conference. A conference might default to providing a team of two interpreters during the conference sessions, as two per hour is the standard used. Should a group of DHH researchers attend the conference and wish to go to different sessions or meetings during the same convening, the organizers may not have provided enough interpreters to support those opportunities.


By providing interpretation for formal sessions only, DHH attendees are excluded from a key piece of these events, conversations outside of scheduled sessions. This applies to both formally planned and spontaneous ones. They might occur before, during, or after official sessions, during a meal offsite, etc. Ideally interpreters would be provided for these as well.


These issues, and others related to other groups of PWDs, are beyond the experience of most PIs who have received event funding.

Are there existing guides or other publications to support convenings PIs?

There are some federal agency guides produced for addressing interpreting and other concerns, such as the “Guide to Developing a Language Access Plan” Center for Medicare and Medicaid Services (CMS). These are often written to address meeting needs of full-time employees on site in office settings. These generally cover various cases not needed by a conference convener and may not address what they need for their specific use case. It might be that the average conference chair and their logistics committee is a simply stated set of guidelines to address their short-term needs for their event. Additionally, a directory of where to hire providers with the appropriate skill sets and domain knowledge to meet the needs of PWDs attending their events would be an incredible aid to all concerned.

How could these needs be addressed?

The policy review process outlined above should include research to determine a base level of A & A support for conferences. They might recommend a preferred federal guide to these resources or identify an existing one.

Incorporate open science standards into the identification of evidence-based social programs

Evidence-based policy uses peer-reviewed research to identify programs that effectively address important societal issues. For example, several agencies in the federal government run clearinghouses that review and assess the quality of peer-reviewed research to identify programs with evidence of effectiveness. However, the replication crisis in the social and behavioral sciences raises concerns that research publications may contain an alarming rate of false positives (rather than true effects), in part due to selective reporting of positive results. The use of open and rigorous practices — like study registration and availability of replication code and data — can ensure that studies provide valid information to decision-makers, but these characteristics are not currently collected or incorporated into assessments of research evidence. 

To rectify this issue, federal clearinghouses should incorporate open science practices into their standards and procedures used to identify evidence-based social programs eligible for federal funding.

Details

The federal government is increasingly prioritizing the curation and use of research evidence in making policy and supporting social programs. In this effort, federal evidence clearinghouses—influential repositories of evidence on the effectiveness of programs—are widely relied upon to assess whether policies and programs across various policy sectors are truly “evidence-based.” As one example, the Every Student Succeeds Act (ESSA) directs states, districts, and schools to implement programs with research evidence of effectiveness when using federal funds for K-12 public education; the What Works Clearinghouse—an initiative of the U.S. Department of Education—identifies programs that meet the evidence-based funding requirements of the ESSA. Similar mechanisms exist in the Departments of Health and Human Services (the Prevention Services Clearinghouse and the Pathways to Work Evidence Clearinghouse), Justice (CrimeSolutions), and Labor (the Clearinghouse for Labor and Evaluation Research). Consequently, clearinghouse ratings have the potential to influence the allocation of billions of dollars appropriated by the federal government for social programs. 

Clearinghouses generally follow explicit standards and procedures to assess whether published studies used rigorous methods and reported positive results on outcomes of interest. Yet this approach rests on assumptions that peer-reviewed research is credible enough to inform important decisions about resource allocation and is reported accurately enough for clearinghouses to distinguish which reported results represent true effects likely to replicate at scale. Unfortunately, published research often contains results that are wrong, exaggerated, or not replicable. The social and behavioral sciences are experiencing a replication crisis as a result of numerous large-scale collaborative efforts that had difficulty replicating novel findings in published peer-reviewed research. This issue is partly attributed to closed scientific workflows, which hinder reviewers’ and evaluators’ attempts to detect issues that negatively impact the validity of reported research findings—such as undisclosed multiple hypothesis testing and the selective reporting of results.

Research transparency and openness can mitigate the risk of informing policy decisions on false positives. Open science practices like prospectively sharing protocols and analysis plans, or releasing code and data required to replicate key results, would allow independent third parties such as journals and clearinghouses to fully assess the credibility and replicability of research evidence. Such openness in the design, execution, and analysis of studies on program effectiveness is paramount to increasing public trust in the translation of peer-reviewed research into evidence-based policy.

Currently, standards and procedures to measure and encourage open workflows—and facilitate detection of detrimental practices in the research evidence—are not implemented by either clearinghouses or the peer-reviewed journals publishing the research on program effectiveness that clearinghouses review. When these practices are left unchecked, incomplete, misleading, or invalid research evidence may threaten the ability of evidence-based policy to live up to its promise of producing population-level impacts on important societal issues.

Recommendations

Policymakers should enable clearinghouses to incorporate open science into their standards and procedures used to identify evidence-based social programs eligible for federal funding, and increase the funds appropriated to clearinghouse budgets to allow them to take on this extra work. There are several barriers to clearinghouses incorporating open science into their standards and procedures. To address these barriers and facilitate implementation, we recommend that:

  1. Dedicated funding should be appropriated by Congress and allocated by federal agencies to clearinghouse budgets so they can better incorporate the assessment of open science practices into research evaluation.
    • Funding should facilitate the hiring of additional personnel dedicated to collecting data on whether open science practices were used—and if so, whether they were used well enough to assess the comprehensive of reporting (e.g., checking publications on results with prospective protocols) and reproducibility of results (e.g., rerunning analyses using study data and code).
  2. The Office of Management and Budget should establish a formal mechanism for federal agencies that run clearinghouses to collaborate on shared standards and procedures for reviewing open science practices in program evaluations. For example, an interagency working group can develop and implement updated standards of evidence that include assessment of open science practices, in alignment with the Transparency and Openness Promotion (TOP) Guidelines for Clearinghouses.
  3. Once funding, standards, and procedures are in place, federal agencies sponsoring clearinghouses should create a roadmap for eventual requirements on open science practices in studies on program effectiveness.
    • Other open science initiatives targeting researchers, research funders, and journals are increasing the prevalence of open science practices in newly published research. As open science practices become more common, agencies can introduce requirements on open science practices for evidence-based social programs, similar to research transparency requirements implemented by the Department of Health and Human Services for the marketing and reimbursement of medical interventions. 
    • For example, evidence-based funding mechanisms often have several tiers of evidence to distinguish the level of certainty that a study produced true results. Agencies with tiered-evidence funding mechanisms can begin by requiring open science practices in the highest tier, with the long-term goal of requiring a program meeting any tier to be based on open evidence.

Conclusion

The momentum from the White House’s 2022 Year of Evidence for Action and 2023 Year of Open Science provides an unmatched opportunity for connecting federal efforts to bolster the infrastructure for evidence-based decision-making with federal efforts to advance open research. Evidence of program effectiveness would be even more trustworthy if favorable results were found in multiple studies that were registered prospectively, reported comprehensively, and computationally reproducible using open data and code. With policymaker support, incorporating these open science practices in clearinghouse standards for identifying evidence-based social programs is an impactful way to connect these federal initiatives that can increase the trustworthiness of evidence used for policymaking.

To learn more about the importance of opening science and to read the rest of the published memos, visit the Open Science Policy sprint landing page.

Develop a Digital Technology Fund to secure and sustain open source software

Open source software (OSS) is a key part of essential digital infrastructure. Recent estimates indicate that 95% of all software relies upon open source, with about 75% of the code being directly open source. Additionally, as our science and technology ecosystem becomes more networked, computational, and interdisciplinary, open source software will increasingly be the foundation on which our discoveries and innovations rest.

However, there remain important security and sustainability issues with open source software, as evidenced by recent incidents such as the Log4j vulnerability that affected millions of systems worldwide.

To better address security and sustainability of open source software, the United States should establish a Digital Technology Fund through multi-stakeholder participation.

Details

Open source software — software whose source code is publicly available and can be modified, distributed, and reused by anyone — has become ubiquitous. OSS offers myriad benefits, including fostering collaboration, reducing costs, increasing efficiency, and enhancing interoperability. It also plays a key role in U.S. government priorities: federal agencies increasingly create and procure open source software by default, an acknowledgement of its technical benefits as well as its value to the public interest, national security, and global competitiveness.

Open source software’s centrality in the technology produced and consumed by the federal government, the university sector, and the private sector highlights the pressing need for these actors to coordinate on ensuring its sustainability and security. In addition to fostering more robust software development practices, raising capacity, and developing educational programs, there is an urgent need to invest in individuals who create and maintain critical open source software components, often without financial support. 

The German Sovereign Tech Fund — launched in 2021 to support the development and maintenance of open digital infrastructure — recently announced such support for the maintainers of Log4j, thereby bolstering its prospects for timely, secure production and sustainability. Importantly, this is one example of numerous that require similar support. Cybersecurity and Infrastructure Security (CISA)’s director Jen Easterly has affirmed the importance of OSS while noting its security vulnerabilities as a national security concern. Easterly rightly called upon moving the responsibility and support for critical OSS components away from individuals to the organizations that benefit from those individuals’ efforts.

Recommendations

To address these challenges, the United States should establish a Digital Technology Fund to provide direct and indirect support to OSS projects and communities that are essential for the public interest, national security, and global competitiveness. The Digital Technology Fund would be funded by a coalition of federal, private, academic, and philanthropic stakeholders and would be administered by an independent nonprofit organization.

To better understand the risks and opportunities:

To encourage multi-stakeholder participation and support

To launch the Digital Tech Fund:

The realized and potential impact of open source software is transformative in terms of next-generation infrastructure, innovation, workforce development, and artificial intelligence safety. The Digital Tech Fund can play an essential and powerful role in raising our collective capacity to address important security and sustainability challenges by acknowledging and supporting the pioneering individuals who are advancing open source software.

To learn more about the importance of opening science and to read the rest of the published memos, visit the Open Science Policy sprint landing page.

Advance open science through robust data privacy measures

In an era of accelerating advancements in data collection and analysis, realizing the full potential of open science hinges on balancing data accessibility and privacy. As we move towards a more open scientific environment, the volume of sensitive data being shared is swiftly increasing. While open science presents an opportunity to fast-track scientific discovery, it also poses a risk to privacy if not managed correctly.

Building on existing data and privacy efforts, the White House and federal science agencies should collaborate to develop and implement clear standards for research data privacy across the data management and sharing life cycle.

Details

Federal agencies’ open data initiatives are a milestone in the move towards open science. They have the potential to foster greater collaboration, transparency, and innovation in the U.S. scientific ecosystem and lead to a new era of discovery. However, a shift towards open data also poses challenges for privacy, as sharing research data openly can expose personal or sensitive information when done without the appropriate care, methods, and tools. Addressing this challenge requires new policies and technologies that allow for open data sharing while also protecting individual privacy.

The U.S. government has shown a strong commitment to addressing data privacy challenges in various scientific and technological contexts. This commitment is underpinned by laws and regulations such as the Health Insurance Portability and Accountability Act and the regulations for human subjects research (e.g., Code of Federal Regulations Title 45, Part 46). These regulations provide a legal framework for protecting sensitive and identifiable information, which is crucial in the context of open science.

The White House Office of Science and Technology Policy (OSTP) has spearheaded the “National Strategy to Advance Privacy-Preserving Data Sharing and Analytics,” aiming to further the development of these technologies to maximize their benefits equitably, promote trust, and mitigate risks. The National Institutes of Health (NIH) operate an internal Privacy Program, responsible for protecting sensitive and identifiable information within NIH work. The National Science Foundation (NSF) complements these efforts with a multidisciplinary approach through programs like the Secure and Trustworthy Cyberspace program, aiming to develop new ways to design, build, and operate cyber systems, protect existing infrastructure, and motivate and educate individuals about cybersecurity.

Given the unique challenges within the open science context and the wide reach of open data initiatives across the scientific ecosystem, there remains a need for further development of clear policies and frameworks that protect privacy while also facilitating the efficient sharing of scientific data. Coordinated efforts across the federal government could ensure these policies are adaptable, comprehensive, and aligned with the rapidly evolving landscape of scientific research and data technologies.

Recommendations

To clarify standards and best practices for research data privacy:

To ensure best practices are used in federally funded research:

To catalyze continued improvements in data privacy technologies:

To facilitate inter-agency coordination:

To learn more about the importance of opening science and to read the rest of the published memos, visit the Open Science Policy sprint landing page.

Incorporate open source hardware into Patent and Trademark Office search locations for prior art

Increasingly, scientific innovations reside outside the realm of papers and patents. This is particularly true for open source hardware — hardware designs made freely and publicly available for study, modification, distribution, production, and sale. The shift toward open source aligns well with the White House’s 2023 Year of Open Science and can advance the accessibility and impact of federally funded hardware. Yet as the U.S. government expands its support for open science and open source, it will be increasingly vital that our intellectual property (IP) system is designed to properly identify and protect open innovations. Without consideration of open source hardware in prior art and attribution, these public goods are at risk of being patented over and having their accessibility lost.

Organizations like the Open Source Hardware Association (OSHWA) — a standards body for open hardware — provide verified databases of open source innovations. Over the past six years, for example, OSHWA’s certification program has grown to over 2600 certifications, and the organization has offered educational seminars and training. Despite the availability of such resources, open source certifications and resources have yet to be effectively incorporated into the IP system.

We recommend that the United States Patent and Trademark Office (USPTO) incorporate open source hardware certification databases into the library of resources to search for prior art, and create guidelines and training to build agency capacity for evaluating open source prior art.

Details

Innovative and important hardware products are increasingly being developed as open source, particularly in the sciences, as academic and government research moves toward greater transparency. This trend holds great promise for science and technology, as more people from more backgrounds are able to replicate, improve, and share hardware. A prime example is the 3D printing industry. Once foundational patents in 3D printing were released, there was an explosion of invention in the field that led to desktop and consumer 3D printers, open source filaments, and even 3D printing in space. 

For these benefits to be more broadly realized across science and technology, open source hardware must be acknowledged in a way that ensures scientists will have their contributions found and respected by the IP system’s prior art process. Scientists building open source hardware are rightfully concerned their inventions will be patented over by someone else. Recently, a legal battle ensued from open hardware being wrongly patented over. While the patent was eventually overturned, it took time and money, and revealed important holes in the United States’ prior art system. As another example, the Electronic Frontier Foundation found 30+ pieces of prior art that the ArrivalStar patent was violating. 

Erroneous patents can harm the validity of open source and limit the creation and use of new open source tools, especially in the case of hardware, which relies on prior art as its main protection. The USPTO — the administrator of intellectual property protection and a key actor in the U.S. science and technology enterprise — has an opportunity to ensure that open source tools are reliably identified and considered. Standardized and robust incorporation of open source innovations into the U.S. IP ecosystem would make science more reproducible and ensure that open science stays open, for the benefits of rapid improvement, testing, citizen science, and general education. 

Recommendations 

We recommend that the USPTO incorporate open source hardware into prior art searches and take steps to develop education and training to support the protection of open innovation in the patenting process.

Incorporation of open hardware into prior art searches will signify the importance and consideration of open source within the IP system. These actions have the potential to improve the efficiency of prior art identification, advance open source hardware by assuring institutional actors that open innovations will be reliably identified and protected, and ensure open science stays open.

To learn more about the importance of opening science and to read the rest of the published memos, visit the Open Science Policy sprint landing page.

Improve research through better data management and sharing plans

The United States government spends billions of dollars every year to support the best scientific research in the world. The novel and multidisciplinary data produced by these investments have historically remained unavailable to the broader scientific community and the public. This limits researchers’ ability to synthesize knowledge, make new discoveries, and ensure the credibility of research. But recent guidance from the Office of Science and Technology Policy (OSTP) represents a major step forward for making scientific data more available, transparent, and reusable. 

Federal agencies should take coordinated action to ensure that data sharing policies created in response to the 2022 Nelson memo incentivize high-quality data management and sharing plans (DMSPs), include robust enforcement mechanisms, and implement best practices in supporting a more innovative and credible research culture. 

Details

The 2022 OSTP memorandum “Ensuring Free, Immediate, and Equitable Access to Federally Funded Research” (the Nelson memo) represents a significant step toward opening up not only the findings of science but its materials and processes as well. By including data and related research outputs as items that should be publicly accessible, defining “scientific data” to include “material… of sufficient quality to validate and replicate research findings” (emphasis added), and specifying that agency plans should cover “scientific data that are not associated with peer-reviewed scholarly publications,” this guidance has the potential to greatly improve the transparency, equity, rigor, and reusability of scientific research.

Yet while the 2022 Nelson memo provides a crucial foundation for open, transparent, and reusable scientific data, preliminary review of agency responses reveals considerable variation on how access to data and research outputs will be handled. Agencies vary by the degree to which policies will be reviewed and enforced and by the degree of specificity by which they define data as being materials needed to “validate and replicate” research findings. Finally, they could and should go further in including plans to fully support a research ecosystem that supports cumulative scientific evidence by enabling the accessibility, discoverability, and citation of researchers’ data sharing plans themselves.

Recommendations 

To better incentivize quality and reusability in data sharing, agencies should: 

To better ensure compliance and comprehensive availability, agencies should:

Updates to the Center for Open Science’s efforts to track, curate, and recommend best practices in implementing the Nelson memo will be disseminated through publication and through posting on our website at https://www.cos.io/policy-reform.

To learn more about the importance of opening science and to read the rest of the published memos, visit the Open Science Policy sprint landing page.

Support scientific software infrastructure by requiring SBOMs for federally funded research

Federally funded research relies heavily on software. Despite considerable evidence demonstrating software’s crucial role in research, there is no systematic process for researchers to acknowledge its use, and those building software lack recognition for their work. While researchers want to give appropriate acknowledgment for the software they use, many are unsure how to do so effectively. With greater knowledge of what software is used in research underlying publications, federal research funding agencies and researchers themselves will better be able to make efficient funding decisions, enhance the sustainability of software infrastructure, identify vital yet often overlooked digital infrastructure, and inform workforce development.

All agencies that fund research should require that resulting publications include a Software Bill of Materials (SBOM) listing the software used in the research.

Details

Software is a cornerstone in research. Evidence from numerous surveys consistently shows that a majority of researchers rely heavily on software. Without it, their work would likely come to a standstill. However, there is a striking contrast between the crucial role that software plays in modern research and our knowledge of what software is used, as well as the level of recognition it receives. To bridge this gap, we propose policies to properly acknowledge and support the essential software that powers research across disciplines.

Software citation is one way to address these issues, but citation alone is insufficient as a mechanism to generate software infrastructure insights. In recent years, there has been a push for the recognition of software as a crucial component of scholarly publications, leading to the creation of guidelines and specialized journals for software citation. However, software remains under-cited due to several challenges, including friction with journals’ reference list standards, confusion regarding which or when software should be cited, and opacity of the roles and dependencies among cited software. Therefore, we need a new approach to this problem.

A Software Bill of Materials (SBOM) is a list of the software components that were used in an effort, such as building application software. Executive Order 14028 requires that all federal agencies obtain SBOMs when they purchase software. For this reason, many high-quality open-source SBOM tools already exist and can be straightforwardly used to generate descriptions of software used in research.  

SBOM tools can identify and list the stack of software underlying each publication, even when the code itself is not openly shared. If we were able to combine software manifests from many publications together, we would have the insights needed to better advance research. SBOM data can help federal agencies find the right mechanism (funding, in-kind contribution of time) to sustain software critical to their missions. Better knowledge about patterns of software use in research can facilitate better coordination among developers and reduce friction in their development roadmaps. Understanding the software used in research will also promote public trust in government-funded research through improved reproducibility.

Recommendation

We recommend the adoption of Software Bills of Materials (SBOMs) — which are already used by federal agencies for security reasons — to understand the software infrastructure underlying scientific research. Given their mandatory use for software suppliers to the federal government, SBOMs are ideal for highlighting software dependencies and potential security vulnerabilities. The same tools and practices can be used to generate SBOMs for publications. We, therefore, recommend that all agencies that fund research should require resulting publications to include an SBOM listing the software used in the research. Additionally, for research that has already been published with supplementary code materials, SBOMs should be generated retrospectively. This will not only address the issue of software infrastructure sustainability but also enhance the verification of research by clearly documenting the specific software versions used and directing limited funds to software maintenance that most need it.

  1. The Office of Science and Technology Policy (OSTP) should coordinate with agencies to undertake feasibility studies of this policy, building confidence that it would work as intended.
    1. Coordination should include funding agencies, federal actors currently applying SBOMs in software procurement, organizations developing SBOM tools and standards,  and scientific stakeholders.
  2. Based on the results of the study, OSTP should direct funding agencies to design and implement policies requiring that publications resulting from federal funding include an openly accessible, machine-readable SBOM for the software used in the research.
  1. OSTP and the Office of Management and Budget should additionally use the Multi-Agency Research and Development Budget Priorities to encourage agencies’ collection, integration, and analysis of SBOM data to inform funding and workforce priorities and to catalyze additional agency resource allocations for software infrastructure assessment in follow-on budget processes.

To learn more about the importance of opening science and to read the rest of the published memos, visit the Open Science Policy sprint landing page.

Create an Office of Co-Production at the National Institutes of Health

The National Institutes of Health (NIH) spent $49 billion in fiscal year 2023 on research and development, a significant annual investment in medical treatment discovery and development. Despite NIH’s research investments producing paradigm-shifting therapies, such as CAR-T cancer treatments, CRISPR-enabled gene therapy for sickle cell, and the mRNA vaccine for COVID-19, the agency and medical scientists more broadly are grappling with declining trust. This further compounds decades-long mistrust in medical research by marginalized populations, whom researchers struggle to recruit as participants in medical research. If things do not improve, a lack of representation may lead to lack of access to effective medical interventions, worsen health disparities, and cost hundreds of billions of dollars.

A new paradigm for research is needed to ensure meaningful public engagement and rebuild trust. Co-production —in which researchers, patients, and practitioners work together as collaborators — offers a framework for embedding collaboration and trust into the biomedical enterprise.

The National Institutes of Health should form an Office of Co-Production in the Office of the Director, Division of Program Coordination, Planning, and Strategic Initiatives.

Details

In accordance with Executive Order 13985 and ongoing public access initiatives, science funding and R&D agencies have been seeking ways to embed equity, accessibility, and public participation into their processes. The NIH has been increasingly working to advance publicly engaged and led research, illustrated by trainings and workshops around patient-engaged research, funding resources for community partnerships like RADx Underserved Populations, community-led research programs like Community Partnerships to Advance Science for Society (ComPASS), and support from the new NIH director. 

To ensure that public engagement efforts are sustainable, it is critical to invest in lasting infrastructure capable of building and maintaining these ties. Indeed, in their Recommendation on Open Science, the United Nations Educational, Scientific, and Cultural Organization outlined infrastructure that must be built for scientific funding to include those beyond STEMM practitioners in research decision-making. One key approach involves explicitly supporting the co-production of research, a process by which “researchers, practitioners and the public work together, sharing power and responsibility from the start to the end of the project, including the generation of knowledge.”

Co-production provides a framework with which the NIH can advance patient involvement in research, health equity, uptake and promotion of new technologies, diverse participation in clinical trials, scientific literacy, and public health. Doing so effectively would require new models for including and empowering patient voices in the agency’s work. 

Recommendations

The NIH should create an Office of Co-Production within the Office of the Director, Division of Program Coordination, Planning, and Strategic Initiatives (DPCPSI). The Center for Co-Production would institutionalize best practices for co-producing research, train NIH and NIH-funded researchers in co-production principles, build patient-engaged research infrastructure, and fund pilot projects to build the research field.

The NIH Office of Co-Production, co-led by patient advocates (PA) and NIH personnel, should be established with the following key programs:

Creating an Office of Co-Production would achieve the following goals: 

Make government-funded hardware open source by default

While scientific publications and data are increasingly made publicly accessible, designs and documentation for scientific hardware — another key output of federal funding and driver of innovation — remain largely closed from view. This status quo can lead to redundancy, slowed innovation, and increased costs. Existing standards and certifications for open source hardware provide a framework for bringing the openness of scientific tools in line with that of other research outputs. Doing so would encourage the collective development of research hardware, reduce wasteful parallel creation of basic tools, and simplify the process of reproducing research. The resulting open hardware would be available to the public, researchers, and federal agencies, accelerating the pace of innovation and ensuring that each community receives the full benefit of federally funded research. 

Federal grantmakers should establish a default expectation that hardware developed as part of federally supported research be released as open hardware. To retain current incentives for translation and commercialization, grantmakers should design exceptions to this policy for researchers who intend to patent their hardware. 

Details

Federal funding plays an important role in setting norms around open access to research. The White House Office of Science and Technology Policy (OSTP)’s recent Memorandum Ensuring Free, Immediate, and Equitable Access to Federally Funded Research makes it clear that open access is a cornerstone of a scientific culture that values collaboration and data sharing. OSTP’s recent report on open access publishing further declares that “[b]road and expeditious sharing of federally funded research is fundamental for accelerating discovery on critical science and policy questions.”

These efforts have been instrumental in providing the public with access to scientific papers and data — two of the foundational outputs of federally funded research. Yet hardware, another key input and output of science and innovation, remains largely hidden from view. To continue the move towards an accessible, collaborative, and efficient scientific enterprise, public access policies should be expanded to include hardware. Specifically, making federally funded hardware open source by default would have a number of specific and immediate benefits: 

Reduce Wasteful Reinvention. Researchers are often forced to develop testing and operational hardware that supports their research. In many cases, unbeknownst to those researchers, this hardware has already been developed as part of other projects by other researchers in other labs. However, since that original hardware was not openly documented and licensed, subsequent researchers are not able to learn from and build upon this previous work. The lack of open documentation and licensing is also a barrier to more intentional, collaborative development of standardized testing equipment for research. 

Increase Access to Information. As the OSTP memo makes clear, open access to federally funded research allows all Americans to benefit from our collective investment. This broad and expeditious sharing strengthens our ability to be a critical leader and partner on issues of open science around the world. Immediate sharing of research results and data is key to ensuring that benefit. Explicit guidance on sharing the hardware developed as part of that research is the next logical step towards those goals. 

Alternative Paths to Recognition. Evaluating a researcher’s impact often includes an assessment of the number of patents they can claim. This is in large part because patents are easy to quantify. However, this focus on patents creates a perverse incentive for researchers to erect barriers to follow on study even if they have no intention of using patents to commercialize their research. Encouraging researchers to open source the hardware developed as part of their research creates an alternative path to evaluate their impact, especially as those pieces of open source hardware are adopted and improved by others. Uptake of researchers’ open hardware could be included in assessments on par with any patented work. This path recognizes the contribution to a collective research enterprise. 

Verifiability. Open access to data and research are important steps towards allowing third parties to verify research conclusions. However, these tools can be limited if the hardware used to generate the data and produce the research are not themselves open. Open sourcing hardware simplifies the process of repeating studies under comparable conditions, allowing for third-party validation of important conclusions.

Recommendations 

Federal grantmaking agencies should establish a default presumption that recipients of research funds make hardware developed with those funds available on open terms. This policy would apply to hardware built as part of the research process, as well as hardware that is part of the final output. Grantees should be able to opt out of this requirement with regards to hardware that is expected to be patented; such an exception would provide an alternative path for researchers to share their work without undermining existing patent-based development pathways. 

To establish this policy, OSTP should conduct a study and produce a report on the current state of federally funded scientific hardware and opportunities for open source hardware policy.

The Office of Management and Budget (OMB) should issue a memorandum establishing a policy on open source hardware in federal research funding. The memorandum should include:

Conclusion

The U.S. government and taxpayers are already paying to develop hardware created as part of research grants. In fact, because there is not currently an obligation to make that hardware openly available, the federal government and taxpayers are likely paying to develop identical hardware over and over again. 

Grantees have already proven that existing open publication and open data obligations promote research and innovation without unduly restricting important research activities. Expanding these obligations to include the hardware developed under these grants is the natural next step.

To learn more about the importance of opening science and to read the rest of the published memos, visit the Open Science Policy sprint landing page.

Promoting reproducible research to maximize the benefits of government investments in science

Scientific research is the foundation of progress, creating innovations like new treatments for melanoma and providing behavioral insights to guide policy in responding to events like the COVID-19 pandemic. This potential for real-world impact is best realized when research is rigorous, credible, and subject to external confirmation. However, evidence suggests that, too often, research findings are not reproducible or trustworthy, preventing policymakers, practitioners, researchers, and the public from fully capitalizing on the promise of science to improve social outcomes in domains like health and education.

To build on existing federal efforts supporting scientific rigor and integrity, funding agencies should study and pilot new programs to incentivize researchers’ engagement in credibility-enhancing practices that are presently undervalued in the scientific enterprise.

Details

Federal science agencies have a long-standing commitment to ensuring the rigor and reproducibility of scientific research for the purposes of accelerating discovery and innovation, informing evidence-based policymaking and decision-making, and fostering public trust in science. In the past 10 years alone, policymakers have commissioned three National Academies reports, a Government Accountability Office (GAO) study, and a National Science and Technology Council (NSTC) report exploring these and related issues. Unfortunately, flawed, untrustworthy, and potentially fraudulent studies continue to affect the scientific enterprise.

The U.S. government and the scientific community have increasingly recognized that open science practices — like sharing research code and data, preregistering study protocols, and supporting independent replication efforts — hold great promise for ensuring the rigor and replicability of scientific research. Many U.S. science agencies have accordingly launched efforts to encourage these practices in recent decades. Perhaps the most well-known example is the creation of clinicaltrials.gov and the requirements that publicly and privately funded trials be preregistered (in 2000 and 2007, respectively), leading, in some cases, to fewer trials reporting positive results. 

More recent federal actions have focused on facilitating sharing of research data and materials and supporting open science-related education. These efforts seek to build on areas of consensus given the diversity of the scientific ecosystem and the resulting difficulty of setting appropriate and generalizable standards for methodological rigor. However, further steps are warranted. Many key practices that could enhance the government’s efforts to increase the rigor and reproducibility of scientific practice — such as the preregistration of confirmatory studies and replication of influential or decision-relevant findings — remain far too rare. A key challenge is the weak incentive to engage in these practices. Researchers perceive them as costly or undervalued given the professional rewards created by the current funding and promotion system, which encourages exploratory searches for new “discoveries” that frequently fail to replicate. Absent structural change to these incentives, uptake is likely to remain limited.

Recommendations

To fully capitalize on the government’s investments in education and infrastructure for open science, we recommend that federal funding agencies launch pilot initiatives to incentivize and reward researchers’ pursuit of transparent, rigorous, and public good-oriented practices. Such efforts could enhance the quality and impact of federally funded research at relatively low cost, encourage alignment of priorities and incentive structures with other scientific actors, and help science and scientists better deliver on the promise of research to benefit society. Specifically, NIH and NSF should: 

Establish discipline-specific offices to launch initiatives around rigor and reproducibility

Incorporate assessments of transparent and credible research methods into their learning agendas

Expand support for third-party replications

To learn more about the importance of opening science and to read the rest of the published memos, visit the Open Science Policy sprint landing page.