Kickstarting Collaborative, AI-Ready Datasets in the Life Sciences with Government-funded Projects

In the age of Artificial Intelligence (AI), large high-quality datasets are needed to move the field of life science forward. However, the research community lacks strategies to incentivize collaboration on high-quality data acquisition and sharing. The government should fund collaborative roadmapping, certification, collection, and sharing of large, high-quality datasets in life science. In such a system, nonprofit research organizations engage scientific communities to identify key types of data that would be valuable for building predictive models, and define quality control (QC) and open science standards for collection of that data. Projects are designed to develop automated methods for data collection, certify data providers, and facilitate data collection in consultation with researchers throughout various scientific communities. Hosting of the resulting open data is subsidized as well as protected by security measures. This system would provide crucial incentives for the life science community to identify and amass large, high-quality open datasets that will immensely benefit researchers.

Challenge and Opportunity 

Life science has left the era of “one scientist, one problem.” It is becoming a field wherein collaboration on large-scale research initiatives is required to make meaningful scientific progress. A salient example is Alphafold2, a machine learning (ML) model that was the first to predict how a protein will fold with an accuracy meeting or exceeding experimental methods. Alphafold2 was trained on the Protein Data Bank (PDB), a public data repository containing standardized and highly curated results of >200,000 experiments collected over 50 years by thousands of researchers.

Though such a sustained effort is laudable, science need not wait another 50 years for the ‘next PDB’. If approached strategically and collaboratively, the data necessary to train ML models can be acquired more quickly, cheaply, and reproducibly than efforts like the PDB through careful problem specification and deliberate management. First, by leveraging organizations that are deeply connected with relevant experts, unified projects taking this approach can account for the needs of both the people producing the data and those consuming it. Second, by centralizing plans and accountability for data and metadata standards, these projects can enable rigorous and scalable multi-site data collection. Finally, by securely hosting the resulting open data, the projects can evaluate biosecurity risk and provide protected access to key scientific data and resources that might otherwise be siloed in industry. This approach is complementary to efforts that collate existing data, such as the Human Cell Atlas and UCSD Genome Browser, and satisfy the need for new data collection that adheres to QC and metadata standards.

In the past, mid-sized grants have allowed multi-investigator scientific centers like the recently funded Science and Technology Center for Quantitative Cell Biology (QCB, $30M in funding 2023) to explore many areas in a given field. Here, we outline how the government can expand upon such schemes to catalyze the creation of impactful open life science data. In the proposed system, supported projects would allow well-positioned nonprofit organizations to facilitate distributed, multidisciplinary collaborations that are necessary for assembling large, AI-ready datasets. This model would align research incentives and enable life science to create the ‘next PDBs’ faster and more cheaply than before.  

Plan of Action 

Existing initiatives have developed processes for creating open science data and successfully engaged the scientific community to identify targets for the ‘next PDB’ (e.g., Chan Zuckerberg Initiative’s Open Science program, Align’s Open Datasets Initiative). The process generally occurs in five steps:

  1. A multidisciplinary set of scientific leaders identify target datasets, assessing the scale of data required and the potential for standardization, and defining standards for data collection methods and corresponding QC metrics.
  2. Collaboratively develop and certify methods for data acquisition to de-risk the cost-per-datapoint and utility of the data.
  3. Data collection methods are onboarded at automation partner organizations, such as NSF BioFoundries and existing National Labs, and these automation partners are certified to meet the defined data collection standards and QC metrics.
  4. Scientists throughout the community, including those at universities and for-profit companies, can request data acquisition, which is coordinated, subsidized, and analyzed for quality.
  5. Data becomes publicly available and is hosted in a stable, robustly maintained database with biosecurity, cybersecurity, and privacy measures in perpetuity for researchers to access. 

The U.S. Government should adapt this process for collaborative, AI-ready data collection in the life sciences by implementing the following recommendations:  

Recommendation 1. An ARPA-like agency — or agency division — should launch a Collaborative, AI-Ready Datasets program to fund large-scale dataset identification and collection.

This program should be designed to award two types of grants:

  1. A medium-sized “phase 1” award of $1-$5m to fund new dataset identification and certification. To date, roadmapping dataset concepts (Steps 1-2 above) has been accomplished by small-scale projects of $1-$5M with a community-driven approach. Though selectively successful, these projects have not been as comprehensive or inclusive as they could otherwise be. Government funding could more sustainably and systematically permit iterative roadmapping and certification in areas of strategic importance.
  2. A large “phase 2” award of $10-$50m to fund the collection of previously identified datasets. Currently, there are no funding mechanisms designed to scale up acquisition (Steps #3-4 above) for dataset concepts that have been deemed valuable and derisked. To fill this gap, the government should leverage existing expertise and collaboration across the nonprofit research ecosystem by awarding grants of $10-50m for the coordination, acquisition, and release of mature dataset concepts. The Human Genome project is a good analogy, wherein a dataset concept was identified and collection was distributed amongst several facilities.

Recommendation 2. The Office of Management and Budget should direct the NSF and NIH to develop plans for funding academics and for-profits traunched on data deposition.

Once an open dataset is established, the government can advance the use and further development of that dataset by providing grants to academics that are traunched on data deposition. This approach would be in direct alignment with the government’s goals for supporting open, shared resources for AI innovation as laid out in section 5.2 of the Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence

Agencies’ approaches to meeting this priority could vary. In one scenario, a policy or program could be established in which grantees would use a portion of the funds disbursed to them to pay for open data acquisition at a certified data provider. Analogous structures have enabled scientists to access other types of shared scientific infrastructure, such as the NSF’s ACCESS program. In the same way that ACCESS offers academics access to compute resources, it could be expanded to offer academic access to data acquisition resources at verified facilities. Offering grants in this way would incentivize the scientific community to interact with and expand upon open datasets, as well as encourage compliance through traunching. 

Efforts to support use and development of open, certified datasets could also be incorporated into existing programs, including the National AI Research Resource, for which complementary programs could be developed to provide funding for standardized data acquisition and deposition. Similar ideas could also be incorporated into core programs within NSF and NIH, which already disburse funds after completion of annual progress reports. Such programs could mandate checks for data deposition in these reports.

Conclusion 

Collaborative, AI-Ready datasets would catalyze progress in many areas of life science, but realizing them requires innovative government funding. By supporting coordinated projects that span dataset roadmapping, methods and standards development, partner certification, distributed collection, and secure release on a large scale, the government can coalesce stakeholders and deliver the next generation of powerful predictive models. To do so, it should combine small-sized, mid-sized, and traunched grants in unified initiatives that are orchestrated by nonprofit research organizations, which are uniquely positioned to execute these initiatives end-to-end. These initiatives should balance intellectual property protection and data availability, and thereby help deliver key datasets upon which new scientific insights depend.

This action-ready policy memo is part of Day One 2025 — our effort to bring forward bold policy ideas, grounded in science and evidence, that can tackle the country’s biggest challenges and bring us closer to the prosperous, equitable and safe future that we all hope for whoever takes office in 2025 and beyond.

Frequently Asked Questions
What is involved in roadmapping dataset opportunities?

Roadmapping dataset opportunities, which can take up to a year, requires convening experts across multiple disciplines, including experimental biology, automation, machine learning, and others. In collaboration, these experts assess both the feasibility and impact of opportunities, as well as necessary QC standards. Roadmapping culminates in determination of dataset value — whether it can be used to train meaningful new machine learning models.

Why should data collection be centralized but redundant?

To mitigate single-facility risk and promote site-to-site interoperability, data should be collected across multiple sites. To ensure that standards and organization holds across sites, planning and documentation should be centralized.

How should automation partners be certified?

Automation partners will be evaluated according to the following criteria:



  • Commitment to open science

  • Rigor and consistency in methods and QC procedures

  • Standardization of data and metadata ontologies


More specifically, certification will depend upon the abilities of partners to accommodate standardized ontologies, capture sufficient metadata, and reliably pass data QC checks. It will also require partners to have demonstrated a commitment to data reusability and replicability, and that they are willing to share methods and data in the open science ecosystem.

Should there be an embargo before data is made public?

Today, scientists have no obligation to publish every piece of data they collect. In an Open Data paradigm, all data must eventually be shared. For some types of data, a short, optional embargo period would enable scientists to participate in open data efforts without compromising their ability to file patents or publish papers. For example, in protein engineering, the patentable product is the sequence of a designed protein, making immediate release of data untenable. An embargo period of one to two years is sufficient to alleviate this concern and may even hasten data sharing by linking it to a fixed length of time after collection, rather than to publication. Whether or not an embargo should be implemented and its length should be determined for each data type, and designed to encourage researchers to participate in acquisition of open data.

How do we ensure biosecurity of the data?

Biological data is a strategic resource and requires stewardship and curation to ensure it has maximum impact. Thus, data that is generated through the proposed system should be hosted by high-quality providers that adhere to biosecurity standards and enforce embargo periods. Appropriate biosecurity standards will be specific to different types of data, and should be formulated and periodically reevaluated by a multidisciplinary group of stakeholders. When access to certified, post-embargo data is requested, the same standards will apply as will export controls. In some instances, for some users, restricting access may be reasonable. For offering this suite of valuable services, hosting providers should be subsidized through reimbursements.

From Strategy to Impact: Establishing an AI Corps to Accelerate HHS Transformation

To unlock the full potential of artificial intelligence (AI) within the Department of Health and Human Services (HHS), an AI Corps should be established, embedding specialized AI experts within each of the department’s 10 agencies. HHS is uniquely positioned for—and urgently requires—this investment in AI expertise, as it plays a pivotal role in delivering efficient healthcare to millions of Americans. HHS’s responsibilities intersect with areas where AI has already shown great promise, including managing vast healthcare datasets, accelerating drug development, and combating healthcare fraud. 

Modeled after the success of the Department of Homeland Security (DHS)’s existing AI Corps, this program would recruit top-tier professionals with advanced expertise in AI, machine learning, data science, and data engineering to drive innovation within HHS. While current HHS initiatives like the AI Council and AI Community of Practice provide valuable strategic guidance, they fall short in delivering the on-the-ground expertise necessary for meaningful AI adoption across HHS agencies. The AI Corps would fill this gap, providing the hands-on, agency-level support necessary to move beyond strategy and into the impactful implementation intended by recent federal actions related to AI. 

This memo uses the Food and Drug Administration (FDA) as a case study to demonstrate how an AI Corps member could spearhead advancements within HHS’s agencies. However, the potential benefits extend across the department. For instance, at the Centers for Disease Control and Prevention (CDC), AI Corps experts could leverage machine learning for more precise outbreak modeling, enabling faster, more targeted public health responses. At the National Institutes of Health (NIH), they could accelerate biomedical research through AI-driven analysis of large-scale genomic and proteomic data. Similarly, at the Centers for Medicare and Medicaid Services (CMS), they could improve healthcare delivery by employing advanced algorithms for patient data analytics, predicting patient outcomes, and enhancing fraud detection mechanisms.

Challenge and Opportunity

AI is poised to revolutionize not only healthcare but also the broad spectrum of services under HHS, offering unprecedented opportunities to enhance patient outcomes, streamline administrative processes, improve public health surveillance, and advance biomedical research. Realizing these benefits and defending against potential harms demands the effective implementation and support of AI tools across HHS. The federal workforce, though committed and capable, currently lacks the specialized expertise needed to fully harness AI’s potential, risking a lag in AI adoption that could impede progress.

The public sector is responding well to this opportunity since it is well positioned to attract leading experts to help leverage new technologies. However, for federal agencies, attracting technical experts has been a perennial challenge, resulting in major setbacks in government tech projects: Of government software projects that cost more than $6 million, only 13% succeed

Without introducing a dedicated AI Corps, existing employees—many of whom lack specialized AI expertise—would be required to implement and manage complex AI tools alongside their regular duties. This could lead to the acquisition or development of AI solutions without proper evaluation of their suitability or effectiveness for specific use cases. Additionally, without the necessary expertise to oversee and monitor these systems, agencies may struggle to ensure they are functioning correctly and ethically. As a result, there could be significant inefficiencies, missed opportunities for impactful AI applications, and an increased reliance on external consultants who may not fully understand the unique challenges and needs of each agency. This scenario not only risks undermining the effectiveness of AI initiatives but also heightens the potential for errors, biases, and misuse of AI technologies, ultimately hindering HHS’s mission and objectives.

HHS’s AI Strategy recognizes the need for AI expertise in government; however, its focus has largely been on strategic oversight rather than the operational execution needed on the ground, with the planned establishment of an AI Council and AI Community of Practice prioritizing policy and coordination. While these entities are crucial, they do not address the immediate need for hands-on expertise within individual agencies. This leaves a critical gap in the hands-on expertise required to safely implement AI solutions at the agency level. HHS covers a wide breadth of functions, from administering national health insurance programs like Medicare and Medicaid to conducting advanced biomedical research at the NIH, with each agency facing distinct challenges where AI could provide transformative benefits. However, without dedicated support, AI adoption risks becoming fragmented, underutilized, or ineffective.

For example, at the CDC, AI could significantly improve infectious disease surveillance systems, enabling more timely interventions and enhancing the CDC’s overall preparedness for public health crises, moving beyond traditional methods that often rely on slower, manual analysis. Furthermore, the Administration for Children and Families (ACF) could leverage AI to better allocate resources, improve program outcomes, and support vulnerable populations more effectively. There are great opportunities to use machine learning algorithms to accelerate data processing and discovery in fields such as cancer genomics and personalized medicine. This could help researchers identify new biomarkers, optimize clinical trial designs, and push forward breakthroughs in medical research faster and more efficiently. However, without the right expertise, these game-changing opportunities could not only remain unrealized but also introduce significant risks. The potential for biased algorithms, privacy breaches, and misinterpretation of AI outputs poses serious concerns. Agency leaders may feel pressured to adopt technologies they don’t fully understand, leading to ineffective or even harmful implementations. Embedding AI experts within HHS agencies is essential to ensure that AI solutions are deployed responsibly, maximizing benefits while mitigating potential harms.

This gap presents an opportunity for the federal government to take decisive action. By recruiting and embedding top-tier AI professionals within each agency, HHS could ensure that AI is treated not as an ancillary task but as a core component of agency operations. These experts would bring the specialized knowledge necessary to integrate AI tools safely and effectively, optimize processes, and drive innovation within each agency.

DHS’s AI Corps, launched as part of the National AI Talent Surge, provides a strong precedent for recruiting AI specialists to advance departmental capabilities. For instance, AI Corps members have played a vital role in improving disaster response by using AI to quickly assess damage and allocate resources more effectively during crises. They have also enhanced cybersecurity efforts by using AI to detect vulnerabilities in critical U.S. government systems and networks. Building on these successes, a similar effort within HHS would ensure that AI adoption moves beyond a strategic objective to a practical implementation, with dedicated experts driving innovation across the department’s diverse functions.

Case Study: The Food and Drug Administration (FDA)

The FDA stands at the forefront of the biotechnology revolution, facing the dual challenges of rapid innovation and a massive influx of complex data. Advances in gene editing, personalized medicine, and AI-driven diagnostics promise to transform healthcare, but they also present significant regulatory hurdles. The current framework, though robust, struggles to keep pace with these innovations, risking delays in the approval and implementation of groundbreaking treatments.

This situation is reminiscent of the challenges faced in the 1980s and 1990s, when advances in pharmaceutical science outstripped the FDA’s capacity to review new drugs, leading to the so-called “drug lag.” The Prescription Drug User Fee Act of 1992 was a pivotal response, streamlining the drug review process by providing the FDA with additional resources. However, the continued reliance on scaling resources may not be sustainable as the complexity and volume of data increase.

The FDA has begun to address this new challenge. For example, the Center for Biologics Evaluation and Research has established committees like the Artificial Intelligence Coordinating Committee and the Regulatory Review AI Subcommittee. However, these efforts largely involve existing staff who must balance AI responsibilities with their regular duties, limiting the potential impact. Moreover, the focus has predominantly been on regulating AI rather than leveraging it to enhance regulatory processes.

Placing an AI expert from the HHS AI Corps within the FDA could fundamentally change this dynamic. By providing dedicated, expert support, the FDA could accelerate its regulatory review processes, ensuring timely and safe access to innovative treatments. The financial implications are significant: the value of accelerated drug approvals, as demonstrated by the worth of Priority Review Vouchers (acceleration of four months = ~$100 million), indicates that effective AI adoption could unlock billions of dollars in industry value while simultaneously improving public health outcomes.

Plan of Action

To address the challenges and seize the opportunities outlined earlier, the Office of the Chief Artificial Intelligence Officer (OCAIO) within HHS should establish an AI Corps composed of specialized experts in artificial intelligence, machine learning, data science, and data engineering. This initiative will be modeled after DHS’s successful AI Corps and tailored to the unique needs of HHS and its 10 agencies.

Recommendation 1. Establish an AI Corps within HHS.

Composition: The AI Corps would initially consist of 10 experts hired to temporary civil servant positions, with one member allocated to each of HHS’s 10 agencies, and each placement lasting one to two years. These experts will possess a range of technical skills—including AI, data science, data engineering, and cloud computing—tailored to each agency’s specific needs and technological maturity. This approach ensures that each agency has the appropriate expertise to effectively implement AI tools and methodologies, whether that involves building foundational data infrastructure or developing advanced AI applications.

Hiring authority: The DHS AI Corps utilized direct hiring authority, which was expanded by the Office of Personnel Management under the National AI Talent Surge. HHS’s AI Corps could adopt a similar approach. This authority would enable streamlined recruitment of individuals into specific AI roles, including positions in AI research, machine learning, and data science. This expedited process would allow HHS to quickly hire and onboard top-tier AI talent.

Oversight: The AI Corps would be overseen by the OCAIO, which would provide strategic direction and ensure alignment with HHS’s broader AI initiatives. The OCAIO would also be responsible for coordinating the activities of the AI Corps, setting performance goals, and evaluating outcomes.

Budget and Funding

Estimated cost: The AI Corps is projected to cost approximately $1.5 million per year, based on an average salary of $150,000 per corps member. This estimate includes salaries and operational costs such as training, travel for interagency collaboration, and participation in conferences. 

Funding source: Funding would be sourced from the existing HHS budget, specifically from allocations set aside for digital transformation and innovation. Given the relatively modest budget required, reallocation within these existing funds should be sufficient. 

Recruitment and Training

Selection process: AI Corps members would be recruited through a competitive process, targeting individuals with proven expertise in AI, data science, and related fields. 

Training: Upon selection, AI Corps members would undergo an intensive orientation and training program to familiarize them with the specific needs and challenges of HHS’s various agencies. This also includes training on federal regulations, ethics, and data governance to ensure that AI applications comply with existing laws and policies.

Agency Integration

Deployment: Each AI Corps member would be embedded within a specific HHS agency, where they would work closely with agency leadership and staff to identify opportunities for AI implementation. Their primary responsibility would be to develop and deploy AI tools that enhance the agency’s mission-critical processes. For example, an AI Corps member embedded at the CDC could focus on improving disease surveillance systems through AI-driven predictive analytics, while a member at the NIH could drive advancements in biomedical research by using machine learning algorithms to analyze complex genomic data.

Collaboration: To ensure cross-agency learning and collaboration, AI Corps members would convene regularly to share insights, challenges, and successes. These convenings would be aligned with the existing AI Community of Practice meetings, fostering a broader exchange of knowledge and best practices across the department.

Case Study: The FDA

AI Corps Integration at the FDA

Location: The AI Corps member assigned to the FDA would be based in the Office of Digital Transformation, reporting directly to the chief information officer. This strategic placement would enable the expert to work closely with the FDA’s leadership team, ensuring that AI initiatives are aligned with the agency’s overall digital strategy.

Key responsibilities

Process improvement: The AI Corps member would collaborate with FDA reviewers to identify opportunities for AI to streamline regulatory review processes. This might include developing AI tools to assist with data analysis, automate routine tasks, or enhance decision-making capabilities.

Opportunity scoping: The expert would engage with FDA staff to understand their workflows, challenges, and data needs. Based on these insights, the AI Corps member would scope and propose AI solutions tailored to the FDA’s specific requirements.

Pilot projects: The AI Corps member would lead pilot projects to test AI tools in real-world scenarios, gathering data and feedback to refine and scale successful initiatives across the agency.

Conclusion

Establishing an AI Corps within HHS is a critical step toward harnessing AI’s full potential to enhance outcomes and operational efficiency across federal health agencies. By embedding dedicated AI experts within each agency, HHS can accelerate the adoption of innovative AI solutions, address current implementation gaps, and proactively respond to the evolving demands of the health landscape.

While HHS may currently have less technological infrastructure compared to departments like the Department of Homeland Security, targeted investment in in-house expertise is key to bridging that gap. The proposed AI Corps not only empowers agencies like the FDA, CDC, NIH, and CMS to enhance their missions but also sets a precedent for effective AI integration across the federal government. Prompt action to establish the AI Corps will position HHS at the forefront of technological innovation, delivering tangible benefits to the American public and transforming the way it delivers services and fulfills its mission.

This action-ready policy memo is part of Day One 2025 — our effort to bring forward bold policy ideas, grounded in science and evidence, that can tackle the country’s biggest challenges and bring us closer to the prosperous, equitable and safe future that we all hope for whoever takes office in 2025 and beyond.

Frequently Asked Questions
How will the AI Corps avoid becoming just another bureaucratic layer?

The AI Corps is designed to be the opposite of bureaucracy—it’s about action, not administration. These experts will be embedded directly within agencies, working alongside existing teams to solve real-world problems, not adding paperwork. Their mission is to integrate AI into daily operations, making processes more efficient and outcomes more impactful. By focusing on tangible results and measurable improvements, the AI Corps will be judged by its ability to cut through red tape, not create it.

What if AI Corps members are too ahead of the curve for existing agency cultures?

Innovation can present challenges, but the AI Corps is designed to address them effectively. These experts will not only bring technical expertise but also serve as facilitators who can translate advanced AI capabilities into practical applications that align with existing agency cultures. A key part of their role will be to make AI more accessible and understandable, ensuring it is valuable to all levels of staff, from frontline workers to senior leadership. Their success will depend on their ability to seamlessly integrate advanced technology into the agency’s everyday operations.

Why focus on AI when there are so many other pressing health issues?

AI isn’t just another tool; it’s a force multiplier that can help solve those other pressing issues more effectively. Whether it’s accelerating drug approvals at the FDA or enhancing public health responses across HHS, AI has the potential to improve outcomes, save time, and reduce costs. By embedding AI experts within agencies, we’re not just addressing one problem—we’re empowering the entire department to tackle multiple challenges with greater efficiency and impact.

What’s in it for the AI experts? Why would top talent join the AI Corps?

For top AI talent, the AI Corps offers a unique opportunity to make a difference at a scale that few private-sector roles can match. It’s a chance to apply their skills to public service, tackling some of the nation’s most critical challenges in healthcare, regulation, and beyond. The AI Corps members will have the opportunity to shape the future of AI in government, leaving a legacy of innovation and impact. The allure of making a tangible difference in people’s lives can be a powerful motivator for the right kind of talent.

Why not outsource AI talent or rely on consultants instead of building in-house expertise?

While outsourcing AI talent or using consultants can offer short-term benefits, it often lacks the sustained engagement necessary for long-term success. Building in-house expertise through the AI Corps ensures that AI capabilities are deeply integrated into the agency’s operations and culture. A notable example illustrating the risks of overreliance on external contractors is the initial rollout of HealthCare.gov. The website faced significant technical issues at launch due to coordination challenges and insufficient in-house technical oversight, which hindered public access to essential healthcare services. In contrast, recent successful government initiatives—such as the efficient distribution of COVID-19 test kits and the timely processing of economic stimulus payments directly into bank accounts—demonstrate the positive impact of having the right technical experts within government agencies.

How will the AI Corps collaborate with existing IT and data teams within agencies?

Collaboration is crucial to the AI Corps’ success. Instead of working in isolation, AI Corps members will integrate with existing IT and data teams, bringing specialized AI knowledge that complements the teams’ expertise. This partnership approach ensures that AI initiatives are well-grounded in the agencies’ existing infrastructure and aligned with ongoing IT projects. The AI Corps will serve as a catalyst, amplifying the capabilities of existing teams rather than duplicating their efforts.

Could the AI Corps inadvertently lead to job displacement within agencies?

The AI Corps is focused on augmentation, not replacement. The primary goal is to empower existing staff with advanced tools and processes, enhancing their work rather than replacing them. AI Corps members will collaborate closely with agency employees to automate routine tasks and free up time for more meaningful activities. A 2021 study by the Boston Consulting Group found that 60% of employees view AI as a coworker rather than a replacement. This reflects the intent of the AI Corps—to build capacity within agencies and ensure that AI is a tool that amplifies human effort, fostering a more efficient and effective workforce.

What does success look like for the HHS AI Corps program after one or two years?

Success for the AI Corps program means that each HHS agency has made measurable progress toward integrating AI and related technologies, tailored to their specific needs and maturity levels. Within one to two years, agencies might have established robust data infrastructures, migrated platforms to the cloud, or developed pilot AI projects that address key challenges. Success also includes fostering a culture of innovation and experimentation, with AI Corps members identifying opportunities and creating proofs of concept in low-risk environments. By collaborating across agencies, these experts support each other and amplify the program’s impact. Ultimately, success is reflected in enhanced capabilities and efficiencies within agencies, setting a strong foundation for ongoing technological advancement aligned with each agency’s mission.

Public Comment on Executive Branch Agency Handling of CAI containing PII

Public comments serve the executive branch by informing more effective, efficient program design and regulation. As part of our commitment to evidence-based, science-backed policy, FAS staff leverage public comment opportunities to embed science, technology, and innovation into policy decision-making.

The Federation of American Scientists (FAS) is a non-partisan, nonprofit organization committed to using science and technology to benefit humanity by delivering on the promise of equitable and impactful policy. FAS believes that society benefits from a federal government that harnesses science, technology, and innovation to meet ambitious policy goals and deliver impactful results to the public. 

We are writing in response to your Request for Information on the Executive Branch Agency Handling of Commercially Available Information (CAI) Containing Personally Identifiable Information (PII). Specifically, we will be answering questions 2 and 5 in your request for information

2. What frameworks, models, or best practices should [the White House Office of Management and Budget] consider as it evaluates agency standards and procedures associated with the handling of CAI containing PII and considers potential guidance to agencies on ways to mitigate privacy risks from agencies’ handling of CAI containing PII?

5.  Agencies provide transparency into the handling of PII through various means (e.g., policies and directives, Privacy Act statements and other privacy notices at the point of collection, Privacy Act system of records notices, and privacy impact assessments). What, if any, improvements would enhance the public’s understanding of how agencies handle CAI containing PII?

Background

In the digital landscape, commercially available information (CAI) represents a vast ecosystem of personal data that can be easily obtained, sold, or licensed to various entities. The Executive Order on Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence (EO 14110) defines CAI comprehensively as information about individuals or groups that is publicly accessible, encompassing details like device information and location data.

A 2017 report by the Georgetown Law Review found that 63% of Americans can be uniquely identified using just three basic attributes—gender, birth date, and ZIP code—with an astonishing 99.98% of individuals potentially re-identifiable from a dataset containing only 15 fundamental characteristics. This vulnerability underscores the critical challenges of data privacy in an increasingly interconnected world. 

CAI takes on heightened significance in the context of artificial intelligence (AI) deployment, as these systems enable both data collection and the use of advanced inference models to analyze datasets and produce predictions, insights, and assumptions that reveal patterns or relationships not directly evident in the data. Some AI systems can allow the intentional or unintentional reidentification of supposedly anonymized private data. These capabilities raise questions about privacy, consent, and the potential for unprecedented levels of personal information aggregation and analysis, challenging existing data protection frameworks and individual rights.

The United States federal government is one of the largest customers of commercial data brokers. Government entities increasingly use CAI to empower public programs, enabling federal agencies to augment decision-making, policy development, and resource allocation and enrich research and innovation goals with large yet granular datasets. For example, the National Institutes of Health have discussed within their data strategies how to incorporate commercially available data into research projects. The use of commercially available electronic health records is essential for understanding social inequalities within the healthcare system but includes sensitive personal data that must be protected. 

However, government agencies face significant public scrutiny over their use of CAI in areas including law enforcement, homeland security, immigration, and tax administration. This scrutiny stems from concerns about privacy violations, algorithmic bias, and the risks of invasive surveillance, profiling, and discriminatory enforcement practices that could disproportionately harm vulnerable populations.  For example, federal agencies like Immigration and Customs Enforcement (ICE) and Customs and Border Protection (CBP) have used broker-purchased location data to track individuals without warrants, raising constitutional concerns. 

In 2020, the American Civil Liberties Union filed a Freedom of Information Act lawsuit against several Department of Homeland Security (DHS) agencies, arguing that the DHS’s use of cellphone data and data from smartphone apps constitutes unreasonable searches without a warrant and violates the Fourth Amendment. A report by the Electronic Frontier Foundation found that CAI was used for mass surveillance practices, including geofence warrants that query all phones in specific locations, further challenging constitutional protections. 

While the Privacy Act of 1974 covers the use of federally collected personal information by agencies, there is no explicit guidance governing federal use of third-party data. The bipartisan Fourth Amendment is Not for Sale Act (H.R.4639) would bar certain technology providers—such as remote computing service and electronic communication service providers—from sharing the contents of stored electronic communications with anyone (including government actors) and from sharing customer records with government agencies. The bill has passed the House of Representatives in the 118th Congress but has yet to pass the Senate as of December 2024. Without protections in statute, it is imperative that the federal government crafts clear guidance on the use of CAI containing PII in AI systems. In this response to the Office of Management and Budget’s (OMB) request for information, FAS will outline three policy ideas that can improve how federal agencies navigate the use of CAI containing PII, including in AI use. 

Summary of Recommendations

The federal government is responsible for ensuring the safety and privacy of the processing of personally identifiable information within commercially available information used for the development and deployment of artificial intelligence systems. For this RFI, FAS brings three proposals to increase government capacity in ensuring transparency and risk mitigation in how CAI containing PII is used, including in agency use of AI: 

  1. Enable FedRAMP to Create an Authorization System for Third-Party Data Sources: An authorization framework for CAI containing PII would ensure a standardized approach for data collection, management, and contracting, mitigating risks, and ensuring ethical data use.
  2. Expand Existing Privacy Impact Assessments (PIA) to Incorporate Additional Requirements and Periodic Evaluations: Regular public reports on CAI sources and usage will enable stakeholders to monitor federal data practices effectively.
  3. Build Government Capacity for the Use of Privacy Enhancing Technologies to Bolster Anonymization Techniques by harnessing existing resources such as the United States Digital Service (USDS). 

Recommendation 1. Enable FedRAMP to Create an Authorization System for Third-Party Data Sources

Government agencies utilizing CAI should implement a pre-evaluation process before acquiring large datasets to ensure privacy and security. OMB, along with other agencies that are a part of the governing board of the Federal Risk and Authorization Management Program (FedRAMP), should direct FedRAMP to create an authorization framework for third-party data sources that contract with government agencies, especially data brokers that provide CAI with PII, to ensure that these vendors comply with privacy and security requirements. FedRAMP is uniquely positioned for this task because of its previous mandate to ensure the safety of cloud service providers used by the federal government and its recent expansion of this mandate to standardize AI technologies. The program could additionally harmonize its new CAI requirements with its forthcoming AI authorization framework.

When designing the content of the CAI authorization, a useful benchmark in terms of evaluation criteria is the Ag Data Transparent (ADT) certification process. Companies applying for this certification must submit contracts and respond to 11 data collection, usage, and sharing questions. Like the FedRAMP authorization process, a third-party administrator reviews these materials for consistency, granting the ADT seal only if the company’s practices align with its contracts. Any discrepancies must be corrected, promoting transparency and protecting farmers’ data rights. The ADT is a voluntary certification, and therefore does not provide a good model for enforcement. However, it does provide a framework for the kind of documentation that should be required. The CAI authorization should thus include the following information required by the ADT certification process:

Unlike the ADT, a FedRAMP authorization process can be strictly enforced. FedRAMP is mandatory for all cloud service providers working with the executive branch and follows a detailed authorization process with evaluations and third-party auditors. It would be valuable to bring that assessment rigor to federal agency use of CAI, and would help provide clarity to commercial vendors. 

The authorization framework should also document the following specific protocols for the use of CAI within AI systems:

By setting these standards, this authorization could help agencies understand privacy risks and ensure the reliability of CAI data vendors before deploying purchased datasets within AI systems or other information systems, therefore setting them up to create appropriate mitigation strategies. 

By encouraging data brokers to follow best practices, this recommendation would allow agencies to focus on authorized datasets that meet privacy and security standards. Public availability of this information could drive market-wide improvements in data governance and elevate trust in responsible data usage. This approach would support ethical data governance in AI projects and create a more transparent, publicly accountable framework for CAI use in government.  

Recommendation 2. Expand Privacy Impact Assessments (PIA) to Incorporate Additional Requirements and Periodic Evaluations 

Public transparency regarding the origins and details of government-acquired CAI containing PII is critical, especially given the largely unregulated nature of the data broker industry at the federal level. Privacy Impact Assessments (PIAs) are mandated under Section 208 of the 2002 E-Government Act and OMB Memo M-03-22, and can serve as a vital policy tool for ensuring such transparency. Agencies must complete PIAs at the outset of any new electronic information collection process that includes “information in identifiable form for ten or more persons.” Under direction from Executive Order 14110 on Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence, OMB issued a request for information in April 2024 to explore updating PIA guidance for AI-era privacy concerns, although new guidance has not yet been issued. 

To ensure that PIAs can effectively provide transparency into government practices on CAI that contains PII, we recommend that OMB provide updated guidance requiring agencies to regularly review and update their PIAs at least every three years, and also require agencies to report more comprehensive information in PIAs. We provide more details on these recommendations below.

First, OMB should guide agencies to periodically update their PIAs to ensure evolutions in agency data practices are publicly captured, which is increasingly important as data-driven AI systems are adopted by government actors and create novel privacy concerns. Under OMB Memo M-03-22, agencies must initiate or update PIAs when new privacy risks or factors emerge that affect the collection and handling of PII, including when agencies incorporate PII obtained from commercial or public sources into existing information systems. However, a public comment submitted by the Electronic Privacy Information Center (EPIC) pointed out that many agencies fail to publish and update required PIAs in a timely manner, indicating that a stricter schedule is needed to maintain accountability for PIA reporting requirements. As data privacy risks evolve through the advancement of AI systems, increased cybersecurity risks, and new legislation, it is essential that a minimum standard schedule for updating PIAs is created to ensure agencies provide the public with an up-to-date understanding of the potential risks resulting from using CAI that includes PII. For example, the European Union’s ​​General Data Protection Regulation (Art. 35) requires PIAs to be reconducted every three years. 

Second, agency PIAs should report more detailed information on the CAI’s source, vendor information, contract agreements, and licensing arrangements. A frequent critique of existing PIAs is that they contain too little information to inform the public of relevant privacy harms. Such a lack of transparency risks damaging public trust in government. One model for expanded reporting frameworks for CAI containing PII is the May 2024 Policy Framework for CAI, established for the Intelligence Community (IC) by the Office of the Director of National Intelligence (ODNI). This framework requires the IC to document and report “the source of the Sensitive CAI and from whom the Sensitive CAI was accessed or collected” and “any licensing agreements and/or contract restrictions applicable to the Sensitive CAI”. OMB should incorporate these reporting practices into agency PIA requirements and explicitly require agencies to identify the CAI data vendor in order to provide insight into the source and quality of purchased data.

Many of these elements are also present in Recommendation 1, for a new FedRAMP authorization framework. However, that recommendation does not include existing agency projects using CAI or agencies that could contract CAI datasets outside of the FedRAMP authorization. Including this information within the PIA framework also allows for an iterative understanding of privacy risks throughout the lifecycle of a project using CAI. 

By obligating agencies to provide more frequent PIA updates and include additional details on the source, vendor, contract and licensing arrangements for CAI containing PII, the public gains valuable insight into how government agencies acquire, use, and manage sensitive data. These updates to PIAs would allow civil society groups, journalists, and other external stakeholders to track government data management practices over time during this critical juncture where federal uptake of AI systems is rapidly increasing.

Recommendation 3. Build Government Capacity for the Use of Privacy Enhancing Technologies to Bolster Anonymization Techniques

Privacy Enhancing Technologies (PETs) are a diverse set of tools that can be used throughout the data lifecycle to ensure privacy by design. They can also be powerful tools in ensuring that PII within CAI) is adequately anonymized and secure. OMB should collect information on current agency PET usage, gather best practices, and identify deployment gaps. To address these gaps, OMB should collaborate with agencies like the USDS to establish capacity-building programs, leveraging initiatives like the proposed “Responsible Data Sharing Core” to provide expert consultations and enhance responsible data-sharing practices.

Meta’s Open Loop project identified eight types of PETs that are ripe to be deployed in AI systems, categorizing them into maturity levels, context of deployment, and limitations. One type of PET is differential privacy, a mathematical framework designed to protect individuals’ privacy in datasets by introducing controlled noise to the data. This ensures that the output of data analysis or AI models does not reveal whether a specific individual’s information is included in the dataset. The noise is calibrated to balance privacy with data utility, allowing meaningful insights to be derived without compromising personal information. Differential privacy is particularly useful in AI models that rely on large-scale data for training, as it prevents the inadvertent exposure of PII during the learning process. Within the federal government, the U.S. Census Bureau is using differential privacy to anonymize data while preserving its aggregate utility, ensuring compliance with privacy regulations and reducing re-identification within datasets.

Scaling the use of PETs in other agencies has been referenced in several U.S. government strategy documents, such as the National Strategy to Advance Privacy-Preserving Data Sharing and Analytics, which encourages federal agencies to adopt and invest in the development of PETs, and the Executive Order (EO) on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence, which calls for federal agencies to identify where they could use PETs. As a continuation of this EO, the National Science Foundation and the Department of Energy established a Research Coordination Network on PETs that will “address the barriers to widespread adoption of PETs, including regulatory considerations.”  

Although the ongoing research and development of PETS is vital to this growing field, there is an increasing need to ensure these technologies are implemented across the federal government. To kick this off, OMB should collect detailed information on how agencies currently use PETs, especially in projects that use CAI containing PII. This effort should include gathering best practices from agencies with successful PET implementations, such as the previous U.S. Census Bureau’s use of differential privacy. Additionally, OMB should identify gaps in PET deployment, assessing barriers such as technical capacity, funding, and awareness of relevant PETs. To address these gaps, OMB should collaborate with other federal agencies to design and implement capacity-building programs, equipping personnel with the knowledge and tools needed to integrate PETs effectively. For example, a forthcoming FAS’ Day One Project publication, “Increasing Responsible Data Sharing Capacity throughout Government,” seeks to harness existing government capabilities to build government capacity in deploying PETs. This proposal aims to enhance responsible data sharing in government by creating a capacity-building initiative called the  “Responsible Data Sharing Core” (RDSC). Managed by the USDS, the RDSC would deploy fellows and industry experts to agencies to consult on data use and sharing decisions and offer consultations on which PETs are appropriate for different contexts.   

Conclusion

The federal government’s increasing reliance on CAI containing PII presents significant privacy challenges. The current landscape of data procurement and AI deployment by agencies like ICE, CBP, and others raises critical concerns about potential Fourth Amendment violations, discriminatory profiling, and lack of transparency.

The ideas proposed in this memo—implementing FedRAMPamp authorization for data brokers, expanding privacy impact assessment requirements, and developing capacity-building programs for privacy-enhancing technologies—represent crucial first steps in addressing these systemic risks. As AI systems become increasingly integrated into government processes, maintaining a delicate balance between technological advancement and fundamental constitutional protections will be paramount to preserving individual privacy, promoting responsible adoption, and maintaining public trust.

We appreciate the opportunity to contribute to this Request for Information on Executive Branch Agency Handling of Commercially Available Information Containing Personally Identifiable Information. Please contact clangevin@fas.org if you have any questions or need additional information.

Teacher Education Clearinghouse for AI and Data Science

The next presidential administration should develop a teacher education and resource center that includes vetted, free, self-guided professional learning modules, resources to support data-based classroom activities, and instructional guides pertaining to different learning disciplines. This would provide critical support to teachers to better understand and implement data science education and use of AI tools in their classroom. Initial resource topics would be: 

In addition, this resource center would develop and host free, pre-recorded, virtual training sessions to support educators and district professionals to better understand these resources and practices so they can bring them back to their contexts. This work would improve teacher practice and cut administrative burdens. A teacher education resource would lessen the digital divide and ensure that our educators are prepared to support their students in understanding how to use AI tools so that each and every student can be college and career ready and competitive at the global level. This resource center would be developed using a process similar to the What Works Clearinghouse, such that it is not endorsing a particular system or curriculum, but is providing a quality rating, based on the evidence provided. 

Challenge and Opportunity

AI is an incredible technology that has the power to revolutionize many areas, especially how educators teach and prepare the next generation to be competitive in higher education and the workforce. A recent RAND study showed leaders in education indicating promise in adapting instructional content to fit the level of their students and for generating instructional materials and lesson plans. While this technology holds a wealth of promise, the field has developed so rapidly that people across the workforce do not understand how best to take advantage of AI-based technologies. One of the most crucial areas for this is in education. AI-enabled tools have the potential to improve instruction, curriculum development, and assessment, but most educators have not received adequate training to feel confident using them in their pedagogy. In a Spring 2024 pilot study (Beiting-Parrish & Melville, in preparation), initial results indicated that 64.3% of educators surveyed had not had any professional development or training in how to use AI tools. In addition, more than 70% of educators surveyed felt they did not know how to pick AI tools that are safe for use in the classroom, and that they were not able to detect biased tools. Additionally, the RAND study indicated only 18% of educators reported using AI tools for classroom purposes. Within those 18%, approximately half of those educators used AI because they had been specifically recommended or directly provided a tool for classroom use. This suggests that educators need to be given substantial support in choosing and deploying tools for classroom use. Providing guidance and resources to support vetting tools for safe, ethical, appropriate, and effective instruction is one of the cornerstone missions of the Department of Education. This education should not rest on the shoulders of individual educators who are known to have varying levels of technical and curricular knowledge, especially for veteran teachers who have been teaching for more than a decade.

If the teachers themselves do not have enough professional development or expertise to select and teach new technology, they cannot be expected to thoroughly prepare their students to understand emerging technologies, such as AI, nor the underpinning concepts necessary to understand these technologies, most notably data science and statistics. As such, students’ futures are being put at risk from a lack of emphasis in data literacy that is apparent across the nation. Recent results from the National Assessment of Education Progress (NAEP), assessment scores show a shocking decline in student performance in data literacy, probability, and statistics skills – outpacing declines in other content areas. In 2019, the NAEP High School Transcript Study (HSTS) revealed that only 17% of students completed a course in statistics and probability, and less than 10% of high school students completed AP Statistics. Furthermore, the HSTS study showed that less than 1% of students completed a dedicated course in modern data science or applied data analytics in high school. Students are graduating with record-low proficiency in data, statistics, and probability, and graduating without learning modern data science techniques. While students’ data and digital literacy are failing, there is a proliferation of AI content online; they are failing to build the necessary critical thinking skills and a discerning eye to determine what is real versus what has been AI-generated, and they aren’t prepared to enter the workforce in sectors that are booming. The future the nation’s students will inherit is one in which experience with AI tools and Big Data will be expected to be competitive in the workforce.

Whether students aren’t getting the content because it isn’t given its due priority, or because teachers aren’t comfortable teaching the content, AI and Big Data are here, and our educators don’t have the tools to help students get ready for a world in the midst of a data revolution. Veteran educators and preservice education programs alike may not have an understanding of the essential concepts in statistics, data literacy, or data science that allow them to feel comfortable teaching about and using AI tools in their classes. Additionally, many of the standard assessment and practice tools are not fit for use any longer in a world where every student can generate an A-quality paper in three seconds with proper prompting. The rise of AI-generated content has created a new frontier in information literacy; students need to know to question the output of publically available LLM-based tools, such as Chat-GPT, as well as to be more critical of what they see online, given the rise of AI-generated deep fakes, and educators need to understand how to either incorporate these tools into their classrooms or teach about them effectively. Whether educators are ready or not, the existing Digital Divide has the potential to widen, depending on whether or not they know how to help students understand how to use AI safely and effectively and have the access to resources and training to do so.

The United States finds itself at a crossroads in the global data boom. Demand in the economic marketplace, and threat to national security by way of artificial intelligence and mal-, mis-, and disinformation, have educators facing an urgent problem in need of an immediate solution. In August of 1958, 66 years ago, Congress passed the National Defense Education Act (NDEA), emphasizing teaching and learning in science and mathematics. Specifically in response to the launch of Sputnik, the law supplied massive funding to, “insure trained manpower of sufficient quality and quantity to meet the national defense needs of the United States.” The U.S. Department of Education, in partnership with the White House Office of Science and Technology Policy, must make bold moves now to create such a solution, as Congress did once before.

Plan of Action

In the years since the Space Race, one problem with STEM education persists: K-12 classrooms still teach students largely the same content; for example, the progression of high school mathematics including algebra, geometry, and trigonometry is largely unchanged. We are no longer in a race to space – we’re now needing to race against data. Data security, artificial intelligence, machine learning, and other mechanisms of our new information economy are all connected to national security, yet we do not have educators with the capacity to properly equip today’s students with the skills to combat current challenges on a global scale. Without a resource center to house the urgent professional development and classroom activities America’s educators are calling for, progress and leadership in spaces where AI and Big Data are being used will continue to dwindle, and our national security will continue to be at risk. It’s beyond time for a new take on the NDEA that emphasizes more modern topics in the teaching and learning of mathematics and science, by way of data science, data literacy, and artificial intelligence. 

Previously, the Department of Education has created resource repositories to support the dissemination of information to the larger educational praxis and research community. One such example is the What Work Clearinghouse, a federally vetted library of resources on educational products and empirical research that can support the larger field. The WWC was created to help cut through the noise of many different educational product claims to ensure that only high-quality tools and research were being shared. A similar process is happening now with AI and Data Science Resources; there are a lot of resources online, but many of these are of dubious quality or are even spreading erroneous information. 

To combat this, we suggest the creation of something similar to the WWC, with a focus on vetted materials for educator and student learning around AI and Data Science. We propose the creation of the Teacher Education Clearinghouse (TEC) underneath the Institute of Education Sciences, in partnership with the Office of Education Technology. Currently, WWC costs approximately $2,500,000 to run, so we anticipate a similar budget for the TEC website. The resource vetting process would begin with a Request for Information from the larger field that would encourage educators and administrators to submit high quality materials. These materials would be vetted using an evaluation framework that looks for high quality resources and materials. 

For example, the RFI might request example materials or lesson goals for the following subjects:

A framework for evaluating how useful these contributions might be for the Teacher Education Clearinghouse would consider the following principles:

Additionally, this would also include a series of quick start guide books that would be broken down by topic and include a set of resources around foundational topics such as, “Introduction to AI” and “Foundational Data Science Vocabulary”. 

When complete, this process would result in a national resource library, which would house a free series of asynchronous professional learning opportunities and classroom materials, activities, and datasets. This work could be promoted through the larger DoE as well as through the Regional Educational Laboratory program and state level stakeholders. The professional learning would consist of prerecorded virtual trainings and related materials (ex: slide decks, videos, interactive components of lessons, etc.). The materials would include educator-facing materials to support their professional development in Big Data and AI alongside student-facing lessons on AI Literacy that teachers could use to support their students. All materials would be publicly available for download on an ED-owned website. This will allow educators from any district, and any level of experience, to access materials that will improve their understanding and pedagogy. This especially benefits educators from less resourced environments because they can still access the training they need to adequately support their students, regardless of local capacity for potentially expensive training and resource acquisition. Now is the time to create such a resource center because there currently isn’t a set of vetted and reliable resources that are available and accessible to the larger educator community and teachers desperately need these resources to support themselves and their students in using these tools thoughtfully and safely. The successful development of this resource center would result in increased educator understanding of AI and data science such that the standing of U.S. students increases on such international measurements as the International Computer and Information Literacy Study (ICILS), as well as increased participation in STEAM fields that rely on these skills.

Conclusion

The field of education is at a turning point; the rise of advancements in AI and Big Data necessitate increased focus on these areas in the K-12 classroom; however, most educators do not have the preparation needed to adequately teach these topics to fully prepare their students. For the United States to continue to be a competitive global power in technology and innovation, we need a workforce that understands how to use, apply, and develop new innovations using AI and Data Science. This proposal for a library of high quality, open-source, vetted materials would support democratization of professional development for all educators and their students.

Modernizing AI Fairness Analysis in Education Contexts

The 2022 release of ChatGPT and subsequent foundation models sparked a generative AI (GenAI) explosion in American society, driving rapid adoption of AI-powered tools in schools, colleges, and universities nationwide. Education technology was one of the first applications used to develop and test ChatGPT in a real-world context. A recent national survey indicated that nearly 50% of teachers, students, and parents use GenAI Chatbots in school, and over 66% of parents and teachers believe that GenAI Chatbots can help students learn more and faster. While this innovation is exciting and holds tremendous promise to personalize education, educators, families, and researchers are concerned that AI-powered solutions may not be equally useful, accurate, and effective for all students, in particular students from minoritized populations. It is possible that as this technology further develops that bias will be addressed; however, to ensure that students are not harmed as these tools become more widespread it is critical for the Department of Education to provide guidance for education decision-makers to evaluate AI solutions during procurement, to support EdTech developers to detect and mitigate bias in their applications, and to develop new fairness methods to ensure that these solutions serve the students with the most to gain from our educational systems. Creating this guidance will require leadership from the Department of Education to declare this issue as a priority and to resource an independent organization with the expertise needed to deliver these services.  

Challenge and Opportunity

Known Bias and  Potential Harm

There are many examples of the use of AI-based systems introducing more bias into an already-biased system. One example with widely varying results for different student groups is the use of GenAI tools to detect AI-generated text as a form of plagiarism. Liang et. al  found that several GPT-based plagiarism checkers frequently identified the writing of students for whom English is not their first language as AI-generated, even though their work was written before ChatGPT was available. The same errors did not occur with text generated by native English speakers. However, in a publication by Jiang (2024), no bias against non-native English speakers was encountered in the detection of plagiarism between human-authored essays and ChatGPT-generated essays written in response to analytical writing prompts from the GRE, which is an example of how thoughtful AI tool design and representative sampling in the training set can achieve fairer outcomes and mitigate bias. 

Beyond bias, researchers have raised additional concerns about the overall efficacy of these tools for all students; however, more understanding around different results for subpopulations and potential instances of bias(es) is a critical aspect of deciding whether or not these tools should be used by teachers in classrooms. For AI-based tools to be usable in high-stakes educational contexts such as testing, detecting and mitigating bias is critical, particularly when the consequences of being incorrect are so high, such as for students from minoritized populations who may not have the resources to recover from an error (e.g., failing a course, being prevented from graduating school). 

Another example of algorithmic bias before the widespread emergence of GenAI which illustrates potential harms is found in the Wisconsin Dropout Early Warning System. This AI-based tool was designed to flag students who may be at risk of dropping out of school; however, an analysis of the outcomes of these predictions found that the system disproportionately flagged African American and Hispanic students as being likely to drop out of school when most of these students were not at risk of dropping out). When teachers learn that one of their students is at risk, this may change how they approach that student, which can cause further negative treatment and consequences for that student, creating a self-fulfilling prophecy and not providing that student with the education opportunities and confidence that they deserve. These examples are only two of many consequences of using systems that have underlying bias and demonstrate the criticality of conducting fairness analysis before these systems are used with actual students. 

Existing Guidance on Fair AI & Standards for Education Technology Applications

Guidance for Education Technology Applications

Given the harms that algorithmic bias can cause in educational settings, there is an opportunity to provide national guidelines and best practices that help educators avoid these harms. The Department of Education is already responsible for protecting student privacy and provides guidelines via the Every Student Succeeds Act (ESSA) Evidence Levels to evaluate the quality of EdTech solution evidence. The Office of Educational Technology, through support of a private non-profit organization (Digital Promise) has developed guidance documents for teachers and administrators, and another for education technology developers (U.S. Department of Education, 2023, 2024). In particular, “Designing for Education with Artificial Intelligence” includes guidance for EdTech developers including an entire section called “Advancing Equity and Protecting Civil Rights” that describes algorithmic bias and suggests that, “Developers should proactively and continuously test AI products or services in education to mitigate the risk of algorithmic discrimination.” (p 28). While this is a good overall guideline, the document critically is not sufficient to help developers conduct these tests

Similarly, the National Institute of Standards and Technology has released a publication on identifying and managing bias in AI . While this publication highlights some areas of the development process and several fairness metrics, it does not provide specific guidelines to use these fairness metrics, nor is it exhaustive. Finally demonstrating the interest of industry partners, the EDSAFE AI Alliance, a philanthropically-funded alliance representing a diverse group of companies in educational technology, has also created guidance in the form of the 2024 SAFE (Safety, Accountability, Fairness, and Efficacy) Framework. Within the Fairness section of the framework, the authors highlight the importance of using fair training data, monitoring for bias, and ensuring accessibility of any AI-based tool. But again, this framework does not provide specific actions that education administrators, teachers, or EdTech developers can take to ensure these tools are fair and are not biased against specific populations. The risk to these populations and existing efforts demonstrate the need for further work to develop new approaches that can be used in the field. 

Fairness in Education Measurement

As AI is becoming increasingly used in education, the field of educational measurement has begun creating a set of analytic approaches for finding examples of algorithmic bias, many of which are based on existing approaches to uncovering bias in educational testing. One common tool is called Differential Item Functioning (DIF), which checks that test questions are fair for all students regardless of their background. For example, it ensures that native English speakers and students learning English have an equal chance to succeed on a question if they have the same level of knowledge . When differences are found, this indicates that a student’s performance on that question is not based on their knowledge of the content. 

While DIF checks have been used for several decades as a best practice in standardized testing, a comparable process in the use of AI for assessment purposes does not yet exist. There also is little historical precedent indicating that for-profit educational companies will self-govern and self-regulate without a larger set of guidelines and expectations from a governing body, such as the federal government. 

We are at a critical juncture as school districts begin adopting AI tools with minimal guidance or guardrails, and all signs point to an increase of AI in education. The US Department of Education has an opportunity to take a proactive approach to ensuring AI fairness through strategic programs of support for school leadership, developers in educational technology, and experts in the field. It is important for the larger federal government to support all educational stakeholders under a common vision for AI fairness while the field is still at the relative beginning of being adopted for educational use. 

Plan of Action 

To address this situation, the Department of Education’s Office of the Chief Data Officer should lead development of a national resource that provides direct technical assistance to school leadership, supports software developers and vendors of AI tools in creating quality tech, and invests resources to create solutions that can be used by both school leaders and application developers. This office is already responsible for data management and asset policies, and provides resources on grants and artificial intelligence for the field. The implementation of these resources would likely be carried out via grants to external actors with sufficient technical expertise, given the rapid pace of innovation in the private and academic research sectors. Leading the effort from this office ensures that these advances are answering the most important questions and can integrate them into policy standards and requirements for education solutions. Congress should allocate additional funding to the Department of Education to support the development of a technical assistance program for school districts, establish new grants for fairness evaluation tools that span the full development lifecycle, and pursue an R&D agenda for AI fairness in education. While it is hard to provide an exact estimate, similar existing programs currently cost the Department of Education between $4 and $30 million a year. 

Action 1. The Department of Education Should Provide Independent Support for School Leadership Through a Fair AI Technical Assistance Center (FAIR-AI-TAC) 

School administrators are hearing about the promise and concerns of AI solutions in the popular press, from parents, and from students. They are also being bombarded by education technology providers with new applications of AI within existing tools and through new solutions. 

These busy school leaders do not have time to learn the details of AI and bias analysis, nor do they have the technical background required to conduct deep technical evaluations of fairness within AI applications. Leaders are forced to either reject these innovations or implement them and expose their students to significant potential risk with the promise of improved learning. This is not an acceptable status quo.  

To address these issues, the Department of Education should create an AI Technical Assistance Center (the Center) that is tasked with providing direct guidance to state and local education leaders who want to incorporate AI tools fairly and effectively. The Center should be staffed by a team of professionals with expertise in data science, data safety, ethics, education, and AI system evaluation. Additionally, the Center should operate independently of AI tool vendors to maintain objectivity.

There is precedent for this type of technical support. The U.S. Department of Education’s Privacy Technical Assistance Center (PTAC) provides guidance related to data privacy and security procedures and processes to meet FERPA guidelines; they operate a help desk via phone or email, develop training materials for broad use, and provide targeted training and technical assistance for leaders. A similar kind of center could be stood up to support leaders in education who need support evaluating proposed policy or procurement decisions.  

This Center should provide a structured consulting service offering a variety of levels of expertise based on the individual stakeholder’s needs and the variety of levels of potential impact of the system/tool being evaluated on learners; this should include everything from basic levels of AI literacy to active support in choosing technological solutions for educational purposes. The Center should partner with external organizations to develop a certification system for high-quality AI educational tools that have passed a series of fairness checks. Creating a fairness certification (operationalized by third party evaluators)  would make it much easier for school leaders to recognize and adopt fair AI solutions that meet student needs. 

Action 2. The Department of Education Should Provide Expert Services, Data, and Grants for EdTech Developers 

There are many educational technology developers with AI-powered innovations. Even when well-intentioned, some of these tools do not achieve their desired impacts or may be unintentionally unsafe due to a lack of processes and tests for fairness and safety.

Educational Technology developers generally operate under significant constraints when incorporating AI models into their tools and applications. Student data is often highly detailed and deeply personal, potentially containing financial, disability, and educational status information that is currently protected by FERPA, which makes it unavailable for use in AI model training or testing. 

Developers need safe, legal, and quality datasets that they can use for testing for bias, as well as appropriate bias evaluation tools. There are several promising examples of these types of applications and new approaches to data security, such as the recently awarded NSF SafeInsights project, which allows analysis without disclosing the underlying data. In addition, philanthropically-funded organizations such as the Allen Institute for AI have released LLM evaluation tools that could be adapted and provided to Education Technology developers for testing. A vetted set of evaluation tools, along with more detailed technical resources and instructions for how to use them would encourage developers to incorporate bias evaluations early and often. Currently, there are very few market incentives or existing requirements that push developers to invest the necessary time or resources into this type of fairness analysis. Thus, the government has a key role to play here.

The Department of Education should also fund a new grant program that tasks grantees with developing a robust and independently validated third-party evaluation system that checks for fairness violations and biases throughout the model development process from pre-processing of data, to the actual AI use, to testing after AI results are created. This approach would support developers in ensuring that the tools they are publishing meet an agreed-upon minimum threshold for safe and fair use and could provide additional justification for the adoption of AI tools by school administrators.

Action 3. The Department of Education Should Develop Better Fairness R&D Tools with Researchers 

There is still no consensus on best practices for how to ensure that AI tools are fair. As AI capabilities evolve, the field needs an ongoing vetted set of analyses and approaches that will ensure that any tools being used in an educational context are safe and fair for use with no unintended consequences.

The Department of Education should lead the creation of a a working group or task force comprised of subject matter experts from education, educational technology, educational measurement, and the larger AI field to identify the state of the art in existing fairness approaches for education technology and assessment applications, with a focus on modernized conceptions of identity. This proposed task force would be an inter-organizational group that would include representatives from several different federal government offices, such as the Office of Educational Technology and the Chief Data Office as well as prominent experts from industry and academia. An initial convening could be conducted alongside leading national conferences that already attract thousands of attendees conducting cutting-edge education research (such as the American Education Research Association and National Council for Measurement in Education).

The working group’s mandate should include creating a set of recommendations for federal funding to advance research on evaluating AI educational tools for fairness and efficacy. This research agenda would likely span multiple agencies including NIST, the Institute of Education Sciences of the U.S. Department of Education, and the National Science Foundation. There are existing models for funding early stage research and development with applied approaches, including the IES “Accelerate, Transform, Scale” programs that integrate learning sciences theory with efforts to scale theories through applied education technology program and Generative AI research centers that have the existing infrastructure and mandates to conduct this type of applied research. 

Additionally, the working group should recommend the selection of a specialized group of researchers who would contribute ongoing research into new empirically-based approaches to AI fairness that would continue to be used by the larger field. This innovative work might look like developing new datasets that deliberately look for instances of bias and stereotypes, such as the CrowS-Pairs dataset. It may build on current cutting edge research into the specific contributions of variables and elements of LLM models that directly contribute to biased AI scores, such as the work being done by the AI company Anthropic. It may compare different foundation LLMs and demonstrate specific areas of bias within their output. It may also look like a collaborative effort between organizations, such as the development of the RSM-Tool, which looks for biased scoring. Finally, it may be an improved auditing tool for any portion of the model development pipeline. In general, the field does not yet have a set of universally agreed upon actionable tools and approaches that can be used across contexts and applications; this research team would help create these for the field.

Finally, the working group should recommend policies and standards that would incentivize vendors and developers working on AI education tools to adopt fairness evaluations and share their results.

Conclusion

As AI-based tools continue being used for educational purposes, there is an urgent need to develop new approaches to evaluating these solutions to fairness that include modern conceptions of student belonging and identity. This effort should be led by the Department of Education, through the Office of the Chief Data Officer, given the technical nature of the services and the relationship with sensitive data sources. While the Chief Data Officer should provide direction and leadership for the project, partnering with external organizations through federal grant processes would provide necessary capacity boosts to fulfill the mandate described in this memo.As we move into an age of widespread AI adoption, AI tools for education will be increasingly used in classrooms and in homes. Thus, it is imperative that robust fairness approaches are deployed before a new tool is used in order to protect our students, and also to protect the developers and administrators from potential litigation, loss of reputation, and other negative outcomes.

This action-ready policy memo is part of Day One 2025 — our effort to bring forward bold policy ideas, grounded in science and evidence, that can tackle the country’s biggest challenges and bring us closer to the prosperous, equitable and safe future that we all hope for whoever takes office in 2025 and beyond.

Frequently Asked Questions
What are some examples of what is currently being done to ensure fairness in AI applications for educational purposes?

When AI is used to grade student work, fairness is evaluated by comparing the scores assigned by AI to those assigned by human graders across different demographic groups. This is often done using statistical metrics, such as the standardized mean difference (SMD), to detect any additional bias introduced by the AI. A common benchmark for SMD is 0.15, which suggests the presence of potential machine bias compared to human scores. However, there is a need for more guidance on how to address cases where SMD values exceed this threshold.


In addition to SMD, other metrics like exact agreement, exact + adjacent agreement, correlation, and Quadratic Weighted Kappa are often used to assess the consistency and alignment between human and AI-generated scores. While these methods provide valuable insights, further research is needed to ensure these metrics are robust, resistant to manipulation, and appropriately tailored to specific use cases, data types, and varying levels of importance.

What are some concerns about using AI in education for students with diverse and overlapping identities?

Existing approaches to demographic post hoc analysis of fairness assume that there are two discrete populations that can be compared, for example students from African-American families vs. those not from African-American families, students from an English language learner family background vs. those that are not, and other known family characteristics. However in practice, people do not experience these discrete identities. Since at least the 1980s, contemporary sociological theories have emphasized that a person’s identity is contextual, hybrid, and fluid/changing. One current approach to identity that integrates concerns of equity that has been applied to AI is “intersectional identity” theory . This approach has begun to develop promising new methods that bring contemporary approaches to identity into evaluating fairness of AI using automated methods. Measuring all interactions between variables results in too small a sample; these interactions can be prioritized using theory or design principles or more advanced statistical techniques (e.g., dimensional data reduction techniques).

Driving Equitable Healthcare Innovations through an AI for Medicaid (AIM) Initiative

Artificial intelligence (AI) has transformative potential in the public health space – in an era when millions of Americans have limited access to high-quality healthcare services, AI-based tools and applications can enable remote diagnostics, drive efficiencies in implementation of public health interventions, and support clinical decision-making in low-resource settings. However, innovation driven primarily by the private sector today may be exacerbating existing disparities by training models on homogenous datasets and building tools that primarily benefit high socioeconomic status (SES) populations

To address this gap, the Center for Medicare and Medicaid Innovation (CMMI) should create an AI for Medicaid (AIM) Initiative to distribute competitive grants to state Medicaid programs (in partnership with the private sector) for pilot AI solutions that lower costs and improve care delivery for rural and low-income populations covered by Medicaid. 

Challenge & Opportunity

In 2022, the United States spent $4.5 trillion on healthcare, accounting for 17.3% of total GDP. Despite spending far more on healthcare per capita compared to other high-income countries, the United States has significantly worse outcomes, including lower life expectancy, higher death rates due to avoidable causes, and lesser access to healthcare services. Further, the 80 million low-income Americans reliant on state-administered Medicaid programs often have below-average health outcomes and the least access to healthcare services. 

AI has the potential to transform the healthcare system – but innovation solely driven by the private sector results in the exacerbation of the previously described inequities. Algorithms in general are often trained on datasets that do not represent the underlying population – in many cases, these training biases result in tools and models that perform poorly for racial minorities, people living with comorbidities, and people of low SES. For example, until January 2023, the model used to prioritize patients for kidney transplants systematically ranked Black patients lower than White patients – the race component was identified and removed due to advocacy efforts within the medical community. AI models, while significantly more powerful than traditional predictive algorithms, are also more difficult to understand and engineer, resulting in the likelihood of further perpetuating such biases. 

Additionally, startups innovating the digital health space today are not incentivized to develop solutions for marginalized populations. For example, in FY 2022, the top 10 startups focused on Medicaid received only $1.5B in private funding, while their Medicare Advantage (MA)-focused counterparts received over $20B. Medicaid’s lower margins are not attractive to investors, so digital health development targets populations that are already well-insured and have higher degrees of access to care.

The Federal Government is uniquely positioned to bridge the incentive gap between developers of AI-based tools in the private sector and American communities who would benefit most from said tools. Accordingly, the Center for Medicare and Medicaid Innovation (CMMI) should launch the AI for Medicaid (AIM) Initiative to incentivize and pilot novel AI healthcare tools and solutions targeting Medicaid recipients. Precedents in other countries demonstrate early success in state incentives unlocking health AI innovations – in 2023, the United Kingdom’s National Health Service (NHS) partnered with Deep Medical to pilot AI software that streamlines services by predicting and mitigating missed appointment risk. The successful pilot is now being adopted more broadly and is projected to save the NHS over $30M annually in the coming years. 

The AIM Initiative, guided by the structure of the former Medicaid Innovation Accelerator Program (IAP), President Biden’s executive order on integrating equity into AI development, and HHS’ Equity Plan (2022), will encourage the private sector to partner with State Medicaid programs on solutions that benefit rural and low-income Americans covered by Medicaid and drive efficiencies in the overall healthcare system. 

Plan of Action

CMMI will launch and operate the AIM Initiative within the Department of Health and Human Services (HHS). $20M of HHS’ annual budget request will be allocated towards the program. State Medicaid programs, in partnership with the private sector, will be invited to submit proposals for competitive grants. In addition to funding, CMMI will leverage the former structure of the Medicaid IAP program to provide state Medicaid agencies with technical assistance throughout their participation in the AIM Initiative. The programs ultimately selected for pilot funding will be monitored and evaluated for broader implementation in the future. 

Sample Detailed Timeline

Risks and Limitations

Conclusion

The AI for Medicaid Initiative is an important step in ensuring the promise of artificial intelligence in healthcare extends to all Americans. The initiative will enable the piloting of a range of solutions at a relatively low cost, engage with stakeholders across the public and private sectors, and position the United States as a leader in healthcare AI technologies. Leveraging state incentives to address a critical market failure in the digital health space can additionally unlock significant efficiencies within the Medicaid program and the broader healthcare system. The rural and low-income Americans reliant on Medicaid have too often been an afterthought in access to healthcare services and technologies – the AIM Initiative provides an opportunity to address this health equity gap.

This action-ready policy memo is part of Day One 2025 — our effort to bring forward bold policy ideas, grounded in science and evidence, that can tackle the country’s biggest challenges and bring us closer to the prosperous, equitable and safe future that we all hope for whoever takes office in 2025 and beyond.

Accelerating Materials Science with AI and Robotics

Innovations in materials science enable innumerable downstream innovations: steel enabled skyscrapers, and novel configurations of silicon enabled microelectronics. Yet progress in materials science has slowed in recent years. Fundamentally, this is because there is a vast universe of potential materials, and the only way to discover which among them are most useful is to experiment. Today, those experiments are largely conducted by hand. Innovations in artificial intelligence and robotics will allow us to accelerate the search process using foundation AI models for science research and automate much of the experimentation with robotic, self-driving labs. This policy memo recommends the Department of Energy (DOE) lead this effort because of its unique expertise in supercomputing, AI, and its large network of National Labs. 

Challenge and Opportunity

Take a look at your smartphone. How long does its battery last? How durable is its frame? How tough is its screen? How fast and efficient are the chips inside it?

Each of these questions implicates materials science in fundamental ways. The limits of our technological capabilities are defined by the limits of what we can build, and what we can build is defined by what materials we have at our disposal. The early eras of human history are named for materials: the Stone Age, the Bronze Age, the Iron Age. Even today, the cradle of American innovation is Silicon Valley, a reminder that even our digital era is enabled by finding innovative ways to assemble matter to accomplish novel things. 

Materials science has been a driver of economic growth and innovation for decades. Improvements to silicon purification and processing—painstakingly worked on in labs for decades—fundamentally enabled silicon-based semiconductors, a $600 billion industry today that McKinsey recently projected would double in size by 2030. The entire digital economy, conservatively estimated by the Bureau of Economic Analysis (BEA) at $3.7 trillion in the U.S. alone, in turn, rests on semiconductors. Plastics, another profound materials science innovation, are estimated to have generated more than $500 billion in economic value in the U.S. last year. The quantitative benefits are staggering, but even qualitatively, it is impossible to imagine modern life without these materials. 

However, present-day materials are beginning to show their age. We need better batteries to accelerate the transition to clean energy. We may be approaching the limits of traditional methods of manufacturing semiconductors in the next decade. We require exotic new forms of magnets to bring technologies like nuclear fusion to life. We need materials with better thermal properties to improve spacecraft. 

Yet materials science and engineering—the disciplines of discovering and learning to use new materials—have slowed down in recent decades. The low-hanging fruit has been plucked, and the easy discoveries are old news. We’re approaching the limits of what our materials can do because we are also approaching the limits of what the traditional practice of materials science can do. 

Today, materials science proceeds at much the same pace as it did half a century ago: manually, with small academic labs and graduate students formulating potential new combinations of elements, synthesizing those combinations, and studying their characteristics. Because there are more ways to configure matter than there are atoms in the universe, manually searching through the space of possible materials is an impossible task. 

Fortunately, AI and robotics present an opportunity to automate that process. AI foundation models for physics and chemistry can be used to simulate potential materials with unprecedented speed and low cost compared to traditional ab initio methods. Robotic labs (also known as “self-driving labs”) can automate the manual process of performing experiments, allowing scientists to synthesize, validate, and characterize new materials twenty-four hours a day at dramatically lower costs. The experiments will generate valuable data for further refining the foundation models, resulting in a positive feedback loop. AI language models like OpenAI’s GPT-4 can write summaries of experimental results and even help ideate new experiments. The scientists and their grad students, freed from this manual and often tedious labor, can do what humans do best: think creatively and imaginatively. 

Achieving this goal will require a coordinated effort, significant investment, and expertise at the frontiers of science and engineering. Because much of materials science is basic R&D—too far from commercialization to attract private investment—there is a unique opportunity for the federal government to lead the way. As with much scientific R&D, the economic benefits of new materials science discoveries may take time to emerge. One literature review estimated that it can take roughly 20 years for basic research to translate to economic growth. Research indicates that the returns—once they materialize—are significant. A study from the Federal Reserve Bank of Dallas suggests a return of 150-300% on federal R&D spending. 

The best-positioned department within the federal government to coordinate this effort is the DOE, which has many of the key ingredients in place: a demonstrated track record of building and maintaining the supercomputing facilities required to make physics-based AI models, unparalleled scientific datasets with which to train those models collected over decades of work by national labs and other DOE facilities, and a skilled scientific and engineering workforce capable of bringing challenging projects to fruition.  

Plan of Action

Achieving the goal of using AI and robotics to simulate potential materials with unprecedented speed and low cost, and benefit from the discoveries, rests on five key pillars: 

  1. Creating large physics and chemistry datasets for foundation model training (estimated cost: $100 million)
  2. Developing foundation AI models for materials science discovery, either independently or in collaboration with the private sector (estimated cost: $10-100 million, depending on the nature of the collaboration);
  3. Building 1-2 pilot self-driving labs (SDLs) aimed at establishing best practices, building a supply chain for robotics and other equipment, and validating the scientific merit of SDLs (estimated cost: $20-40 million);
  4. Making self-driving labs an official priority of the DOE’s preexisting FASST initiative (described below);
  5. Directing the DOE’s new Foundation for Energy Security and Innovation (FESI) to prioritize establishing fellowships and public-private partnerships to support items (1) and (2), both financially and with human capital.

The total cost of the proposal, then, is estimated at between $130-240 million. The potential return on this investment, though, is far higher. Moderate improvements to battery materials could drive tens or hundreds of billions of dollars in value. Discovery of a “holy grail” material, such as a room-temperature, ambient-pressure superconductor, could create trillions of dollars in value. 

Creating Materials Science Foundation Model Datasets

Before a large materials science foundation model can be trained, vast datasets must be assembled. DOE, through its large network of scientific facilities including particle colliders, observatories, supercomputers, and other experimental sites, collects enormous quantities of data–but this, unfortunately, is only the beginning. DOE’s data infrastructure is out-of-date and fragmented between different user facilities. Data access and retention policies make sharing and combining different datasets difficult or impossible. 

All of these policy and infrastructural decisions were made far before training large-scale foundation models was a priority. They will have to be changed to capitalize on the newfound opportunity of AI. Existing DOE data will have to be reorganized into formats and within technical infrastructure suited to training foundation models. In some cases, data access and retention policies will need to be relaxed or otherwise modified. 

In other cases, however, highly sensitive data will need to be integrated in more sophisticated ways. A 2023 DOE report, recognizing the problems with DOE data infrastructure, suggests developing federated learning capabilities–an active area of research in the broader machine learning community–which would allow for data to be used for training without being shared. This would, the report argues, ​​”allow access and connections to the information through access control processes that are developed explicitly for multilevel privacy.”

This work will require deep collaboration between data scientists, machine learning scientists and engineers, and domain-specific scientists. It is, by far, the least glamorous part of the process–yet it is the necessary groundwork for all progress to follow. 

Building AI Foundation Models for Science

Fundamentally, AI is a sophisticated form of statistics. Deep learning, the broad approach that has undergirded all advances in AI over the past decade, allows AI models to uncover deep patterns in extremely complex datasets, such as all the content on the internet, the genomes of millions of organisms, or the structures of thousands of proteins and other biomolecules. Models of this kind are sometimes loosely referred to as “foundation models.”  

Foundation models for materials science can take many different forms, incorporating various aspects of physics, chemistry, and even—for the emerging field of biomaterials—biology. Broadly speaking, foundation models can help materials science in two ways: inverse design and property prediction. Inverse design allows scientists to input a given set of desired characteristics (toughness, brittleness, heat resistance, electrical conductivity, etc.) and receive a prediction for what material might be able to achieve those properties. Property prediction is the opposite flow of information, inputting a given material and receiving a prediction of what properties it will have in the real world. 

DOE has already proposed creating AI foundation models for materials science as part of its Frontiers in Artificial Intelligence for Science, Security and Technology (FASST) initiative. While this initiative contains numerous other AI-related science and technology objectives, supporting it would enable the creation of new foundation models, which can in turn be used to support the broader materials science work. 

DOE’s long history of stewarding America’s national labs makes it the best-suited home for this proposal. DOE labs and other DOE sub-agencies have decades of data from particle accelerators, nuclear fusion reactors, and other specialized equipment rarely seen in other facilities. These labs have performed hundreds of thousands of experiments in physics and chemistry over their lifetimes, and over time, DOE has created standardized data collection practices. AI models are defined by the data that they are trained with, and DOE has some of the most comprehensive physics and chemistry datasets in the country—if not the world. 

The foundation models created by DOE should be made available to scientists. The extent of that availability should be determined by the sensitivity of the data used to train the model and other potential risks associated with broad availability. If, for example, a model was created using purely internal or otherwise sensitive DOE datasets, it might have to be made available only to select audiences with usage monitored; otherwise, there is a risk of exfiltrating sensitive training data. If there are no such data security concerns, DOE could choose to fully open source the models, meaning their weights and code would be available to the general public. Regardless of how the models themselves are distributed, the fruits of all research enabled by both DOE foundation models and self-driving labs should be made available to the academic community and broader public. 

Scaling Self-Driving Labs

Self-driving labs are largely automated facilities that allow robotic equipment to autonomously conduct scientific experiments with human supervision. They are well-suited to relatively simple, routine experiments—the exact kind involved in much of materials science. Recent advancements in robotics have been driven by a combination of cheaper hardware and enhanced AI models. While fully autonomous humanoid robots capable of automating arbitrary manual labor are likely years away, it is now possible to configure facilities to automate a broad range of scripted tasks. 

Many experiments in materials science involve making iterative tweaks to variables within the same broad experimental design. For example, a grad student might tweak the ratios of the elements that constitute the material, or change the temperature at which the elements are combined. These are highly automatable tasks. Furthermore, by allowing multiple experiments to be conducted in parallel, self-driving labs allow scientists to rapidly accelerate the pace at which they conduct their work. 

Creating a successful large-scale self-driving lab will require collaboration with private sector partners, particularly robot manufacturers and the creators of AI models for robotics. Fortunately, the United States has many such firms. Therefore, DOE should initiate a competitive bidding process for the robotic equipment that will be housed within its self-driving labs. Because DOE has experience in building lab facilities, it should directly oversee the construction of the self-driving lab itself. 

The United States already has several small-scale self-driving labs, primarily led by investments at DOE National Labs. The small size of these projects, however, makes it difficult to achieve the economies of scale that are necessary for self-driving labs to become an enduring part of America’s scientific ecosystem. 

AI creates additional opportunities to expand automated materials science. Frontier language and multi-modal models, such as OpenAI’s GPT-4o, Anthropic’s Claude 3.5, and Google’s Gemini family, have already been used to ideate scientific experiments, including directing a robotic lab in the fully autonomous synthesis of a known chemical compound. These models would not operate with full autonomy. Instead, scientists would direct the inquiry and the design of the experiment, with the models autonomously suggesting variables to tweak. 

Modern frontier models have substantial knowledge in all fields of science, and can hold all of the academic literature relevant to a specific niche of materials science within their active attention. This combination means that they have—when paired with a trained human—the scientific intuition to iteratively tweak an experimental design. They can also write the code necessary to direct the robots in the self-driving lab. Finally, they can write summaries of the experimental results—including the failures. This is crucial, because, given the constraints on their time, scientists today often only report their successes in published writing. Yet failures are just as important to document publicly to avoid other scientists duplicating their efforts. 

Once constructed, this self-driving lab infrastructure can be a resource made available as another DOE user facility to materials scientists across the country, much as DOE supercomputers are today. DOE already has a robust process and infrastructure in place to share in-demand resources among different scientists, again underscoring why the Department is well-positioned to lead this endeavor. 

Conclusion

Taken together, materials science faces a grand challenge, yet an even grander opportunity. Room-temperature, ambient-pressure superconductors—permitted by the laws of physics but as-yet undiscovered—could transform consumer electronics, clean energy, transportation, and even space travel. New forms of magnets could enable a wide range of cutting-edge technologies, such as nuclear fusion reactors. High-performance ceramics could improve reusable rockets and hypersonic aircraft. The opportunities are limitless. 

With a coordinated effort led by DOE, the federal government can demonstrate to Americans that scientific innovation and technological progress can still deliver profound improvements to daily life. It can pave the way for a new approach to science firmly rooted in modern technology, creating an example for other areas of science to follow. Perhaps most importantly, it can make Americans excited about the future—something that has been sorely lacking in American society in recent decades. 

AI is a radically transformative technology. Contemplating that transformation in the abstract almost inevitably leads to anxiety and fear. There are legislative proposals, white papers, speeches, blog posts, and tweets about using AI to positive ends. Yet merely talking about positive uses of AI is insufficient: the technology is ready, and the opportunities are there. Now is the time to act. 

This action-ready policy memo is part of Day One 2025 — our effort to bring forward bold policy ideas, grounded in science and evidence, that can tackle the country’s biggest challenges and bring us closer to the prosperous, equitable and safe future that we all hope for whoever takes office in 2025 and beyond.

Frequently Asked Questions
What are the misuse or safety risks associated with self-driving labs?

Compared to “cloud labs” for biology and chemistry, the risks associated with self-driving labs for materials science are low. In a cloud lab equipped with nucleic acid synthesis machines, for example, genetic sequences need to be screened carefully to ensure that they are not dangerous pathogens—a nontrivial task. There are not analogous risks for most materials science applications.


However, given the dual-use nature of many novel materials, any self-driving lab would need to have strong cybersecurity and intellectual property protections. Scientists using self-driving lab facilities would need to be carefully screened by DOE—fortunately, this is an infrastructure DOE possesses already for determining access to its supercomputing facilities.

What classes of materials would benefit most from automated synthesis and characterization?

Not all materials involve easily repeatable, and hence automatable, experiments for synthesis and characterization. But many important classes of materials do, including:



  • Thin films and coatings

  • Photonic and optoelectronic materials such as perovskites (used for solar panels)

  • Polymers and monomers

  • Battery and energy storage materials


Over time, additional classes of materials can be added.

Beyond Congressional funding, what additional resources can DOE draw on for this project?

DOE can and should be creative and resourceful in finding additional resources beyond public funding for this project. Collaborations on both foundation AI models and scaling self-driving labs between DOE and private sector AI firms can be uniquely facilitated by DOE’s new Foundation for Energy Security and Innovation (FESI), a private foundation created by DOE to support scientific fellowships, public-private partnerships, and other key mission-related initiatives.

Do foundation models for materials science currently exist?

Yes. Some private firms have recently demonstrated the promise. In late 2023, Google DeepMind unveiled GNoME, a materials science model that identified thousands of new potential materials (though they need to be experimentally validated). Microsoft’s GenMatter model pushed in a similar direction. Both models were developed in collaboration with DOE National Labs (Lawrence Berkeley in the case of DeepMind, and Pacific Northwest in the case of Microsoft).

America’s Teachers Innovate: A National Talent Surge for Teaching in the AI Era

Thanks to Melissa Moritz, Patricia Saenz-Armstrong, and Meghan Grady for their input on this memo.

Teaching our young children to be productive and engaged participants in our society and economy is, alongside national defense, the most essential job in our country. Yet the competitiveness and appeal of teaching in the United States has plummeted over the past decade. At least 55,000 teaching positions went unfilled this year, with long-term annual shortages set to double to 100,000 annually. Moreover, teachers have little confidence in their self-assessed ability to teach critical digital skills needed for an AI enabled future and in the profession at large. Efforts in economic peer countries such as Canada or China demonstrate that reversing this trend is feasible. The new Administration should announce a national talent surge to identify, scale, and recruit into innovative teacher preparation models, expand teacher leadership opportunities, and boost the profession’s prestige. “America’s Teachers Innovate” is an eight-part executive action plan to be coordinated by the White House Office of Science and Technology Policy (OSTP), with implementation support through GSA’s Challenge.Gov and accompanied by new competitive priorities in existing National Science Foundation (NSF), Department of Education (ED), Department of Labor (DoL), and Department of Defense education (DoDEA) programs. 

Challenge and Opportunity 

Artificial Intelligence may add an estimated $2.6 trillion to $4.4 trillion annually to the global economy. Yet, if the U.S. is not able to give its population the proper training to leverage these technologies effectively, the U.S. may witness a majority of this wealth flow to other countries over the next few decades while American workers are automated from, rather than empowered by, AI deployment within their sectors. The students who gain the digital, data, and AI foundations to work in tandem with these systems – currently only 5% of graduating high school students in the U.S. – will fare better in a modern job market than the majority who lack them. Among both countries and communities, the AI skills gap will supercharge existing digital divides and dramatically compound economic inequality. 

China, India, Germany, Canada, and the U.K. have all made investments to dramatically reshape the student experience for the world of AI and train teachers to educate a modern, digitally-prepared workforce. While the U.S. made early research & development investments in computer science and data science education through the National Science Foundation, we have no teacher workforce ready to implement these innovations in curriculum or educational technology. The number of individuals completing a teacher preparation program has fallen 25% over the past decade; long-term forecasts suggest at least 100,000 shortages annually, teachers themselves are discouraging others from joining their own profession (especially in STEM), and preparing to teach digital skills such as computer science was the least popular option for prospective educators to pursue. In 2022, even Harvard discontinued its Undergraduate Teacher Education Program completely, citing low interest and enrollment numbers. There is still consistent evidence that young people or even current professionals remain interested in teaching as a possible career, but only if we create the conditions to translate that interest into action. U.S. policymakers have a narrow window to leverage the strong interest in AI to energize the education workforce, and ensure our future graduates are globally competitive for the digital frontier. 

Plan of Action 

America’s teaching profession needs a coordinated national strategy to reverse decades of decline and concurrently reinvigorate the sector for a new (and digital) industrial revolution now moving at an exponential pace. Key levers for this work include expanding the number of leadership opportunities for educators; identifying and scaling successful evidence-based models such as UTeach, residency-based programs, or National Writing Project’s peer-to-peer training sites; scaling registered apprenticeship programs or Grow Your Own programs along with the nation’s largest teacher colleges; and leveraging the platform of the President to boost recognition and prestige of the teaching profession. 

The White House Office of Science and Technology Policy (OSTP) should coordinate a set of Executive Actions within the first 100 days of the next administration, including: 

Recommendation 1. Launch a Grand Challenge for AI-Era Teacher Preparation 

Create a national challenge via www.Challenge.Gov to identify the most innovative teacher recruitment, preparation, and training programs to prepare and retain educators for teaching in the era of AI. Challenge requirements should be minimal and flexible to encourage innovation, but could include the creation of teacher leadership opportunities, peer-network sites for professionals, and digital classroom resource exchanges. A challenge prompt could replicate the model of 100Kin10 or even leverage the existing network. 

Recommendation 2. Update Areas of National Need 

To enable existing scholarship programs to support AI readiness, the U.S. Department of Education should add “Artificial Intelligence,” “Data Science,” and “Machine Learning” to GAANN Areas of National Need under the Computer Science and Mathematics categories to expand eligibility for Masters-level scholarships for teachers to pursue additional study in these critical areas. The number of higher education programs in Data Science education has significantly increased in the past five years, with a small but increasing number of emerging Artificial Intelligence programs.  

Recommendation 3. Expand and Simplify Key Programs for Technology-Focused Training

The President should direct the U.S. Secretary of Education, the National Science Foundation Director, and the Department of Defense Education Activity Director to add “Artificial Intelligence, Data Science, Computer Science” as competitive priorities where appropriate for existing grant or support programs that directly influence the national direction of teacher training and preparation, including the Teacher Quality Partnerships (ED) program, SEED (ED), the Hawkins Program (ED), the STEM Corps (NSF), the Robert Noyce Scholarship Program (NSF), and the DoDEA Professional Learning Division, and the Apprenticeship Building America grants from the U.S. Department of Labor. These terms could be added under prior “STEM” competitive priorities, such as the STEM Education Acts of 2014 and 2015 for “Computer Science,”and framed under “Digital Frontier Technologies.” 

Additionally, the U.S. Department of Education should increase funding allocations for ESSA Evidence Tier-1 (“Demonstrates Rationale”), to expand the flexibility of existing grant programs to align with emerging technology proposals. As AI systems quickly update, few applicants have the opportunity to conduct rigorous evaluation studies or randomized control trials (RCTs) within the timespan of an ED grant program application window. 

Additionally, the National Science Foundation should relaunch the 2014 Application Burden Taskforce to identify the greatest barriers in NSF application processes, update digital review infrastructure, review or modernize application criteria to recognize present-day technology realities, and set a 2-year deadline for recommendations to be implemented agency-wide. This ensures earlier-stage projects and non-traditional applicants (e.g. nonprofits, local education agencies, individual schools) can realistically pursue NSF funding. Recommendations may include a “tiered” approach for requirements based on grant size or applying institution. 

Recommendation 4. Convene 100 Teacher Prep Programs for Action

The White House Office of Science & Technology Policy (OSTP) should host a national convening of nationally representative colleges of education and teacher preparation programs to 1) catalyze modernization efforts of program experiences and training content, and 2) develop recruitment strategies to revitalize interest in the teaching profession. A White House summit would help call attention to falling enrollment in teacher preparation programs; highlight innovative training models to recruit and retrain additional graduates; and create a deadline for states, districts, and private philanthropy to invest in teacher preparation programs. By leveraging the convening power of the White House, the Administration could make a profound impact on the teacher preparation ecosystem. 

The administration should also consider announcing additional incentives or planning grants for regional or state-level teams in 1) catalyzing K-12 educator Registered Apprenticeship Program (RAPs) applications to the Department of Labor and 2) enabling teacher preparation program modernization for incorporating introductory computer science, data science, artificial intelligence, cybersecurity, and other “digital frontier skills,” via the grant programs in Recommendation 3 or via expanded eligibility for the Higher Education Act.  

Recommendation 5. Launch a Digital “White House Data Science Fair”

Despite a bipartisan commitment to continue the annual White House Science Fair, the tradition ended in 2017. OSTP and the Committee on Science, Technology, and Math Education (Co-STEM) should resume the White House Science Fair and add a national “White House Data Science Fair,” a digital rendition of the Fair for the AI-era. K-12 and undergraduate student teams would have the opportunity to submit creative or customized applications of AI tools, machine-learning projects (similar to Kaggle competitions), applications of robotics, and data analysis projects centered on their own communities or global problems (climate change, global poverty, housing, etc.), under the mentorship of K-12 teachers. Similar to the original White House Science Fair, this recognition could draw from existing student competitions that have arisen over the past few years, including in Cleveland, Seattle, and nationally via AP Courses and out-of-school contexts. Partner Federal agencies should be encouraged to contribute their own educational resources and datasets through FC-STEM coordination, enabling students to work on a variety of topics across domains or interests (e.g. NASA, the U.S. Census, Bureau of Labor Statistics, etc.).

Recommendation 6. Announce a National Teacher Talent Surge at the State of Union

The President should launch a national teacher talent surge under the banner of “America’s Teachers Innovate,” a multi-agency communications campaign to reinvigorate the teaching profession and increase the number of teachers completing undergraduate or graduate degrees each year by 100,000. This announcement would follow the First 100 Days in office, allowing Recommendations 1-5 to be implemented and/or planned. The “America’s Teachers Innovate” campaign would include:

A national commitments campaign for investing in the future of American teaching, facilitated by the White House, involving State Education Agencies (SEAs) and Governors, the 100 largest school districts, industry, and philanthropy. Many U.S. education organizations are ready to take action. Commitments could include targeted scholarships to incentivize students to enter the profession, new grant programs for summer professional learning, and restructuring teacher payroll to become salaried annual jobs instead of nine-month compensation (see Discover Bank: “Surviving the Summer Paycheck Gap”).

Expansion of the Presidential Awards for Excellence in Mathematics and Science Teaching (PAMEST) program to include Data Science, Cybersecurity, AI, and other emerging technology areas, or a renaming of the program for wider eligibility across today’s STEM umbrella. Additionally, the PAMEST Award program should resume  in-person award ceremonies beyond existing press releases, which were discontinued during COVID disruptions and have not since been offered. Several national STEM organizations and teacher associations have requested these events to return.

Student loan relief through the Teacher Loan Forgiveness (TLF) program for teachers who commit to five or more years in the classroom. New research suggests the lifetime return of college for education majors is near zero, only above a degree in Fine Arts. The administration should add “computer science, data science, and artificial intelligence” to the subject list of “Highly Qualified Teacher” who receive $17,500 of loan forgiveness via executive order.

An annual recruitment drive at college campus job fairs, facilitated directly under the banner of the White House Office of Science & Technology Policy (OSTP), to help grow awareness on the aforementioned programs directly with undergraduate students at formative career choice-points.

Recommendation 7. Direct IES and BLS to Support Teacher Shortage Forecasting Infrastructure

The IES Commissioner and BLS Commissioner should 1) establish a special joint task-force to better link existing Federal data across agencies and enable cross-state collaboration on the teacher workforce, 2) support state capacity-building for interoperable teacher workforce data systems through competitive grant priorities in the State Longitudinal Data Systems (SLDS) at IES and the Apprenticeship Building America (ABA) Program (Category 1 grants), and 3) recommend a review criteria question for education workforce data & forecasting in future EDA Tech Hub phases. The vast majority of states don’t currently have adequate data systems in place to track total demand (teacher vacancies), likely supply (teachers completing preparation programs), and the status of retention/mobility (teachers leaving the profession or relocating) based on near- or real-time information. Creating estimates for this very brief was challenging and subject to uncertainty. Without this visibility into the nuances of teacher supply, demand, and retention, school systems cannot accurately forecast and strategically fill classrooms.

Recommendation 8. Direct the NSF to Expand Focus on Translating Evidence on AI Teaching to Schools and Districts.

The NSF Discovery Research PreK-12 Program Resource Center on Transformative Education Research and Translation (DRK-12 RC) program is intended to select intellectual partners as NSF seeks to enhance the overall influence and reach of the DRK-12 Program’s research and development investments. The DRK-12 RC program could be utilized to work with multi-sector constituencies to accelerate the identification and scaling of evidence-based practices for AI, data science, computer science, and other emerging tech fields. Currently, the program is anticipated to make only one single DRK-RC award; the program should be scaled to establish at least three centers: one for AI, integrated data science, and computer science, respectively, to ensure digitally-powered STEM education for all students. 

Conclusion 

China was #1 in the most recent Global Teacher Status Index, which measures the prestige, respect, and attractiveness of the teaching profession in a given country; meanwhile, the United States ranked just below Panama. The speed of AI means educational investments made by other countries have an exponential impact, and any misstep can place the United States far behind – if we aren’t already. Emerging digital threats from other major powers, increasing fluidity of talent and labor, and a remote-work economy makes our education system the primary lever to keep America competitive in a fast-changing global environment. The timing is ripe for a new Nation at Risk-level effort, if not an action on the scale of the original National Defense Education Act in 1958 or following the more recent America COMPETES Act. The next administration should take decisive action to rebuild our country’s teacher workforce and prepare our students for a future that may look very different from our current one.

This action-ready policy memo is part of Day One 2025 — our effort to bring forward bold policy ideas, grounded in science and evidence, that can tackle the country’s biggest challenges and bring us closer to the prosperous, equitable and safe future that we all hope for whoever takes office in 2025 and beyond.

This memo was developed in partnership with the Alliance for Learning Innovation, a coalition dedicated to advocating for building a better research and development infrastructure in education for the benefit of all students. Read more education R&D memos developed in partnership with ALI here.

Frequently Asked Questions
How many more teachers do we need?

Approximately 100,000 more per year. The U.S. has 3.2 million public school teachers and .5 million private school teachers (NCES, 2022). According to U.S. Department of Education data, 8% of public and 12% of private school teachers exit the profession each year (-316,000), a number that has remained relatively steady since 2012, while long-term estimates of re-entry continue to hover near 20% (+63,000). Unfortunately, the number of new teachers completing either traditional or alternative preparation programs has steadily declined over the past decade to 159,000+ per year. As a result of this gap, active vacancies continue to increase each year, and more than 270,000 educators are now cumulatively underqualified for their current roles, assumedly filling-in for absences caused by the widening gap. These predictions were made as early as 2016 (p. 2) and now have seemingly become a reality. Absent any changes, the total shortage of vacant or underqualified teaching positions could reach a total deficit between 700,000 and 1,000,000 by 2035.


The above shortage estimate assumes a base of 50,000 vacancies and 270,000 underqualified teachers as of the most recent available data, a flow of -94,000 net (entries – exits annually, including re-entrants) in 2023-2024. This range includes uncertainties for a slight (3%-5%) annual improvement in preparation from the status quo growth of alternative licensure pathways such as Grow your Own or apprenticeship programs through 2035. For exit rate, the most conservative estimates suggest a 5% exit rate, while the highest estimate at 50%; however, assembled state-level data suggests a 7.9% exit rate, similar to the NCES estimate (8%). Population forecasts for K-12 students (individuals aged 14-17) imply slight declines by 2035, based on U.S. Census estimates. Taken together, more optimistic assumptions result in a net cumulative shortage closer to -700,000 teachers, while worst-case scenario estimates may exceed -1,000,000.

Why not replace human teachers with AI tutors or digital lectures?

Early versions of AI-powered tutoring have significant promise but have not yet lived up to expectations. Automated tutors have resulted in frustrating experiences for users, led students to perform worse on tests than those who leveraged no outside support, and have yet to successfully integrate other school subject problem areas (such as mathematics). We should expect AI tools to improve over time and become more additive for learning specific concepts, including repetitive or generalizable tasks requiring frequent practice, such as sentence writing or paragraph structure, which has the potential to make classroom time more useful and higher-impact. However, AI will struggle to replace other critical classroom needs inherent to young and middle-aged children, including classroom behavioral management, social motivation to learn, mentorship relationships, facilitating collaboration between students for project-based learning, and improving quality of work beyond accuracy or pre-prompted, rubric-based scoring. Teachers consistently report student interest as a top barrier for continued learning, which digital curriculum and AI automation may provide effectively for a short-period, but cannot do for the full twelve-year duration of a students’ K-12 experience.

How much would the proposal cost?
Aside from Office of Science & Technology Policy (OSTP) staff time, the proposal would equate to the cost of 1) Recommendation #1’s Grand Challenge (estimated at $5 million), 2) Recommendation #6’s component for student loan relief (to be calculated by OMB), and 3) Recommendation #7’s increase of NSF’s CTERT program from 1 to 3 awards ($10 million).
What could Congress do to support this work?

These proposed executive actions complement a bi-partisan legislative proposal, “A National Training Program for AI-Ready Students,” which would invest in a national network of training sites for in-service teachers, provide grant dollars to support the expansion of teacher preparation programs, and help reset teacher payroll structure from 9-months to 12-months. Either proposal can be implemented independently from the other, but are stronger together.

Three Artificial Intelligence Bills Endorsed by Federation of American Scientists Advance from the House Committee

Proposed bills advance research ecosystems, economic development, and education access and move now to the U.S. House of Representatives for a vote

Washington, D.C. – September 12, 2024 – Three proposed artificial intelligence bills endorsed by the Federation of American Scientists (FAS), a nonpartisan science think tank, advance forward from a House Science, Space, and Technology Committee markup held on September 11th, 2024. These bills received bipartisan support and will now be reported to the full chamber. The three bills are: H.R. 9403, the Expanding AI Voices Act, co-sponsored by Rep. Vince Fong (CA-20) and Rep. Andrea Salinas (OR-06); H.R. 9197, the Small Business AI Act, co-sponsored by Rep. Mike Collins (GA-10) and Rep. Haley Stevens (MI-11), and H.R. 9403, the Expand AI Act, co-sponsored by Rep. Valerie Foushee (NC-04) and Rep. Frank Lucas (OK-03).

“FAS endorsed these bills based on the evaluation of their strengths. Among these are the development of infrastructure to develop AI safely and responsibly; the deployment of resources to ensure development benefits more equitably across our economy; and investment in the talent pool necessary for this consequential, emerging technology,” says Dan Correa, CEO of FAS.

“These three bills pave a vision for the equitable and safe use of AI in the U.S. Both the Expanding AI Voices Act and the NSF AI Education Act will create opportunities for underrepresented voices to have a say in how AI is developed and deployed. Additionally, the Small Business AI Act will ensure that an important sector of our society feels empowered to use AI safely and securely,” says Clara Langevin, FAS AI Policy Specialist. 

Expanding AI Voices Act

The Expanding AI Voices Act will support a broad and diverse interdisciplinary research community for the advancement of artificial intelligence and AI-powered innovation through partnerships and capacity building at certain institutions of higher education to expand AI capacity in populations historically underrepresented in STEM.

Specifically, the Expanding AI Voices Act of 2024 will:

Small Business AI Act

Emerging science is central to new and established small businesses, across industries and around the country. This bill will require the Director of the National Institute of Standards and Technology (NIST) to develop resources for small businesses in utilizing artificial intelligence, and for other purposes. 

National Science Foundation Artificial Intelligence Education Act of 2024 (NSF AI Education Act).


The National Artificial Intelligence Initiative Act of 2020 (15 U.S.C. 9451) will  bolster educational skills in AI through new learning initiatives and workforce training programs. Specifically, the bill will: 

###

ABOUT FAS

The Federation of American Scientists (FAS) works to advance progress on a broad suite of contemporary issues where science, technology, and innovation policy can deliver dramatic progress, and seeks to ensure that scientific and technical expertise have a seat at the policymaking table. Established in 1945 by scientists in response to the atomic bomb, FAS continues to work on behalf of a safer, more equitable, and more peaceful world. More information at fas.org.

GenAI in Education Research Accelerator (GenAiRA)

The United States faces a critical challenge in addressing the persistent learning opportunity gaps in math and reading, particularly among disadvantaged student subgroups. According to the 2022 National Assessment of Educational Progress (NAEP) data, only 37% of fourth-grade students performed at or above the proficient level in math, and 33% in reading. The rapid advancement of generative AI (GenAI) technologies presents an unprecedented opportunity to bridge these gaps by providing personalized learning experiences and targeted support. However, the current mismatch between the speed of GenAI innovation and the lengthy traditional research pathways hinders the thorough evaluation of these technologies before widespread adoption, potentially leading to unintended negative consequences.

Failure to adapt our research and regulatory processes to keep pace with the development of GenAI technologies could expose students to ineffective or harmful educational tools, exacerbate existing inequities, and hinder our ability to prepare all students for success in an increasingly complex and technology-driven world. The education sector must act with urgency to establish the necessary infrastructure, expertise, and collaborative partnerships to ensure that GenAI-powered tools are rigorously evaluated, continuously improved, and equitably implemented to benefit all students.

To address this challenge, we propose three key recommendations for congressional action:

  1. Establish the GenAI in Education Research Accelerator Program (GenAiRA) within the Institute of Education Sciences (IES) to support and expedite efficacy research on GenAI-powered educational tools.
  2. Adapt IES research and evaluation processes to create a framework for the rapid assessment of GenAI-enabled educational technology, including alternative research designs and evidence standards.
  3. Support the establishment of a GenAI Education Research and Innovation Consortium, bringing together schools, researchers, and education technology (EdTech) developers to participate in rapid cycle studies and continuous improvement of GenAI tools.

By implementing these recommendations, Congress can foster a more responsive and evidence-based ecosystem for GenAI-powered educational tools, ensuring that they are equitable, effective, and safe for all students. This comprehensive approach will help unlock the transformative potential of GenAI to address persistent learning opportunity gaps and improve outcomes for all learners, while maintaining scientific rigor and prioritizing student well-being.

During the preparation of this work, the authors used the tool Claude 3 Opus (by Anthropic) to help clarify and synthesize, and add accessible language around concepts and ideas generated by members of the team. The authors reviewed and edited the content as needed and take full responsibility for the content of this publication.

Challenge and Opportunity

Widening Learning Opportunity Gap 

NAEP data reveals that many U.S. students, especially those from disadvantaged subgroups, are not achieving proficiency in math and reading. In 2022, only 37% of fourth-graders performed at or above the NAEP proficient level in math, and 33% in reading—the lowest levels in over a decade. Disparities are more profound when disaggregated by race, ethnicity, and socioeconomic status; for example, only 17% of Black students and 21% of Hispanic students reached reading proficiency, compared to 42% of white students.

Rapid AI Evolution

GenAI is a transformative technology that enables rapid development and personalization of educational content and tools, addressing unmet needs in education such as lack of resources, 1:1 teaching time, and teacher quality. However, that rapid pace also raises concerns about premature adoption of unvetted tools, which could negatively impact students’ educational achievement. Unvetted GenAI tools may introduce misconceptions, provide incorrect guidance, or be misaligned with curriculum standards, leading to gaps in students’ understanding of foundational concepts. If used for an extended period, particularly with vulnerable learners, these tools could have a long-term impact on learning foundations that may be difficult to remedy.

On the other hand, carefully designed, trained, and vetted GenAI models that have undergone rapid cycle studies and design iterations based on data have the potential to effectively address students’ misconceptions, build solid learning foundations, and provide personalized, adaptive support to learners. These tools could accelerate progress and close learning opportunity gaps at an unprecedented scale.

Slow Vetting Processes 

The rapid pace of AI development poses significant challenges for traditional research and evaluation processes in education. Efficacy research, particularly studies sponsored by the IES or other Department of Education entities, is a lengthy, resource-intensive, and often onerous process that can take years to complete. Randomized controlled trials and longitudinal studies struggle to keep up with the speed of AI innovation: by the time a study is completed, the AI-powered tool may have already undergone multiple iterations or been replaced.

It can be difficult to recruit and sustain school and teacher participation in efficacy research due to the significant time and effort required from educators. Moreover, obtaining certifications and approvals for research can be complex and time-consuming, as researchers must navigate institutional review boards, data privacy regulations, and ethical guidelines, which can delay the start of a study by months or even years.

Many EdTech developers find themselves in a catch-22 situation, where their products are already being adopted by schools and educators, yet they are simultaneously expected to participate in lengthy and expensive research studies to prove efficacy. The time and resources required to engage in such research can be a significant burden for EdTech companies, especially start-ups and small businesses, which may prefer to focus on iterating and improving their products based on real-world feedback. As a result, many EdTech developers may be reluctant to participate in traditional efficacy research, further exacerbating the disconnect between the rapid pace of AI innovation and the slow process of evaluating the effectiveness of these tools in educational settings.

Gaps in Existing Efforts and Programs

While federal initiatives like SEERNet and ExpandAI have made strides in supporting AI and education research and development, they may not be fully equipped to address the specific challenges and opportunities presented by GenAI for several reasons:

Traditional approaches to efficacy research and evaluation may not be well-suited to evaluating the potential benefits and outcomes associated with GenAI-powered tools in the short term, particularly when assessing whether a program shows enough promise to warrant wider deployment with students. 

A New Approach 

To address these challenges and bridge the gap between GenAI innovation and efficacy research, we need a new approach to streamline the research process, reduce the burden on educators and schools, and provide timely and actionable insights into the effectiveness of GenAI-powered tools. This may involve alternative study designs, such as rapid cycle evaluations or single-case research, and developing new incentive structures and support systems to encourage and facilitate the participation of teachers, schools, and product developers in research studies.

GenAiRA aims to tackle these challenges by providing resources, guidance, and infrastructure to support more agile and responsive efficacy research in the education sciences. By fostering collaboration among researchers, developers, and educators, and promoting innovative approaches to evaluation, this program can help ensure that the development and adoption of AI-powered tools in education are guided by rigorous, timely, and actionable evidence—while simultaneously mitigating risks to students.

Learning from Other Sectors 

Valuable lessons can be drawn from other fields that have faced similar balancing acts between innovation, research, and safety. Two notable examples are the U.S. Food and Drug Administration’s (FDA) expedited review pathways for drug development and the National Institutes of Health’s (NIH) Clinical and Translational Science Awards (CTSA) program for accelerating medical research.

Example 1: The FDA Model

The FDA’s expedited review programs, such as Fast Track, Breakthrough Therapy, Accelerated Approval, and Priority Review, are designed to speed up the development and approval of drugs that address unmet medical needs or provide significant improvements over existing treatments. These pathways recognize that, in certain cases, the benefits of bringing a potentially life-saving drug to market quickly may outweigh the risks associated with a more limited evidence base at the time of approval.

Key features include:

  1. Early and frequent communication between the FDA and drug developers to provide guidance and feedback throughout the development process.
  2. Flexibility in clinical trial design and evidence requirements, such as allowing the use of surrogate endpoints or single-arm studies in certain cases.
  3. Rolling review of application materials, allowing drug developers to submit portions of their application as they become available rather than waiting for the entire package to be complete.
  4. Shortened review timelines, with the FDA committing to reviewing and making a decision on an application within a specified timeframe (e.g., six months for Priority Review).

These features can accelerate the development and approval process while still ensuring that drugs meet standards for safety and effectiveness. They also acknowledge that the evidence base for a drug may evolve over time, with post-approval studies and monitoring playing a crucial role in confirming the drug’s benefits and identifying any rare or long-term side effects.

Example 2: The CTSA Program

The NIH’s CTSA program established a national network of academic medical centers, research institutions, and community partners to accelerate the translation of research findings into clinical practice and improve patient outcomes.

Key features include:

  1. Collaborative research infrastructure, consisting of a network of institutions and partners that work together to conduct translational research, share resources and expertise, and disseminate best practices.
  2. Streamlined research processes with standardized protocols, templates, and tools to facilitate the rapid design, approval, and implementation of research studies across the network.
  3. Training and development of researchers and clinicians to build a workforce equipped to conduct innovative and rigorous translational research.
  4. Community engagement in the research process to ensure that studies are responsive to real-world needs and priorities.

By learning from the successes and principles of the FDA’s expedited review pathways and the NIH’s CTSA program, the education sector can develop its own innovative approach to accelerating the responsible development, evaluation, and deployment of GenAI-powered tools, as outlined in the following plan of action.

Plan of Action

To address the challenges and opportunities presented by GenAI in education, we propose the following three key recommendations for congressional action and the evolution of existing programs.

Recommendation 1. Establish the GenAI in Education Research Accelerator Program (GenAiRA).

Congress should establish the GenAiRA, housed in the IES, to support and expedite efficacy research on products and tools utilizing AI-powered educational tools and programs. This program will:

  1. Provide funding and resources to researchers and educators to conduct rigorous, timely, and cost-effective efficacy studies on promising AI-based solutions that address achievement gaps.
  2. Create guidelines and offer webinars and technical assistance to researchers, educators, and developers to build expertise in the responsible design, implementation, and evaluation of GenAI-powered tools in education.
  3. Foster collaboration and knowledge-sharing among researchers, educators, and GenAI developers to facilitate the rapid translation of research findings into practice and continuously improve GenAI-powered tools.
  4. Develop and disseminate best practices, guidelines, and ethical frameworks for responsible development and deployment of GenAI-enabled educational technology tools in educational settings, focusing on addressing bias, accuracy, privacy, and student agency issues.

Recommendation 2. Under the auspices of GenAiRA, adapt IES research and evaluation processes to create a framework to evaluate GenAI-enabled educational technology.

In consultation with experts in educational research and AI, IES will develop a framework that:

  1. Identifies existing research designs and creates alternative research designs (e.g., quasi-experimental studies, rapid short evaluations) suitable for generating credible evidence of effectiveness while being more responsive to the rapid pace of AI innovation. 
  2. Establish evidence-quality guidelines for rapid evaluation, including minimum sample sizes, study duration, effect size, and targeted population.
  3. Funds replication studies and expansion studies to determine impact in different contexts or with different populations (e.g., students with IEPs and English learners).
  4. Provides guidance to districts on how to interpret and apply evidence from different types of studies to inform decision-making around adopting and using AI technologies in education.   

Recommendation 3. Establish a GenAI Education Research and Innovation Consortium.

Congress should provide funding and incentives for IES to establish a GenAI Education Research and Innovation Consortium that brings together a network of “innovation schools,” research institutions, and EdTech developers committed to participating in rapid cycle studies and continuous improvement of GenAI tools in education. This approach will ensure that AI tools are developed and implemented in a way that is responsive to the needs and values of educators, students, and communities.

To support this consortium, Congress should:

  1. Allocate funds for the IES to provide grants and resources to schools, research institutions, and EdTech developers that meet established criteria for participation in the consortium, such as demonstrated commitment to innovation, research capacity, and ethical standards.
  2. Direct IES to work with programs like SEERNet and ExpandAI to identify and match potential consortium members, provide guidance and oversight to ensure that research studies meet rigorous standards for quality and ethics, and disseminate findings and best practices to the broader education community.
  3. Encourage the development of standardized protocols and templates for data sharing, privacy protection, and informed consent within the consortium, to reduce the time and effort required for each individual study and streamline administrative processes.
  4. Incentivize participation in the consortium by offering resources and support for schools, researchers, and developers, such as access to funding opportunities, technical assistance, and professional development resources.
  5.  Require the establishment of a central repository of research findings and best practices generated through rapid cycle evaluations conducted within the consortium, to facilitate the broader dissemination and adoption of effective GenAI-powered tools.

Conclusion 

Persistent learning opportunity gaps in math and reading, particularly among disadvantaged students, are a systemic challenge requiring innovative solutions. GenAI-powered educational tools offer potential for personalizing learning, identifying misconceptions, and providing tailored support. However, the mismatch between the pace of GenAI innovation and lengthy traditional research pathways impedes thorough vetting of these technologies to ensure they are equitable, effective, and safe before widespread adoption.

GenAiRA and development of alternative research frameworks provide a comprehensive approach to bridge the divide between GenAI’s rapid progress and the need for thorough evaluation in education. Leveraging existing partnerships, research infrastructure, and data sources can expedite the research process while maintaining scientific rigor and prioritizing student well-being.

The plan of action creates a roadmap for responsibly harnessing GenAI’s potential in education. Identifying appropriate congressional mechanisms for establishing the accelerator program, such as creating a new bill or incorporating language into upcoming legislation, can ensure this critical initiative receives necessary funding and oversight.

This comprehensive strategy charts a path toward equitable, personalized learning facilitated by GenAI while upholding the highest standards of evidence. Aligning GenAI innovation with rigorous research and prioritizing the needs of underserved student populations can unlock the transformative potential of these technologies to address persistent achievement gaps and improve outcomes for all learners.

This idea is part of our AI Legislation Policy Sprint. To see all of the policy ideas spanning innovation, education, healthcare, and trust, safety, and privacy, head to our sprint landing page.

Frequently Asked Questions
What makes AI and GenAI-powered educational tools different from traditional educational technologies?
AI and GenAI-powered educational tools differ from traditional educational technologies in their speed of development and deployment, as AI-generated content can be created and deployed extremely quickly, often with little time taken for thorough testing and evaluation. Additionally, AI-powered tools can generate content dynamically based on user inputs and interactions, meaning that the content presented to each student may be different every time, making it inherently more time-consuming to test and evaluate compared to fixed, pre-written content. Also, the ability of AI-powered tools to rapidly generate and disseminate educational content at scale means that any issues or flaws in the technology can have far-reaching consequences, potentially impacting large numbers of students across multiple schools and districts.
How do gaps in early grades impact students’ long-term educational outcomes and opportunities?
Students who fall behind in math and reading in the early years are more likely to struggle academically in later grades, leading to lower graduation rates, reduced college enrollment, and limited career opportunities.
What are some of the limitations of current educational interventions in addressing these learning opportunity gaps?
Current educational interventions often take a one-size-fits-all approach, failing to address the unique learning needs of individual students. They may also lack the ability to provide immediate feedback and adapt instruction in real-time based on student performance.
How has the rapid advancement of AI and GenAI technologies created new opportunities for personalized learning and targeted support?
Advancements such as machine learning and natural language processing have enabled the development of educational tools that can analyze vast amounts of student data, identify patterns in learning behavior, and provide customized recommendations and support. Personalization can include recommendations for what topics to learn and when, but also adjustments to finer details like amount and types of feedback and support provided. Further, content can be adjusted to make it more accessible to students, both from a language standpoint (dynamic translation) and a cultural one (culturally relevant contexts and characters). In the past, these types of adjustments were not feasible due to the labor involved in building them. With GenAI, this level of personalization will become commonplace and expected.
What are the potential risks or unintended consequences of implementing AI-powered educational tools without sufficient evidence of their effectiveness or safety?

Implementing AI and GenAI-powered educational tools without sufficient evidence of their effectiveness or safety could lead to the widespread use of ineffective interventions. If these tools fail to improve student outcomes or even hinder learning progress, they can have long-lasting negative consequences for students’ academic attainment and self-perception as learners.


When students are exposed to ineffective educational tools, they may struggle to grasp key concepts, leading to gaps in their knowledge and skills. Over time, these gaps can compound, leaving students ill-prepared for future learning challenges and limiting their academic and career opportunities. Moreover, repeated experiences of frustration and failure with educational technologies can erode students’ confidence, motivation, and engagement with learning.


This erosion of learner identity can be particularly damaging for students from disadvantaged backgrounds, who may already face additional barriers to academic success. If AI-powered tools fail to provide effective support and personalization, these students may fall even further behind their peers, exacerbating existing educational inequities.

How can we ensure that AI and GenAI-powered educational tools are developed and implemented in an equitable manner, benefiting all students, especially those from disadvantaged backgrounds?
By prioritizing research and funding for interventions that target the unique needs of disadvantaged student populations. We must also engage diverse stakeholders, including educators, parents, and community members, in the design and evaluation process to ensure that these tools are culturally responsive and address the specific challenges faced by different communities.
How can educators, parents, and policymakers stay informed about the latest developments in AI-powered educational tools and make informed decisions about their adoption and use?
Educators, parents, and policymakers can stay informed by engaging with resources, guidance and programs developed by organizations like the Office of Educational Technology, Institute of Education Sciences, EDSAFE AI Alliance and others on the opportunities and risks of AI/GenAI in education.

A Safe Harbor for AI Researchers: Promoting Safety and Trustworthiness Through Good-Faith Research

Artificial intelligence (AI) companies disincentivize safety research by implicitly threatening to ban independent researchers that demonstrate safety flaws in their systems. While Congress encourages companies to provide bug bounties and protections for security research, this is not yet the case for AI safety research. Without independent research, we do not know if the AI systems that are being deployed today are safe or if they pose widespread risks that have yet to be discovered, including risks to U.S. national security. While companies conduct adversarial testing in advance of deploying generative AI models, they fail to adequately test their models after they are deployed as part of an evolving product or service. Therefore, Congress should promote the safety and trustworthiness of AI systems by establishing bug bounties for AI safety via the Chief Digital and Artificial Intelligence Office and creating a safe harbor for research on generative AI platforms as part of the Platform Accountability and Transparency Act.

Challenge and Opportunity 

In July 2023, the world’s top AI companies signed voluntary commitments at the White House, pledging to “incent third-party discovery and reporting of issues and vulnerabilities.” Almost a year later, few of the signatories have lived up to this commitment. While some companies do reward researchers for finding security flaws in their AI systems, few companies strongly encourage research on safety or provide concrete protections for good-faith research practices. Instead, leading generative AI companies’ Terms of Service legally prohibit safety and trustworthiness research, in effect threatening anyone who conducts such research with bans from their platforms or even legal action.

In March 2024, over 350 leading AI researchers and advocates signed an open letter calling for “a safe harbor for independent AI evaluation.” The researchers noted that generative AI companies offer no legal protections for independent safety researchers, even though this research is critical to identifying safety issues in AI models and systems. The letter stated: “whereas security research on traditional software has established voluntary protections from companies (‘safe harbors’), clear norms from vulnerability disclosure policies, and legal protections from the DOJ, trustworthiness and safety research on AI systems has few such protections.” 

In the months since the letter was released, companies have continued to be opaque about key aspects of their most powerful AI systems, such as the data used to build their models. If a researcher wants to test whether AI systems like ChatGPT, Claude, or Gemini can be jailbroken such that they pose a threat to U.S. national security, they are not allowed to do so as companies proscribe such research. Developers of generative AI models tout the safety of their systems based on internal red-teaming, but there is no way for the federal government or independent researchers to validate these results, as companies do not release reproducible evaluations. 

Generative AI companies also impose barriers on their platforms that limit good-faith research. Unlike much of the web, the content on generative AI platforms is not publicly available, meaning that users need accounts to access AI-generated content and these accounts can be restricted by the company that owns the platform. In addition, companies like Google, Amazon, Microsoft, and OpenAI block certain requests that users might make of their AI models and limit the functionality of their models to prevent researchers from unearthing issues related to safety or trustworthiness.

Similar issues plague social media, as companies take steps to prevent researchers and journalists from conducting investigations on their platforms. Social media researchers face liability under the Computer Fraud and Abuse Act and Section 1201 of the Digital Millennium Copyright Act among other laws, which has had a chilling effect on such research and worsened the spread of misinformation online. The stakes are even higher for AI, which has the potential not only to turbocharge misinformation but also to provide U.S. adversaries like China and Russia with material strategic advantages. While legislation like the Platform Accountability and Transparency Act would enable research on recommendation algorithms, proposals that grant researchers access to platform data do not consider generative AI platforms to be in scope.

Congress can safeguard U.S. national security by promoting independent AI safety research. Conducting pre-deployment risk assessments is insufficient in a world where tens of millions of Americans are using generative AI—we need real-time assessments of the risks posed by AI systems after they are deployed as well. Big Tech should not be taken at its word when it says that its AI systems cannot be used by malicious actors to generate malware or spy on Americans. The best way to ensure the safety of generative AI systems is to empower the thousands of cutting-edge researchers at U.S. universities who are eager to stress test these systems. Especially for general-purpose technologies, small corporate safety teams are not sufficient to evaluate the full range of potential risks, whereas the independent research community can do so thoroughly.

Figure 1. What access protections do AI companies provide for independent safety research? Source: Longpre et al., “A Safe Harbor for AI Evaluation and Red Teaming.

Plan of Action

Congress should enable independent AI safety and trustworthiness researchers by adopting two new policies. First, Congress should incentivize AI safety research by creating algorithmic bug bounties for this kind of work. AI companies often do not incentivize research that could reveal safety flaws in their systems, even though the government will be a major client for these systems. Even small incentives can go a long way, as there are thousands of AI researchers capable of demonstrating such flaws. This would also entail establishing mechanisms through which safety flaws or vulnerabilities in AI models can be disclosed, or a kind of help-line for AI systems.

Second, Congress should require AI platform companies, such as Google, Amazon, Microsoft, and OpenAI to share data with researchers regarding their AI systems. As with social media platforms, generative AI platforms mediate the behavior of millions of people through the algorithms they produce and the decisions they enable. Companies that operate application programming interfaces used by tens of thousands of enterprises should share basic information about their platforms with researchers to facilitate external oversight of these consequential technologies. 

Taken together, vulnerability disclosure incentivized through algorithmic bug bounties and protections for researchers enabled by safe harbors would substantially improve the safety and trustworthiness of generative AI systems. Congress should prioritize mitigating the risks of generative AI systems and protecting the researchers who expose them.

Recommendation 1. Establish algorithmic bug bounties for AI safety.

As part of the FY2024 National Defense Authorization Act (NDAA), Congress established “Artificial Intelligence Bug Bounty Programs” requiring that within 180 days “the Chief Digital and Artificial Intelligence Officer of the Department of Defense shall develop a bug bounty program for foundational artificial intelligence models being integrated into the missions and operations of the Department of Defense.” However, these bug bounties extend only to security vulnerabilities. In the FY2025 NDAA, this bug bounty program should be expanded to include AI safety. See below for draft legislative language to this effect. 

Recommendation 2. Create legal protections for AI researchers.

Section 9 of the proposed Platform Accountability and Transparency Act (PATA) would establish a “safe harbor for research on social media platforms.” This likely excludes major generative AI platforms such as Google Cloud, Amazon Web Services, Microsoft Azure, and OpenAI’s API, meaning that researchers have no legal protections when conducting safety research on generative AI models via these platforms. PATA and other legislative proposals related to AI should incorporate a safe harbor for research on generative AI platforms.

Conclusion

The need for independent AI evaluation has garnered significant support from academics, journalists, and civil society. Safe harbor for AI safety and trustworthiness researchers is a minimum fundamental protection against the risks posed by generative AI systems, including related to national security. Congress has an important opportunity to act before it’s too late.

This idea is part of our AI Legislation Policy Sprint. To see all of the policy ideas spanning innovation, education, healthcare, and trust, safety, and privacy, head to our sprint landing page.

Frequently Asked Questions
Do companies support this idea?
Some companies are supportive of this idea, but many legal teams are risk averse, especially when there is no legal obligation to offer safe harbor. Multiple companies have indicated they will not change their policies and practices until the government compels them to do so.
Wouldn’t allowing for more safety testing come with safety risks?
Safety testing does not entail additional safety risks. In the absence of widespread safety testing, these flaws will still be found by foreign adversaries, but we would not know that these flaws existed in the first place. Security through obscurity has long been disproven. Furthermore, safe harbors only protect research that is conducted according to strict rules regarding what constitutes good-faith research.
What federal agencies have relevant authorities here?
The National Institute of Standards and Technology (NIST), the Federal Trade Commission (FTC), and the National Science Foundation (NSF) are among the most important federal entities in this area. Under President Biden’s AI executive order, NIST is responsible for drafting guidance on red teaming among other issues, which could include protections for independent researchers. FTC has jurisdiction over competition and consumer protection issues related to generative AI, both of which relate to researcher access. NSF has launched the National AI Research Resource Pilot, which can help scale up researcher access as AI companies provide compute credits via the pilot.
How does this intersect with the Copyright Office’s triennial Section 1201 DMCA proceeding?

The authors of this memorandum as well as the academic paper underlying it submitted a comment to the Copyright Office in support of an exemption to DMCA for AI safety and trustworthiness research. The Computer Crime and Intellectual Property Section of the U.S. Department of Justice’s Criminal Division and Senator Mark Warner have also endorsed such an exemption. However, a DMCA exemption regarding research on AI bias, trustworthiness, and safety alone would not be sufficient to assuage the concerns of AI researchers, as they may still face liability under other statutes such as the Computer Fraud and Abuse Act.

Are researchers really limited by what AI companies are doing? I see lots of academic research on these topics.

Much of this research is currently conducted by research labs with direct connections to the AI companies they are assessing. Researchers who are less well connected, of which there are thousands, may be unwilling to take the legal or personal risk of violating companies’ Terms of Service. See our academic paper on this topic for further details on this and other questions.

How might language from the FY2024 NDAA be adapted to bug bounties for AI safety?

See draft legislative language below, building on Sec. 1542 of the FY2024 NDAA:


SEC. X. EXPANSION OF ARTIFICIAL INTELLIGENCE BUG BOUNTY PROGRAMS.


(a) Update to Program for Foundational Artificial Intelligence Products Being Integrated Within Department of Defense.—


(1) Development required.—Not later than 180 days after the date of the enactment of this Act and subject to the availability of appropriations, the Chief Digital and Artificial Intelligence Officer of the Department of Defense shall expand its bug bounty program for foundational artificial intelligence models being integrated into the missions and operations of the Department of Defense to include unsafe model behaviors in addition to security vulnerabilities.


(2) Collaboration.—In expanding the program under paragraph (1), the Chief Digital and Artificial Intelligence Officer may collaborate with the heads of other Federal departments and agencies with expertise in cybersecurity and artificial intelligence.


(3) Implementation authorized.—The Chief Digital and Artificial Intelligence Officer may carry out the program In subsection (a).


(4) Contracts.—The Secretary of Defense shall ensure, as may be appropriate, that whenever the Secretary enters into any contract, such contract allows for participation in the bug bounty program under paragraph (1).


(5) Rule of construction.—Nothing in this subsection shall be construed to require—


(A) the use of any foundational artificial intelligence model; or


(B) the implementation of the program developed under paragraph (1) for the purpose of the integration of a foundational artificial intelligence model into the missions or operations of the Department of Defense.

Update COPPA 2.0 to Strengthen Children’s Online Voice Privacy in the AI Era

Emerging technologies like artificial intelligence (AI) are changing the way humans interact with machines. As AI technology has made huge progress over the last decade, the processing of modalities such as text, voice, image, and video data has been replaced with data-driven large AI models. These models were primarily aimed for machines to comprehend various data and perform tasks without human intervention. Now, with the emergence of generative AI like ChatGPT, these models are capable of generating data such as text, voice, image, or video. Policymakers across the globe are struggling to draft to govern ethical use of data as well as regulate the creation of safe, secure, and trustworthy AI models. 

Data privacy is a major concern with the advent of AI technology. Actions by the US Congress such as the proposed American Privacy Rights Act aim to enforce strict data privacy rights. With emerging AI applications for children, the privacy of children and the safekeeping of their personal information is also a legislative challenge. 

Congress must act to protect children’s voice privacy before it’s too late. Companies that store children’s voice recordings and use them for profit-driven applications (or advertising) without parental consent pose serious privacy threats to children and families. The proposed revisions to the Children’s Online Privacy Protection Act (COPPA) aim to restrict companies’ capacity to profit from children’s data and transfer the responsibility of compliance from parents to companies. However, several measures in the proposed legislation need more clarity and additional guidelines.

Challenge and Opportunity 

Human voice1 is one of the most popular modalities for AI technology. Advancements in voice AI technology such as voice AI assistants (Siri, Google, Bixby, Alexa, etc.) in smartphones have made many day-to-day activities easier; however, there are also emerging threats from voice AI and a lack of regulations governing voice data and voice AI applications. One example is AI voice impersonation scams. Using the latest voice AI technology,2 a high-quality personalized voice recording can be generated with as little as 15 seconds of the speaker’s recorded voice. A technology rat race among Big Tech has begun, as companies are trying to achieve this using voice recordings that are less than a few seconds. Scammers have increasingly been using this technology for their benefit. OpenAI, the creator of ChatGPT, recently developed a product called Voice Engine—but refrained from commercializing it by acknowledging that this technology poses “serious risks,” especially in an election year. 

A voice recording contains very personal information about a speaker, and that gives the ability to identify a target speaker from recordings of multiple speakers. Emerging research in voice AI technology has potential implications for medical and health-related applications from voice recordings, plus identification of age, height, and much more. When using cloud-based applications, privacy concerns also arise during voice data transfer and from data storage leaks, due to noncompliance with data collection and storage. Therefore, the threats from misuse of voice data and voice AI technology are enormous.

Social media services, educational technology, online games, and smart toys are just a few services for children that have started adopting voice technology (e.g., Alexa for Kids). Any service operator (or company) collecting and using children’s personal information, including their voice, is bound by the Children’s Online Privacy Protection Act (COPPA). The Federal Trade Commission (FTC) is the enforcing federal agency for COPPA. However, several companies have recently violated COPPA by collecting personal information from children without parental consent and used it for advertising and maximizing their platform profits. “Amazon’s history of misleading parents, keeping children’s recordings indefinitely, and flouting parents’ deletion requests violated COPPA and sacrificed privacy for profits,” said Samuel Levine of the FTC’s Bureau of Consumer Protection. The FTC alleges that Amazon maintained records of children’s data, disregarding parents’ deletion requests, and trained its Voice AI algorithms on that data.

Children’s spoken characteristics are different from those of adults; thus, developing voice AI technology for children is more challenging. Most commercial voice-AI-enabled services work smoothly for adults, but their accuracy in understanding children’s voices is often limited. Another challenge is the relatively sparse availability of children’s voice data to train AI models. Therefore, Big Tech is looking to leverage ways to acquire as much children’s voice data as possible to train AI voice models. This challenge is prevalent not only in industry but also in academic research on the subject due to very limited data availability and varying spoken skills. However, misuse of acquired data, especially without consent, is not a solution, and operators must be penalized for such actions. 

Considering the recent violations of COPPA by operators, and with a goal to strengthen the compliance of safeguarding and avoid misuse of personal information such as voice, Congress is updating COPPA with new legislation. The COPPA updates propose to extend and update the definition of “operator,” “personal information” including voice prints, “consent,” “website/service/application” including devices connected to the internet, and guidelines for “collection, use, disclosure, and deletion of personal information.” These updates are especially critical when the personal information of users (or consumers) can serve as valuable data for operators for profit-driven applications and misuse without any federal regulation. The FTC acknowledges that the current version of COPPA is insufficient; therefore, these updates would also enable the FTC to act on operators and take strict action. 

Plan of Action 

The Children and Teens’ Online Privacy Protection Act (COPPA 2.0) has been proposed in both the Senate and House to update COPPA for the modern internet age, with a renewed focus on limiting misuse of children’s personal data (including voice recordings). This proposed legislation has gained momentum and bipartisan support. However, the text in this legislation could still be updated to ensure consumer privacy and support future innovation.

Recommendation 1. Clarify the exclusion clause for audio files. 

An exclusion clause has been added in this legislation particularly for audio files containing a child’s voice, declaring that the collected audio file is not considered personal information if it meets certain criteria. This was added to adopt a more expansive audio file exception, particularly to allow operators to provide some features to their users (or consumers).  

While just having the text “only uses the voice within the audio file solely as a replacement for written words”3 might be overly restrictive for voice-based applications, the text “to perform a task” might open the use of audio files for any task that could be beneficial to operators. The task should only be related to performing a request or providing a service to the user, and that needs to be clarified in the text. Potential misuse of this text could be (1) to train AI models for tasks that might help operators provide a service to the user—especially for personalization, or (2) to extract and store “audio features”4 (most voice AI models are trained using audio features instead of the raw audio itself). Operators might argue that extracting audio features is necessary as part of the algorithm that assists in providing a service to the user.  Therefore, the phrasing “to perform a task” in this exclusion might be open-ended and should be modified as suggested: 

Current text: “(iii) only uses the voice within the audio file solely as a replacement for written words, to perform a task, or engage with a website, online service, online application, or mobile application, such as to perform a search or fulfill a verbal instruction or request; and”

Suggestion text: “(iii) only uses the voice within the audio file solely as a replacement for written words, to only perform a task to engage with a website, online service, online application, or mobile application, such as to perform a search or fulfill a verbal instruction or request; and” 

On a similar note, legislators should consider adding the term “audio features.” Audio features are enough to train voice AI models and develop any voice-related application, even if the original audio file is deleted. Therefore, the deletion argument in the exclusion clause should be modified as suggested: 

Current text: “(iv) only maintains the audio file long enough to complete the stated purpose and then immediately deletes the audio file and does not make any other use of the audio file prior to deletion.”

Suggestion text: “(iv) only maintains the audio file long enough to complete the stated purpose and then immediately deletes the audio file and any extracted audio-based features and does not make any other use of the audio file (or extracted audio-features) prior to deletion.

Adding more clarity to the exclusion will help avoid misuse of children’s voices for any task that companies might still find beneficial and also ensure that operators delete all forms of the audio which could be used to train AI models. 

Recommendation 2. Add guidelines on the deidentification of audio files to enhance innovation. 

A deidentified audio file is one that cannot be used to identify the speaker whose voice is recorded in that file. The legislative text of COPPA 2.0 does not mention or have any guidelines on how to deidentify an audio file. These guidelines would not only protect the privacy of users but also allow operators to use deidentified audio files to add features and improve their products. The guidelines could include steps to be followed by operators as well as additional commitment from operators. 

The steps include: 

The commitments include: 

Following these guidelines might be expensive for operators; however, it is crucial to take as many precautions as possible. Current deidentification steps of audio files followed by operators are not sufficient, and there have been numerous instances in which anonymized data had been reidentified, according to a statement released by a group of State Attorneys General. These proposed guidelines could allow operators to deidentify audio files and use those files for product development. This will allow the innovation of voice AI technology for children to flourish. 

Recommendation 3. Add AI-generated avatars in the definition of personal information.

With the emerging applications of generative AI and growing virtual reality use for education (in classrooms) and for leisure (in online games), “AI-based avatar generation from a child’s image, audio, or video” should be added to the legislative definition of “personal information.” Virtual reality is a growing space, and digital representations of the human user (an avatar) are increasingly used to allow the user to see and interact with virtual reality environments and other users. 

Conclusion 

As new applications of AI emerge, operators must ensure compliance in the collection and use of consumers’ personal information and safety in the design of their products using that data, especially when dealing with vulnerable populations like children. Since the original passage of COPPA in 1998, how consumers use online services for day-to-day activities, including educational technology and amusement for children, has changed dramatically. This ever-changing scope and reach of online services require strong legislative action to bring online privacy standards into the 21st century.  Without a doubt, COPPA 2.0 will lead this regulatory drive not only to protect children’s personal information collected by online services and operators from misuse but also to ensure that the burden of compliance rests on the operators rather than on parents. These recommendations will help strengthen the protections of COPPA 2.0 even further while leaving open avenues for innovation in voice AI technology for children.

This idea is part of our AI Legislation Policy Sprint. To see all of the policy ideas spanning innovation, education, healthcare, and trust, safety, and privacy, head to our sprint landing page.