Automating Scientific Discovery: A Research Agenda for Advancing Self-Driving Labs

Despite significant advances in scientific tools and methods, the traditional, labor-intensive model of scientific research in materials discovery has seen little innovation. The reliance on highly skilled but underpaid graduate students as labor to run experiments hinders the labor productivity of our scientific ecosystem. An emerging technology platform known as Self-Driving Labs (SDLs), which use commoditized robotics and artificial intelligence for automated experimentation, presents a potential solution to these challenges.

SDLs are not just theoretical constructs but have already been implemented at small scales in a few labs. An ARPA-E-funded Grand Challenge could drive funding, innovation, and development of SDLs, accelerating their integration into the scientific process. A Focused Research Organization (FRO) can also help create more modular and open-source components for SDLs and can be funded by philanthropies or the Department of Energy’s (DOE) new foundation. With additional funding, DOE national labs can also establish user facilities for scientists across the country to gain more experience working with autonomous scientific discovery platforms. In an era of strategic competition, funding emerging technology platforms like SDLs is all the more important to help the United States maintain its lead in materials innovation.

Challenge and Opportunity

New scientific ideas are critical for technological progress. These ideas often form the seed insight to creating new technologies: lighter cars that are more energy efficient, stronger submarines to support national security, and more efficient clean energy like solar panels and offshore wind. While the past several centuries have seen incredible progress in scientific understanding, the fundamental labor structure of how we do science has not changed. Our microscopes have become far more sophisticated, yet the actual synthesizing and testing of new materials is still laboriously done in university laboratories by highly knowledgeable graduate students. The lack of innovation in how we historically use scientific labor pools may account for stagnation of research labor productivity, a primary cause of concerns about the slowing of scientific progress. Indeed, analysis of scientific literature suggests that scientific papers are becoming less disruptive over time and that new ideas are getting harder to find. The slowing rate of new scientific ideas, particularly in the discovery of new materials or advances in materials efficiency, poses a substantial risk, potentially costing billions of dollars in economic value and jeopardizing global competitiveness. However, incredible advances in artificial intelligence (AI) coupled with the rise of cheap but robust robot arms are leading to a promising new paradigm of material discovery and innovation: Self-Driving Labs. An SDL is a platform where material synthesis and characterization is done by robots, with AI models intelligently selecting new material designs to test based on previous experimental results. These platforms enable researchers to rapidly explore and optimize designs within otherwise unfeasibly large search spaces.

Today, most material science labs are organized around a faculty member or principal investigator (PI), who manages a team of graduate students. Each graduate student designs experiments and hypotheses in collaboration with a PI, and then executes the experiment, synthesizing the material and characterizing its property. Unfortunately, that last step is often laborious and the most time-consuming. This sequential method to material discovery, where highly knowledgeable graduate students spend large portions of their time doing manual wet lab work, rate limits the amount of experiments and potential discoveries by a given lab group. SDLs can significantly improve the labor productivity of our scientific enterprise, freeing highly skilled graduate students from menial experimental labor to craft new theories or distill novel insights from autonomously collected data. Additionally, they yield more reproducible outcomes as experiments are run by code-driven motors, rather than by humans who may forget to include certain experimental details or have natural variations between procedures.

Self-Driving Labs are not a pipe dream. The biotech industry has spent decades developing advanced high-throughput synthesis and automation. For instance, while in the 1970s statins (one of the most successful cholesterol-lowering drug families) were discovered in part by a researcher testing 3800 cultures manually over a year, today, companies like AstraZeneca invest millions of dollars in automation and high-throughput research equipment (see figure 1). While drug and material discovery share some characteristics (e.g., combinatorially large search spaces and high impact of discovery), materials R&D has historically seen fewer capital investments in automation, primarily because it sits further upstream from where private investments anticipate predictable returns. There are, however, a few notable examples of SDLs being developed today. For instance, researchers at Boston University used a robot arm to test 3D-printed designs for uniaxial compression energy adsorption, an important mechanical property for designing stronger structures in civil engineering and aerospace. A Bayesian optimizer was then used to iterate over 25,000 designs in a search space with trillions of possible candidates, which led to an optimized structure with the highest recorded mechanical energy adsorption to date. Researchers at North Carolina State University used a microfluidic platform to autonomously synthesize >100 quantum dots, discovering formulations that were better than the previous state of the art in that material family.

These first-of-a-kind SDLs have shown exciting initial results demonstrating their ability to discover new material designs in a haystack of thousands to trillions of possible designs, which would be too large for any human researcher to grasp. However, SDLs are still an emerging technology platform. In order to scale up and realize their full potential, the federal government will need to make significant and coordinated research investments to derisk this materials innovation platform and demonstrate the return on capital before the private sector is willing to invest it.

Other nations are beginning to recognize the importance of a structured approach to funding SDLs: University of Toronto’s Alan Aspuru-Guzik, a former Harvard professor who left the United States in 2018, has created an Acceleration Consortium to deploy these SDLs and recently received $200 million in research funding, Canada’s largest ever research grant. In an era of strategic competition and climate challenges, maintaining U.S. competitiveness in materials innovation is more important than ever. Building a strong research program to fund, build, and deploy SDLs in research labs should be a part of the U.S. innovation portfolio.

Plan of Action

While several labs in the United States are working on SDLs, they have all received small, ad-hoc grants that are not coordinated in any way. A federal government funding program dedicated to self-driving labs does not currently exist. As a result, the SDLs are constrained to low-hanging material systems (e.g., microfluidics), with the lack of patient capital hindering labs’ ability to scale these systems and realize their true potential. A coordinated U.S. research program for Self-Driving Labs should:

Initiate an ARPA-E SDL Grand Challenge: Drawing inspiration from DARPA’s previous grand challenges that have catalyzed advancements in self-driving vehicles, ARPA-E should establish a Grand Challenge to catalyze state-of-the-art advancements in SDLs for scientific research. This challenge would involve an open call for teams to submit proposals for SDL projects, with a transparent set of performance metrics and benchmarks. Successful applicants would then receive funding to develop SDLs that demonstrate breakthroughs in automated scientific research. A projected budget for this initiative is $30 million1, divided among six selected teams, each receiving $5 million over a four-year period to build and validate their SDL concepts. While ARPA-E is best positioned in terms of authority and funding flexibility, other institutions like National Science Foundation (NSF) or DARPA itself could also fund similar programs.

Establish a Focused Research Organization to open-source SDL components: This FRO would be responsible for developing modular, open-source hardware and software specifically designed for SDL applications. Creating common standards for both the hardware and software needed for SDLs will make such technology more accessible and encourage wider adoption. The FRO would also conduct research on how automation via SDLs is likely to reshape labor roles within scientific research and provide best practices on how to incorporate SDLs into scientific workflows. A proposed operational timeframe for this organization is five years, with an estimated budget of $18 million over that time period. The organization would work on prototyping SDL-specific hardware solutions and make them available on an open-source basis to foster wider community participation and iterative improvement. A FRO could be spun out of the DOE’s new Foundation for Energy Security (FESI), which would continue to establish the DOE’s role as an innovative science funder and be an exciting opportunity for FESI to work with nontraditional technical organizations. Using FESI would not require any new authorities and could leverage philanthropic funding, rather than requiring congressional appropriations.

Provide dedicated funding for the DOE national labs to build self-driving lab user facilities, so the United States can build institutional expertise in SDL operations and allow other U.S. scientists to familiarize themselves with these platforms. This funding can be specifically set aside by the DOE Office of Science or through line-item appropriations from Congress. Existing prototype SDLs, like the Argonne National Lab Rapid Prototyping Lab or Berkeley Lab’s A-Lab, that have emerged in the past several years lack sustained DOE funding but could be scaled up and supported with only $50 million in total funding over the next five years. SDLs are also one of the primary applications identified by the national labs in the “AI for Science, Energy, and Security” report, demonstrating willingness to build out this infrastructure and underscoring the recognized strategic importance of SDLs by the scientific research community.

Frequently Asked Questions
What factors determine whether an SDL is appropriate for materials innovation?

As with any new laboratory technique, SDLs are not necessarily an appropriate tool for everything. Given that their main benefit lies in automation and the ability to rapidly iterate through designs experimentally, SDLs are likely best suited for:



  • Material families with combinatorially large design spaces that lack clear design theories or numerical models (e.g., metal organic frameworks, perovskites)

  • Experiments where synthesis and characterization are either relatively quick or cheap and are amenable to automated handling (e.g., UV-vis spectroscopy is relatively simple, in-situ characterization technique)

  • Scientific fields where numerical models are not accurate enough to use for training surrogate models or where there is a lack of experimental data repositories (e.g., the challenges of using density functional theory in material science as a reliable surrogate model)


While these heuristics are suggested as guidelines, it will take a full-fledged program with actual results to determine what systems are most amenable to SDL disruption.

What aren’t SDLs?

When it comes to exciting new technologies, there can be incentives to misuse terms. Self-Driving Labs can be precisely defined as the automation of both material synthesis and characterization that includes some degree of intelligent, automated decision-making in-the-loop. Based on this definition, here are common classes of experiments that are not SDLs:



  • High-throughput synthesis, where synthesis automation allows for the rapid synthesis of many different material formulations in parallel (lacks characterization and AI-in-the-loop)

  • Using AI as a surrogate trained over numerical models, which is based on software-only results. Using an AI surrogate model to make material predictions and then synthesizing an optimal material is also not a SDL, though certainly still quite the accomplishment for AI in science (lacks discovery of synthesis procedures and requires numerical models or prior existing data, neither of which are always readily available in the material sciences).

Will SDLs “automate” away scientists? How will they change the labor structure of science?

SDLs, like every other technology that we have adopted over the years, eliminate routine tasks that scientists must currently spend their time on. They will allow scientists to spend more time understanding scientific data, validating theories, and developing models for further experiments. They can automate routine tasks but not the job of being a scientist.


However, because SDLs require more firmware and software, they may favor larger facilities that can maintain long-term technicians and engineers who maintain and customize SDL platforms for various applications. An FRO could help address this asymmetry by developing open-source and modular software that smaller labs can adopt more easily upfront.

The Biorevolution is Underway. Now is the Time for Biology to Harness the Potential of Artificial Intelligence

The Federation of American Scientists (FAS) Makes Five Policy Recommendations to Maximize Opportunity and Minimize Risk at the Intersection of Biology and Artificial Intelligence

Washington, DC – December 12, 2023 – Today the Federation of American Scientists (FAS) released federal policy recommendations to address potential threats AI poses to bioscience and the surging bioeconomy. The five recommendations presented by experts are detailed in these memos:

Read each of these recommendations, plus an introduction from Nazish Jeffery at this link.

ABOUT FAS

The Federation of American Scientists (FAS) works to advance progress on a broad suite of contemporary issues where science, technology, and innovation policy can deliver dramatic progress, and seeks to ensure that scientific and technical expertise have a seat at the policymaking table. Established in 1945 by scientists in response to the atomic bomb, FAS continues to work on behalf of a safer, more equitable, and more peaceful world. More information at fas.org.

###

Tracking AI Provisions in FY 2024 Appropriations Bills

As Congress moves forward with the appropriations process, both the House and Senate have proposed various provisions related to artificial intelligence (AI) and machine learning (ML) across different spending bills. These proposals reflect the growing importance and adoption of AI/ML technologies across many areas of government.

Below we summarize AI/ML provisions for each appropriations bill in tables comparing the Senate and House versions. Tables include:

Both chambers provide significant funding increases for AI research at science agencies like the National Science Foundation (NSF), National Institute of Standards and Technology (NIST), and the Department of Energy (DoE)’s Office of Science. For example, the Senate recommends $135 million for AI initiatives across DoE’s Office of Science, while the House includes $20 million for NSF to research AI explainability. NIST sees a $68 million funding increase in the Senate bill for its measurement labs and research, and a $15 million increase in the House.

The provisions overall seem focused on practical AI applications and boosting research, rather than ideological battles. The language in both chambers’ bills is framed in terms of maintaining US leadership and competitiveness, which tends to avoid partisan divisions. The House justifies more of its spending on AI in tones that are hawkish toward China. The Senate bills tend to have more congressionally directed spending items, or earmarks, related to AI.

Both bills demonstrate interest in AI applications like agricultural forecasting, autonomous vehicles, and utilizing AI to modernize government operations. But the Senate more explicitly directs agencies to adopt AI to improve such programs, and in some cases, such as NIST funding, the Senate is more fiscally generous. Overall, the Senate bill reports and bill summaries are more specific in the language and observations around AI, with 65 provisions related to AI or machine learning, compared to 44 in the House, across all appropriations bills. This potentially reflects a somewhat higher level of interest within the Senate Appropriations committee on the topic.

While both chambers agree on boosting AI research funding, the Senate takes a more top-down approach prescribing funding for AI initiatives while the House allows more agency discretion. Differences also emerge regarding perspectives on AI oversight and governance. Clearly, there will be a lot of coordination needed to align on AI funding priorities when (and if) these bills go to conference.

This tracker will be updated as the appropriations process continues.

Agriculture

AI Provisions in 2023 Appropriations Bills: Agriculture

ProvisionHouse SummarySenate SummaryStatusSource
Agricultural data securityN/AProvides $2M for ARS and university collaboration on ag data security researchPassed Senate 11/1/2023S. P.18
BARD programmingExpands BARD (USDA-Israel collaboration) to include AIExpands BARD (USDA-Israel collaboration) to include AIPassed Senate 11/1/2023; Passed House Appropriations 6/20/2023S. P.19; H. P.17
Sensor fusion researchN/AProvides $1M for ARS sensor/AI research for environmental monitoringPassed Senate 11/1/2023S. P.23
Poultry processing researchN/AProvides funds for ARS poultry research including AI/automationPassed Senate 11/1/2023S. P.28
Predictive crop modelingN/AProvides $1M for ARS predictive crop modeling using AIPassed Senate 11/1/2023S. P.28-29
Agricultural roboticsN/AEncourages NIFA ag robotics researchPassed Senate 11/1/2023S. P.37
AI Research InstitutesEncourages NIFA support for AI Research InstitutesEncourages NIFA support for AI Research InstitutesPassed Senate 11/1/2023; Passed House Appropriations 6/20/2023S. P.37; H. P.25
SNAP fraud detectionN/AEncourages FNS to use data mining/ML for SNAP fraud detectionPassed Senate 11/1/2023S. P.106
Food labeling accuracyDirects FDA to evaluate AI tools for food labeling complianceDirects FDA to evaluate AI tools for food labeling compliancePassed Senate 11/1/2023; Passed House Appropriations 6/20/2023S. P.123; H. P.78
Dairy robotics fundingN/AProvides funding for robotic dairy milker (ARS)Passed Senate 11/1/2023S. P.151
Specialty crop resilienceSupports NIFA specialty crop resilience research using AIN/APassed House Appropriations 6/20/2023, failed on House Floor 9/28/23H. P.24

Commerce, Science & Justice

AI Provisions in 2023 Appropriations Bills: Commerce, Science, & Justice

ProvisionHouse SummarySenate SummaryStatusPage
NIST AI fundingProvides $15M for NIST AI standards and risk frameworkDirects NIST to continue AI standards and risk framework workPassed House Subcommittee 7/14/2023; Passed Senate Committee 7/13/2023H. Explanatory Materials p19, 22; S. p23
NOAA autonomous systemsProvides $21.7M for NOAA autonomous maritime systemsN/APassed House Subcommittee 7/14/2023H. Explanatory Materials p27, 35
NOAA computing upgradesN/AProvides $60M to NOAA including $5M for AI weather data processingPassed Senate Committee 7/13/2023S. p62
NASA aviation autonomyProvides $10M+ for NASA autonomous flight systemsN/APassed House Subcommittee 7/14/2023H. Explanatory Materials p87-88, 91, 94
NSF AI explainabilityProvides up to $20M for NSF AI explainability researchN/APassed House Subcommittee 7/14/2023H. Explanatory Materials p97
NSF AI workforceEncourages NSF AI workforce developmentSupports NSF AI workforce developmentPassed House Subcommittee 7/14/2023; Passed Senate Committee 7/13/2023H. Explanatory Materials p101; S. p171-2
NWS translationN/AEncourages NWS to use AI for weather translationsPassed Senate Committee 7/13/2023S. p54
DEA digital evidenceN/AUrges DEA to adopt AI for digital evidence analysisPassed Senate Committee 7/13/2023S. p100-101
NASA AI investmentsN/ANotes AI as key space technology areaPassed Senate Committee 7/13/2023S. p157
NSF AI transparencyN/AEncourages NSF AI transparency researchPassed Senate Committee 7/13/2023S. p172
NSF computing resourcesEncourages NSF support for AI computing resourcesEncourages NSF AI computing resourcesPassed House Subcommittee 7/14/2023; Passed Senate Committee 7/13/2023H. Explanatory Materials p99; S. p173
EEOC AI biasN/ADirects EEOC to report on AI bias in hiringPassed Senate Committee 7/13/2023S. p181

Energy & Water Development

AI Provisions in 2023 Appropriations Bills: Energy & Water Development

ProvisionHouse SummarySenate SummaryStatusPage
Climate modeling researchN/AProvides funding for DOE Office of Science AI/ML tools to improve climate modeling and analysis of low-dose radiation impactsPassed Senate committee 7/20/23S. p120
Broad AI/ML research programN/AEstablishes $135M cross-cutting AI/ML research program across DOE Office of SciencePassed Senate committee 7/20/23S. p116
Quantum computing algorithmsN/ASupports DOE Office of Science research on algorithms for future quantum computersPassed Senate committee 7/20/23S. p117
Exascale computing softwareN/AProvides funding to DOE Office of Science to maintain and advance software for exascale systemsPassed Senate committee 7/20/23S. p117
Advanced computing strategyN/ADirects DOE to brief Congress on advanced computing strategy and investmentsPassed Senate committee 7/20/23S. p118
Battery interface designN/AEncourages DOE Office of Science research using AI/ML tools for battery interface designPassed Senate committee 7/20/23S. p118-9
Removal of AI office fundingN/AEliminates $1M in funding for DOE’s Artificial Intelligence and Technology OfficePassed Senate committee 7/20/23S. p150
AI for dredging optimizationN/AProvides funding for AI/ML tools to optimize Army Corps dredging operationsPassed Senate committee 7/20/23S. p47
AI for CO2 captureProvides $5M for universities to research AI/ML for DOE FE CO2 captureN/APassed House 10/26/23H. p110
Cyber-physical systems securityProvides $5M for university research on resilient cyber-physical systems for DOE CESERN/APassed House 10/26/23H. p100

Financial Services & Government

AI Provisions in 2023 Appropriations Bills: Financial Services and General Government Bill

ProvisionHouse SummarySenate SummaryStatusPage
IRS customer service AIN/ADirects IRS to study using AI chatbots for customer service and provide briefingPassed Senate Appropriations Committee 7/13/2023S. p25-26
FTC AI oversightN/ASupports FTC oversight of AI for consumer protection and competitionPassed Senate Appropriations Committee 7/13/2023S. p68
George Mason University AI researchN/AProvides $1M to George Mason University Center for AI Innovation for Economic Competitiveness for AI researchPassed Senate Appropriations Committee 7/13/2023S. p110

Homeland Security:

AI Provisions in 2023 Appropriations Bills: Homeland Security

ProvisionHouse SummarySenate SummaryStatusPage
CBP AI for screeningN/AEncourages CBP to prioritize testing and acquiring AI/ML tools for vehicle and cargo screeningPassed Senate Appropriations Committee 7/27/2023S. p4, 29-30
CBP northern border techN/ADirects CBP to brief Congress on autonomous systems for northern borderPassed Senate Appropriations Committee 7/27/2023S. p31
CBP AI for mail facilitiesN/AProvides funds for CBP to use AI/ML to detect smuggled drugs in international mailPassed Senate Appropriations Committee 7/27/2023S. p37-38
CBP contract analysisN/ADirects CBP to use AI/ML tools to analyze and consolidate contractsPassed Senate Appropriations Committee 7/27/2023S. p44
Integrated border surveillanceN/AProvides $86M for integrated border surveillance towers with autonomyPassed Senate Appropriations Committee 7/27/2023S. p45
CBP screening improvementsN/ADirects CBP to incorporate AI/ML into screening systemsPassed Senate Appropriations Committee 7/27/2023S. p46
ICE data systemsN/ADirects ICE to brief Congress on use of AI/ML in data systemsPassed Senate Appropriations Committee 7/27/2023S. p56-57
Coast Guard tech upgradesN/AEncourages Coast Guard to develop AI/ML for search and rescuePassed Senate Appropriations Committee 7/27/2023S. p77
DHS cybersecurity researchN/AProvides university research funds for cybersecurity AI/MLPassed Senate Appropriations Committee 7/27/2023S. p115
DHS IoT researchN/AEncourages DHS research on IoT and AI/ML for infrastructure securityPassed Senate Appropriations Committee 7/27/2023S. p116
CBP surveillance towersProvides $21M for CBP autonomous surveillance towersN/APassed House 9/28/2023H. p5, 21
Coast Guard tech upgradesProvides $10M to Coast Guard for AI and autonomous capabilitiesN/APassed House 9/28/2023H. p7
CBP targeting improvementsEncourages CBP review of AI for targeting centerN/APassed House 9/28/2023H. p26
CBP maritime surveillanceDirects CBP to brief Congress on maritime autonomous surveillanceN/APassed House 9/28/2023H. p29, 47
CBP tower deploymentDirects CBP to deploy autonomous surveillance towersN/APassed House 9/28/2023H. p30
CBP inspection techProvides $12.6M for CBP AI/ML screening toolsN/APassed House 9/28/2023H. p32
ICE detention operationsDirects ICE to brief Congress on using AI for detention operationsN/APassed House 9/28/2023H. p39-40
TSA screening improvementsRecognizes TSA efforts to develop AI screening algorithmsN/APassed House 9/28/2023H. p44-45

Interior & Environment

AI Provisions in 2023 Appropriations Bills: Interior & Environment Comparison

ProvisionHouse SummarySenate SummaryStatus
Wildfire modeling and AIEncourages Forest Service and BLM collaboration with NOAA on wildfire modeling and AI solutionsN/APassed House 11/3/23
Advanced computing for water monitoringProvides funding for USGS work with academics on advanced computing techniques for water monitoringProvides funding for USGS to work with academics on advanced computing techniques for water monitoring and requests briefingPassed Senate Appropriations Committee 7/27/2023; Passed House Appropriations Committee 7/19/2023

Labor, HHS & Education

AI Provisions in 2023 Appropriations Bills: Labor, HHS, and Education Bill

ProvisionHouse SummarySenate SummaryStatusPage
Robotics manufacturing trainingN/AEncourages DOL to prioritize robotics and manufacturing training programsPassed Senate Appropriations Committee 7/27/2023S. p9
NIH AI/ML researchN/AProvides $135M for NIH AI/ML research and seeks update on ethics standardsPassed Senate Appropriations Committee 7/27/2023S. p118-9
NIH AI for research prioritizationN/ASupports NIH using AI to optimize research investmentsPassed Senate Appropriations Committee 7/27/2023S. p124
IES collaboration on AI researchN/AEncourages IES collaboration with NSF on AI education researchPassed Senate Appropriations Committee 7/27/2023S. p248
Robotics education earmarksN/AIncludes 3 earmarks totaling $2.06M for robotics education programsPassed Senate Appropriations Committee 7/27/2023S. p285, 340, 355

Legislative Branch

AI Provisions in 2023 Appropriations Bills: Legislative Branch

ProvisionHouse SummarySenate SummaryStatusPage
Copyright Office AI studyN/ADirects Copyright Office to brief Congress on AI and copyright issuesPassed Senate Appropriations Committee 7/13/2023S. p49
GAO AI oversightN/AEncourages GAO’s STAA team to continue AI oversight for CongressPassed Senate Appropriations Committee 7/13/2023S. p55
House AI working groupDirects House CAO to formalize AI working group and produce AI reportN/APassed House 11/1/2023H. p8-9
AI captioning studyDirects accessibility office to study AI captioning for committeesN/APassed House 11/1/2023H. p13

Military Construction & VA

AI Provisions in 2023 Appropriations Bills: Military Construction & VA

ProvisionHouse SummarySenate SummaryStatusPage
VA autonomous robotsEncourages VA to consider autonomous robots in hospital planning and requests reportDirects VA to report on cost savings from using autonomous robots at hospitalsPassed Senate 11/1/2023; Passed House 7/27/2023S. p48; H. p32, 46
VA bioelectronics researchEncourages VA research combining bioelectronics, AI/ML for treatmentN/APassed House 7/27/2023H. p33

State & Foreign Operations

AI Provisions in 2023 Appropriations Bills: State & Foreign Operations

ProvisionHouse SummarySenate SummaryStatusPage
Technology diplomacy trainingN/ADirects State Dept to address deficiencies in AI training for technology diplomacyPassed Senate Appropriations Committee 7/20/2023S. p15
NATO emerging tech investmentsN/AEncourages NATO to invest in AI/ML capabilitiesPassed Senate Appropriations Committee 7/20/2023S. p27

Transportation & HUD

AI Provisions in 2023 Appropriations Bills: Transportation & HUD

ProvisionHouse SummarySenate SummaryStatusPage
Rural AV researchN/AConcerned about delays in DOT awarding prior rural AV research funds; may require briefingPassed Senate 11/1/2023S. p21-22
Crashworthiness standardsDirects NHTSA to continue research on lightweight materials for AVsDirects NHTSA to continue research on lightweight materials for AVsPassed Senate 11/1/2023; Passed House 7/18/2023S. p59; H. p41
Airport taxiing systemEncourages FAA to evaluate autonomous airport taxiing systemN/APassed House 7/18/2023H. p30
AV regulatory frameworkDirects NHTSA to submit biannual reports on AV rulemaking activitiesN/APassed House 7/18/2023H. p42

Unlocking American Competitiveness: Understanding the Reshaped Visa Policies under the AI Executive Order

The looming competition for global talent has brought forth a necessity to evaluate and update the policies concerning international visa holders in the United States. Recognizing this, President Biden has directed various agencies to consider policy changes aimed at improving processes and conditions for legal foreign workers, students, researchers, and scholars through the upcoming AI Executive Order (EO). The EO recognizes that attracting global talent is vital for continued U.S. economic growth and enhancing competitiveness. 

Here we offer a comprehensive analysis of potential impacts and beneficiaries under several key provisions brought to attention by this EO. The provisions considered herein are categorized under six paramount categories: domestic revalidation for J-1 and F-1 Visas; modernization of H-1B Visa Rules; updates to J-1 Exchange Visitor Skills List; the introduction of Global AI Talent Attraction Program; issuing an RFI to seek updates to DOL’s Schedule A; and policy manual updates for O-1A, EB-1, EB-2 and International Entrepreneur Rule. Each policy change carries the potential to advance America’s ability to draw in international experts that hugely contribute to our innovation-driven economy.

Domestic Revalidation for J-1 and F-1 Visas

The EO directive on expanding domestic revalidation for J-1 research scholars and F-1 STEM visa students simplifies and streamlines the renewal process for a large number of visa holders. 

There are currently approximately 900,000 international students in the US, nearly half of whom are enrolled in STEM fields. This policy change has the potential to impact almost 450,000 international students, including those who partake in optional practical training (OPT). The group of affected individuals consists greatly of scholars with advanced degrees as nearly half of all STEM PhDs are awarded to international students.

One of the significant benefits offered by this EO directive is the reduction in processing times and associated costs. In addition, it improves convenience for these students and scholars. For example, many among the several hundreds of thousands of STEM students will no longer be obligated to spend excessive amounts on travel to their home country for a 10-minute interview at an Embassy.

Aside from saving costs, this directive also allows students to attend international conferences more easily and enjoy hassle-free travel without being worried about having to spend a month away from their vital research waiting for visa renewal back home.

Expanding domestic revalidation to F and J visa holders was initially suggested by the Secure Borders and Open Doors Advisory Committee in January 2008, indicating its long-standing relevance and importance. By implementing it, we not only enhance efficiency but also foster a more supportive environment for international students contributing significantly to our scientific research community.

Modernization of H-1B Visa Rules

The EO directive to update the rules surrounding H-1B visas would positively impact the over 500k H-1B visa holders. The Department of Homeland Security recently released a Notice of Proposed Rulemaking to reform the H-1B visa rules. It would allow these visa holders to easily transition into new jobs, have more predictability and certainty in the renewal process and more flexibility or better opportunities to apply their skills, and allow entrepreneurs to more effectively access the H-1B visa. Last year, 206,002 initial and continuing H-1Bs were issued. The new rules would apply to similar numbers in FY2025. But what amplifies this modification’s impact is its potential crossover with EB-1 and EB-2 petitioners waiting on green cards—currently at over 400k petitions. 

Additionally, the modernization would address the issue of multiple applications per applicant. This has been a controversial issue in the H-1B visa program as companies would often file multiple registrations for the same employee, thus increasing the exhaustion rate of yearly quotas, thereby reducing chances for others. This modernization could potentially address this problem by introducing clear rules or restrictions on the number of applications per applicant. USCIS recently launched fraud investigations into several companies engaging in this practice.

Updates to J-1 Exchange Visitor Skills List

The EO directive to revamp the skills list will synchronize with evolving global labor market needs. Nearly 37k of the J-1s issued in 2022 went to professors, research scholars and short term scholars, hailing from mainly China and India (nearly 40% of all). Therefore, this update not only expands opportunities available to these participants but also tackles critical skill gaps within fields like AI in the U.S. Once the J-1 skills list is updated to meet the realities of the global labor market today, it will allow thousands of additional high skilled J-1 visa holders to apply for other visa categories immediately, without spending 2-years in their countries of origin, as laid out in this recent brief by the Federation of American Scientists.

Global AI Talent Attraction Program

Recognizing AI talent is global, the EO directive on using the State Department’s public diplomacy function becomes strategically important. By hosting overseas events to appeal to such crucial talent bases abroad, we can effectively fuel the U.S. tech industry’s unmet demand that has seen a steep incline over recent years. While 59% of the top-tier AI researchers work in the U.S., only 20% of them received their undergraduate degree in the U.S. Only 35% of the most elite (top 0.5%) of AI researchers received their undergraduate degree in the U.S., but 65% of them work in the U.S. The establishment of a Global AI Talent Attraction program by the State Department will double down on this uniquely American advantage.

Schedule A Update & DOL’s RFI

Schedule A is a list of occupations for which the U.S. Department of Labor (DOL) has determined there are not sufficient U.S. workers who are able, willing, qualified and available. Foreign workers in these occupations can therefore have a faster process to receive a Green Card because the employer does not need to go through the Labor Certification process. Schedule A Group I was created in 1965 and has remained unchanged since 1991. If the DOL were to update Schedule A, it would impact foreign workers and employers in several ways depending on how the list changes:

Foreign workers with occupations that are on Schedule A do not have to go through the PERM (Program Electronic Review Management) labor certification process, a process that otherwise takes on average 300 days to complete. This is because Schedule A lists occupations for which the Department of Labor has already determined there are not sufficient U.S. workers who are able, willing, qualified and available. An updated Schedule A could cut PERM applications filed significantly down from current high volumes (over 86,000 already filed by the end of FY23 Q3). While the EO only calls for an RFI seeking information on the Schedule A List, this is a critical first step to an eventual update that is badly needed.

Policy Manual Updates for O-1A, EB-1, EB-2 and International Entrepreneur Rule

The EO’s directive to DHS to modernize pathways for experts in AI and other emerging technologies will have profound effects on the U.S. tech industry. Fields such as Artificial Intelligence (AI), Quantum computing, Biotechnology, etc., are increasingly crucial in defining global technology leadership and national security. As per the NSCAI report, the U.S. significantly lags behind in terms of AI expertise due to severe immigration challenges.

The modernization would likely include clarification and updates to the criteria of defining ‘extraordinary ability’ and ‘exceptional ability’ under O-1A, EB-1 and EB-2 visas, becoming more inclusive towards talents in emerging tech fields. For instance, the current ‘extraordinary ability’ category is restrictive towards researchers as it preferentially favors those who have received significant international awards or recognitions—a rarity in most early-stage research careers. Similarly, despite O-1A and EB-1 both designed for aliens with extraordinary ability, the criteria for EB-1 is more restrictive than O-1A and bringing both in line would allow a more predictable path for an O-1A holder to transition to an EB-1. Such updates also extend to the International Entrepreneur Rule, facilitating startup founders from critical technology backgrounds more straightforward access into the U.S. landscape.

Altogether, these updates could lead to a surge in visa applications under O-1A, EB-1, EB-2 categories and increase entrepreneurship within emerging tech sectors. In turn, this provision would bolster the U.S.’ competitive advantage globally by attracting top-performing individuals working on critical technologies worldwide.

Enhanced Informational Resources and Transparency

The directives in Section 4 instruct an array of senior officials to create informational resources that demystify options for experts in critical technologies intending to work in the U.S. The provision’s ramifications include:

Streamlining Visa Services 

This area of the order directly addresses immigration policy with a view to accelerating access for talented individuals in emerging tech fields. 

Using Discretionary Authorities to Support and Attract AI Talent

The EO’s directive to the Secretary of State and Secretary of Homeland Security to use discretionary authorities—consistent with applicable law and implementing regulations—to support and attract foreign nationals with special skills in AI seeking to work, study, or conduct research in the U.S. could have enormous implications. 

One way this provision could be implemented is through the use of public benefit parole. Offering parole to elite AI researchers who may otherwise be stuck in decades long backlogs (or are trying to evade authoritarian regimes) could see a significant increase in the inflow of intellectual prowess into the U.S. Public benefit parole is also the basis for the International Entrepreneur Rule. Given how other countries are actively poaching talent from the U.S. because of our decades long visa backlogs, creating a public benefit parole program for researchers in AI and other emerging technology areas could prove extremely valuable. These researchers could then be allowed to stay and work in the U.S. provided they are able to demonstrate (on an individual basis) that their stay in the U.S. would provide a significant public benefit through their AI research and development efforts.

Another potential utilization of this discretionary authority could be in the way of the Department of State issuing a memo announcing a one‐​time recapture of certain immigrant visa cap numbers to redress prior agency failures to issue visas. There is precedence for this as when the government openly acknowledged its errors that made immigrants from Western Hemisphere countries face longer wait times between 1968 and 1976 as it incorrectly charged Cuban refugees to the Western Hemisphere limitation. To remedy the situation, the government recaptured over 140,000 visas from prior fiscal years on its own authority, and issued them to other immigrants who were caught in the Western Hemisphere backlog. 

In the past, considerable quantities of green cards have gone unused due to administrative factors. Recapturing these missed opportunities could immediately benefit a sizable volume of immigrants, including those possessing AI skills and waiting for green card availability. For instance, if a hypothetical 300,000 green cards that were not allocated due to administrative failures are recaptured, it could potentially expedite the immigration process for a similar number of individuals. 

Finally, as a brief from the Federation of American Scientists stated earlier, it is essential that the Secretary of State and the Secretary of Homeland Security extend the visa interview waivers indefinitely, considering the significant backlogs faced by the State Department at several consular posts that are preventing researchers from traveling to the U.S. 

In August 2020, Secretary Pompeo announced that applicants seeking a visa in the same category they previously held would be allowed to get an interview waiver if their visa expired in the last 24 months. Before this, the expiration period for an interview waiver was only 12 months. In December 2020, just two days before this policy was set to expire, DOS extended it through the end of March 2021. In March, the expiration period was doubled again, from 24 months to 48 months and the policy extended through December 31, 2021. In September of 2021, DOS also approved waivers through the remainder of 2021 for applicants of F, M, and academic J visas from Visa Waiver Program countries who were previously issued a visa.

In December 2021, DOS extended its then-existing policies (with some minor modifications) through December 2022. Moreover, the interview waiver policy that individuals renewing a visa in the same category as a visa that expired in the preceding 48 months may be eligible for issuance without an interview was announced as a standing policy of the State Department, and added to the department’s Foreign Affairs Manual for consular officers.  In December 2022, DOS announced another extension of these policies, which are set to expire at the end of 2023. 

As the State Department recently noted: “These interview waiver authorities have reduced visa appointment wait times at many embassies and consulates by freeing up in-person interview appointments for other applicants who require an interview. Nearly half of the almost seven million nonimmigrant visas the Department issued in Fiscal Year 2022 were adjudicated without an in-person interview. We are successfully lowering visa wait times worldwide, following closures during the pandemic, and making every effort to further reduce those wait times as quickly as possible, including for first-time tourist visa applicants. Embassies and consulates may still require an in-person interview on a case-by-case basis and dependent upon local conditions.”

These changes would also benefit U.S. companies and research institutions, who often struggle to retain and attract international AI talent due to the lengthy immigration process and uncertain outcomes. In addition, exercising parole authority can open a new gateway for attracting highly skilled AI talent that might have otherwise chosen other countries due to the rigid U.S. immigration system. 

The use of such authorities can result in a transformational change for AI research and development in the U.S. However, all these outcomes entirely depend upon the actual changes made to existing policies—a task that many acknowledge will require serious thoughtfulness for walking a balance between remaining advantageously selective yet inclusive enough.

In summary, these provisions would carry massive impacts—enabling us to retain foreign talent vital across sectors including but not limited to education, technology and healthcare; all fuelling our national economic growth in turn.

‘Safe, Secure, and Trustworthy AI’ is Necessary and Urgent, Says Federation of American Scientists (FAS)

Researchers at the nonpartisan science think tank support Biden’s executive order on the use of artificial intelligence in government

Washington, DCOctober 30, 2023 – Saying that action is urgent and necessary, researchers at the Federation of American Scientists (FAS) state their approval of today’s Executive Order (EO) on the federal government’s use of artificial intelligence (AI). The EO, signed by President Biden this morning, directs federal departments and agencies to assess their AI workforce requirements and preparedness to deploy AI tools and secure against threats. The EO mandates that within six months of the EO release certain agencies (DoD, DHS, DOE, NSA, CIA, FBI) submit assessments to the president on how AI capabilities could be leveraged within government. Additional provisions include high-skilled immigration, cybersecurity, and red-teaming requirements for Large Language Models (LLMs) before they can be used by federal workers. Also introduced is AI.gov, intended to be the key landing page for the government on AI issues and serve as a one-stop shop for all things AI, including workforce training related efforts. 

“We applaud the Biden Administration for taking action to direct the development and use of this critical technology to the best outcomes,” says Dan Correa, CEO of FAS. “Our organization was founded in 1945 when scientists understood the tremendous ramifications of atomic energy; while we don’t think AI represents that kind of existential threat, we do see AI as a rapidly evolving technology with massive societal implications that demands the public’s attention now, especially when applied to government services.”

He continues: “We are pleased to see the provisions outlined in this comprehensive executive order as a significant step in the right direction. This landmark document recognizes not only the pivotal role AI and advanced technologies play in shaping our future but also the absolute importance of nurturing the human minds behind these fields.”

Key provisions in the EO include: bolstering AI capabilities and defenses within the federal government; building up the federal AI workforce; attracting and retaining foreign AI talent; red-teaming requirements before AI capabilities can be used by the government; and a “know your customer” framework similar to others used in cloud computing. Also included is a provision for ongoing public comment regarding open source LLMs as the technology evolves.

“The inclusion of red teaming exercises and guidance to agencies on using powerful AI models demonstrate a proactive and prudent stance towards managing risks and rewards,” says Divyansh Kaushik, associate director of Emerging Technologies and National Security at FAS. “This intersectional approach ensures we uphold safety while pushing the frontiers of knowledge.

In essence, this executive order unequivocally underscores the tight-knit relationship between technology and national security, reinforcing its commitment to harnessing science for technological prowess and societal well-being. As we watch these policies unfold, we are hopeful that they will serve as a beacon attracting advancements in AI and emerging technologies, paving the way for a stronger, more prosperous future.”

Kaushik goes on to explain that focusing on immigration parameters to support technology expertise influx—with initiatives like H1-B modernization, J1 skills list updates, and domestic visa revalidation—bolsters America’s ability to attract and retain global talent.

The EO is expected to serve as a sign that the United States is setting the standard for AI use that allies and partners can emulate. This week President Harris and Commerce Secretary Raimondo are scheduled to travel to the UK for an AI Safety Summit on November 1 – 2. The Vice President is planning to deliver a major address around AI policy. 


ABOUT FAS

The Federation of American Scientists (FAS) works to advance progress on a broad suite of contemporary issues where science, technology, and innovation policy can deliver dramatic progress, and seeks to ensure that scientific and technical expertise have a seat at the policymaking table. Established in 1945 by scientists in response to the atomic bomb, FAS continues to work on behalf of a safer, more equitable, and more peaceful world. More information at fas.org.

AI in Action: Recommendations for AI Policy in Health, Education, and Labor

The Ranking Member of the Senate Committee on Health, Education, Labor, & Pensions (HELP) recently requested information regarding AI in our healthcare system, in the classroom, and in the workplace. The Federation of American Scientists was happy to provide feedback on the Committee’s questions. Targeted investments and a clear-eyed vision of the future of AI in these domains will allow the U.S. to reap more of the potential benefits of AI while preventing some of the costs.

This response provides recommendations on leveraging AI to improve education, healthcare, and the future of work. Key points include:

Overall, with thoughtful oversight and human-centric design, AI promises immense benefits across these sectors. But responsible governance is crucial, as is inclusive development and ongoing risk assessment. By bringing together stakeholders, the U.S. can lead in advancing ethical, high-impact applications of AI.

Education

The Federation of American Scientists (FAS) co-leads the Alliance for Learning Innovation (ALI), a coalition of cross-sector organizations seeking to build a stronger, more competitive research and development (R&D) infrastructure in U.S. education. As was noted in the ALI Coalition’s response to White House Office of Science & Technology Policy’s “Request for Information: National Priorities for Artificial Intelligence,” FAS sees great promise and opportunity for artificial intelligence to improve education, equity, economic opportunity, and national security. In order to realize this opportunity and mitigate risks, we must ensure that the U.S. has a robust, inclusive, and updated education R&D ecosystem that crosscuts federal agencies.

What Should The Federal Role Be In Supporting AI In Education?

Research And Development

The U.S. government should prioritize funding and supporting R&D in the field of AI to ensure that the U.S. is on the cutting edge of this technology. One strong existing federal example are the AI Institutes supported by the National Science Foundation (NSF) and the U.S. Department of Education (ED). Earlier this year, NSF and the Institute of Education Sciences (IES) established the AI Institute for Exceptional Children, which capitalizes on the latest AI research to serve children with speech and language pathology needs. Communities would benefit from additional AI Institutes that meet the moment and deliver solutions for today’s teaching and learning challenges.

Expanding Research Grant Programs

Federal agencies, and specifically IES, should build upon the training programs it has for broadening participation and create specific research grant programs for minority-serving institutions with an emphasis on AI research. While the IES Pathways program has had success in diversifying education research training programs, more needs to be done at the predoctoral and postdoctoral level.

National Center For Advanced Development In Education

Another key opportunity to support transformational AI research and development in the United States is to establish a National Center for Advanced Development in Education (NCADE). Modeled after the Defense Advanced Research Projects Agency (DARPA), NCADE would support large-scale, innovative projects that require a more nimble and responsive program management approach than currently in place. The Center would focus on breakthrough technologies, new pedagogical approaches, innovative learning models, and more efficient, reliable, and valid forms of assessments. By creating NCADE, Congress can seed the development and use of artificial intelligence to support teaching, personalize learning, support ELL students, and analyze speech and reading.

How Can We Ensure That AI Systems Are Designed, Developed, And Deployed In A Manner That Protects People’s Rights And Safety?

First and foremost, we need to ensure that underserved communities, minors, individuals with disabilities and the civil rights organizations that support them are at the table throughout the design process for AI tools and products. In particular, we need to ensure that research is led and driven locally and by those who are closest to the challenges, namely educators, parents, students, and local and state leaders.

When thoughtfully and inclusively designed, AI has the potential to enhance equity by providing more personalized learning for students and by supporting educators to address the individual and diverse needs in their classrooms. For example, AI could be utilized in teacher preparation programs to ensure that educators have access to more diverse experiences during their pre-service experiences. AI can also provide benefits and services to students and families who currently do not have access to those resources due to a lack of human capital.

Labor

What Role Will AI Play In Creating New Jobs?

AI can serve as a powerful tool for workforce systems, employers, and employees alike in order to drive job creation and upskilling. For instance, investment in large language learning models that scrape and synthesize real-time labor market information (LMI) can be used to better inform employers and industry consortia about pervasive skills gaps. Currently, most advanced real-time LMI products exist behind paywalls, but Congress should consider investing in the power of this information as a public good to forge a more competitive labor market.

The wide-scale commercialization of AI/ML-based products and services will also create new types of jobs and occupations for workers. Contrary to popular belief, many industries that face some level of automation will still require trained employees to pivot to emerging needs in a way that offsets the obsoletion of other roles. Through place-based partnerships between employers and training institutions (e.g., community colleges, work-based learning programs, etc.), localities can reinvest in their workers to provide transition opportunities and close labor market gaps.

What Role Will AI Standards Play In Regulatory And Self-Regulatory Efforts?

AI standards will serve as a crucial foundation as the U.S. government and industries navigate AI’s impacts on the workforce. The NIST AI Risk Management Framework provides a methodology for organizations to assess and mitigate risks across the AI lifecycle. This could enable more responsible automation in HR contexts—for example, helping ensure bias mitigation in algorithmic hiring tools. On the policy side, lawmakers drafting regulations around AI and employment will likely reference and even codify elements of the Framework.

On the industry side, responsible technology leaders are already using the NIST AI RMF for self-regulation. By proactively auditing and mitigating risks in internal AI systems, companies can build public trust and reduce the need for excessive government intervention. Though policymakers still have an oversight role, widespread self-regulation using shared frameworks is at this point the most efficient path for safe and responsible AI across the labor market.

Healthcare

What Updates To The Regulatory Frameworks For Drugs And Biologics Should Congress Consider To Facilitate Innovation In AI Applications?

Congress has an opportunity to update regulations to enable responsible innovation and oversight for AI applications in biopharma. For example, Congress could consider expanding the FDA’s mandate and capacity to require upfront risk assessments before deployment of particularly high-risk or dual-use bio-AI systems. This approach is currently used by DARPA for some autonomous and biological technologies.

Additionally, thoughtful reporting requirements could be instituted for entities developing advanced bio-AI models above a certain capability threshold. This transparency would allow for monitoring of dual-use risks while avoiding overregulation of basic research. 

How Can The FDA Improve The Use Of AI In Medical Devices? 

Ensuring That Analysis Of Subpopulation Performance Is A Key Component Of The Review Process For AI Tools

Analyzing data on the subpopulation performance of medical devices should be one key component of any comprehensive effort to advance equity in medical innovation. We appreciate the recommendations in the GOP HELP white paper asking developers to document the performance of their devices on various subpopulations when considering updates and modifications. It will be essential to assess subpopulation performance to mitigate harms that may otherwise arise—especially if an argument for equity is made for a certain product. 

Clarifying The Role Of Real-World Evidence In Approvals

Locating concerns regarding performance in subpopulations and within different medical environments will most likely involve the collection of real-world evidence regarding the performance of these tools in the wild. The role of real-world evidence in the regulatory approval process for market surveillance and updates should be defined more clearly in this guidance. 

How Can AI Be Best Adopted To Not Inappropriately Deny Patients Care?

AI Centers of Excellence could be established to develop demonstration AI tools for specific care populations and care environments. For example, FAS has published a Day One Memo proposing an AI Center of Excellence for Maternal Health to bring together data sources, then analyze, diagnose, and address maternal health disparities, all while demonstrating trustworthy and responsible AI principles. The benefits of AI Centers of Excellence are two-fold: they provide an opportunity for coordination across the federal government, and they can evaluate existing datasets to establish high-priority, high-impact applications of AI-enabled research for improving clinical care guidelines and tools for healthcare providers. 

The AI Center of Excellence model demonstrates the power of coordinating and thoughtfully applying AI tools across disparate federal data sources to address urgent public health needs. Similar centers could be established to tackle other complex challenges at the intersection of health, environmental, socioeconomic, and demographic factors. For example, an AI Center focused on childhood asthma could integrate housing data, EPA air quality data, Medicaid records, and school absenteeism data to understand and predict asthma triggers.

Harnessing the Promise of AI

Artificial intelligence holds tremendous potential to transform education, healthcare, and work for the better. But realizing these benefits in an equitable, ethical way requires proactive collaboration between policymakers, researchers, civil society groups, and industry.

The recommendations outlined here aim to strike a balance—enabling innovation and growth while centering human needs and mitigating risks. This requires robust funding for R&D, modernized regulations, voluntary standards, and inclusive design principles. Ongoing oversight and impact evaluation will be key, as will coordination across agencies and stakeholders.

Trust Issues: An Analysis of NSF’s Funding for Trustworthy AI

Below, we analyze AI R&D grants from the National Science Foundation’s Computer and Information Science and Engineering (NSF CISE) directorate, estimating those supporting “trustworthy AI” research. NSF hasn’t offered an overview of specific funding for such studies within AI. Through reviewing a random sample of granted proposals 2018-2022, we estimate that ~10-15% of annual AI funding supports trustworthy AI research areas, including interpretability, robustness, privacy-preservation, and fairness, despite an increased focus on trustworthy AI in NSF’s strategic plan as well as public statements by key NSF and White House officials. Robustness receives the most allocation (~6% annually), while interpretability and fairness each obtain ~2%. Funding for privacy-preserving machine learning has seen a significant rise, from .1% to ~5%. We suggest NSF increases funding towards responsible AI, incorporating specific programs and solicitations addressing critical AI trustworthiness issues. We also clarify that NSF should consider trustworthiness in all AI grant application assessments and prioritize projects enhancing the safety of foundation models.

Background on Federal AI R&D

Federal R&D funding has been critical to AI research, especially a decade ago when machine learning (ML) tools had less potential for wide use and received limited private investment. Much of the early AI development occurred in academic labs that were mainly federally funded, forming the foundation for modern ML insights and attracting large-scale private investment. With private sector investments outstripping public ones and creating notable AI advances, federal funding agencies are now reevaluating their role in this area. The key question lies in how public investment can complement private finance to advance AI research that is beneficial for American wellbeing.

Figure 1.

Inspiration for chart from from Our World in Data

The Growing Importance of Trustworthy AI R&D

A growing priority within the discourse of national AI strategy is the advancement of “trustworthy AI”. Per the National Institutes of Standards and Technology, Trustworthy AI refers to AI systems that are safe, reliable, interpretable, robust, demonstrate respect for privacy, and have harmful biases mitigated. Though terms such as “trustworthy AI”, “safe AI”, “responsible AI”, and “beneficial AI” are not precisely defined, they are an important part of the government’s characterization of high-level AI R&D strategy. We aim to elucidate these concepts further in this report, focusing on specific research directions aimed at bolstering the desirable attributes in ML models. We will start by discussing an increasing trend we observe in governmental strategies and certain program solicitations emphasizing such goals.

This increased focus has been reflected in many government strategy documents in recent years. Both the 2016 National AI R&D Strategic Plan and its 2019 update from the National Science and Technology Council pinpointed trustworthiness in AI as a crucial objective. This was reiterated even more emphatically in the recent 2023 revision, which stressed ensuring confidence and reliability of AI systems as especially significant objectives. The plan also underlined how burgeoning numbers of AI models have necessitated urgent efforts towards enhancing safety parameters in AIs. Public feedback regarding previous versions of this plan highlight an expanded priority across academia, industry and society at large for AI models that maintain safety codes, transparency protocols, and equitable improvements without trespassing privacy norms. The NSF’s FY2024 budget proposal submission articulated its primary intention in advancing “the frontiers of trustworthy AI“, deviating from earlier years’ emphasis on sowing seeds for future advancements across various realms of human pursuits.

Concrete manifestations of this increasing emphasis on trustworthy AI can be seen not only in high-level discussions of strategy, but also through specific programs designed to advance trustworthiness in AI models. One of the seven new NSF AI institutes established recently focuses exclusively on “trustworthy AI“. Other programs like NSF’s Fairness in Artificial Intelligence and Safe-Learning Enabled Systems focus chiefly on cultivating dimensions of trustworthy AI research.

Despite their value, these individual programs focused on AI trustworthiness form only a small fragment of total funding allocated for AI R&D by the NSF; at around $20 million per year against nearly $800 million per year in funding towards AI R&D. It remains unclear how much this mounting concern surrounding trustworthy and responsible AI influences NSF’s funding commitments towards responsible AI research. In this paper, we aim to provide an initial investigation of this question by estimating the proportion of grants over the past five fiscal years (FY 2018-2022) from NSF’s CISE directorate (the primary funder of AI R&D within NSF) which support a few key research directions within trustworthy AI: interpretability, robustness, fairness, and privacy-preservation.

Please treat our approximations cautiously; these are neither exact nor conclusive responses to this question. Our methodology heavily relies upon individual judgments categorizing nebulous grant types within a sample of the overall grants. Our goal is to offer an initial finding into federal funding trends directed towards trustworthy AI research.

Methodology

We utilized NSF’s online database of granted awards from the CISE directorate to facilitate our research. Initially, we identified a representative set of AI R&D-focused grants (“AI grants”) funded by NSF’s CISE directorate across certain fiscal years 2018-2022. Subsequently, we procured a random selection of these grants and manually classified them according to predetermined research directions relevant to trustworthy AI. An overview of this process is given below, with details on each step of our methodology provided in the Appendix.

  1. Search: Using NSF’s online award search feature, we extracted a near comprehensive collection of abstracts of grant applications approved by NSF’s CISE directorate during fiscal years 2018-2022. Since the search function relies on keywords, we focused on high recall in the search results over high precision, leading to an overly encompassing result set yielding close to 1000 grants annually. It is believed that this initial set encompasses nearly all AI grants from NSF’s CISE directorate while also incorporating numerous non-AI-centric R&D awards.
  2. Sample: For each fiscal year, a representative random subset of 100 abstracts was drawn (approximating 10% of the total abstracts extracted). This sample size was chosen as it strikes a balance between manageability for manual categorization and sufficient numbers for reasonably approximate funding estimations.
  3. Sort: Based on prevailing definitions of trustworthy AI, four clusters were conceptualized for research directions: i) interpretability/explainability, ii) robustness/safety, iii) fairness, iv) privacy-preservation. To furnish useful contrasts with trustworthy AI funding numbers, additional categories were designated: v) capabilities and vi) applications of AI. Herein, “capabilities” corresponds to pioneering initiatives in model performance and “application of AI” refers to endeavors leveraging extant AI techniques for progress in other domains. Non-AI-centric grants were sorted out of our sample and marked as “other” in this stage. Each grant within our sampled allotment was manually classified into one or more of these research directions based on its primary focus and possible secondary or tertiary objectives where applicable—additional specifics regarding this sorting process are delineated in the Appendix.

Findings

Based on our sorting process, we estimate the proportion of AI grant funds from NSF’s CISE directorate which are primarily directed at our trustworthy AI research directions.

Figure 2.

As depicted in Figure 2, the collective proportion of CISE funds allocated to trustworthy AI research directions usually varies from approximately 10% to around 15% of the total AI funds per annum. However, there are no noticeable positive or negative trends in this overall metric, indicating that over the five-year period examined, there were no dramatic shifts in the funding proportion assigned to trustworthy AI projects. 

Considering secondary and tertiary research directions

As previously noted, several grants under consideration appeared to have secondary or tertiary focuses or seemed to strive for research goals which bridge different research directions. We estimate that over the five-year evaluation period, roughly 18% of grant funds were directed to projects having at least a partial focus on trustworthy AI.

Figure 3.

Specific Research Directions

Robustness/safety

Presently, ML systems tend to fail unpredictably when confronted with situations considerably different from their training scenarios (non-iid settings). This failure propensity may induce detrimental effects, especially in high-risk environments. With the objective of diminishing such threats, robustness or safety-related research endeavors aim to enhance system reliability across new domains and mitigate catastrophic failure when facing untrained situations.1 Additionally, this category encompasses projects addressing potential risks and failure modes identification for further safety improvements.

Over the past five years, our analysis shows that research pertaining to robustness is typically the most funded trustworthy AI direction, representing about 6% of the total funds allocated by CISE to AI research. However, no definite trends have been identified concerning funding directed at robustness over this period.

Figure 4.

Interpretability/explainability

Explaining why a machine learning model outputs certain predictions for a given input is still an unsolved problem.2 Research on interpretability or explainability aspires to devise methods for better understanding the decision-making processes of machine learning models and designing more easily interpretable decision systems.

Over the investigated years, funding supporting interpretability and explainability doesn’t show substantial growth, averagely accounting for approximately 2% of all AI funds.

Figure 5.

Fairness/non-discrimination

ML systems often reflect and exacerbate existing biases present in their training data. To circumvent these issues, research focusing on fairness or non-discrimination purposes works towards creating systems that sidestep such biases. Frequently this area of study involves exploring ways to reduce dataset biases and developing bias-assessment metrics for current models along with other bias-reducing strategies for ML models.3

The funding allocated to this area also generally accounts for around 2% of annual AI funds. Our data did not reveal any discernible trend related to fairness/non-discrimination orientated fundings throughout the examined period.

Figure 6.

Privacy-preservation

AI systems training typically requires large volumes of data that can include personal information; therefore privacy preservation is crucial. In response to this concern, privacy-preserving machine learning research aims at formulating methodologies capable of safeguarding private information.4

Throughout the studied years, funding for privacy-preserving machine learning exhibits significant growth from under 1% in 2018 (the smallest among our examined research directions) escalating to over 6% in 2022 (the largest among our inspect trustworthy AI research topics). This increase flourishes around fiscal year 2020; however, its cause remains indeterminate.

Figure 7.

Recommendations

NSF should continue to carefully consider the role that its funding can play in an overall AI R&D portfolio, taking into account both private and public investment. Trustworthy AI research presents a strong opportunity for public investment. Many of the lines of research within trustworthy AI may be under-incentivized within industry investments, and can be usefully pursued by academics. Concretely, NSF could: 


Appendix

Methodology

For this investigation, we aim to estimate the proportion of AI grant funding from NSF’s CISE directorate which supports research that is relevant to trustworthy AI. To do this, we rely on publicly-provided data of awarded grants from NSF’s CISE directorate, accessed via NSF’s online award search feature. We first aim to identify, for each of the examined fiscal years, a set of AI-focused grants (“AI grants”) from NSF’s CISE directorate. From this set, we draw a random sample of grants, which we manually sort into our selected trustworthy AI research directions. We go into more detail on each of these steps below. 

How did we choose this question? 

We touch on some of the motivation for this question in the introduction above. We investigate NSF’s CISE directorate because it is the primary directorate within NSF for AI research, and because focusing on one directorate (rather than some broader focus, like NSF as a whole) allows for a more focused investigation. Future work could examine other directorates within NSF or other R&D agencies for which grant awards are publicly available. 

We focus on estimating trustworthy AI funding as a proportion of total AI funding, with our goal being to analyze how trustworthy AI is prioritized relative to other AI work, and because this information could be more action-guiding for funders like NSF who are choosing which research directions within AI to prioritize.

Search (identifying a list of AI grants from NSF’s CISE Directorate)

To identify a set of AI grants from NSF’s CISE directorate, we used the advanced award search feature on NSF’s website. We conducted the following search:

This search yielded a set of ~1000 grants for each fiscal year. This set of grants was over-inclusive, with many grants which were not focused on AI. This is because we aimed for high recall, rather than high precision when choosing our key words; our focus was to find a set of grants which would include all of the relevant AI grants made by NSF’s CISE directorate. We aim to sort out false positives, i.e. grants not focused on AI, in the subsequent “sorting” phase. 

Sampling

We assigned a random number to each grant returned by our initial search, and then sorted the grants from smallest to largest. For each year, we copied the 100 grants with the smallest randomly assigned numbers and into a new spreadsheet which we used for the subsequent “sorting” step. 

We now had a random sample of 500 grants (100 for each FY) from the larger set of ~5000 grants which we identified in the search phase. We chose this number of grants for our sample because it was manageable for manual sorting, and we did not anticipate massive shifts in relative proportions were we to expand from a ~10% sample to say, 20% or 30%. 

Identifying Trustworthy AI Research Directions

We aimed to identify a set of broad research directions which would be especially useful for promoting trustworthy properties in AI systems, which could serve as our categories in the subsequent manual sorting phase. We consulted various definitions of trustworthy AI, relying most heavily on the definition provided by NIST: “characteristics of trustworthy AI include valid and reliable, safe, secure and resilient, accountable and transparent, explainable and interpretable, privacy-enhanced, and fair with harmful bias managed.” We also consulted some lists of trustworthy AI research directions, identifying research directions which appeared to us to be of particular importance for trustworthy AI. Based on the above process, we identify the following clusters of trustworthy AI research:

It is important to note here that none of these research areas are crisply defined, but we thought that these clusters provided a useful, high-level, way to break trustworthy AI research down into broad categories. 

In the subsequent steps, we aim to compare the amount of grant funds that are specifically aimed at promoting the above trustworthy AI research directions with the amount of funds which are directed towards improving AI systems’ capabilities in general, or simply applying AI to other classes of problems.

Sorting

For our randomly sampled set of 500 grants, we aimed to sort each grant according to its intended research direction. 

For each grant, we a) read the title and the abstract of the grant and b) assigned the grant a primary research direction, and if applicable, a secondary and tertiary research direction. Secondary and tertiary research directions were not selected for each grant, but were chosen for some grants which stood out to us as having a few different objectives. We provide examples of some of these “overlapping” grants below.

We sorted grants into the following categories:

  1. Capabilities
    1. This category was used for projects that are primarily aimed at advancing the capabilities of AI systems, by making them more competent at some task, or for research which could be used to push forward the frontier of capabilities for AI systems. 
    2. This category also includes investments in resources that are generally useful for AI research, e.g. computing clusters at universities. 
    3. Example: A project which aims to develop a new ML model which achieves SOTA performance on a computer vision benchmark.
  2. Application of AI/ML.
    1. This category was used for projects which apply existing ML/AI techniques to research questions in other domains. 
    2. Example: A grant which uses some machine learning techniques to analyze large sets of data on precipitation, temperature, etc. to test a hypothesis in climatology.
  3. Interpretability/explainability.
    1. This category was used for projects which aim to make AI systems more interpretable or explainable, by allowing for a better understanding of their decision-making process. Here, we included both projects which offer methods for better interpreting existing models, and also on projects which offer new training methods that are easier to interpret.
    2. Example: A project which determines the features of a resume that make it more or less likely to be scored positively by a resume-ranking algorithm.
  4. Robustness/safety
    1. This category was used for projects which aim to make AI systems more robust to distribution shifts and adversarial inputs, and more reliable in unfamiliar circumstances. Here, we include both projects which introduce methods for making existing systems more robust, and those which introduce new techniques that are more robust in general. 
    2. Example: A project which explores new methods for providing systems with training data that causes a computer vision model to learn robustly useful patterns from data, rather than spurious ones. 
  5. Fairness/non-discrimination
    1. This category was used for projects which aim to make AI systems less likely to entrench or reflect harmful biases. Here, we focus on work directly geared at making models themselves less biased. Many project abstracts described efforts to include researchers from underrepresented populations in the research process, which we chose not to include because of our focus on model behavior.
    2. Example: A project which aims to design techniques for “training out” certain undesirable racial or gender biases.
  6. Privacy preservation
    1. This category was used for projects which aim to make AI systems less privacy-invading. 
    2. Example: A project which provides a new algorithm that allows a model to learn desired behavior without using private data. 
  7. Other
    1. This category was used for grants which are not focused on AI. As mentioned above, the random sample included many grants which were not AI grants, and these could be removed as “other.”

Some caveats and clarifications on our sorting process

This sorting focuses on the apparent intentions and goals of the research as stated in the abstracts and titles, as these are the aspects of each grant the NSF award search feature makes readily viewable. Our process may therefore miss research objectives which are outlined in the full grant application (and not within the abstract and title). 

A focus on specific research directions

We chose to focus on specific research agendas within trustworthy and responsible AI, rather than just sorting grants between a binary of “trustworthy” or “not trustworthy” in order to bring greater clarity to our grant sorting process. We still make judgment calls with regards to which individual research agendas are being promoted by various grants, but we hope that such a sorting approach will allow greater agreement.

As mentioned above, we also assigned secondary and tertiary research directions to some of these grants. You can view the grants in the sample and how we sorted each here. Below, we offer some examples of the kinds of grants which we would sort into these categories.

Examples of Grants with Multiple Research Directions

To summarize: in the sorting phase, we read the title and abstract of each grant in our random sample, and assigned these grants to a research direction. Many grants received only a “primary” research direction, though some received secondary and tertiary research directions as well. This sorting was based on our understanding of the main goals of the project, based on the description provided by the project title and abstract.

Revolutionary Advances in AI Won’t Wait

The Pentagon has turned innovation into a buzzword, and everyone can agree on the need for faster innovation. It seems a new innovation office is created every week. Yet when it comes to AI, the DoD is still moving too slowly and hampered by a slow procurement process. How can it make innovation core to the organization and leverage the latest technological developments?

We have to first understand what type of innovation is needed. As Harvard Business School professor Clayton Christensen wrote, there are two types of innovation: sustaining and disruptive. Sustaining innovation makes existing products and services better. It’s associated with incremental improvements, like adding new features to a smartphone or boosting the performance of the engine on a car, in pursuit of better performance and higher profits.

Disruptive innovation occurs when a firm with fewer resources challenges one of the bigger incumbents, typically either with a lower-cost business model or by targeting a new customer segment. Disruptive firms can start with fewer resources because they have less overhead and fewer fixed costs, and they often leverage new technologies.

Initially, a disruptor goes unnoticed by an incumbent, who is focused on capturing more profitable customers through incremental improvements. Over time, though, the disruptor grows enough to capture large market share, threatening to replace the incumbent altogether.

Intel Illustrates Both Types of Innovation

Intel serves as an illustrative example of both types of innovation. It was the first company to manufacture DRAM memory chips, creating a whole new market. However, as it focused on sustaining innovation, it was disrupted by low-cost Japanese firms that were able to offer the same DRAM memory chips at a lower cost. Intel then pivoted to focus on microprocessors, disrupting the personal computer industry. However, more recently, Intel is at risk of being disrupted again, this time by lower-power microprocessors, like ARM, and application-specific processors, like Nvidia GPUs.

The DoD, like the large incumbent it is, has become good at sustaining innovation. Its acquisitions process first outlines the capabilities it needs, then sets budgets, and finally purchases what external partners provide. Each part of this – the culture, the procedures, the roles, the rules – have been optimized over time for sustaining innovation. This lengthy, three-part process has allowed the Pentagon to invest in steadily improving hardware, like submarines and airplanes, and the defense industrial base has followed suit, consolidating to just five major defense contractors that can provide the desired sustaining innovation.

The problem is that we are now in an era of disruptive innovation, and a focus on sustaining innovation doesn’t work for disruptive innovation. As a result of decreasing defense budgets in the 1990s and a parallel increase in funding in the private sector, companies now lead the way on innovation. With emerging technologies like drones, artificial intelligence, and quantum computing advancing every month by the private sector, a years-long process to outline capabilities and define budgets won’t work: by the time the requirements are defined and shared, the technology will have moved on, rendering the old requirements obsolete. To illustrate the speed of change, consider that the National Security Commission on Artificial Intelligence’s lengthy 2021 report on how the U.S. can win in the AI era failed to include any mention of generative AI or Large-Language Models, which have seen revolutionary advances in just the past few years. Innovation is happening faster than our ability to write reports or define capabilities.

The Small, Daring, and Nimble Prevail

So how does an organization respond to the threat of disruptive innovation? It must create an entirely new business unit to respond, with new people, processes, and culture. The existing organization has been optimized to the current threat in every way, so in many ways it has to start over while still leveraging the resources and knowledge it has accumulated.

Ford learned this lesson the hard way. After trying to intermix production of internal combustion cars and electric vehicles for years, Ford recently carved out the EV group into a separate business unit. The justification? The “two businesses required different skills and mind-sets that would clash and hinder each area if they remained parts of one organization”, reported the New York Times after speaking with Jim Farley, the CEO of Ford.

When the personal computer was first introduced by Apple, IBM took it seriously and recognized the threat to its mainframe business. Due to bureaucratic and internal controls, however, its product development process took four or five years. The industry was moving too quickly for that. To respond, the CEO created a secretive, independent team of just 40 people. The result? The IBM personal computer was ready to ship just one year later.

One of the most famous examples of creating a new business unit comes from the defense space: Skunkworks. Facing the threat of German aircraft in World War II, the Air Force asked Lockheed Martin to design them a plane that could fly at 600-mph, which was 200 mph faster than Lockheed’s current planes. And they wanted a working prototype in just 180 days. With the company already at capacity, a small group of engineers, calling themselves Skunkworks, set up shop in a different building with limited resources – and miraculously hit the goal ahead of schedule. Their speed was attributed to their ability to avoid Lockheed’s bureaucratic processes. Skunkworks would expand over the years and go on to build some of the most famous Air Force planes, including the U-2 and SR-71.

DoD’s Innovation Approach to Date

The DoD appears to be re-learning these lessons today. Its own innovation pipeline is clogged down by bureaucracy and internal controls. Faced with the threat of a Chinese military that is investing heavily into AI and moving towards AI-enabled warfare, the DoD has finally realized that it cannot rely on its sustaining innovation to win. It must reorganize itself to respond to the disruptive threat.

It has created a wave of new pathways to accelerate the adoption of emerging technologies. SBIR open topics, the Defense Innovation Unit, SOFWERX, the Office of Strategic Capital, and the National Security Innovation Capital program are all initiatives created in the spirit of Skunkworks or the “new business unit”. Major commands are doing it too, with the emergence of innovation units like Navy Task Force 59 in CENTCOM.

These initiatives are all attempts to respond to the disruption by opening up alternative pathways to fund and acquire technology. SBIR open topics, for example, have been found to be more effective than traditional approaches because they don’t require the DoD to list requirements up front, instead allowing it to quickly follow along with commercial and academic innovation.

Making the DoD More Agile 

Some of these initiatives will work, others won’t. The advantage of DoD is that it has the resources and institutional heft to create multiple such “new business units” that try a variety of approaches, provided Congress continues to fund them.

From there, it must learn which approaches work best for accelerating the adoption of emerging technologies and pick a winner, scaling that approach to replace its core acquisitions process. These new pathways must be integrated into the main organization, otherwise they risk remaining fringe programs with scoped impact. The best contractors from these new pathways will also have to scale up, disrupting the defense industrial base. It is only with these new operating and business models – along with new funding policies and culture – can the DoD become proficient at acquiring the latest technologies. Scaling up the new business units is the only way to do so.

The path forward is clear. The hard work to reform the acquisitions process must begin by co-opting the strengths of these new innovation pathways. The good news is that the DoD, through its large and varied research programs, partnerships, and funding, has clear visibility into emerging and future technologies. Now it must figure out how to scale the new innovation programs or risk getting disrupted.

FY24 NDAA AI Tracker

As both the House and Senate gear up to vote on the National Defense Authorization Act (NDAA), FAS is launching this live blog post to track all proposals around artificial intelligence (AI) that have been included in the NDAA. In this rapidly evolving field, these provisions indicate how AI now plays a pivotal role in our defense strategies and national security framework. This tracker will be updated following major updates.

Senate NDAA. This table summarizes the provisions related to AI from the version of the Senate NDAA that advanced out of committee on July 11. Links to the section of the bill describing these provisions can be found in the “section” column. Provisions that have been added in the manager’s package are in red font. Updates from Senate Appropriations committee and the House NDAA are in blue.

Senate NDAA Provisions
ProvisionSummarySection
Generative AI Detection and Watermark CompetitionDirects Under Secretary of Defense for Research and Engineering to create a competition for technology that detects and watermarks the use of generative artificial intelligence.218
DoD Prize Competitions for Business Systems ModernizationAuthorizes competitions to improve military business systems, emphasizing the integration of AI where possible.221
Broad review and update of DoD AI StrategyDirects the Secretary of Defense to perform a periodic review and update of its 2018 AI strategy, and to develop and issue new guidance on a broad range of AI issues, including adoption of AI within DoD, ethical principles for AI, mitigation of bias in AI, cybersecurity of generative AI, and more.222
Strategy and assessment on use of automation and AI for shipyard optimizationDevelopment of a strategy on the use of AI for Navy shipyard logistics332
Strategy for talent development and management of DoD Computer Programming WorkforceEstablishes a policy for “appropriate” talent development and management policies, including for AI skills.1081
Sense of the Senate Resolution in Support of NATOOffers support for NATO and NATO’s DIANA program as critical to AI and other strategic priorities1238 | 1239
Enhancing defense partnership with IndiaDirects DoD to enhance defense partnership with India, including collaboration on AI as one potential priority area.1251
Specification of Duties for Electronic Warfare Executive CommitteeAmends US code to specify the duties of the Electronic Warfare Executive Committee, including an assessment of the need for automated, AI/ML-based electronic warfare capabilities1541
Next Generation Cyber Red TeamsDirects the DoD and NSA to submit a plan to modernize cyber red-teaming capabilities, ensuring the ability to emulate possible threats, including from AI1604
Management of Data Assets by Chief Digital OfficerOutlines responsibilities for CDAO to provide data analytics capabilities needed for “global cyber-social domain.”1605
Developing Digital Content Provenance CourseDirects Director of Defense Media Activity to develop a course on digital content provenance, including digital forgeries developed with AI systems, e.g. AI-generated “deepfakes,”1622

Report on Artificial Intelligence Regulation in Financial Services Industry

Directs regulators of the financial services industry to produce reports analyzing how AI is and ought to be used by the industry and by regulators6096

AI Bug Bounty Programs

Directs CDAO to develop a bug bounty program for AI foundation models that are being integrated in DOD operations6097

Vulnerability analysis study for AI-enabled military applications

Directs CDAO to complete a study analyzing vulnerabilities to the privacy, security, and accuracy of AI-enabled military applications, as well as R&D needs for such applications, including foundation models.6098

Report on Data Sharing and Coordination

Directs SecDef to to submit a report on ways to improve data sharing across DoD6099

Establishment of Chief AI Officer of the Department of State

Establishes within the Department of State a Chief AI Officer, who may also serve as Chief Data Officer to oversee adoption of AI in the Department and to advise the Secretary of State on the use of AI in conducting data-informed diplomacy.6303

House NDAA. This table summarizes the provisions related to AI from the version of the House NDAA that advanced out of committee. Links to the section of the bill describing these provisions can be found in the “section” column.

House NDAA Provisions
ProvisionSummarySection
Process to ensure the responsible development and use of artificial intelligenceDirects CDAO to develop a process for assessing whether AI technology used by DoD is functioning responsibly, including through the development of clear standards, and to amend AI technology as needed220
Intellectual property strategyDirects DoD to develop an intellectual property strategy to enhance capabilities in procurement of emerging technologies and capabilities263
Study on establishment of centralized platform for development and testing of autonomy softwareDirects SecDef and CDAO to conduct a study, assessing the feasibility and advisability of developing a centralized platform to develop and test autonomous software.264
Congressional notification of changes to Department of Defense policy on autonomy in weapon systemsRequires that Congress be notified of changes to DoD Directive 3000.09 (on autonomy in weapons systems) within 30 days of any changes266
Sense of Congress on dual use innovative technology for the robotic combat vehicle of the ArmyThis offers support for the Army’s acquisition strategy for the Robot Combat Vehicle program, and recommends that the Army consider a similar framework for future similar programs.267
Pilot program on optimization of aerial refueling and fuel management in contested logistics environments through use of artificial intelligenceDirects CDAO, USD(A&S), and Air Force to develop a pilot program to optimize the logistics of aerial refueling and to consider the use of AI technology to help with this mission.266
Modification to acquisition authority of the senior official with principal responsibility for artificial intelligence and machine learningIncreases annual acquisition authority for CDAO from $75M to $125M, and extends this authority from 2025 to 2029.827
Framework for classification of autonomous capabilitiesDirects CDAO and others within DoD to establish a department-wide classification framework for autonomous capabilities to enable easier use of autonomous systems in the department.930

Funding Comparison. The following tables compare the funding requested in the President’s budget to funds that are authorized in current House and Senate versions of the NDAA. All amounts are in thousands of dollars.

Funding Comparison
ProgramRequestedAuthorized in HouseAuthorized in SenateNEW! Passed in Senate Approps 7/27NEW! Passed in full House 9/28
Other Procurement, Army–Engineer (non-construction) equipment: Robotics and Applique Systems68,89368,89368,893

65,118 (-8,775 for “Effort previously funded,” +5,000 for “Soldier borne sensor”)

73,893 (+5,000 for “Soldier borne sensor”)

AI/ML Basic Research, Army10,70810,70810,708

10,708

10,708

AI/ML Technologies, Army24,14224,14224,142

27,142 (+3,000 for “Automated battle damage assessment and adjust fire”)

24

AI/ML Advanced Technologies, Army13,18715,687
(+ 2,500 for “Autonomous Long Range Resupply”)
18,187
(+ 5,000 for “Tactical AI & ML”)

24,687 (+11,500 for “Cognitive computing architecture
for military systems”)

13,187

AI Decision Aids for Army Missile Defense Systems Integration06,0000

0

0

Robotics Development, Army3,0243,0243,024

3,024

3,024

Ground Robotics, Army35,31935,31935,319

17,337 (-17,982 for “SMET Inc II early to need”)

45,319 (+10,000 for “common robotic controller”)

Applied Research, Navy: Long endurance mobile autonomous passive acoustic sensing research02,5000

0

0

Advanced Components, Navy: Autonomous surface and underwater dual-modality vehicles05,0000

3,000

0

Air Force University Affiliated Research Center (UARC)—Tactical Autonomy8,0188,0188,018

8,018

8,018

Air Force Applied Research: Secure Interference Avoiding Connectivity of Autonomous AI Machines03,0005,000

0

0

Air Force Advanced Technology Development: Semiautonomous adversary air platform0010,000

0

0

Advanced Technology Development, Air Force: High accuracy robotics02,5000

0

0

Air Force Autonomous Collaborative Platforms118,826176,013
(+ 75,000 for Project 647123: Air-Air Refueling TMRR,
-17,813 for Technical realignment )
101,013
(- 17,813 for DAF requested realignment of funds)

101,013

101,013

Space Force: Machine Learning Techniques for Radio Frequency (RF) Signal Monitoring and Interference Detection010,0000

0

0

Defense-wide: Autonomous resupply for contested logistics02,5000

0

0

Military Construction–Pennsylvania Navy Naval Surface Warfare Center Philadelphia: AI Machinery Control Development Center088,20088,200

0

0

Intelligent Autonomous Systems for Seabed Warfare007,000

5,000

0

Funding for Office of Chief Digital and Artificial Intelligence Officer
ProgramRequestedAuthorized in HouseAuthorized in SenateNEW! Passed in Senate AppropsNEW! Passed in full House
Advanced Component Development and Prototypes34,35034,35034,350

34,350

34,350

System Development and Demonstration615,245570,246
(-40,000 for “insufficient justification,” -5,000 for “program decrease.”)
615,246

246,003 (-369,243, mostly for functional transfers to JADC2 and Alpha-1)

704,527 (+89,281, mostly for “management innovation pilot” and transfers from other programs for “enterprise digital alignment”)

Research, Development, Test, and Evaluation17,24717,24717,247

6,882 (-10,365, “Functional transfer to line 130B for ALPHA-1″)

13,447 (-3,800 for “excess growth”)

Senior Leadership Training Courses02,7500

0

0

ALPHA-1000

222,723

0


On Senate Approps Provisions

The Senate Appropriations Committee generally provided what was requested in the White House’s budget regarding artificial intelligence (AI) and machine learning (ML), or exceeded it. AI was one of the top-line takeaways from the Committee’s summary of the defense appropriations bill. Particular attention has been paid to initiatives that cut across the Department of Defense, especially the Chief Digital and Artificial Intelligence Office (CDAO) and a new initiative called Alpha-1. The Committee is supportive of Joint All-Domain Command and Control (JADC2) integration and the recommendations of the National Security Commission on Artificial Intelligence (NSCAI).

On House final bill provisions

Like the Senate Appropriations bill, the House of Representatives’ final bill generally provided or exceeded what was requested in the White House budget regarding AI and ML. However, in contract to the Senate Appropriations bill, AI was not a particularly high-priority takeaway in the House’s summary. The only note about AI in the House Appropriations Committee’s summary of the bill was in the context of digital transformation of business practices. Program increases were spread throughout the branches’ Research, Development, Test, and Evaluation budgets, with a particular concentration of increased funding for the Defense Innovation Unit’s AI-related budget.

Six Policy Ideas for the National AI Strategy

The White House Office of Science and Technology Policy (OSTP) has sought public input for the Biden administration’s National AI Strategy, acknowledging the potential benefits and risks of advanced AI. The Federation of American Scientists (FAS) was happy to recommend specific actions for federal agencies to safeguard Americans’ rights and safety. With U.S. companies creating powerful frontier AI models, the federal government must guide this technology’s growth toward public benefit and risk mitigation.

Recommendation 1: OSTP should work with a suitable agency to develop and implement a pre-deployment risk assessment protocol that applies to any frontier AI model.

Before launching a frontier AI system, developers must ensure safety, trustworthiness, and reliability through pre-deployment risk assessment. This protocol aims to thoroughly analyze potential risks and vulnerabilities in AI models before deployment. 

We advocate for increased funding towards the National Institute of Standards and Technology (NIST) to enhance its risk measurement capacity and develop robust benchmarks for AI model risk assessment. Building upon NIST’s AI Risk Management Framework (RMF) will standardize metrics for evaluation incorporating various cases such as open-source models, academic research, and fine-tuning of models which differ from larger labs like OpenAI’s GPT-4.

We propose the Federal Trade Commission (FTC), under Section 5 of the FTC Act, implement and enforce this pre-deployment risk assessment strategy. The FTC’s role to prevent unfair or deceptive practices in commerce is aligned with mitigating potential risks from AI systems.

Recommendation 2: Adherence to the appropriate risk management framework should be compulsory for any AI-related project that receives federal funding.

The U.S. government, as a significant funder of AI through contracts and grants, has both a responsibility and opportunity. Responsibility: to ensure that its AI applications meet a high bar for risk management.  Opportunity: to enhance a culture of safety in AI development more broadly. Adherence to a risk management framework should be a prerequisite for AI projects seeking federal funds.

Currently, voluntary guidelines such as NIST’s AI RMF exist, but we propose making these compulsory. Agencies should require contractors to document and verify the risk management practices in place for the contract. For agencies that do not have their own guidelines, the NIST AI RMF should be used. And the NSF should require documentation of the grantee’s compliance with the NIST AI RMF in grant applications for AI projects. This approach will ensure all federally funded AI initiatives maintain a high bar for risk management.

Recommendation 3: NSF should increase its funding for “trustworthy AI” R&D.

Trustworthy AI” refers to AI systems that are reliable, safe, transparent, privacy-enhanced, and unbiased. While NSF is a key non-military funder of AI R&D in the U.S., our rough estimates indicate that its investment in fields promoting trustworthiness has remained relatively static, accounting for only 10-15% of all AI grants. Given its $800 million annual AI-related budget, we recommend that NSF direct a larger share of grants towards research in trustworthy AI.

To enable this shift, NSF could stimulate trustworthy AI research through specific solicitations; launch targeted programs in this area; and incorporate a “trustworthy AI” section in funding applications, prompting researchers to outline the trustworthiness of their projects. This would help evaluate AI project impacts and promote proposals with significant potential in trustworthy AI. Lastly, researchers could be requested or mandated to apply the NIST AI RMF during their studies.

Recommendation 4: FedRAMP should be broadened to cover AI applications contracted for by the federal government.

The Federal Risk and Authorization Management Program (FedRAMP) is a government-wide initiative that standardizes security protocols for cloud services. Given the rising utilization of AI services in federal operations, a similar system of security standards should apply to these services, since they are responsible for managing highly sensitive data related to national security and individual privacy.

Expanding FedRAMP’s mandate to include AI services is a logical next step in ensuring the secure integration of advanced technologies into federal operations. Applying a framework like FedRAMP to AI services would involve establishing robust security standards specific to AI, such as secure data handling, model transparency, and robustness against adversarial attacks. The expanded FedRAMP program would streamline AI integration into federal operations and avoid repetitive security assessments.

Recommendation 5: The Department of Homeland Security should establish an AI incidents database.

The Department of Homeland Security (DHS) needs to create a centralized AI Incidents Database, detailing AI-related breaches, failures and misuse across industries. Its existing authorization under the Homeland Security Act of 2002 makes DHS capable of this role. This database would increase understanding, mitigate risks, and build trust in AI systems’ safety and security.

Voluntary reporting from AI stakeholders should be encouraged while preserving data confidentiality. For effectiveness, anonymized or aggregated data should be shared with AI developers, researchers, and policymakers to better understand AI risks. DHS could use existing databases such as the one maintained by the Partnership on AI and Center for Security and Emerging Technologies, as well as adapt reporting methods from global initiatives like the Financial Services Information Sharing and Analysis Center.

Recommendation 6: OSTP should work with agencies to streamline the process of granting Interested Agency Waivers to AI researchers on J-1 visas.

The ongoing global competition in AI necessitates attracting and retaining a diverse, highly skilled talent pool. The US J-1 Exchange Visitor Program, often used by visiting researchers, requires some participants to return home for two years before applying for permanent residence.

Federal agencies can waive this requirement for certain individuals via an “Interested Government Agency” (IGA) request. Agencies should establish a transparent, predictable process for AI researchers to apply for such waivers. The OSTP should collaborate with agencies to streamline this process. Taking cues from the Department of Defense’s structured application process, including a dedicated webpage, application checklist, and sample sponsor letter, could prove highly beneficial for improving the transition of AI talent to permanent residency in the US.
Review the details of these proposals in our public comment.

How Do OpenAI’s Efforts To Make GPT-4 “Safer” Stack Up Against The NIST AI Risk Management Framework?

In March, OpenAI released GPT-4, another milestone in a wave of recent AI progress. This is OpenAI’s most advanced model yet, and it’s already being deployed broadly to millions of users and businesses, with the potential for drastic effects across a range of industries

But before releasing a new, powerful system like GPT-4 to millions of users, a crucial question is: “How can we know that this system is safe, trustworthy, and reliable enough to be released?” Currently, this is a question that leading AI labs are free to answer on their own–for the most part. But increasingly, the issue has garnered greater attention as many have become worried that the current pre-deployment risk assessment and mitigation methods like those done by OpenAI are insufficient to prevent potential risks, including the spread of misinformation at scale, the entrenchment of societal inequities, misuse by bad actors, and catastrophic accidents. 

This concern is central to a recent open letter, signed by several leading machine learning (ML) researchers and industry leaders, which calls for a 6-month pause on the training of AI systems “more powerful” than GPT-4 to allow more time for, among other things, the development of strong standards which would “ensure that systems adhering to them are safe beyond a reasonable doubt” before deployment. There’s a lot of disagreement over this letter, from experts who contest the letter’s basic narrative, to others who think that the pause is “a terrible idea” because it would unnecessarily halt beneficial innovation (not to mention that it would be impossible to implement). But almost all of the participants in this conversation tend to agree, pause or no, that the question of how to assess and manage risks of an AI system before actually deploying it is an important one. 

A natural place to look for guidance here is the National Institute of Standards and Technology (NIST), which released its AI Risk Management Framework (AI RMF) and an associated playbook in January. NIST is leading the government’s work to set technical standards and consensus guidelines for managing risks from AI systems, and some cite its standard-setting work as a potential basis for future regulatory efforts.

In this piece we walk through both what OpenAI actually did to test and improve GPT-4’s safety before deciding to release it, limitations of this approach, and how it compares to current best practices recommended by the National Institute of Standards and Technology (NIST). We conclude with some recommendations for Congress, NIST, industry labs like OpenAI, and funders.

What did OpenAI do before deploying GPT-4? 

OpenAI claims to have taken several steps to make their system “safer and more aligned”. What are those steps? OpenAI describes these in the GPT-4 “system card,” a document which outlines how OpenAI managed and mitigated risks from GPT-4 before deploying it. Here’s a simplified version of what that process looked like:

Was this enough? 

Though OpenAI says they significantly reduced the rates of undesired model behavior through the above process, the controls put in place are not robust, and methods for mitigating bad model behavior are still leaky and imperfect. 

OpenAI did not eliminate the risks they identified. The system card documents numerous failures of the current version of GPT-4, including an example in which it agrees to “generate a program calculating attractiveness as a function of gender and race.” 

Current efforts to measure risks also need work, according to GPT-4 red teamers. The Alignment Research Center (ARC) which assessed these models for “emergent” risks says that “the testing we’ve done so far is insufficient for many reasons, but we hope that the rigor of evaluations will scale up as AI systems become more capable.” Another GPT-4 red-teamer,  Aviv Ovadya, says that “if red-teaming GPT-4 taught me anything, it is that red teaming alone is not enough.” Ovadya recommends that future pre-deployment risk assessment efforts are improved using “violet teaming,” in which companies identify “how a system (e.g., GPT-4) might harm an institution or public good, and then support the development of tools using that same system to defend the institution or public good.” 

Since current efforts to measure and mitigate risks of advanced systems are not perfect, the question comes down to when they are “good enough.” What levels of risk are acceptable?  Today, industry labs like OpenAI can mostly rely on their own judgment when answering this question, but there are many different standards that could be used. Amba Kak, the executive director of the AI Now Institute, suggests a more stringent standard, arguing that regulators should require AI companies ”to prove that they’re going to do no harm” before releasing a system. To meet such a standard, new, much more systematic risk management and measurement approaches would be needed.

How did OpenAI’s efforts map on to NIST’s Risk Management Framework? 

NIST’s AI RMF Core consists of four main “functions,” broad outcomes which AI developers can aim for as they develop and deploy their systems: map, measure, manage, and govern. 

Framework users can map the overall context in which a system will be used to determine relevant risks that should be “on their radar” in that identified context. They can then measure identified risks quantitatively or qualitatively, before finally managing them, acting to mitigate risks based on projected impact. The govern function is about having a well-functioning culture of risk management to support effective implementation of the three other functions.

Looking back to OpenAI’s process before releasing GPT-4, we can see how their actions would align with each function in the RMF Core. This is not to say that OpenAI applied the RMF in its work; we’re merely trying to assess how their efforts might align with the RMF.  

Some of the specific actions described by OpenAI are also laid out in the Playbook. The Measure 2.7 function highlights “red-teaming” activities as a way to assess an AI system’s “security and resilience,” for example.

NIST’s resources provide a helpful overview of considerations and best practices that can be taken into account when managing AI risks, but they are not currently designed to provide concrete standards or metrics by which one can assess whether the practices taken by a given lab are “adequate.” In order to develop such standards, more work would be needed. To give some examples of current guidance that could be clarified or made more concrete:

So, across NIST’s AI RMF, while determining whether a given “outcome” has been achieved could be up for debate, nothing stops developers from going above and beyond the perceived minimum (and we believe they should). This is not a bug of the framework as it is currently designed, rather a feature, as the RMF “does not prescribe risk tolerance.” However, it is important to note that more work is needed to establish both stricter guidelines which leading labs can follow to mitigate risks from leading AI systems, and concrete standards and methods for measuring risk on top of which regulations could be built. 

Recommendations

There are a few ways that standards for pre-deployment risk assessment and mitigation for frontier systems can be improved: 

Congress

NIST

Industry Labs

Funders

Creating Auditing Tools for AI Equity

The unregulated use of algorithmic decision-making systems (ADS)—systems that crunch large amounts of personal data and derive relationships between data points—has negatively affected millions of Americans. These systems impact equitable access to educationhousingemployment, and healthcare, with life-altering effects. For example, commercial algorithms used to guide health decisions for approximately 200 million people in the United States each year were found to systematically discriminate against Black patients, reducing, by more than half, the number of Black patients who were identified as needing extra care.

One way to combat algorithmic harm is by conducting system audits, yet there are currently no standards for auditing AI systems at the scale necessary to ensure that they operate legally, safely, and in the public interest. According to one research study examining the ecosystem of AI audits, only one percent of AI auditors believe that current regulation is sufficient. 

To address this problem, the National Institute of Standards and Technology (NIST) should invest in the development of comprehensive AI auditing tools, and federal agencies with the charge of protecting civil rights and liberties should collaborate with NIST to develop these tools and push for comprehensive system audits. 

These auditing tools would help the enforcement arms of these federal agencies save time and money while fulfilling their statutory duties. Additionally, there is a pressing need to develop these tools now, with Executive Order 13985 instructing agencies to “focus their civil rights authorities and offices on emerging threats, such as algorithmic discrimination in automated technology.”

Challenge and Opportunity

The use of AI systems across all aspects of life has become commonplace as a way to improve decision-making and automate routine tasks. However, their unchecked use can perpetuate historical inequities, such as discrimination and bias, while also potentially violating American civil rights.

Algorithmic decision-making systems are often used in prioritization, classification, association, and filtering tasks in a way that is heavily automated. ADS become a threat when people uncritically rely on the outputs of a system, use them as a replacement for human decision-making, or use systems with no knowledge of how they were developed. These systems, while extremely useful and cost-saving in many circumstances, must be created in a way that is equitable and secure. 

Ensuring the legal and safe use of ADS begins with recognizing the challenges that the federal government faces. On the one hand, the government wants to avoid devoting excessive resources to managing these systems. With new AI system releases happening everyday, it is becoming unreasonable to oversee every system closely. On the other hand, we cannot blindly trust all developers and users to make appropriate choices with ADS.

This is where tools for the AI development lifecycle come into play, offering a third alternative between constant monitoring and blind trust. By implementing auditing tools and signatory practices, AI developers will be able to demonstrate compliance with preexisting and well-defined standards while enhancing the security and equity of their systems. 

Due to the extensive scope and diverse applications of AI systems, it would be difficult for the government to create a centralized body to oversee all systems or demand each agency develop solutions on its own. Instead, some responsibility should be shifted to AI developers and users, as they possess the specialized knowledge and motivation to maintain proper functioning systems. This allows the enforcement arms of federal agencies tasked with protecting the public to focus on what they do best, safeguarding citizens’ civil rights and liberties.

Plan of Action

To ensure security and verification throughout the AI development lifecycle, a suite of auditing tools is necessary. These tools should help enable outcomes we care about, fairness, equity, and legality. The results of these audits should be reported (for example, in an immutable ledger that is only accessible by authorized developers and enforcement bodies) or through a verifiable code-signing mechanism. We leave the specifics of the reporting and documenting the process to the stakeholders involved, as each agency may have different reporting structures and needs. Other possible options, such as manual audits or audits conducted without the use of tools, may not provide the same level of efficiency, scalability, transparency, accuracy, or security.

The federal government’s role is to provide the necessary tools and processes for self-regulatory practices. Heavy-handed regulations or excessive government oversight are not well-received in the tech industry, which argues that they tend to stifle innovation and competition. AI developers also have concerns about safeguarding their proprietary information and users’ personal data, particularly in light of data protection laws.

Auditing tools provide a solution to this challenge by enabling AI developers to share and report information in a transparent manner while still protecting sensitive information. This allows for a balance between transparency and privacy, providing the necessary trust for a self-regulating ecosystem.

Solution Technical Requirements

A general machine learning lifecycle. Examples of what system developers at each stage would be responsible for signing off on the use of the security and equity tools in the lifecycle. These developers represent companies, teams, or individuals.

The equity tool and process, funded and developed by government agencies such as NIST, would consist of a combination of (1) AI auditing tools for security and fairness (which could be based on or incorporate open source tools such as AI Fairness 360 and the Adversarial Robustness Toolbox), and (2) a standardized process and guidance for integrating these checks (which could be based on or incorperate guidance such as the U.S. Government Accountability Office’s  Artificial Intelligence: An Accountability Framework for Federal Agencies and Other Entities).1 

Dioptra, a recent effort between NIST and the National Cybersecurity Center of Excellence (NCCoE) to build machine learning testbeds for security and robustness, is an excellent example of the type of lifecycle management application that would ideally be developed. Failure to protect civil rights and ensure equitable outcomes must be treated as seriously as security flaws, as both impact our national security and quality of life. 

Equity considerations should be applied across the entire lifecycle; training data is not the only possible source of problems. Inappropriate data handling, model selection, algorithm design, and deployment, also contribute to unjust outcomes. This is why tools combined with specific guidance is essential. 

As some scholars note, “There is currently no available general and comparative guidance on which tool is useful or appropriate for which purpose or audience. This limits the accessibility and usability of the toolkits and results in a risk that a practitioner would select a sub-optimal or inappropriate tool for their use case, or simply use the first one found without being conscious of the approach they are selecting over others.”

Companies utilizing the various packaged tools on their ADS could sign off on the results using code signing. This would create a record that these organizations ran these audits along their development lifecycle and received satisfactory outcomes. 

We envision a suite of auditing tools, each tool applying to a specific agency and enforcement task. Precedents for this type of technology already exist. Much like security became a part of the software development lifecycle with guidance developed by NIST, equity and fairness should be integrated into the AI lifecycle as well. NIST could spearhead a government-wide initiative on auditing AI tools, leading guidance, distribution, and maintenance of such tools. NIST is an appropriate choice considering its history of evaluating technology and providing guidance around the development and use of specific AI applications such as the NIST-led Face Recognition Vendor Test (FRVT).

Areas of Impact & Agencies / Departments Involved


Security & Justice
The U.S. Department of Justice, Civil Rights Division, Special Litigation SectionDepartment of Homeland Security U.S. Customs and Border Protection U.S. Marshals Service 

Public & Social Sector
The U.S. Department of Housing and Urban Development’s Office of Fair Housing and Equal Opportunity

Education
The U.S. Department of Education

Environment
The U.S. Department of Agriculture, Office of the Assistant Secretary for Civil RightsThe Federal Energy Regulatory CommissionThe Environmental Protection Agency

Crisis Response
Federal Emergency Management Agency 

Health & Hunger
The U.S. Department of Health and Human Services, Office for Civil RightsCenter for Disease Control and PreventionThe Food and Drug Administration

Economic
The Equal Employment Opportunity Commission, The U.S. Department of Labor, Office of Federal Contract Compliance Programs

Infrastructure
The U.S. Department of Transportation, Office of Civil RightsThe Federal Aviation AdministrationThe Federal Highway Administration

Information Verification & Validation
The Federal Trade Commission, The Federal Communication Commission, The Securities and Exchange Commission.

Many of these tools are open source and free to the public. A first step could be combining these tools with agency-specific standards and plain language explanations of their implementation process.

Benefits

These tools would provide several benefits to federal agencies and developers alike. First, they allow organizations to protect their data and proprietary information while performing audits. Any audits, whether on the data, model, or overall outcomes, would be run and reported by the developers themselves. Developers of these systems are the best choice for this task since ADS applications vary widely, and the particular audits needed depend on the application. 

Second, while many developers may opt to use these tools voluntarily, standardizing and mandating their use would allow an evaluation of any system thought to be in violation of the law to be easily assesed. In this way, the federal government will be able to manage standards more efficiently and effectively.

Third, although this tool would be designed for the AI lifecycle that results in ADS, it can also be applied to traditional auditing processes. Metrics and evaluation criteria will need to be developed based on existing legal standards and evaluation processes; once these metrics are distilled for incorporation into a specific tool, this tool can be applied to non-ADS data as well, such as outcomes or final metrics from traditional audits.

Fourth, we believe that a strong signal from the government that equity considerations in ADS are important and easily enforceable will impact AI applications more broadly, normalizing these considerations.   

Example of Opportunity

An agency that might use this tool is the Department of Housing and Urban Development (HUD), whose purpose is to ensure that housing providers do not discriminate based on race, color, religion, national origin, sex, familial status, or disability.

To enforce these standards, HUD, which is responsible for 21,000 audits a year, investigates and audits housing providers to assess compliance with the Fair Housing Act, the Equal Credit Opportunity Act, and other related regulations. During these audits, HUD may review a provider’s policies, procedures, and records, as well as conduct on-site inspections and tests to determine compliance. 

Using an AI auditing tool could streamline and enhance HUD’s auditing processes. In cases where ADS were used and suspected of harm, HUD could ask for verification that an auditing process was completed and specific metrics were met, or require that such a process be undergone and reported to them. 

Noncompliance with legal standards of nondiscrimination would apply to ADS developers as well, and we envision the enforcement arms of protection agencies would apply the same penalties in these situations as they would in non-ADS cases.

R&D

To make this approach feasible, NIST will require funding and policy support to implement this plan. The recent CHIPS and Science Act has provisions to support NIST’s role in developing “trustworthy artificial intelligence and data science,” including the testbeds mentioned above. Research and development can be partially contracted out to universities and other national laboratories or through partnerships/contracts with private companies and organizations.

The first iterations will need to be developed in partnership with an agency interested in integrating an auditing tool into its processes. The specific tools and guidance developed by NIST must be applicable to each agency’s use case. 

The auditing process would include auditing data, models, and other information vital to understanding a system’s impact and use, informed by existing regulations/guidelines. If a system is found to be noncompliant, the enforcement agency has the authority to impose penalties or require changes to be made to the system.

Pilot program

NIST should develop a pilot program to test the feasibility of AI auditing. It should be conducted on a smaller group of systems to test the effectiveness of the AI auditing tools and guidance and to identify any potential issues or areas for improvement. NIST should use the results of the pilot program to inform the development of standards and guidelines for AI auditing moving forward.

Collaborative efforts

Achieving a self-regulating ecosystem requires collaboration. The federal government should work with industry experts and stakeholders to develop the necessary tools and practices for self-regulation.

A multistakeholder team from NIST, federal agency issue experts, and ADS developers should be established during the development and testing of the tools. Collaborative efforts will help delineate responsibilities, with AI creators and users responsible for implementing and maintaining compliance with the standards and guidelines, and agency enforcement arms agency responsible for ensuring continued compliance.

Regular monitoring and updates

The enforcement agencies will continuously monitor and update the standards and guidelines to keep them up to date with the latest advancements and to ensure that AI systems continue to meet the legal and ethical standards set forth by the government.

Transparency and record-keeping

Code-signing technology can be used to provide transparency and record-keeping for ADS. This can be used to store information on the auditing outcomes of the ADS, making reporting easy and verifiable and providing a level of accountability to users of these systems.

Conclusion

Creating auditing tools for ADS presents a significant opportunity to enhance equity, transparency, accountability, and compliance with legal and ethical standards. The federal government can play a crucial role in this effort by investing in the research and development of tools, developing guidelines, gathering stakeholders, and enforcing compliance. By taking these steps, the government can help ensure that ADS are developed and used in a manner that is safe, fair, and equitable.

WHAT IS AN ALGORITMIC DECISION-MAKING SYSTEM
An algorithmic decision-making system (ADS) is software that uses algorithms to make decisions or take actions based on data inputs, sometimes without human intervention. ADS are used in a wide range of applications, from customer service chatbots to screening job applications to medical diagnosis systems. ADS are designed to analyze data and make decisions or predictions based on that data, which can help automate routine or repetitive tasks, improve efficiency, and reduce errors. However, ADS can also raise ethical and legal concerns, particularly when it comes to bias and privacy.
WHAT IS AN ALGORITMIC AUDIT
An algorithmic audit is a process that examines automated decision-making systems and algorithms to ensure that they are fair, transparent, and accountable. Algorithmic audits are typically conducted by independent third-party auditors or specialized teams within organizations. These audits examine various aspects of the algorithm, such as the data inputs, the decision-making process, and the outcomes produced, to identify any biases or errors. The goal is to ensure that the system operates in a manner consistent with ethical and legal standards and to identify opportunities to improve the system’s accuracy and fairness.
WHAT IS CODE SIGNING, AND WHY IS IT INVOLVED?
Code signing is the process of digitally signing software and code to verify the integrity and authenticity of the code. It involves adding a digital signature to the code, which is a unique cryptographic hash that is generated using a private key held by the code signer. The signature is then embedded into the code along with other metadata.

Code signing is used to establish trust in code that is distributed over the internet or other networks. By digitally signing the code, the code signer is vouching for its identity and taking responsibility for its contents. When users download code that has been signed, their computer or device can verify that the code has not been tampered with and that it comes from a trusted source.

Code signing can be extended to all parts of the AI lifecycle as a means of verifying the authenticity, integrity, and function of a particular piece of code or a larger process. After each step in the auditing process, code signing enables developers to leave a well-documented trail for enforcement bodies/auditors to follow if a system were suspected of unfair discrimination or unsafe operation.

Code signing is not essential for this project’s success, and we believe that the specifics of the auditing process, including documentation, are best left to individual agencies and their needs. However, code signing could be a useful piece of any tools developed.
WHAT IS AN AI AUDITOR
An AI auditor is a professional who evaluates and ensures the fairness, transparency, and accountability of AI systems. AI auditors often have experience in risk management, IT or cybersecurity auditing, or engineering, and use frameworks such as IIA’s AI Framework, COSO ERM Framework, or the U.S. GAO’s Artificial Intelligence: An Accountability Framework for Federal Agencies and Other Entities. Much like other other IT auditors, they review and audit the development, deployment, and operation of systems to ensure that they align with business objectives and legal standards. AI auditors more than in other fields have also had a push to include consideration for sociotechnical issues as well. This includes analyzing the underlying algorithms and data used to develop the AI system, assessing its impact on various stakeholders, and recommending improvements to ensure that it is being used effectively.
WHY SHOULD THE FEDERAL GOVERNMENT BE THE ENTITY TO ACT RATHER THAN THE PRIVATE SECTOR OR STATE/LOCAL GOVERNMENT?
The federal government is uniquely positioned to take the lead on this issue because of its responsibility to protect civil rights and ensure compliance with federal laws and regulations. The federal government can provide the necessary resources, expertise, and implementation guidance to ensure that AI systems are audited in a fair, equitable, and transparent manner.
WHO IS LIKELY TO PUSH BACK ON THIS PROPOSAL AND HOW CAN THAT HURDLE BE OVERCOME?
Industry stakeholders may be resistant to these changes. They should be engaged in the development of tools and guidelines so their concerns can be addressed and effort should be made to clearly communicate the benefits of increased accountability and transparency for both the industry and the public. Collaboration and transparency are key to overcoming potential hurdles, as is making any tools produced user-friendly and accessible.

Additionally, there may be pushback on the tool design. It is important to remember that currently, engineers often use fairness tools at the end of a development process, as a last box to check, instead of as an integrated part of the AI development lifecycle. These concerns can be addressed by emphasizing the comprehensive approach taken and by developing the necessary guidance to accompany these tools—which does not currently exist.
WHAT ARE SOME OTHER EXAMPLES OF HOW AI HAS HARMED SOCIETY
Example #1: Healthcare

New York regulators are calling on a UnitedHealth Group to either stop using or prove there is no problem with a company-made algorithm that researchers say exhibited significant racial bias. This algorithm, which UnitedHealth Group sells to hospitals for assessing the health risks of patients, assigned similar risk scores to white patients and Black patients despite the Black patients being considerably sicker.

In this case, researchers found that changing just one parameter could generate “an 84% reduction in bias.” If we had specific information on the parameters going into the model and how they are weighted, we would have a record-keeping system to see how certain interventions affected the output of this model.

Bias in AI systems used in healthcare could potentially violate the Constitution’s Equal Protection Clause, which prohibits discrimination on the basis of race. If the algorithm is found to have a disproportionately negative impact on a certain racial group, this could be considered discrimination. It could also potentially violate the Due Process Clause, which protects against arbitrary or unfair treatment by the government or a government actor. If an algorithm used by hospitals, which are often funded by the government or regulated by government agencies, is found to exhibit significant racial bias, this could be considered unfair or arbitrary treatment.

Example #2: Policing

A UN panel on the Elimination of Racial Discrimination has raised concern over the increasing use of technologies like facial recognition in law enforcement and immigration, warning that it can exacerbate racism and xenophobia and potentially lead to human rights violations. The panel noted that while AI can enhance performance in some areas, it can also have the opposite effect as it reduces trust and cooperation from communities exposed to discriminatory law enforcement. Furthermore, the panel highlights the risk that these technologies could draw on biased data, creating a “vicious cycle” of overpolicing in certain areas and more arrests. It recommends more transparency in the design and implementation of algorithms used in profiling and the implementation of independent mechanisms for handling complaints.

A case study on the Chicago Police Department’s Strategic Subject List (SSL) discusses an algorithm-driven technology used by the department to identify individuals at high risk of being involved in gun violence and inform its policing strategies. However, a study by the RAND Corporation on an early version of the SSL found that it was not successful in reducing gun violence or reducing the likelihood of victimization, and that inclusion on the SSL only had a direct effect on arrests. The study also raised significant privacy and civil rights concerns. Additionally, findings reveal that more than one-third of individuals on the SSL, approximately 70% of that cohort, have never been arrested or been a victim of a crime yet received a high-risk score. Furthermore, 56% of Black men under the age of 30 in Chicago have a risk score on the SSL. This demographic has also been disproportionately affected by the CPD’s past discriminatory practices and issues, including torturing Black men between 1972 and 1994, performing unlawful stops and frisks disproportionately on Black residents, engaging in a pattern or practice of unconstitutional use of force, poor data collection, and systemic deficiencies in training and supervision, accountability systems, and conduct disproportionately affecting Black and Latino residents.

Predictive policing, which uses data and algorithms to try to predict where crimes are likely to occur, has been criticized for reproducing and reinforcing biases in the criminal justice system. This can lead to discriminatory practices and violations of the Fourth Amendment’s prohibition on unreasonable searches and seizures, as well as the Fourteenth Amendment’s guarantee of equal protection under the law. Additionally, bias in policing more generally can also violate these constitutional provisions, as well as potentially violating the Fourth Amendment’s prohibition on excessive force.

Example #3: Recruiting

ADS in recruiting crunch large amounts of personal data and, given some objective, derive relationships between data points. The aim is to use systems capable of processing more data than a human ever could to uncover hidden relationships and trends that will then provide insights for people making all types of difficult decisions.

Hiring managers across different industries use ADS every day to aid in the decision-making process. In fact, a 2020 study reported that 55% of human resources leaders in the United States use predictive algorithms across their business practices, including hiring decisions.

For example, employers use ADS to screen and assess candidates during the recruitment process and to identify best-fit candidates based on publicly available information. Some systems even analyze facial expressions during interviews to assess personalities. These systems promise organizations a faster, more efficient hiring process. ADS do theoretically have the potential to create a fairer, qualification-based hiring process that removes the effects of human bias. However, they also possess just as much potential to codify new and existing prejudice across the job application and hiring process.

The use of ADS in recruiting could potentially violate several constitutional laws, including discrimination laws such as Title VII of the Civil Rights Act of 1964 and the Americans with Disabilities Act. These laws prohibit discrimination on the basis of race, gender, and disability, among other protected characteristics, in the workplace. Additionally, the these systems could also potentially violate the right to privacy and the due process rights of job applicants. If the systems are found to be discriminatory or to violate these laws, they could result in legal action against the employers.
WHAT OPEN-SOURCE TOOLS COULD BE LEVERAGED FOR THIS PROJECT?
Aequitas, Accenture Algorithmic Fairness. Alibi Explain, AllenNLP, BlackBox Auditing, DebiasWE, DiCE, ErrorAnalysis, EthicalML xAI, Facebook DynaBoard, Fairlearn, FairSight, FairTest, FairVis, FoolBox, Google Explainable AI, Google KnowYourData, Google ML Fairness Gym, Google PAIR Facets, Google PAIR Language Interpretability Tool, Google PAIR Saliency, Google PAIR What-If Tool, IBM Adversarial Robustness Toolbox, IBM AI Fairness 360, IBM AI Explainability 360, Lime, MLI, ODI Data Ethics Canvas, Parity, PET Repository, PwC Responsible AI Toolkit, Pymetrics audit-AI, RAN-debias, REVISE, Saidot, SciKit Fairness, Skater, Spatial Equity Data Tool, TCAV, UnBias Fairness Toolkit