SOURCE CODE: A Policy Agenda for Fostering Trust and Fairness in AI

AI systems are rapidly becoming part of the machinery of public life, but they sit on shaky foundations. We have seen AI being deployed for cancer screening, assisting people with disabilities, and even to address complex environmental challenges. Yet as these systems are deployed in increasingly consequential settings, the promise of AI to expand opportunity and increase effectiveness and boost productivity has been accompanied by harms that are no longer hypothetical. These harms fall into several recurring categories: systems can misallocate resources, misrepresent groups, fail to function reliably, or be deployed for illegitimate purposes, even when the technology works as intended.

For example, AI has affected who gets allocated critical resources. A widely used healthcare algorithm underestimated the needs of Black patients, limiting access to care. In finance, algorithmic decision-making has produced discriminatory outcomes in lending and underwriting.

Another observed harm is that failures in AI rollout can affect how people and communities are represented. AI systems have been shown to reinforce harmful stereotypes or render certain groups invisible altogether, particularly when they are trained on incomplete or biased data. A well known example is how facial recognition technologies are shown to perform significantly worse on darker-skinned individuals, raising concerns about misidentification and disproportionate surveillance.

Still, other harms are failures of basic system functionality. Gunshot detection systems have generated large numbers of false alerts, manipulating how evidence is used in criminal proceedings. The Michigan Integrated Data Automated System (MiDAS), which was used to find instances of fraud in state unemployment benefits, was incorrect in 85% of its fraud determinations.

Another category of concern arises not from failures in system design or performance, but from how AI systems are built and used in practice. These harms arise not from system failure but from how situations in which systems operate as intended yet still produce harmful outcomes. For example, algorithmic management tools in the workplace could intensify worker surveillance, destabilize scheduling, or reduce worker autonomy, even when operating accurately.

These harms explain why AI is facing a crisis of public trust. A June 2025 Pew study found that half of U.S. adults feel “more concerned than excited about the growing use of AI”, with only a small minority expressing optimism. AI cannot deliver broad public benefits, such as improved public services, if the people affected by it do not trust the systems shaping their lives. The public will not trust abstract statements of fairness, transparency, responsibility, or legitimacy. It will be rebuilt only if those commitments are translated into procedural institutional mechanisms: procurement rules, public engagement processes, sector-specific safeguards, and holistic remedies. Such commitments will mean that AI can be used legitimately in the public interest.

In many cases, the concern is whether the technology should even be used. This question raises deeper considerations around the concentration of power, human dignity, and the conditions under which innovation actually benefits the public. While AI can be used to support societal benefits, such as helping overworked healthcare practitioners, it can also be used in ways that harm human dignity, such as through surveillance or by restricting fair access to benefits or create new vulnerabilities like cybersecurity and data privacy risks. Building fairness and trust in AI requires more than improving system performance. Policymakers and the public must ask whether particular uses are legitimate, who benefits, who bears the risks, and what limits they should set and enforce.

Because these questions cannot be answered by abstract principles alone, the Federation of American Scientists worked with experts and practitioners across civil society and academia to bring together a policy agenda with ten actionable and high-impact solutions. We did this through our SOURCE CODE: AI Trust and Fairness Policy Sprint. When problems are urgent, institutions are uncertain, and traditional policymaking moves too slowly, our policy sprints create space to bring together experts across disciplines, from academics to technologists, advocates, and practitioners, and empower them to move quickly from diagnosis to action. Instead of debating what trust and fairness mean in the abstract, this sprint focused on what they look like in practice, and how it can be operationalized through specific policy levers.

This paper proceeds in three parts. First, we examine how fairness and trust are understood across different contexts, and why that creates friction in how we map the space. Second, we explore the challenges of implementing policy designed to install fairness guardrails. We highlight how gaps in policy, capacity, and real-world conditions can undermine even well-intentioned systems. Finally, we present a set of policy strategies across these key levers, offering actionable pathways for building fairer and more trustworthy AI systems.

What do we mean by ‘trust and fairness’ in AI?

Fairness and trust are often invoked as key components of AI governance and policy, but they can seem nebulous depending on the context and the community at hand. In general, we consider public trust to be the extent to which people see systems and institutions as reliable, accountable, and responsive to harms. AI fairness broadly concerns whether AI systems distribute benefits and burdens in ways that can be justified within a particular social, legal, and institutional context. Together, fairness and trust point to a broader question of legitimacy: whether an AI system should be used at all, for what purpose, and under what auspices.

This section will reflect on how different stakeholders view fairness and public trust to examine each perspective before turning to existing legal instruments, policy gaps, and what is needed for effective policy implementation in this field.

On fairness

Fairness in AI is not a single concept but is defined differently across technical, legal, and social domains. This plurality reflects how AI systems are “sociotechnical systems”: their effects depend not only on data and algorithms but also on the institutions, incentives, rules, and human decisions that shape their deployment.

In examining how society in general views fairness, literature shows that individuals often understand fairness not as a clearly defined principle but in contrast to experiences of unfairness– that is, the absence of harm. This means that individuals, communities, and different cultures perceive fairness differently based on their own lived experiences. In terms of AI fairness, society views fairness as specifically related to terms such as “equity, consistency, non-discrimination, impartiality, justice, honesty, and reasonableness.”

When evaluating fairness in AI systems, technical literature often distinguishes between two broad concepts. The first, individual fairness, asks whether similar individuals are treated comparably for a given task. This approach requires defining which characteristics are relevant to the task and what it means for two people to be meaningfully similar. The second, group fairness, examines whether outcomes are distributed unequally across groups. For example, in hiring, one group fairness approach might ask whether candidates from different demographic groups are selected at similar rates, while other approaches might focus on whether error rates or predictive accuracy differ across groups.

There are differences of opinion about what actually constitutes equal or similar outcomes in these definitions and how to predict them. On the one hand, historical data may be treated as a valid basis for predicting future outcomes. On the other hand, historical data itself could be shaped by historical and structural inequities, causing systems trained on it to reproduce existing patterns of discrimination. These competing considerations around historical data can create situations where an AI system is meeting one definition of fairness, but the actual outcome is creating unequal harms to a group or individual. The ProPublica investigation of the COMPAS risk assessment tool, an algorithm used to support criminal justice officials’ decisions on bail, sentencing, and early release, found that Black defendants were twice as likely to be labeled as high risk as white defendants. In theory, COMPAS satisfied one fairness criterion, in this case predictive parity, which means that risk scores are equally accurate across groups, but it also propagated the systemic inequalities of the U.S. criminal justice system.

Fairness cannot be understood as a fixed technical standard, but rather as a contested concept shaped by social, institutional, and legal contexts. Determining which definition of fairness should govern a particular AI system is a nuanced decision, one that requires a deep understanding of the sectoral context, stakeholders, and other elements within the scope of the AI system’s deployment. This becomes even more challenging when relying on existing legal frameworks that may only partially address the complexities of AI-driven decision-making.

On public trust

What does it mean for technology itself to be worthy of trust and, in turn, of adoption? Public trust in AI is not generated by technical performance alone. It is built when people can see that an AI system serves a legitimate purpose, works reliably in its deployment context, and remains subject to meaningful human oversight, public accountability, and remedies when things go wrong. We also explicitly see fairness as a component of public trust, and that the public will not trust AI if they view that its allocations of opportunities and burdens are unfair. Ultimately, public trust is dependent on the public seeing the use of AI as justified and legitimate. If it falters, then AI use will be seen as negative to society. For example, the prospect of AI-driven worker substitution is a major source of public concern, raising questions about whether the use of AI to replace human labor is legitimate. If such a substitution occurs at scale, it could further erode public trust in AI systems and their deployment.

Public sector adoption of AI is where governance approaches are first tested in practice, shaping both regulatory norms and broader public expectations. When government agencies deploy AI systems, they are effectively signaling what responsible use looks like. As a result, failures in public sector systems can have outsized consequences. When government use of AI leads to unfair outcomes, unreliable decisions, or a lack of accountability, it can erode trust not only in AI but in the government itself. For example, the use of automated prior authorization systems in Medicare Advantage has been associated with higher denial rates and barriers to medically necessary post-acute care, something that could directly affect public attitudes towards AI adoption in government services.

Across the federal government, different administrations have relied on the discourse that public trust in AI systems is essential to ensuring that the technology is disseminated across the public sphere. For example, Executive Order 13859 under the first Trump administration explicitly called for the use of AI in a manner that “fosters public trust and confidence,” a sentiment that carried over into Biden-era executive actions and remains in the current Trump administration OMB guidance, which defines how the federal government uses and acquires AI. These efforts have generally focused on identifying broad, high-risk uses of AI in the federal government and then pairing them with risk-mitigation requirements, leaving federal agencies to define implementation details and build the internal capacity to identify and enforce protections against AI uses that could erode public trust.

How can existing law be used as a tool for trust and fairness?

Existing anti-discrimination law provides an important, but incomplete, set of tools for addressing AI-related harms. In employment, Title VII of the Civil Rights Act of 1964 prohibits discrimination based on race, color, religion, sex, and national origin. Later case law and statutory amendments, including the Civil Rights Act of 1991, developed a framework that distinguishes between disparate treatment, where a person is intentionally treated differently because of a protected characteristic, and disparate impact, where a facially neutral practice causes unjustified adverse effects for a protected group. This framework is highly relevant to AI systems, which may produce unequal outcomes even when they do not explicitly use protected characteristics.

Existing statutory tools are also directly applicable to AI-related consumer harms that occur in specific sectors. The Equal Credit Opportunity Act (ECOA), for instance, remains a powerful tool for addressing bias and discrimination in financial services, especially in cases where discrimination arises from algorithmic decision-making outputs playing a determinative role in financial outcomes such as loan decisions. Recent enforcement actions clearly demonstrate how these existing laws can be applied in practice to algorithmic harms to build the much-needed public trust in AI systems. In 2025, the Massachusetts Attorney General’s Office fined a financial services company that used AI for student loan underwriting after determining that its algorithmic outputs were discriminatory. As part of the legal remedy, the company was required to inventory its models and retrain them to comply with anti-discrimination, consumer protection, and fair lending laws. Similarly, the Federal Trade Commission has already used its authority to address harmful AI deployments. Its settlement with Rite Aid over the use of facial recognition technology included an unfairness claim, underscoring that discriminatory or harmful AI practices can fall squarely within existing prohibitions on “unfair” conduct. These examples illustrate that while regulators are able to act, such interventions are reactive, occurring after harms have already materialized, and may not provide detailed guidance for how systems should be designed or governed in advance.

Yet, without statutory reform, the contemporary reliance on existing law and legal frameworks is out of necessity rather than choice. This is in part because legal frameworks typically do not define fairness in the abstract. Instead, the law operationalizes it through established doctrines, standards, and enforcement mechanisms. Many emerging AI concerns can be partially mapped onto well-established legal principles with effective recourse and remedies for consumers. This continuity becomes especially apparent in consumer-facing contexts.

For example, trust and fairness in consumer-facing technologies such as AI closely align with longstanding notions of consumer product safety. As such, public concerns about whether AI systems are reliable, transparent, and non-harmful mirror traditional expectations that the physical products consumers purchase should not pose undue risks or potential harms. In addition to notions of what constitutes product safety, the American legal system has long articulated what constitutes “unfair” conduct through statutes such as the Federal Trade Commission Act and the Dodd-Frank Act’s prohibition on unfair, deceptive, or abusive acts or practices (UDAAP).

In the absence of clear enforcement and interpretive guidance, however, legal gaps can translate into diffuse or ambiguous accountability, undermining public confidence in both the technologies themselves and the institutions responsible for overseeing them. One such example of this dynamic leading to widespread unease and declining public trust in technology is the bipartisan frustration over digital payment apps. Apps that use AI tools for automated content moderation or fraud detection can unknowingly “debank” and terminate the accounts of otherwise welcome clients.

Policymakers therefore face a difficult bind: without more deliberate efforts to operationalize fairness and accountability, public trust will remain elusive. Legal standards alone are not enough; they must be translated into systems that people experience as fair, reliable, and responsive to harm.

In the absence of new laws that directly address the challenges posed by artificial intelligence, practitioners must rely on existing legal frameworks that offer only partial mechanisms for accountability and redress when harm arises. This creates particular challenges in areas where concepts such as “trust” and “fairness” in AI systems are not explicitly codified, and where legal precedent is still limited or emerging. Existing legal frameworks may provide an important starting point for building public trust, but they do not fully capture the range of concerns raised by AI systems or the conditions needed to sustain public trust over time.

The challenge of implementation: why good intentions fail to produce trusted outcomes

In a rapidly evolving policy climate, governments across jurisdictions have implemented or proposed to create fairer, more trustworthy algorithmic systems and to protect people from harms. When designing policy interventions, it is important to take into account how prepared institutions are in taking on these responsibilities, and what resources, such as talent, processes, and even data availability, exist to ensure that a new policy has a fighting chance to succeed.

Take, for example, federal agencies’ implementation of OMB Memorandum M-24-10, a government-wide guidance document on how the federal government uses and acquires AI. Agencies are supposed to publicly publish a compliance plan describing the processes they will undertake to follow said guidance. In our analysis of these compliance plans, we found that agencies vary in technical expertise, staffing capacity, and institutional resources, which leads to inconsistent compliance and fragmented oversight practices. In this case, governance frameworks alone are insufficient; effective implementation depends on sustained investment in technical talent and administrative capacity.

At the state level, California’s implementation of Assembly Bill 302 illustrates a similar challenge of translating AI governance policies into accountability mechanisms. Although the law required the state to inventory high-risk automated decision systems used by agencies, California’s first public use-case inventory incorrectly stated that no such systems were in use, despite numerous publicly documented examples. This failure stemmed from weak implementation practices, including an informal reporting process that relied largely on agency self-reporting through email surveys.

New York’s Local Law 144, which codified bias auditing requirements for automated decision-making systems, has also faced its own constraints. For example, an audit of the law found that implementation challenges were compounded by limited mechanisms for leveraging expertise through interdepartmental collaboration. In addition, limitations in data quality and availability significantly constrain the ability to evaluate explicit bias or disparate impact. In New York, this challenge was particularly evident during implementation, as agencies struggled with both limited test data and the widespread absence of key demographic information needed to assess bias. In many cases, employers had not collected demographic data on applicants at all, and where such data did exist, it was often incomplete, leaving more applicants without demographic information than with it. This made it difficult to evaluate potential bias in automated decision-making systems used in employment decisions.

These barriers to implementation are a key motivation for the SOURCE CODE: AI Trust and Fairness Policy Sprint. In each of our memos, we take into account the resources, the stakeholders, and the capacity of each institution in bringing a policy idea to fruition. In developing our policy agenda, we have worked with experts across civil society and academia to identify solutions that are responsive to public concerns while remaining attentive to the realities of policy implementation. Across the policy memos, we outline actionable proposals that address several core policy levers that can help move AI governance from principle to practice.

Policy Levers to Advance AI Fairness and Build Public Trust

In undertaking this sprint, our focus is on how policy proposals can be implemented in light of current institutional and political realities. We recognize that broader debates over the values and governance of AI, including recent federal actions, have created uncertainty around the durability and implementation of comprehensive AI governance frameworks. Rather than concentrating on a single theory of AI governance, our sprint examines ten targeted ideas that can influence outcomes across different jurisdictions to provide trustworthy and fair outcomes for AI use. Our ideas can be categorized across four policy levers: government use of AI, public engagement, sector-specific interventions, and remedies.

Guardrails in Government Use of AI

Government use of AI represents one of the most immediate and consequential opportunities to shape how these systems function in practice. When public agencies adopt AI, they are not merely deploying tools but also setting precedents that can influence how systems are designed and governed more broadly. Public procurement, in this context, emerges as a critical but underexamined lever. The terms governments set when acquiring AI systems shape what vendors disclose, how systems are assessed, and what safeguards are built in from the outset. Several of the memos look at how AI is acquired across many sectors of public service, such as education, law enforcement, and healthcare, specific risk mitigation tools, and what final procurement agreements between governments and vendors should consider. Here are the policy ideas that specifically look at guardrails for government use of AI:

How State Governments Should Purchase AI to Ensure Fair, Transparent, and Accountable Use by Jae Yeon Kim and Aniket Kesari
How State Leaders Can Put People First in AI Decision-Making by Nicole Ozer and Brady Hirsch
Prioritize Student Safety in K-12 Education By Establishing AI Procurement Guardrails by J.B. Branch
The Federal Government Should Pilot a Decision Subject Representative Program for AI Systems Inspired by the FDA by Anna Lenhart
How to Safely Bring AI into Law Enforcement AI-Generated Police Reports by Jon M. Peha

Public Engagement

Public engagement is treated as an afterthought in policymaking, in part because it is difficult to execute well. Policymakers and affected communities operate in different technical and cultural languages, making sustained dialogue challenging. Yet, meaningful public engagement in AI deployment can be a critical step to ensuring that AI use is both appropriate and trustworthy. Systems developed and deployed without concrete input from affected communities risk entrenching harm and undermining trust. Therefore, we have looked at practical ways that public engagement can be institutionalized at both the federal, state and local levels, as well as deployed to bring in underrepresented communities, such as rural populations, into the policy-making context. Here are policy ideas that move through what public engagement should like in practice:

The Federal Government Should Pilot a Decision Subject Representative Program for AI Systems Inspired by the FDA by Anna Lenhart
Empowering Communities through Community Benefit Agreements in AI-Fueled Data Center Development by Liza Paudel
FairCare Verification Offers a Human-Centered Path for AI in Medicaid by Y. Tony Yang
How State Leaders Can Put People First in AI Decision-Making by Nicole Ozer and Brady Hirsch
Making Rural Communities Visible in Artificial Intelligence Through Rural Proofing in Kansas and Beyond by Ziwei Qi, Tatiana Lin, and Ayokunle Olagoke

Sector-Specific Interventions

AI does not operate in a vacuum, and neither should its governance. Each sector presents distinct risks, regulatory landscapes, and implementation challenges that must be accounted for in policy design. In K-12 education, procurement processes must consider privacy measures for underage individuals, surveillance, and the outsourcing of pedagogical judgment. Systems used to draft police reports risk introducing unverified or fabricated information into official records, underscoring the need for defined standards on use, human oversight, and disclosure. In labor markets, AI systems are reshaping wages, working conditions, job protections, and income stability. Healthcare has long been a contested space for automated decision-making systems because of its direct impact on access to care and quality of life. Existing sectoral protections will need to address algorithmic management, including requirements for transparency, notification, and avenues for contesting automated decisions. Here are policy proposals , spanning healthcare, education, labor and law enforcement:

FairCare Verification Offers a Human-Centered Path for AI in Medicaid by Y. Tony Yang
Move Algorithmic-Driven Pay and Scheduling Systems From Surveillance Pay to Fair Wages by Wilneida Negrón
How to Safely Bring AI into Law Enforcement Through AI-Generated Police Reports by Jon M. Peha
Making Rural Communities Visible in Artificial Intelligence Through Rural Proofing in Kansas and Beyond by Ziwei Qi, Tatiana Lin, and Ayokunle Olagoke
Prioritize Student Safety in K-12 Education By Establishing AI Procurement Guardrails by J.B. Branch

Redress and Remedies

Although much of AI governance focuses on preventing harm, no system of safeguards will be perfect. This raises a critical question: what does redress look like when harms do occur? Existing approaches to recourse often fall short when applied in the real world, and incentives do not always align with accountability, and agencies or vendors may face legal, financial, or operational constraints that limit the availability or effectiveness of recourse mechanisms. These gaps point to the need to think more expansively about redress; we need to encode the individual right to contest decisions and insert it within a broader system of accountability. What institutional structures are needed to support meaningful recourse? And what forms of remedy, whether procedural, financial, or community-based, are appropriate? Here are two ideas that make harms structurally correctable:

Settlement Wins Against Big Tech Should Underwrite Digital Resilience Funds by Gaurav Laroia and Charlotte Slaiman
Empowering Communities through Community Benefit Agreements in AI-Fueled Data Center Development by Liza Paudel

From Ideas to Action

Our SOURCE CODE: AI Trust and Fairness Policy Sprint aims to advance the detailed policy solutions needed to foster public trust and implement fairness in the adoption of AI across diverse domains, from healthcare and government benefits to rural access, education, and worker protections.

We hope readers will engage deeply with these proposals, help bring them into practice, and build on them, developing new ideas that push this work even further. These ten proposals are not comprehensive, nor do they capture the full landscape of challenges that AI governance must address, from market concentration and labor displacement to infrastructure impacts and frontier-model risks. Rather, they are intended as an actionable starting point, an effort to illustrate what a detailed, implementable policy can look like.

The next step is to test these ideas in practice, learn from their successes and shortcomings, and translate those lessons into stronger governance frameworks. We hope this work serves as a foundation for a growing coalition of policymakers, practitioners, and communities committed to building AI systems that are fair, trustworthy, and accountable.

Empowering Communities through Community Benefit Agreements in AI-Fueled Data Center Development

The United States is experiencing an unprecedented surge in data center construction driven by AI infrastructure demand. Over 5,000 facilities are operating today, with investments of $400 billion in 2025 and an estimated $1.8 trillion in between 2024 and 2030. This capital is arriving faster than environmental review processes, utility planning cycles, and community engagement frameworks were designed to accommodate. The consequences for communities are serious and well-documented: rising electricity bills, massive water consumption, e-waste, noise and light pollution, and billions in tax subsidies to some of the world’s most profitable corporations — often without meaningful public disclosure. These harms do not fall evenly, with communities of color and low-income neighborhoods already carrying disproportionate burdens.

Community Benefit Agreements (CBAs) are a legally binding, enforceable tool that allows communities to secure real commitments from data center developers before development proceeds. When properly structured — with specific numeric targets, secured financial obligations, independent monitoring, and meaningful enforcement — CBAs transform data center deals into durable community partnerships. Drawing on practitioner expertise from dozens of negotiations across sectors, emerging AI data center agreements, and new research on community harm and regulatory gaps, this memo makes the case for CBAs and provides a practical policy playbook for using them effectively, including potential provisions and considerations like enforceable harm mitigations, meaningful community investment, and lasting accountability mechanisms, to surface broad community needs while remaining adaptable to local contexts.

Challenge and Opportunity

Harms to Communities from Rapid Expansion of AI Infrastructure

U.S. data centers consumed 183 TWh of electricity in 2024 – more than 4% of total national consumption and roughly equivalent to the annual electricity demand of Pakistan, with it only projected to grow larger – roughly 17% more by 2030. A typical AI-focused hyperscaler consumes as much electricity as 100,000 households; the largest under construction are expected to use 20 times as much. The scale is such that AI data center demand in Virginia alone contributed to an 833% increase in regional capacity market auction prices – what electricity utilities and grid operators pay to ensure there will be enough power generation available during peak demand periods – for 2025–2026. These pressures do not just translate directly into costs for ordinary ratepayers but because these are structural costs baked into the grid, they also make it harder for communities to see, contest, or hold anyone accountable for the surge. Electricity prices in some data center-heavy regions have surged over 250% in five years, with estimates predicting data center electricity demand could double–or even triple–by 2028.

The scale of harm to nearby communities extends beyond electricity prices: increased water usage, e-waste, air and noise pollution, and adverse health effects. A single large data center can use up to 5 million gallons of water a day (with about a quarter of the usage from direct cooling), equivalent to a city of 50,000 people. Additionally, hardware disposal is projected to generate 1.2–5 million metric tons of e-waste from generative AI alone between 2020 and 2030. Diesel backup generators – utilized at almost every facility – emit particulate matter classified by the EPA as a likely human carcinogen. Diesel generators emit harmful nitrogen oxides 200–600 times more than natural gas plants per unit of electricity produced. Researchers estimate that data center backup generators in Virginia, operating at just 10% of permitted levels, could already cause 14,000 asthma symptom cases and 13-19 deaths annually, with public health costs of $220–$300 million per year spreading across multiple states – and communities of color, low income communities and rural communities paying the bulk of that price.

But perhaps the most underappreciated community harm from the data center boom is fiscal: the extraordinary scale of tax subsidies that state and local governments have extended to some of the world’s most profitable companies, frequently without meaningful public disclosure or community input. Good Jobs First, which tracks corporate subsidies nationally, found that in 10 of the 20 states disclosing data center subsidy costs, programs cost over $100 million per year. Further, the opacity of these arrangements is striking: of 36 states with data center subsidy programs, only 11 publicly disclose which companies receive benefits. Virginia, the world’s largest data center market, for example, forgoes nearly $1 billion annually in state and local revenue without telling the public which companies receive the money or how much each receives. Not to mention, data centers, once fully built and operational, employ on average only 157 permanent workers – an extraordinarily low jobs return on billions in public subsidy – averaged $1.4 million to $2.1 million in subsidies per permanent job. Additionally, companies frequently hide behind non-disclosure agreements (NDAs) avoiding public input and scrutiny, especially on critical details about energy use, water consumption, and sometimes even the identity of the data center operator.

Centering Community Needs in AI Infrastructure Development

As data centers have proliferated and these harms are starting to be documented, so has grown the backlash against new developments. Data Center Watch, which tracks grassroots opposition to large-scale projects across 28 U.S. states, found that between May 2024 and March 2025, $64 billion worth of data center projects were blocked or delayed by local opposition. In Q2 2025 alone, more project disruptions occurred than in the previous two years combined. Opposition is bipartisan and geographically broad. Recent nationwide polling found that a whopping 70% of Americans oppose a data center construction nearby, with nearly half “strongly” opposed – a far lower acceptance rate than for gas plants, wind farms, or nuclear facilities.

This issue is an urgent priority now because while public concern over rising energy rates, water usage, and unchecked development is growing, no comprehensive mechanism currently exists to align the interests of communities, developers, and local governments.

As AI companies promise us the large-scale and incredible societal benefits to come from AI, they can show they are serious by starting with making sure the data centers they are building to power the AI future benefits the communities they’re in.

Why Community Benefit Agreements?

CBAs are legally binding agreements, negotiated between developers and community stakeholders, that secure enforceable commitments before development proceeds. Adapted from their successful use in bank merger oversight (under the Community Reinvestment Act) and clean energy project approvals, CBAs can:

Establish environmental monitoring and reporting requirements more stringent than applicable permits.
Secure financial contributions to community investment funds, backed by letters of credit that allow enforcement without costly litigation.
Lock in local hiring commitments with specific numeric targets and apprenticeship pipelines.
Create Community Advisory Boards with real authority and ongoing oversight throughout the life of the project.
Make transparent what would otherwise remain hidden: water consumption, energy use, tax benefits, and environmental commitments.

In the absence of broader legislative and regulatory protections, CBAs offer a promising, underutilized and legally binding tool to ensure adequate harm mitigation and potential for communities to share in the opportunities, and not just the costs, of AI infrastructure; with the additional benefit of being able to be tailored specifically to a community’s needs.

For instance, in late 2025, the city of Lancaster negotiated a legally binding CBA with the developers of the Lancaster AI Hub before construction was finalized, securing $20 million in community contributions. Key wins include a hard cap of 20,000 gallons per day of municipal water use per campus, a 100% clean energy requirement backed by tiered financial penalties of up to $10 million per building, strict noise limits tied to pre-construction ambient levels, and full public records transparency.

The agreement also commits developers to a local hiring plan, free first-responder training, and ongoing community engagement — demonstrating that municipalities can extract meaningful, enforceable protections from data center developers when they engage before key approvals are locked in. Of note, the city is the negotiator of the CBA in this case, but the same negotiations and provisions can be won in a legally binding CBA through communities themselves as well – working with community leaders, community-based organizations, and local policymakers with enforcement mechanisms woven in for effectiveness.

Importantly, CBAs do not require communities to support a project. They are negotiated exchanges. If a developer will not make commitments adequate to the community’s concerns, opposition — including calls for moratoriums — remains a legitimate and more appropriate response. The credibility of that alternative is precisely what gives CBA negotiations their teeth.

Especially while policymaking, legislation and other broader reforms can take time; in their absence, CBAs can be a particularly useful interim governance mechanism to meet the urgency of this moment.

Why now?

Hyperscalers are urgently racing to secure sites, power contracts, and permits to meet AI demand. Given that the time to power is crucial for the data center companies, it gives communities and municipalities genuine leverage right now, alongside the need, urgency, and tools/resources to be able to engage. Data center developments face political opposition that is delaying billions of dollars in projects. They need community support, or at minimum community acquiescence, to move through permitting processes that would require public hearings, board votes, and environmental reviews .

With the scale of projected and current investments in the billions of dollars, and their effects in communities already being felt with more to come, and especially as broader reforms that are slower to move are not yet in place, CBAs are not just a useful interim governance policy tool that can fill this currently urgent need, but now is also the time of maximum policy leverage.

Plan of Action

States should not rely on voluntary developer promises. They should create a statutory and regulatory framework that makes robust CBAs a condition for approval or subsidy in high-impact data center projects.

We recommend CBAs be utilized as a potential policy tool for facilitation and solutions-building to meet community, developers’, and local governments’ tripartite objectives, under defined conditions. Local policymakers should treat CBAs as a lever that enables communities to provide direct input, occupy an established space to negotiate impacts and mitigations, and secure reinvestment in ways that benefit the community.

Local governments can require CBAs (working alongside community-based organizations and other community leaders) if developers apply for permits, zoning, or other approvals to build out data centers – such that planning departments, zoning boards, or city councils can condition approval on compliance and can then impose penalties, delay permits, or revoke approvals if terms aren’t met.

The following recommendations highlight specific ways and provisions that policymakers at the local governmental level (like the City of Lancaster for the Lancaster data center CBA) and community-based organizations advocating and negotiating on behalf of communities can utilize in their efforts to protect communities from harm and establish some fairness, transparency and accountability in the data center development process. As others like the Brookings Institute and National Association for the Advancement of Colored People (NAACP) have substantially outlined and advocated for, they represent emerging best practices at this juncture. Key provisions alongside their criticality are also summarized in Summary Table 1 at the end of this proposal.

Recommendation 1. Policymakers (and CBOs and community leaders negotiating on behalf of communities) should utilize specific provisions to address harms and provide mitigations, to increase transparency, and to steward ongoing governance and accountability.

Harm Remediation

Prohibit cost-shifting of energy rates to ratepayers. The impacts on electricity affordability, grid infrastructure, and ratepayers resulting from the proposal’s energy demand are some of the harms that are closest to communities. Measures intended to prevent or offset disproportionate burdens on residential customers and frontline communities, including developers fronting the costs of any infrastructure upgrades and interconnection, or creation of a new rate class (like in Oregon or Virginia) for data centers.

Require developers to go beyond regulatory compliance on environmental protections. The Lancaster CBA specifically with data center developers requires selective catalytic reduction on generators. In California, the California Environmental Quality Act (CEQA) required and negotiated mitigations have included fence-line monitoring, health risk assessments, and restrictions more stringent than state permits. Every CBA should include independent real-time air monitoring with publicly available data, a community health fund financed by the developer, and diesel emission standards that go beyond what permits require.

Require prioritization and usage of clean energy. Lancaster CBA, for instance, requires 100% clean sourcing required, with tiered penalties of $2.5M–$10M per building backed by a $10M Letter of Credit, and penalty proceeds directed to a Sustainable Development and Clean Energy Fund. Add third-party Renewable Energy Certificates (RECs) verification and prohibit characterizing REC purchases as equivalent to direct clean energy generation without explicit disclosure. In the absence of full clean energy sourcing, energy ratcheting over time should be utilized.

Set a hard numeric cap on water usage with public reporting. Given the documented conflicts over water in drought-prone regions, water provisions are increasingly among the most contentious and most important elements of data center CBAs. Lancaster CBA’s 20,000-gallon-per-day municipal water cap per campus, combined with closed-loop cooling requirements, is a strong model. Add quarterly public consumption reporting and a renegotiation trigger if operations expand beyond the scope contemplated at execution.

Transparency, Governance & Accountability

Mandate public dashboards with ongoing reporting. These should include water usage, energy usage, as well as pollution metrics like the amount of time spent on backup diesel generators or noise decibels.
Require full public disclosure of all tax incentives, Payments in Lieu of Taxes (PILOTs), and government subsidies received by the developer. Given that 25 of 36 states with data center subsidy programs do not disclose recipients, communities must insist on transparency in the CBA itself.
Conduct impact assessments, including equity impact assessments.
Create a Board with real enforcement authority. Every CBA needs a Community Advisory Board (CAB) with seats for environmental justice representatives and community residents (not just officials), with the authority to commission independent audits, defined financial penalties for violations, and a right to seek injunctive relief directly, as well as the responsible entity for the community fund.
Make enforcement penalties for violations clear and escalating. Community negotiators should insist on specific, escalating financial penalties for violations — not vague remediation language — with enforcement authority vested in the CAB.
Include sunset and renegotiation triggers. Include mandatory renegotiation at five-year intervals or upon material changes in facility scope, ownership, or energy consumption. There should also be clear processes outlining any potential decommissioning and long-term liability to avoid stranded assets with locals being left footing the bill. These could look like, including decommissioning bonds (tied to facility footprint or power draw) posted at execution, a funded remediation escrow, and a specific site restoration timeline.

Recommendation 2. Policymakers and CBOs negotiating on behalf of communities should require investment in communities as a baseline condition for any equitable agreement.

Beneath the gold rush of data centers and AI lies real places, real people, and real resources being quietly consumed in service of extraordinary profits. The companies cashing in are among the wealthiest in history — and that wealth is being built, quite literally, on local communal foundations: their land, their water, their power grids, their roads, their first responders, and their environment. The economic rewards generated need to reflect that. Communities supplying these resources and shouldering associated burdens cannot be sidelined as the immense profits generated flow elsewhere.

Aside from harm remediation, CBA, in its associated prep and processes, can serve as a platform to uncover, understand, and platform broad community needs. There should be specific provisions that specifically seek to address these needs, to ultimately move towards a more balanced and equitable distribution of the costs and benefits associated with AI development in the community, given the wide ramifications of data center developments in host communities.

Establish a Community Fund: CBA community funds can support locally-determined priorities such as broadband access, AI and digital literacy programs, just transition pathways with apprenticeships and training, healthcare, quality of life upgrades like parks and art ensuring that the wealth generated by AI infrastructure is reinvested in the communities hosting it. They can also be utilized to offset any ratepayer costs of infrastructure upgrades that are spread outside of the data center developers. Critically, Nondisclosure agreements (NDAs) on government incentive terms must be prohibited, ensuring that subsidy arrangements are publicly accessible and communities can assess whether tax concessions are being offset by CBA commitments.
Set Numeric Workforce Targets and Prohibit Misclassification: Workforce provisions should include specific local hiring targets – typically 30–50% of construction labor hours from defined geographies – written into the CBA itself rather than deferred to post-execution plans. Because operational data centers average only 157 permanent employees, workforce provisions should focus primarily on the construction phase, while leveraging the developer’s long-term presence to fund broader workforce training initiatives, including AI just transition opportunities, in the community.

Secure Financial Commitments with Letters of Credit: Payments should be secured by a Letter of Credit or corporate guarantee from a sufficiently capitalized entity, with payment triggers tied to specific construction and operational milestones. For example, Lancaster commits $20M total, secured by a $20M Letter of Credit or corporate guarantee from a $100M+ net-worth entity, with payments triggered at construction financing and operations commencement per building.
Explore Diverse Community Wealth-Sharing Mechanisms: Beyond direct cash funds, CBAs can incorporate a range of wealth-sharing tools such as community land trusts, local equity stakes in the facility, revenue-sharing agreements tied to facility profits, or dedicated funds for affordable housing and small business development – ensuring communities build lasting economic power rather than receiving one-time payments.
Address AI-Specific Infrastructure Concerns: Although not as common yet, CBAs can also consider specific provisions addressing AI operations, data sourcing practices, and the risks of long-term infrastructure lock-in associated with AI systems.

Recommendation 3. Policymakers (and/or community negotiators) should proactively identify and put the supporting mechanisms in place for meaningful representation, negotiation, enforcement, and accountability.

The most common CBA failures are not in the provisions communities demand – they are in process and enforcement structure. When poorly structured, or negotiated after key approvals are in hand, they can give the appearance of community benefit while delivering very little.

There are certain necessary conditions, dependencies, and actionable sub-recommendations for CBAs to be effective such as investing in and strengthening community-level organizing and coalition-building, providing training and workshops on provisions and negotiations, and critically, providing thoughtful representation to prevent takeover, and building robust enforcement mechanisms for delivery of benefits in practice. Looking back at the legal history and utilization of CBAs in the bank merger approval process and CEQA “Opt-In” process in CA that requires a CBA, we have gleaned some important lessons about levers, enforceability, and accountability, as well as recommendations on the negotiation and power-building process, listed below.

Negotiate Early. Treat CBA execution as a precondition of permitting support as negotiating leverage is greatest before approvals are granted. Work with local government officials to make clear to developers that permitting support is conditioned on a satisfactory CBA. The Lancaster CBA was negotiated after zoning opinions had been issued and demolition had begun, and its gaps (no specific hire targets, no independent community board, no air monitoring) directly reflect that reduced leverage.

Build a United Coalition. Organize internally before engaging the developer, presenting a united front through a Community Advisory Board and a mediator if necessary – the coalition should exclude both intractable opponents and members prepared to support the project without a CBA.
Establish Ground Rules First. Before negotiating specifics, use a memorandum of understanding (MOU) to set the terms for timeline, information-sharing, representation, and dispute resolution. This also prevents developers from selectively engaging sympathetic stakeholders while sidelining community members most directly affected.
Secure Legal and Technical Representation, ideally with cost-recovery agreement with the developers. Hire legal counsel with energy and environmental expertise and a technical expert to interpret site assessments, emissions modeling, and energy projections – unrepresented communities are structurally disadvantaged at the negotiating table. Negotiate a cost-recovery agreement requiring the developer to pay for community-selected legal counsel and technical experts, a practice well-established in permitting and utility interconnections that should become standard in CBA negotiations.

Require Developer-Funded Community Review. Ask the developer to fund community technical review – a precedent well-established in CEQA practice. It can be coupled with the negotiation including a due diligence phase, where documentation is provided to the community coalition to review and provide recommendations.
Demand Numeric Targets, Not Aspirational Language. Replace “good faith efforts” and vague commitments with specific, measurable targets subject to annual reporting and financial penalties for non-compliance, as bank merger advocates successfully did with dollar-denominated, geo-specific lending commitments.
Prohibit NDAs on Environmental and Financial Data. Do not allow nondisclosure agreements on monitoring data, permits, consumption reports, or government incentive terms – the notorious Memphis xAI case, in which more than 30 unpermitted turbines operated in secret with health and environmental consequences for the community, illustrates the consequences of unchecked secrecy. Lancaster’s CBA also correctly designates the CBA as a public record under Pennsylvania’s Right-to-Know Law.
Negotiate CBAs and PILOT Agreements Together. CBA and payment-in-lieu-of-tax agreements (PILOT) must be negotiated in tandem with a cap on total payments, ensuring community investment funds supplement, and do not substitute for, any expected tax revenue.
Specify Any Fund Governance in the Agreement. Ambiguous collective fund governance renders financial commitments meaningless – specify committee composition, voting rules, permitted uses, and annual reporting directly in the CBA.
Frame Agreements Around Impact Mitigation, Not Approval. Require developers to first identify community concerns and propose mitigation before discussing payments. Framing money as the price of approval produces smaller commitments and less community ownership of outcomes.
Know When CBAs Are Not the Right Tool. CBAs cannot substitute for strong environmental permitting, transparent subsidy disclosure, or robust utility regulation, and should not be pursued when permits are already in place, transparency has been denied, or a developer-backed document is being falsely presented as a community agreement. There are plenty of situations where opposition or moratorium might be more appropriate. Know the limitations of CBAs – their scope is limited to what the contracting parties agree to, and their enforceability depends on clear terms, specific metrics, secured financial obligations, and parties with the legal standing and resources to enforce them.

Conclusion

The extraordinary wealth generated by the AI data center boom is being built on community land, water, electricity, and environmental capacity. Yet, the communities bearing these burdens are seeing little of the benefit. The hyperscalers behind this buildout are among the most valuable companies in human history, and the AI services running on this infrastructure will generate billions in revenue. None of this wealth is created in a vacuum: it is created in specific places, using specific community resources, and the communities providing those resources deserve a meaningful share of the value they help create.
The current pattern in which vulnerable communities absorb the largest burdens, profitable companies receive the largest subsidies, and benefits flow primarily to shareholders, is neither inevitable nor acceptable. It reflects choices being made right now, as the buildout accelerates and the patterns of harm and benefit are being set. CBAs are a tool to make different choices: to insist that the communities hosting AI infrastructure share genuinely in its benefits, and that the costs of that infrastructure – to air quality, water systems, grid reliability, and community character – are borne by those who profit from it, not by those who simply happen to live nearby. The time to act is now.

Summary Table 1. Key Provisions of Data Center CBAs

Provision area	Key community protections & commitments	Priority
Environmental protections	Binding diesel generator emission limits beyond permit minimums; noise limits tied to pre-construction ambient (day and night); independent real-time air and water monitoring with public data; cumulative impact analysis for clustered facilities; proximity assessment for environmental justice communities	Critical
Clean energy	100% clean sourcing commitment; tiered financial penalties backed by Letter of Credit; third-party REC verification; prohibition on ratepayer cost pass-through for grid upgrades; annual public consumption reporting; energy ratcheting milestones where full compliance is not immediate	Critical
Water usage	Specific daily cap on municipal water use; closed-loop cooling requirement; wastewater capacity compliance; quarterly public reporting on consumption; renegotiation trigger if facility scope expands materially	High
Fiscal contributions & transparency	Dollar-specific community investment fund with milestone-triggered payments; secured by Letter of Credit or corporate guarantee; full public disclosure of all tax incentives and PILOTs; no NDAs on public finance data; fund governance (committee composition, voting rules, permitted uses) specified in the agreement	Critical
Workforce development	Local hire percentage targets for construction and operations; prevailing wage standards; apprenticeship and training pathways; targeted outreach to underserved zip codes; explicit FLSA anti-misclassification clause	High
Governance & enforcement	Community Advisory Board with independent monitoring authority and seats for community residents; escalating financial penalties; grievance mechanism with binding arbitration; right to seek injunctive relief; annual public reporting to governing body; decommissioning plan, bonding requirements, and remediation escrow; regular equity impact assessments	Critical

Priority ratings reflect the degree to which a provision is foundational to meaningful community protection. All provisions should be adapted to local context and available negotiating leverage.

Frequently Asked Questions

What are the limitations of CBAs? When are they potentially not the ideal tool?

CBAs are a powerful tool but are not a substitute for strong state and federal environmental permitting, transparent subsidy disclosure laws, or robust utility regulation protecting ratepayers. Their enforceability depends on clear terms, specific metrics, secured financial obligations, and parties with the legal standing and resources to enforce them. When permits are already in place, transparency has been denied, or a developer-backed document is being presented as a community agreement, opposition or a moratorium may be more appropriate than a CBA negotiation. However, especially as broader reforms can take time, CBAs are useful as an interim governance mechanism.

Are CBAs legally enforceable?

Yes. CBAs are legally binding contracts enforceable in court. Provisions backed by Letters of Credit can be enforced by drawing on the letter without costly litigation. Injunctive relief and specific performance are also available remedies in most jurisdictions.

Do CBAs require communities to support the project?

No. CBAs are negotiated exchanges. The community provides a path through the permitting process; the developer provides binding commitments. If commitments are inadequate, communities retain the right to oppose the project. The credibility of that option is what gives negotiations their leverage.

What if the developer won’t negotiate?

Community leverage mechanisms include direct lobbying of elected officials, media engagement, social media amplification, community organizing and protests, and formal procedural interventions such as CEQA comment periods. Coalitions should be prepared to escalate. In some cases, formal opposition or a moratorium is the appropriate response.

How are CBA funds governed?

Fund governance must be specified in the CBA itself — committee composition, voting rules, permitted uses, and annual reporting requirements. Ambiguous governance renders financial commitments meaningless in practice. The Lancaster CBA’s joint committee model is one approach; stronger versions include community representatives with independent authority and the ability to commission audits.

How does a CBA interact with tax abatement or PILOT agreements?

CBAs and payment-in-lieu-of-tax agreements must be negotiated together, with a clear understanding of total community obligations, ensuring community investment funds supplement rather than substitute for expected tax revenue. Communities should resist any framing in which CBA contributions are treated as the price for subsidies.

What are some successful examples of CBAs being used effectively?

Lancaster, PA, 2025

The City of Lancaster negotiated a legally binding CBA with the developers of the Lancaster AI Hub before construction was finalized, securing $20 million in community contributions. Key wins include a hard cap of 20,000 gallons per day of municipal water use per campus, a 100% clean energy requirement backed by tiered financial penalties of up to $10 million per building, strict noise limits tied to pre-construction ambient levels, and full public records transparency. The agreement also commits developers to a local hiring plan, free first-responder training, and ongoing community engagement — demonstrating that municipalities can extract meaningful, enforceable protections from data center developers when they engage before key approvals are locked in.

Nashville MLS Soccer, Nashville, TN, 2018

A coalition called Stand Up Nashville successfully advocated for this CBA in connection with a soccer stadium development project. The CBA includes, among other things, commitments on jobs that pay a living wage, hiring priorities, affordable housing, and a childcare center. As part of this CBA, Stand Up Nashville’s committed to support rezoning legislation for the stadium, which was widely opposed before the CBA. Nashville’s Mayor eventually supported the stadium project in large part due to the CBA.

Facebook Campus Expansion CBA, Menlo Park, CA, 2016

This CBA, associated with an office expansion, is between Facebook and a coalition of community groups. In this agreement, Facebook made an almost $20 million commitment to affordable housing in the area, which led to an additional $60 million in other donor commitments.

What are some key references on CBAs and data centers?

Brookings: Why community benefit agreements are necessary for data centers | Brookings

NAACP: CBA Template for Data Centers

Good Jobs First: Key Reforms: Community Benefits Agreements

Kapor Foundation: The Unequal Burden of Data Centers

AI Now Institute: North Star Data Center Policy Toolkit: State and Local Policy Interventions to Stop Rampant AI Data Center Expansion

What is the typical CBA process like?

From NAACP’s CBA Guide

In practice, this can mean: 1. The initial agreement pays for legal counsel and technical support, selected by and managed by the community coalition. 2. The next phase is either: (1) an agreement to establish binding requirements for transparency, impact studies, labor standards, and equity protections, which is contained in Article 3 of the template; OR (2) a due diligence phase, which requests information provided in Article 3. 3. An amendment is negotiated after the community has access to impact information on electric, environmental, housing, and infrastructure demands, which could be an amendment specifying the exact dollar amounts and project-specific mitigation measures. This approach allows communities to understand the scale and type of impacts before finalizing the financial structure of the Community Benefits Agreement, while maintaining leverage and ensuring that non-opposition is tied to a complete, enforceable package of commitments.

From PolicyLink CBA Toolkit:

Unless developers face significant public pressure and/or legal leverage that jeopardizes public

approval, developers are unlikely to compromise. A coalition may exert leverage to bring the developer to the table in a variety of ways: direct lobbying of elected officials and city staff, notifying any reporters covering the issue that the community has significant concerns, using social media to amplify the community’s voice and raise support, protests at the worksite or at City Hall, or artist-led community responses, like chalk art at the site or near City Hall.

Stakeholders & Roles:

A community coalition can include stakeholders such as: Individual residents, Neighborhoods councils, Faith groups, Local non-profits, Local businesses, PTAs, Housing advocates, City administration staff and elected leaders can demonstrate inclusive leadership by (i) providing transparency around the project; (2) insisting on broad community support for project approval; (3) encouraging CBA negotiations, without trying to influence them. 2-4 coalition representatives should contact the elected officials (or city council staff) most involved in the proposed project and brief them on the coalition, its priorities, and any engagement it has had or plans to have with the developer. The coalition representatives should ask that the officials condition a vote in favor of the project upon the developer’s support for the coalition’s priorities.

Elected officials can be an important ally in a CBA negotiation because they can persuade their colleagues on council to delay a vote on the project to allow more time for the coalition to negotiate with the developer. They can also apply pressure on the developer to reach an agreement with the Coalition. The coalition should assess whether it can count on commitments of support from a majority of the committee and/or council members. Particularly if a coalition new, support from key elected officials will help bring developers to the table. It may be necessary to take legal action against objectionable aspects of the development to inspire a willingness to negotiate.

Settlement Wins Against Big Tech Should Underwrite Digital Resilience Funds

Historically large penalties have been insufficient in crafting durable and effective deterrence against corporate wrongdoing. A better approach has bedeviled regulatory enforcers, legislators, attorneys general, and the judiciary. This challenge has been especially acute as enforcers have attempted to rein in the worst violations of the largest technology companies as we transition from the social media era to the AI era. Company scale and market power allow them to absorb even historic penalties as the cost of doing business, blunting the effectiveness of civil litigation and regulatory fines.

The stakes for more effective deterrence and a more robust remedies toolkit are rapidly compounding. Many emerging AI related harms, including AI induced psychosis, maladapted socialization, deepfake driven bullying and harassment, suicide coaching, and declines in children’s literacy bear the hallmarks of a public health crisis or environmental disaster rather than just discrete consumer injuries. The scale of these externalities invites greater prosecutorial and regulatory scrutiny but also demands a more creative enforcement playbook. When historic fines against these companies and their predecessors disappear into general treasuries those funds remain largely inert instead of helping the public defend itself.

Injunctive relief and headline fines are important enforcement mechanisms but if enforcement is to reach its deterrent potential and protect the public in the advanced algorithmic era, we must recognize that penalizing corporate misconduct is only half the battle. By allocating funds from tech settlements to investments in broad-based consumer education, digital literacy, independent researchers, or new enforcement and investigatory infrastructure, state attorneys general and the judiciary can transform these otherwise inert dollars into a sustained and active defense against digital harms.

Challenge and Opportunity

The Federal Trade Commission’s historic $5 billion settlement with Facebook in 2019 is perhaps the clearest example of a broken enforcement model. At the time of its announcement, the penalty was the largest ever imposed by the FTC on a company for violating consumer privacy. Even as a majority of the Commission approved the settlement Commissioners Rebecca Slaughter and Rohit Chopra warned in their dissents that the penalty was unlikely to meaningfully deter the company or the broader market. They were right. The settlement imposed some compliance obligations, but none challenging its underlying business model of aggressive data harvesting. The company’s stock price rose after the announcement. Within a few years the FTC sought to reopen its privacy orders against Meta over subsequent alleged privacy violations, illustrating the failure of the penalty to sustainably alter corporate behavior.

The Facebook settlement was hopefully the high watermark of a certain kind of enforcement paradigm. Fines should be larger. Behavioral and structural remedies should be stronger and imposed more often. Vital work has been done to turn that page and institute meaningful controls on data abuses and exploitative design. But, as we continue to use fines and penalties we have to confront a limitation in the enforcement model. When those dollars disappear into state or federal treasuries they do little to address systemic technological disruption. To protect the public, enforcers can put settlement dollars to work. We need to invest directly in the public so our society is prepared to handle this wave of technological disruption.

Inert Fines, Federal Constraints, and State Action

What if the $5 billion Facebook settlement had been put to better use?

Imagine if even a portion of those funds had supported a sustained nationwide consumer education effort on the harms of social media use and digital literacy? A fraction of that money could support public education campaigns teaching about manipulative design practices and how we can take our autonomy back. The fine itself only punished the company’s past conduct on the supply side; investing that money in public education could have helped shift the demand side, changing the user behavior in the market that made these products profitable.

Instead, like most federal settlements, by law that money flowed straight from Facebook directly into the federal treasury. Federal enforcers have limited ability to direct those funds towards targeted public education or resilience efforts (with the notable exception of the Consumer Finance Protection Bureau (CFPB) which is allowed to direct civil penalties to a special consumer education fund).

While Federal regulators like the Federal Trade Commission and the Department of Justice have obtained some landmark penalties, state attorneys general have increasingly become the primary defenders of Americans’ digital rights. In recent years the states have secured billions of dollars through aggressive enforcement: A $700 million settlement over Google’s app store practices; $391.5 million in a multistate effort over deceptive location tracking; $1.4 billion from Meta for “using facial recognition without users’ permission”. The list goes on.

Crucially, state attorneys general have different constraints on how civil penalty funds may be used. Many states have their own Unfair or Deceptive Acts or Practices (UDAP) statutes, in addition to a variety of consumer protection laws. Under common law practice and state statutes many AGs have more leeway in directing their settlement funds to organizations and causes “consistent with the objectives and purposes of the underlying cause of action”. Through multistate settlements, AGs have repeatedly demonstrated their ability to coordinate enforcement and reshape industry practices on a national scale.

But, it’s fair to ask: how much will even a $1.4 billion payout change a company’s underlying market incentives and consumer behavior? What might that $1.4 billion accomplish if even a portion were invested in changing consumer behavior through consumer education, digital literacy, independent research, and resilience building?

Crafting forward looking structural and behavioral remedies in a fast-changing industry is important and difficult. It was only in the past few years, decades into the internet era, that the Federal Trade Commission embraced data minimization and algorithmic deletion as the appropriate remedies for data abuses. Finding the appropriate remedies for AI related harms is crucial work, but will take time. If we put the dollars to work, negotiated settlements can help build deterrence and prevention right now.

Successful Lawsuits Against Defective Algorithms, Addictive Product Design

Recent state-level actions prove we can change the enforcement paradigm. Two recent cases target the root of these digital harms: defective algorithms and addictive product design. By framing these platforms as defective products engineered to exploit children, enforcers have bypassed the traditional tech liability shield. This breakthrough could open the floodgates for systemic accountability.

In March, a California jury awarded a 20 year old plaintiff $6 million after finding that Meta and YouTube negligently designed their platforms and caused severe mental health crises under a theory of defective products liability. In the same month a New Mexico jury levied a historic $375 million penalty against Meta for violating the state’s Unfair Practices Act by misleading parents about the safety of their products thereby enabling child exploitation.

These verdicts could be bellwethers for a wave of impending litigation and settlements. Currently, a historic and “sprawling” set of consolidated lawsuits, known as Multidistrict Litigation (MDL) 3047 is proceeding in Federal Court. This lawsuit includes 41 attorneys general, hundreds of school districts, thousands of individual personal injury suits, all consolidated and contesting the “‘unreasonably dangerous’ design of social media platforms.”

These cases, and others, mean billions of dollars may soon be changing hands. The critical policy question for state enforcers is whether those funds, after class members and direct victims are made whole through restitution, will disappear into general treasuries or be used to address the real problems at hand.

Plan of Action

Putting Settlement Dollars to Work

We propose putting settlement dollars to work through the creation of a Digital Resilience Fund. This is not a radical departure from current enforcement norms. Rather, it’s a call to accelerate adoption of a model, such as the Truth Initiative, which informs this proposal, that’s been successfully deployed across other industries, as seen in Table 1.

Table 1. Settlement Funds, Examples from Finance to Public Health

The Tobacco Master Settlement Agreement (1998) involved the combined effort of 52 states and territories to settle state lawsuits recovering billions in medical expenses. In doing so it also set up The Truth Initiative. The initiative was a culture-shift media campaign to change the narrative around smoking to help break the cycle of addiction on the demand side.

The National Mortgage Settlement (2012) was a combined effort of 49 state AGs, plus DC and the Federal Government against mortgage servicers for automatically signing foreclosure documents without verifying if the underlying information was correct in violation of the law. The settlement didn’t just penalize banks. It directed billions towards foreclosure prevention and housing counseling.

The Volkswagen Emissions Settlement (2016) this federal and state settlement resolved claims that the car maker installed illegal devices to cheat on emissions tests. The settlement required VW to invest $2.7 billion into an independent environmental mitigation trust. All 50 states, DC, Puerto Rico, and tribal governments, were beneficiaries. It paid for the replacement of old diesel engines and clean school buses, as well as establishing the Electrify America EV charging network.

The National Opioid Settlement Agreement (2021) resolved thousands of lawsuits in the wake of the opioid epidemic. State AGs, in addition to regular injunctive relief, mandated that companies pay into abatement funds that must be spent on remediation and prevention of opioid-based harms.

Local Community Benefits Fund (2026) demonstrates that the abatement and harm mitigation model isn’t just reserved for massive national cases. Localities have used this too. In 2026 the Bay Area Air Quality Management District launched the Local Community Benefits Fund (Bay REPAIR program) to direct monetary penalties from air quality violations back into community projects to improve public health.

State-Directed Charitable Restitution: The abatement model is already used by some state AGs. The New York Attorney General’s Office, for example, routinely directs settlement funds from corporate misconduct cases directly into community organizations and public grants. From distributing millions in environmental settlements to local botanical gardens, to directing the remnants of dissolved and mismanaged nonprofits to community charities, some state enforcers have already used this framework to redirect recovered funds towards the public interest.

In each of the cases in Table 1 enforcers recognized that penalties alone weren’t sufficient to ameliorate the harm from the underlying legal violations. Settlement dollars disappearing into general treasuries would have been a disservice. As victim advocates frequently note, one of the most profound ways to honor victims is by preventing others from becoming victims, whether the threat is from opioids, unlawful financial practices, smoking, or pollution.

Further, the time to act is now. The public is already convinced there’s a problem with AI accountability. Recent Gallup polling reveals a fascinating paradox regarding the next generation of consumers. While over half of Generation Z uses generative AI on a weekly basis, their optimism about the technology is plummeting. They are increasingly anxious about the technology’s impact, with majorities expressing fear that AI will come at a high cost to human creativity, critical thinking, and learning. The public can feel the ground shifting, but lacks the tools to fight back.

Recommendation 1. Establish a Digital Resilience Fund

The previous examples of settlement agreements all exemplify an important principle: settlement dollars ought not to be inert. Directing the spend on those dollars is an additional tool in the toolkit of deterring corporate wrongdoing, mitigating digital and AI harms, and hardening society to deal more effectively against disruptive technology. An educated public acts as a deterrent and could help steer the market towards deploying technologies that serve, rather than exploit their users.

State AGs alone, or in concert with each other, or with legislators, can begin redirecting a portion of major technology settlement proceeds into a fund focused on education, research, and harm mitigation related to AI. These funds could be administered through an independent nonprofit (like the Truth Campaign), through an existing public foundation structure, or through state-level grant programs (like the Opioid Abatement funds). The precise institutional form is less important than the principle that settlement or fine dollars tied to AI related harms ought to be used to build society’s capacity to stop or withstand those harms.

What a Digital Resilience Fund Could Do

Depending on the size of the settlement and the scope of the underlying harm enforcers and lawmakers could scale these funds across a range of initiatives:

Fund Counter-Marketing and Awareness Campaigns: A fund could drastically modernize consumer education around technology. The Truth Initiative showed that a well funded and sophisticated campaign can change behavior. To compete against billion-dollar engagement engines, we need compelling communications that resonate with the public. We envision a “touch grass” message delivered with cultural fluency on the social platforms where harms occur – meeting the moment and helping people make different, more informed, choices.
Support Independent Research and Monitoring: Money for research means we’ll be equipped with a better understanding of how these tools affect behavior and mental well-being. Researchers could also identify evidence-based interventions that help save lives. The research could then be translated into public education materials and materials for evidence-based remedies, regulation, and legislation.
Support Digital Literacy and AI Education at Scale: Counter marketing can raise awareness, digital literacy can build skills for this media era. This could mean grants to schools, libraries, or community organizations to teach students, educators, and families how AI systems can shape behavior, how to navigate a changing information environment, or how deepfakes can erode trust. As documented by the OECD, a whole-of-society approach to media literacy can be extremely effective against disinformation.
Act as a Nimble Response Mechanism: As AI tools become more autonomous and agentic, new risks will emerge faster than legislation or litigation can mitigate them. A resilience campaign could launch educator toolkits or literacy campaigns as a first step while legislative efforts and litigation strategies are ongoing.
Educate and Protect the Labor Force: AI and algorithmic harms will extend beyond social media. For settlements involving hiring software, worker surveillance, or discriminatory models, public education campaigns could also educate workers about their rights. Do professionals who work primarily on computers know that by “using AI” in their daily work, they may be training their replacements? Do they know how to communicate with their coworkers without being surveilled so they can take collective action?
Establish a multistate investigatory and research apparatus: As comprehensive federal tech regulation remains stalled by gridlock, state enforcers have become the primary defenders of consumer rights. By pooling settlement resources, a coalition of states could establish shared or parallel investigatory infrastructure. Recent initiatives like the Governor’s Public Health Alliance shows that states already have the logistical framework to pool expertise and coordinate parallel responses when federal infrastructure is lacking. This would provide state regulators, AGs, and legislators the dedicated technical expertise, auditing capacity, and ongoing monitoring of the market needed to support future litigation and evidence-based regulation without waiting for federal action.

Together, these recommendations all work to influence cultural change about how our society views AI and evaluates corporate harms. Luckily, we have evidence that this kind of investment can be successful.

The Evidence for Culture Change Efficacy

It’s easy to look back at the 1980s “Just Say No” era and wonder if public education campaigns can actually do anything to change entrenched consumer behavior. But the data tells a different story. Well-funded and targeted campaigns have made a difference.

The Truth Initiative is the gold standard. Instead of dryly lecturing teenagers, the campaign exposed the manipulative marketing tactics of tobacco executives and helped cause a collapse in teen smoking – dropping from nearly 23% in 2000 to less than 2% today. Peer reviewed studies have shown that in just one year the campaign prevented 300,000 kids and young adults from becoming smokers.

The collapse in smoking is a generational public health accomplishment, but other interventions around the world have shown that public education works:

Finland has used a whole of society education approach for decades to combat disinformation.
The Real Cost: The FDA has used digital and social media advertising to drive declines in teen vaping.
This Girl Can: The UK tackled the gender gap in sports, inspiring over 1.6 million women to start exercising by dismantling cultural stigmas.
Time to Change & It’s Not OK: The UK and New Zealand successfully used massive awareness campaigns to measurably reduce mental health discrimination and shift community norms around family violence.
Dumb Ways to Die: Australia used a humorous digital campaign to secure a 20 percent drop in transit accidents.

When combined effectively, litigation, regulation, and education have a proven track record of changing social behavior. Protecting the public from the tech industry’s predatory business models and the next wave of AI harms is an enormous challenge, but we have the evidence that trying to build a healthier digital culture is absolutely worth the effort.

Guardrails and Guidance

To maintain public trust and to prevent the misuse of funds any Digital Resilience Fund or similar initiative collects, it must operate under a narrow mandate focused on the remediation and prevention of AI-related harms and follow the best practices set forward in previous settlements. For example, the National Opioid Settlement Agreement provided a list of approved uses for funds focused solely on abatement. Other states have instituted public dashboards to track spending of settlement dollars in a transparent way.

While many AGs already have the authority to direct these funds through settlement agreements, ultimately codifying them through state legislative frameworks may provide greater predictability and transparency for their long-term operation. Legislation may also be necessary to allow fines and penalties, not just settlements, to contribute to the fund in some states.

Conclusion

An informed public is a valuable partner in deterring corporate malfeasance.

Fines must be large enough to penalize lawbreaking, and structural and behavioral remedies must aggressively dismantle harmful corporate practices, especially with regards to the growing power of AI companies. These are the core instruments of any effective enforcement toolkit. However, if we really want to change these companies’ behavior, we have to change the market they operate in.

A well funded digital literacy and culture campaign could step into this chasm. By giving ordinary people the skills to spot deepfakes, resist manipulative algorithms, and protect their mental health, we empower them to demand safer products.

State attorneys general have an incredible opportunity to build on the historic work they have already done. From the Tobacco Master Settlement to the Opioid abatement funds, the states have proven themselves as the primary architects of massive, society saving interventions.

As algorithmic harms increasingly mirror environmental disasters and public health crises, our response must be equally systemic. The next wave of technology settlements offers a generational opportunity to look beyond the standard playbook. Rather than treating historic recoveries as a simple windfall for state treasuries, enforcers must deploy these funds to protect our communities and build a stronger foundation for our democracy.

Frequently Asked Questions

Was the creation of a public education fund raised in the Google Search antitrust case?

Yes, it was. Colorado requested the establishment of a “public education fund” as one of the remedies in the Google search monopoly case. Judge Amit Mehta declined to sign off on it noting 1) the states did not draw any connection between Google’s distribution agreements and the public’s perceptions of other search engines as a prong of the Sherman Act allegations and 2) the state’s “lack of specifics [about the potential program] is fatal”. Helpfully, this lays out a guide for future litigants to win a public education fund by drawing this connection and providing those specifics. In cases relating to consumer protection, deceptive practices, and product liability, consumer perceptions are central, so it will be even easier to demonstrate a connection to public education. The first enforcer to win such a remedy would serve as a model for others, creating a snowball effect. Note: This decision was regarding a court ordered remedy, and does not limit settlements.

Will state legislatures reject this proposal in order to prioritize other needs?

This may be a significant bureaucratic hurdle. From Texas to the District of Columbia, enforcers frequently rely on attorney’s fees, litigation support funds, and settlement recoveries to self-fund a significant percentage of their own agencies’ operations. Diverting massive tech settlements, in part or in whole, to third-party resilience funds requires AGs to recognize the value to their constituents, and even future enforcement actions, of having this additional research and an educated populace. The research function of the fund could uncover future enforcement targets. State legislative frameworks are the most durable vehicle for creating a digital resilience fund. They rebalance the need for AG offices to fund their operations and the need to build these forward-looking programs. State legislatures answer to constituents, who may be increasingly angry and increasingly organized to demand that dollars brought in from these enforcement actions go to directly addressing the causes of these harms.

Could the Federal Trade Commission Adopt this Proposal?

Not under current law. Under the Miscellaneous Receipts Act federal settlement dollars are required to be deposited into the federal treasury. As discussed earlier the Consumer Financial Protection Bureau has explicit authority under Dodd-Frank to use its civil penalty fund for consumer education and financial literacy. We support Congress amending the FTC act to allow agencies like the FTC to do the same. Until then, states are the most viable actors for this model.

Face Recognition Performance, Bias, and the Limits of Technical Fixes

Christopher Gatlin was arrested for a brutal assault he didn’t commit after AI Face Recognition Technology (FRT) said he matched the suspect. He spent 17 months behind bars, and clearing his name took two years. As of March 2026, there were at least nine documented U.S. wrongful arrests tied to face recognition misidentification, mostly involving Black people. From 2012 to 2020 Rite Aid customers, disproportionately in non-white neighborhoods, were flagged by FRT as shoplifters, confronted, and sometimes expelled, including the searching of an 11 year old girl, all on the basis of bad matches.

Errors made by FRT are one cause of these harms, and these systems are known to make more errors on certain populations, including Black people, women, East Asians, and older people. But the way these systems are used by humans is a key component of these errors. Christopher Gatlin was identified based on a grainy photo of a hooded, partially obscured face, which could not be expected to lead to reliable identification. Moreover, police arrested him despite a lack of corroborating evidence. Harms caused by Rite Aid were due in part to a decision to mainly deploy face recognition in disproportionately non-white communities, as well as a lack of proper user training and the use of poor quality photos.

At the same time, face recognition does provide real benefits. In controlled, cooperative settings such as unlocking phones, banking apps, or passport verification, modern systems can be highly accurate. NIST evaluations show dramatic improvement over time, with errors occurring about one time in 1,000, depending on conditions. Millions of Americans use face recognition daily for convenience and security.

In tasks involving uncontrolled settings with uncooperative subjects however, such as identifying people from surveillance images, accuracy is much lower and more difficult to measure. Law enforcement and child-protection organizations have still used face recognition to identify suspects, locate missing children, and support trafficking investigations, but the potential from harms from inaccurate results in high stakes settings is much greater. Furthermore, the effect of biased performance is magnified in these uncontrolled settings, in which the number of errors seems to be much greater for some subpopulations. This report focuses on the causes of this bias, its potential harms and possible steps to reduce these harms. The use of face recognition in mass surveillance obviously raises other serious potential concerns, but these are outside the scope of this report.

Harms from FRT result both from technical errors and flaws in the ways humans use these systems. This suggests two parallel strategies for reducing the negative effects of biased face recognition. One approach is to reduce the bias in face recognition systems directly. Bias can occur due to training FRT using biased datasets that do not accurately reflect the demographics of the overall population. This can be difficult to eliminate due to the massive scale of data used to train FRT, which makes it difficult to control or even understand the demographics of the data. But further efforts can be made to reduce demographic bias in the data. Numerous other external factors that are more difficult to control may also create biased performance. Consequently, in the near term it may be practical to reduce, but not to completely eliminate biased performance.

A complementary approach to reducing harms from biased face recognition is to ensure that FRT are used appropriately by human operators. This solution is much easier to implement in the near term than the previous technical solution. It is not sufficient, however, simply to ensure there is a human in the loop confirming the results of FRT, since often FRT are more accurate than humans, their errors occur on challenging cases, and people may be unable to correct these errors. Behavioral policy interventions range from research aimed at better measuring bias and understanding when FRT results are not trustworthy to clear standards for how human operators use and interpret the results of FRT and restricting the use of FRT when potential harms outweigh the benefits.

In this report we provide an overview of face recognition performance and differential performance between different demographic groups. We summarize results from the National Institute of Standards and Technology assessing performance of numerous commercial face recognition systems. And we provide an overview of potential policies to reduce harms from face recognition bias.

Acknowledgements

Our understanding of this topic has benefitted greatly from conversations with Kevin Bowyer, Leah Frazier, Patrick Grother, Anil Jain, Brendan Klare, Alice O’Toole, Jonathan Phillips, Jay Stanley, and Nathan Wessler. We also received insightful comments and suggestions from Clara Langevin and Caroline Siegal Singh. Any failure in understanding is due to the authors.

1.1 Introduction

1.2 Face Recognition Technology Has Caused Significant Harms

1.3 Face Recognition Technology is Increasingly Widely Used

1.4 Face Recognition Difficulty Varies Significantly

1.5 What Do We Mean by Bias in Face Recognition?

1.6 Outline of the Rest of the Report

2.1 Glossary

3.1 How Face Recognition Works

3.2 A Brief History of Face Recognition

3.3 How Face Recognition Models Are Trained

4.1 Face Recognition in Use Today

5.1 Face Recognition Performance Across Different Conditions

5.2 Challenges in Real-World Face Recognition

6.1 Defining and Measuring Bias in Face Recognition

6.2 Absolute vs. Relative Error

6.3 NIST experiments on Demographic Variation

6.4 Challenges in Measuring Bias in Face Recognition

7.1 Sources of Bias in Face Recognition Systems

7.2 The Contribution of Dataset Bias

7.3 Sources of Bias Beyond the Data

7.4 Reductions in Bias Over Time

8.1 The Human Factor: Face Recognition Systems as part of a Socio-Technical System

8.2 Limitations of Human Oversight

8.3 User Errors

9.1 Policy Interventions to Address Bias in Face Recognition Systems

9.2 Research

9.3 Measure and Reduce Bias

9.4 Regulate Sociotechnical use of Face Recognition

10.1 Conclusions

Appendix: Variations in Bias Over Time

Introduction

Face Recognition Technology Has Caused Significant Harms

Improper development or use of face recognition technology (FRT) can lead to serious harms. One such example occurred in 2020 when Christopher Gatlin was arrested for a brutal assault he didn’t commit after a face recognition system proposed him as a possible match for the suspect. He spent 17 months behind bars, and clearing his name took two years. Porcha Woodruff, eight months pregnant, spent 11 hours in detention for a carjacking after another bad match, even though surveillance footage showed the suspect was not pregnant. As of March 2026, there are at least nine documented U.S. wrongful arrests tied to face recognition misidentification.

In another example of this dynamic, Rite Aid, a major pharmacy chain, deployed face recognition technology widely in stores to spot alleged serial shoplifters. Impacted customers, disproportionately in non-white neighborhoods, were flagged, confronted, and sometimes banned from stores, including searching an 11 year old girl, all on the basis of bad facial recognition matches. Federal regulators later banned the company from deploying facial recognition technology in stores for five years, noting higher false-positive rates in stores serving predominantly Black and Asian communities and improper pre-deployment safeguards (more details here).

These instances of incorrect matching and arrests have mostly involved non-white people. But, while errors may be more prevalent among these populations, as FRT use grows it can increasingly affect all people. For example, police recently released a white Tennessee grandmother who had been wrongly jailed for nearly six months based on FRT results. She was arrested while babysitting four children, accused of committing bank fraud in North Dakota, although she had never been there. Unable to pay her bills, she lost her home.

Figure 1. On the left is a surveillance photo taken at a crime scene. On the right is the image of Robert Williams that was incorrectly matched to this photo by an automatic face recognition system.

The harms described above were instigated by flawed matches produced by FRT—computational models that perform face recognition. However, these models always form part of a larger system in which humans apply FRT to some task. The failures were not just the product of a bad model, but of human failure to follow effective procedures. In many cases, face recognition searches are performed using low resolution images, with faces partially obscured. Figure 1 shows the surveillance photo used to identify Robert Williams, who was wrongly arrested for theft on the basis of this image. He later stated, “My daughters can’t unsee me being handcuffed and put into a police car.” In some cases, police have violated accepted practice with suggestive remarks that prompt witnesses to confirm the results of automatic face recognition technology. In the Rite Aid case, poor employee training, the use of low quality images, and many other deployment decisions contributed to a large number of mistaken identifications.

Face Recognition Technology is Increasingly Widely Used

Face recognition technology has become increasingly accurate and widely adopted. It is estimated that 131 million Americans use face recognition on a daily basis for applications such as unlocking their phones or banking apps, providing convenience and improving security. FRT usage is especially prevalent in applications in which the person being recognized cooperates with the system. In controlled, cooperative settings, face recognition systems have improved rapidly, with error rates roughly halving every two years in some evaluations. Under ideal conditions, top-performing systems may make a mistake only once in several hundred attempts.

Face recognition is also increasingly used by law enforcement agencies to identify uncooperative subjects, identify criminal suspects, and find missing children. Its use in surveillance is also growing. For example, Immigration and Customs Enforcement (ICE) is using FRT to identify people and determine their immigration status. In these applications, FRT often successfully identifies individuals, but their accuracy is not as high, and the potential for harmful errors increases. An incorrect match in this instance can potentially result in wrongful detention or deportation of American citizens. As face recognition use grows, so will its benefits and harms, making it an urgent matter to understand its properties, impact, and effective policy interventions.

Figure 2. Each column shows a pair of images of the same person. Experimental subjects find the images on the left easiest to match, while it is most difficult to determine that the images on the right come from the same individual.

Face Recognition Difficulty Varies Significantly

The difficulty of face recognition problems varies tremendously depending on the setting. Figure 1 has already shown a difficult operational setting, in which a poor quality surveillance image must be matched. A human examining these images has a hard time telling whether they are of the same person. Figure 2 shows that even when images are of good quality, it is not always easy to tell whether they come from the same person, due to changes in things like hairstyle.

What Do We Mean by Bias in Face Recognition?

Bias in face recognition has been the subject of significant public concern and extensive research over the past decade, particularly as these systems have been deployed in high-stakes settings such as law enforcement and surveillance. This report examines the nature, causes, and consequences of this bias, and in this introduction we begin with a brief discussion of what we mean by “bias”.

Face recognition is meant to solve a problem that has an objectively correct solution; do these two images come from the same person? We say the system displays bias against certain demographic groups if it makes more errors on these groups than on the general population. We will use the terms “bias” and “differential performance” interchangeably.

FRT have consistently shown worse performance on women than men and worse performance on Black people than on white people, and many FRT display worse performance on East Asian people than white Americans. One way that bias can occur is through training FRT models using unbalanced data that better represents some groups. When this occurs, bias can be mitigated by augmenting the training set to represent different groups more equally.

However, defining demographic subgroups exactly can be difficult, making it hard to balance data. Studies that compare performance on men and women generally ignore subtleties of gender identity. Groups of Black or white people used in studies certainly contain many individuals of mixed race and, for example, Black people in the United States might have a different distribution of traits than Black people from East Africa. Different studies sample demographic subgroups in different ways, and therefore may not be evaluating exactly the same questions.

Moreover, it is unclear how best to define demographic subgroups. For example, is it more fruitful to measure differential performance between white and Black people, or between light-skinned and dark-skinned people? Black people can differ from white people not just in skin tone but also in structural properties of their face. At this time, it is unclear which aspects of appearance account for differential performance and how this would align with all possible subgroups. Most studies have been limited to a few broad demographic categories and it is not known, for example, whether performance would differ between specific nationality groups within a similar region such as Vietnamese and Korean people.

Outline of the Rest of the Report

This article aims to provide necessary background to assess the trajectory and risks of bias in face recognition technology. We do not address other important concerns about FRT, such as maintenance of privacy and the use of FRT in mass surveillance.

In the next section we will briefly describe how face recognition systems work. We will then discuss the world-wide scope of face recognition. Next we summarize the accuracy of FRT and how this has progressed. We then discuss the nature of bias in FRT, and consider the causes of this bias. Next we consider FRT as part of a socio-technical system, and the impact of human users on FRT harms. Finally, we suggest possible policy interventions to reduce these harms.

This report makes the following points:

1. Improvements in accuracy have not eliminated bias.

Face recognition systems have become significantly more accurate in recent years, but they continue to exhibit differential performance across demographic groups.

2. Bias is difficult to measure and difficult to fully eliminate.

In real-world, uncontrolled settings, bias is harder to quantify and may be larger than benchmark results suggest. While technical interventions can reduce disparities, there is no simple or complete solution.

3. Harms arise from both technical errors and how systems are used.

Errors in face recognition can lead to significant harms, including wrongful arrests and other adverse outcomes. These harms are often amplified by deployment decisions, such as where systems are used and how results are interpreted.

4. Face recognition should be understood as a sociotechnical system.

Bias and harm arise not only from the underlying models, but also from human judgment and organizational practices. Inappropriate use of face recognition results can be more significant than technical error.

5. Policy interventions can reduce harms even without perfect technical solutions.

Effective policies include improving transparency and evaluation, supporting research on real-world performance. Furthermore, just having humans check the results of FRT is not sufficient to avoid errors; this requires establishing clear, detailed protocols governing when and how face recognition may be used.

6. Governance of use is as important as improving the technology.

Auditing data and system outputs, developing tools that signal when results are unreliable, and enforcing strict use protocols can significantly reduce the risk that errors lead to harmful outcomes.

Glossary

How Face Recognition Works

Face recognition is based on machine learning, and highly dependent on the use of large-scale data sets. This data is difficult to carefully control or characterize.

Face Recognition refers to the process of automatically identifying a person from a photo. It is divided into two tasks. In verification (or one-to-one matching), two images of faces are compared to provide a yes/no answer to the question of whether they come from the same person. This is used, for example, in border control, when a live image of someone may be compared to their passport photo. In identification (or one-to-many matching), a single probe face image is compared to a potentially large gallery of images to determine which, if any faces in the gallery match the probe image. The gallery might contain, for example, mug shot images of people who have been arrested, driver’s license photos, images of people who have been barred from access to casinos, or a large collection of images scraped from the internet. A system performing identification might declare that it finds no match, return a single match, or return a potentially large collection of images that might resemble the probe image. In the latter case it is expected that these potential matches will be assessed by the user to identify valid matches. FRT may also return a confidence level about the correctness for each match, although these may not correspond to the true probability that the match is right.

A Brief History of Face Recognition

The first fully automatic face recognition system was developed 50 years ago as the subject of the PhD thesis of Takeo Kanade, who went on to become one of the pioneers in the field of computer vision. It identified landmarks on the face, such as the corner of the mouth, and used their position to compare images. Early methods like this, based on face geometry, had limited effectiveness. Scientists began to develop more useful and accurate face recognition systems through the growing use of machine learning, beginning in the late 1990s. These methods are trained with numerous face images, called a training set, to automatically extract representations of faces that can be used to compare them more robustly.

Progress accelerated rapidly as researchers began to appreciate the power of using an approach known as neural networks, which allowed them to leverage massive datasets of faces to “teach” the computer how to recognize new faces. While neural networks were used by FRT by the late ’90s, their use became dominant in the mid-2010s after further breakthroughs in machine learning with large neural networks, a technique known as deep learning. Since the mid-2010s, improvements in model architectures, training methods, and data scale have driven substantial gains in measured accuracy, especially on standardized benchmarks. At the same time, these advances have enabled rapid adoption of face recognition across a range of applications, from smartphone authentication to large-scale identification systems used by governments and private firms, even as performance in real-world settings remains highly dependent on context.

How Face Recognition Models Are Trained

To perform accurately, an FRT must be able to determine that two images of the same person are similar, even if the images are taken at different times, from different viewpoints, under different lighting conditions. This is done by training the machine learning model to extract a representation that captures facial properties that can distinguish one person from another, but that are not significantly affected by viewing conditions or even some aging. The similarity between two faces can be given a numerical score that represents the degree of difference between the representation of each face.

In its simplest form, training occurs by incrementally adjusting the parameters of a neural network. In most current publicly available systems these parameters consist of tens of millions of numbers that control the network’s behavior. If it is shown two images of the same person, the parameters are adjusted to increase the similarity score. If the images are of two different people, parameters are changed to lower the score. Once the model is trained, if two images produce a similarity score above a chosen number, known as the cutoff, the system declares the two images to be the same person; if it falls below that cutoff, the system says they are different.

Once the model has been trained, it can perform identification using a gallery of faces by comparing a representation of the probe to representations of the gallery images. That is, it can verify or identify images of people who were not in the training set, because it has learned a general representation that should apply to any faces.

The large data sets used in training are typically scraped from the internet. For example, one influential early data set, Labeled Faces in the Wild, made use of face images detected in Yahoo! news stories, with identifying captions. A number of large scale datasets containing millions of images have been developed using photos of celebrities available on the internet. Some companies, such as Meta and Google have made use of internal data that users have uploaded and labeled; these training data sets may contain more than 100 million images. Clearview, a face recognition company, claims to use data sets of more than 70 billion face images scraped from the internet. Given the high cost and diminishing returns of training with so many images it is unlikely that all of these images are used for training, and this large corpus is more likely to be used to form the gallery.

Academic FRT generally train on datasets of images of public figures, such as the MS-Celeb-1M dataset, which contains ten million images of about 100,000 individuals. These massive datasets capture how a person’s appearance can vary with age, lighting, viewpoint, expression, and other conditions, which helps improve accuracy of systems trained on the datasets. Commercial systems do not generally provide details of their training sets, but it is expected that they include similarly large sets of images scraped from the internet, or provided by users, as in the case of Google and Meta. However, because these data sets are assembled at enormous scale—often from uncontrolled sources—they are difficult to audit, regulate, or correct when they embed systematic biases.

Face Recognition in Use Today

Face recognition use is increasing rapidly, becoming more prevalent in numerous high-stakes applications.

The global face recognition market was almost nine billion dollars in 2025, with projected growth to over 30 billion by 2034. Over a third of this market is in the U.S., but there is wide adoption of FRT around the world. One of the primary applications of face recognition is to efficiently and reliably identify people. This can make access to financial systems more secure, potentially preventing identity theft. It can also make hospital admissions quicker and more accurate, and speed up passport verification. In these applications, a human subject opts-in to using the FRT, cooperating to allow consistency in viewpoint, avoiding unusual facial expressions, and enabling controlled lighting. This leads to highly accurate systems. In many cases, such as using FRT to unlock cell phones, users opt-in to the technology for added convenience and device security. When entering the country, U.S. citizens may opt-in to face recognition systems, and their photos are deleted after 12 hours, while non-citizens are required to participate, with photos retained for 75 years.

Face recognition is also widely used in surveillance and law enforcement. Ten percent of U.S. police departments use FRT. The NYPD made 2,878 arrests resulting from FRT in the first five years of its use. The Metropolitan Police in London report 100 arrests using FRT in conjunction with mounted security cameras, including a suspect accused of kidnapping. Police in New Delhi used FRT to identify almost 3,000 missing children, and FRT has been used to identify refugee children who have been separated from their family. The National Center for Missing & Exploited Children (NCMEC) has used a tool called Spotlight, which makes use of FRT, to identify children who are victims of sex trafficking. In 2023, the FBI worked with NCMEC to identify or arrest 68 suspects of trafficking. A large number of retail stores use FRT to track customers to understand traffic patterns, and despite the Rite Aid case, retailers such as Wegmans still use FRT to spot accused shoplifters. Immigration and Customs Enforcement (ICE) is using FRT to identify people and determine their immigration status.

Face recognition has been widely used for surveillance of the Uyghur population by the Chinese government.^, FRT are used by the Israeli government to track and surveil Palestinians.

These applications of face recognition can solve crimes, enhance security and make access more convenient, but also raise troubling concerns about mass surveillance, repression of civil liberties, and high-stakes errors which materially harm people. In surveillance and criminal investigations, subjects are not cooperative, and probe images used are often of poor quality, as illustrated in Figure 1, which produces much higher error rates. An awareness of mass surveillance can also have a chilling effect on people’s ability and willingness to participate in Constitutionally protected activities such as protest or dissent.

As face recognition has grown more practical, a large number of companies have developed and marketed FRT. This includes large tech companies such as Amazon, Microsoft, Toshiba, NEC and Apple, and smaller companies that focus more narrowly on face recognition, biometrics and security, such as Clearview, Idemia, and Rank One Computing. Clearview is one of the most widely used by federal and local law enforcement in the U.S.

Early in the development of face recognition technology, the best performing systems were produced by academics and used openly available architectures and data. However, with its rapid commercial growth, state of the art FRT are generally developed by companies that provide little transparency about how they work or what data they use. As we will discuss in more detail, the National Institute of Standards and Technology evaluates the performance of some of these systems, but this evaluation is voluntary and not all companies participate.

Face Recognition Performance Across Different Conditions

Face recognition performance has improved rapidly, but recognition can still be quite difficult in many settings.

Two types of errors can occur in face recognition. With false positives, a FRT incorrectly states that two images come from the same individual. With false negatives, the system incorrectly states that two images do not come from the same individual. The cutoff is what determines the balance between false positives and false negatives. Tightening it makes the system more cautious about declaring a match (reducing false positives) but also more likely to miss legitimate matches (increasing false negatives).

Figure 3. The ACLU found that Amazon’s face recognition system matched 28 members of Congress to mugshots of other people.

The significance of this cutoff is illustrated well by the American Civil Liberty Union’s (ACLU’s) evaluation of Amazon’s FR system, “Rekognition” and the subsequent controversy. The ACLU reported that they had tested Rekognition, and that it incorrectly identified 28 members of Congress with people who had committed crimes (Figure 3). A significantly disproportionate number of these false matches were people of color. Amazon responded by arguing that although the ACLU had used the default cutoff, or confidence threshold, of 80% for Rekognition, this was more appropriate for finding celebrities on social media, and that their documentation recommended a much more stringent cutoff of 99% for use in high stakes applications such as law enforcement. Amazon also pointed out that the bias in the results may have been due to bias in the gallery of images used by the ACLU. If the ACLU compared images to a gallery that disproportionately contained people of color it would be more likely to produce false matches for people of color in congress. The ACLU replied by stressing the dangers of a system that was inaccurate with default thresholds and a lack of guidance for the system’s use.

One lesson from the Amazon Rekognition controversy is that the potential harms of an FRT depend not just on its technical accuracy but also on how users apply these systems. It also provides some indication that Rekognition was more prone to false positive errors when applied to people of color, at least at one significant cutoff threshold.

Figure 4. Three images of a researcher at the National Institute of Standards and Technology. The left image simulates a passport or similar photo, the middle image simulates images that might be taken while going through immigration, the right image simulates an image taken by a kiosk.

Figure 5. Two pairs of images, each pair shows the same person under identical imaging conditions except for a change in lighting (images from the Multi-PIE dataset).

Challenges in Real-World Face Recognition

The most rigorous experiments measuring face recognition accuracy are conducted under tightly controlled conditions. As a result, reported performance often overstates how systems perform in real-world settings, where error rates can be much higher.

The difficulty of face recognition tasks can vary widely. Frequently, identification is performed by performing verification between the probe image and all gallery images. Identification becomes more difficult as the gallery size grows and the number of opportunities for false positive matches increases. The difficulty of face recognition tasks also depends very much on the conditions under which images were taken. For example, in border control, the subject can be required to face the camera with their face fully visible, lighting can be controlled, and camera quality can be ensured.

Figure 4 shows that even images taken at a kiosk can be much harder to match, due, for example, to changes in viewpoint. Figure 5 illustrates the effect that a change of lighting can have on the difficulty of matching faces. As previously shown in Figure 1, when images come from surveillance cameras, the subject may not be facing the camera, they may not be close to the camera, so image resolution can be low, and their hair or hand or another object may obscure part of the face. Identification with poor imaging conditions may have many orders of magnitude more errors than verification under tightly controlled conditions.

By all metrics, there seems to be little doubt that face recognition accuracy has been improving rapidly. The National Institute of Standards and Technology (NIST) Face Recognition Vendor Test (FRVT) evaluations illustrate this increase (most recent results here). NIST evaluates verification performance on two high quality images of frontal facing individuals. From 2020 to 2025 the error rate fell by a factor of three. (They set a threshold for matching to achieve a false positive rate of 0.003%, so about one false identification in 33,000 attempted matches. They then measure the false negative rate, the number of correct matches missed. The best performing system as of January 2025 achieved a false negative rate of 0.13%, a little more than one correct match missed in 800.) Similarly, the error rate on an identification task that matched a mug shot probe image to a large gallery of mugshots fell by a factor of 5 during the same period. (The best performing method, when using a threshold to produce a false positive identification rate of 0.3%, had a false negative error rate of 0.05%. This means that the system would falsely identify a probe image in the gallery (of 1,600,000 mugshots) one time in about 300, while missing a correct match about one time in 2,000.) Some results are shown in Figure 6, as of March 2025. Over a period of decades, NIST has found that errors have generally fallen by about a factor of two every two years. Under controlled conditions, FRT are now much more accurate. For example, on the best performer as of March 30, 2026, when performing verification on two mugshots, using a cutoff set to make a false positive match one time in a million, a false negative failure to find a match will occur one time in 500. This sharp increase in accuracy in a short period has happened alongside widespread adoption in applications like border control or unlocking a phone.

These experiments represent relatively ideal conditions. FRT in the real world may face much higher failure rates. This can occur due to more challenging imaging conditions, such as using a surveillance image as a probe, instead of a mugshot, or other factors such as changes in the subject’s appearance. For example, when the best performing system at mugshot identification is applied in a scenario in which the gallery contains visa images and the probe is taken from a kiosk, the error rate increases by a factor of about 18 with a false negative error about one time in 30 instead of one time in 500. This is a fairly typical increase, and still represents relatively idealized conditions compared to the most challenging ones.

Defining and Measuring Bias in Face Recognition

Face recognition performs with different levels of accuracy on different demographic groups. As face recognition becomes more accurate, this may limit the effects of this disparity in some applications, but it can still be quite significant in high-stakes applications.

Going back more than 30 years, researchers have observed different rates of accuracy in face recognition systems depending on demographic properties of the subject, including race, gender and age. For example, in 2011 a study showed that Western face recognition algorithms performed better on Caucasian faces than East Asian faces, while East Asian face recognition systems performed better on East Asian faces than Caucasian ones. In 2018, the influential Gender Shades paper examined differential performance not in face recognition, but in a related facial analysis problem of determining gender from a face, showing much poorer performance on images of dark skinned females than light skinned males.

Absolute vs. Relative Error

In considering differential performance, it is important to distinguish between absolute and relative differences in performance. We define the absolute difference in two error rates as the difference between the larger and smaller error. For example, if an FRT produces 2% error on male faces and 4% error on female faces, we would say that the absolute difference is 4% – 2% = 2%. We describe the relative error as the ratio between the larger and smaller value, which in this case would be 4%/2% = 2. As overall performance improves, the absolute error tends to decrease, while the relative error rate might or might not decrease. For example, if a new generation of FRT reduces error on male faces to 1% and reduces error on female faces to 2%, absolute error decreases from 2% to 1%, while relative error remains constant.

Whether absolute or relative error is more important depends on the operational considerations and use of the system. When performance is very high, absolute error will tend to shrink. If this translates into operational settings, then relative error may become unimportant. For example, if an FRT makes a mistake once in a billion queries on one population, and twice in a billion on another, errors for either population may be so rare that they are insignificant. In practice, the impact of absolute error also depends on how widely deployed a system is. As systems become more accurate, they may become more widely deployed, which can paradoxically result in more accurate systems producing more errors.

Even though current FRT achieve quite low error rates under ideal conditions, these error rates tend to grow much higher under more challenging conditions, and errors can be quite common. Although it is difficult to study error rates accurately under the most challenging conditions, high relative error under ideal conditions may predict relative error that is just as high or higher under challenging conditions that also have high absolute error. That is, while absolute error in operational contexts is of greatest importance, relative error in highly controlled conditions may predict high absolute error in less controlled conditions. Consequently, it is premature to think that FRT are so accurate that relative error is no longer important. A more nuanced view would hold that continuingly high relative error rates may be less important for some applications, such as unlocking phones, and still be quite important in other applications, such as criminal investigations.

NIST Experiments on Demographic Variation

Since 2019 NIST has performed extensive evaluations of demographic variations in performance on hundreds of face recognition systems. They have access to large collections of non-public images that they use to evaluate FRT submitted by companies. The large size and private nature of the dataset makes it especially unlikely that models are overfit to the data by, for example, selecting parameters that boost their performance on this particular data. NIST computes false negative rates using over a million pairs of images, comparing one high quality image of an individual to a medium quality image of the same person. False positive rates are computed using over a billion pairs of high quality images from different individuals. Image quality reflects applications such as passport checks at airports, but does not include more challenging problems such as police investigations using surveillance footage. All images come with demographic information, including the age, gender and country of origin of the subject. Country of origin is used as a proxy for race, focusing on countries that are less racially diverse, but this is not a perfect proxy.

NIST finds a relatively small demographic variation in false negative rates, in which a correct match is missed, and a much larger variation in false positive rates, in which an incorrect match is accepted. For example, the top performing FRT as of March 2025 produced 358 times as many false positives for West African females over 65 as for Eastern European males aged 35-50, with the false match rate increasing from about one in 15,000 to about one in 50. Among the top ten performing systems, the false positive rate for all West Africans was about 23 times higher, on average, than the rate for Eastern Europeans. The false positive rate for these performers on average is about 4.6 times higher for females than males, and about 2.9 times higher for people over 65 compared to people aged 20-35. The evaluations also show poorer performance on people from South or East Asia, relative to Eastern Europeans. Many additional studies have also found that FRT generally perform better on white people than people from other racial groups, and on males compared to females.

These studies do have important limitations. More narrowly defined groups (e.g. West African women over 65) will have less data, leading to noisy estimates, and when we take the ratio of two noisy estimates we amplify the noise. Also, images taken in different countries may differ in ways beyond the race of the subject, such as in the types of cameras or lighting used. Also, incorrect labels may have a significant effect on accuracy. If a visa photo is associated with the wrong name, this can lead to a false match, and these incorrect labels may be more prevalent in some countries than others. Finally, measures of bias may vary depending on the specific ways in which performance is measured. The chief scientist of a leading face recognition company has stated that in practice they find differential performance between racial groups of a factor of approximately 1.5, rather than the higher numbers found in NIST studies. (Brendan Klare, personal communication.)

Challenges in Measuring Bias in Face Recognition

There is decades of evidence of differential performance of face recognition between demographic groups, particularly affecting non-white people and females. However, these studies generally make use of relatively high quality images, and may not accurately reflect the degree of differential performance in challenging operational cases, such as the use of surveillance footage in criminal investigations or in identifying people on a watch list. This is due to the fact that it is quite difficult to accurately characterize and sample images from challenging environments. And while large scale photo collections with known identities and some demographic information exist, such as passport photos, we do not have large scale collections of photos taken in challenging conditions that have this information. While this problem is elusive, there is some evidence that differential performance increases with the difficulty of the recognition task.

Another limitation occurs because races are not well-defined biological categories but social constructs. It is not clear how to systematically divide a population into different races, especially in the case of multi-racial individuals. This is particularly challenging when images are scraped from the internet, and need to be labeled by race. Some studies have focused on skin darkness rather than race, but this is also difficult to determine accurately from photos due to the effect of unknown lighting conditions on apparent skin color. In spite of these limitations, there is a clear consensus among researchers that differences in FRT performance exist between racial groups.

An important question is how differential performance in face recognition is evolving over time. Is this a problem that was initially ignored, but is now being effectively addressed, or one that is recalcitrant? While there is no question that absolute differences in accuracy are shrinking over time, as FRT become more accurate, the behavior of relative differences is less clear. This is difficult to judge, since new test sets come out frequently, and experimental performance is generally measured over an ever changing landscape of conditions. Perhaps the most stable evaluation framework is NIST’s, which has consistently evaluated new FRT under the same conditions including systems developed from 2018 to 2026. Some of the top performing FRT have evolved, with multiple versions being released over this time period. When we examine these, we see that some have significantly reduced the amount of bias over time, while others have not, and have even seen increased bias. This suggests that it may be possible to reduce systematic bias through model design. More details can be found in the appendix.

Sources of Bias in Face Recognition Systems

Bias in face recognition systems arises from a combination of imbalanced training data, differences in image quality and gallery composition, and other technical and operational factors that are difficult to fully control or eliminate.

False negatives often arise when image quality is poor or facial features are obscured, while false positives are more likely when different individuals appear similar to the system, which can be exacerbated by limitations in training data or representation. For example, if we compare two images of the same person, and one of these images is blurry or has bad lighting or low resolution, the images may appear dissimilar due to these effects. FRT are trained to be somewhat robust to changes in viewing condition, but they are still likely to make errors when these changes are large. On the other hand, if a system is trained using few images of one demographic group, the system may not learn representations that distinguish between a wide range of appearances within that group. For example, if one trained an FRT using images of only one Black person, the system would likely learn to associate dark skin with that individual, and would not learn features that effectively distinguish between different Black people. This is an extreme example, but it is generally found that deep neural networks become more effective as the amount of relevant data increases.

We focus on false positive errors, as these show the greatest differences across demographic groups and are most closely associated with documented harms, such as wrongful arrests. In this section, we will discuss two key points. First, while it may be straightforward to improve demographic balance in datasets, completely eliminating demographic bias is complex and difficult. Second, while demographic bias in the data may be responsible for some bias in false positives, it is not necessarily the only source of these differences. Various research results present conflicting evidence of the importance of dataset bias in practice.

The Contribution of Dataset Bias

Face datasets collected in the last 15-20 years have generally consisted of images scraped from the internet. This enables the creation of large scale datasets that capture a wide range of variations in viewing conditions. These datasets often used well-known people with many online photos, without specific regard to accurately representing the distribution of people of different races or genders in the population as a whole. For example, an early and very influential dataset, Labeled Faces in the Wild (LFW), consisted of 77.5% images of men and 22.5% images of women. LFW was based on people who had appeared in Yahoo! news stories that were identified in captions, making it easier to build a large dataset of known people. However, these people were obviously not representative of the overall population.

Some more recent datasets pay closer attention to capturing the true distribution of people in the world. However, creating unbiased datasets can sometimes be a subtle and difficult problem. For example, the BUPT-Balancedface (BUPT) dataset was constructed to have equal numbers of images of Caucasian, Indian, Asian and African faces. However, subsequent analysis revealed that the Asian and Indian faces consistently appeared as a larger size in the dataset. So although the number of images was balanced, the viewing conditions of the images could still vary significantly. This discrepancy might, for example, lead to biased performance at test time.

The reason for systematic biases in datasets is often not well understood, but it is plausible that when scraping images from the internet, photos from different countries might follow different conventions, use different cameras, or differ in myriad other ways. Therefore, to judge whether a dataset is biased is not as simple as counting the number of images from each population.

A deeper difficulty is even defining what it means to have an unbiased dataset. BUPT represented four demographics equally. But it is unclear what should count as a racial category. For example, should Asian faces be counted as one category? Should Chinese and Japanese people be considered two separate racial categories? What about multiracial individuals? The concept of race is not biological, but a social construct that is not well defined. It is also problematic to correctly label the racial origins of large scale datasets, which may contain images of millions of people. It seems clear that paying attention to demographic diversity will produce less biased datasets than building datasets based on arbitrary selection of celebrities. However, it is also clear that creating completely unbiased datasets is an ill-defined problem. Even with a given definition of “unbiased” it remains very challenging and beyond current technology.

There is certainly strong evidence that dataset bias can produce differential performance, and bias can be reduced through improving the training data balance. It has been found that while Western face recognition algorithms perform better on Caucasian faces than on East Asian faces, algorithms developed in East Asia perform better on East Asian faces, a result that is likely due to dataset bias. After the Gender Shades paper demonstrated that Microsoft’s gender identification algorithm performed much more poorly on Black women than white men, Microsoft quickly improved performance dramatically on Black women by balancing its datasets.

Differential performance can also occur because of biases in the gallery data or probe data. When the gallery is formed from images scraped from the internet, the properties and number of these images may vary drastically from individual to individual, or even from group to group. It has been shown, for example, that if one group is more highly represented in the gallery, this will lead to more false positives among that group because there is greater potential for the gallery to contain faces similar to the probe. As another example, if one group, such as women, frequently have longer hair that covers more of their face in the probe image, this can also lead to higher error rates. Also, if a gallery image is of low quality, not showing a clear image of the face, it may be matched to a similar low quality probe image of a different person. Rite Aid’s use of low-quality images in its gallery is believed to have contributed to the large number of false matches it produced, which in turn led to customers—disproportionately in non-white neighborhoods—being wrongly flagged, confronted, and sometimes expelled from stores. When companies such as Clearview make use of billions of images scraped from the internet it is extremely challenging to balance these datasets or ensure uniformity in their quality.

Assessing dataset bias in commercial systems is complicated further by the fact that companies generally do not make their datasets publicly available or disclose many details about them. Moreover, NIST experiments on dataset bias do not make use of the galleries used by commercial systems. Therefore any bias due to galleries would not be detected.

Sources of Bias Beyond the Data

Other factors besides data may also significantly influence differential performance. Some experiments have shown that even balanced datasets do not produce equal performance on men and women, or between races, and that sometimes more biased datasets produce less biased and better results. Furthermore, demographic groups may have properties that make them easier or harder to recognize. For example, there may be greater variation in hairstyle in one gender than another, and males in different countries may have different trends in facial hair. If someone has an unusual beard, for example, this may make him easier to recognize, or harder to recognize if he shaves his beard. It is difficult to determine the effects on differential performance of social conventions affecting appearance. It has also been noted that darker skin may require different types of lighting to bring out the facial structure. This could result in more recognition errors for people with darker skin when lighting is not controlled.

In summary, it is clear that extreme dataset bias produces biased results. It is quite challenging to produce perfectly unbiased datasets, and less clear to what extent the differential performance observed in modern face recognition systems may be due to dataset bias, especially since these systems are built with proprietary data that is not open to public examination.

Reductions in Bias Over Time

From a policy perspective, perhaps the most important question is whether companies have the ability to produce less biased FRT. To address this question we examined NIST measurements of the performance of models produced by leading companies. NIST has assessed the degree of bias in multiple models produced over time by some companies, allowing us to see how their performance has evolved. Based on NIST reports, we find that some companies have significantly reduced the absolute and relative bias in their systems in two or three years after initial evaluation, while other companies have not reduced relative bias, and in some cases it has increased, even while absolute bias decreases due to improved overall accuracy. Details of this analysis may be found in the appendix.

These results suggest that companies are capable of reducing bias, although this is certainly not definitive. In a conversation with one of the authors, the chief scientist at a leading face recognition company confirmed that NIST evaluations have helped them identify certain variants of differential performance between racial groups, enabling them to take effective steps to proactively identify and reduce bias whenever the company becomes aware of it. (Brendan Klare, personal communication.)

The Human Factor: Face Recognition Systems as part of a Socio-Technical System

Many errors in face recognition are due not just to mistakes by the technology, but to the way in which people make use of it.

The preceding sections focused on the technical properties of face recognition systems. However, these systems do not operate in isolation. They are embedded in what researchers call a sociotechnical system, in which the technology interacts with human judgment and organizational practices. The real-world effects of face recognition therefore depend not only on technical FRT performance, but also on how human users interpret and act on its results. In practice, this interaction can create distinctive failure modes. For example, users may rely too heavily on algorithmic matches without considering other evidence or fail to appreciate how image quality and threshold choices affect reliability.

Limitations of Human Oversight

Some authors argue that these human factors can be structured to correct for technical weaknesses in face recognition systems. One commentator contends that: “it is stunningly easy to build protocols around face recognition that largely wash out the risk of discriminatory impacts…. A simple policy requiring additional confirmation before relying on algorithmic face matches would probably do the trick… one has to wonder why so few researchers who identify bias in artificial intelligence ever go on to ask whether the bias they’ve found could be controlled with such measures.”

However, empirical evidence suggests that this confidence in human oversight may be misplaced. First, FRT tends to make errors on difficult cases, in which humans also make errors. Studies show that humans are unable to identify many of the errors made by automatic systems. Furthermore, human performance on face recognition suffers from similar differential performance as machine learning systems. Dubbed the other-’race’ effect, it has long been known that humans are more accurate in recognizing faces from their own race than from others (it has been posited that this also stems from dataset bias, in that people encounter more individuals of their own race than of others). Some work indicates that current automated systems recognize faces more accurately than the typical person, and that in some cases, combining a less effective human judgement with an automatic system may actually lead to lower accuracy than simply using the results of the automatic system. Human judgements can in some cases be used to improve algorithmic accuracy but it may be difficult to determine when that is the case. In general, we cannot assume that human judgements will be accurate or that human oversight can be counted on to correct errors made by automatic systems.

Figure 7. Christopher Gaitlin, right, was identified using the security photo on the left.

User Errors

Consistent with these findings, many of the known cases of false arrests due to FRT errors involved questionable practices by investigators. Christopher Gatlin was arrested for the brutal assault of a security guard, after an FRT flagged him as a possible suspect, based on a low quality image (Figure 7). Police steered the security guard to identify Gatlin, in what they later admitted was improper behavior.

Robert Williams was arrested for burglary one year after the crime, based on applying FRT to a surveillance video. Lacking witnesses, police showed the surveillance video to an employee of the store’s insurance company, who identified Williams from a photo array, although the video was of poor quality and his face was obscured by a shadow (Figure 1). The police failed to take basic steps such as investigating Williams’ alibi. The police chief at the time, James Craig, said that “this was clearly sloppy, sloppy investigative work.” In other cases, police have shown a single suspect’s photo to a witness, violating best practices by being unduly suggestive. This led to an arrest despite the suspect’s convincing alibi.

In cases where FRT lead to false arrests, it seems that police may in fact give undue weight to the results of FRT, rather than catching their errors, an example of “automation bias”. In another case in which recommended procedures were not followed, police were unable to obtain face recognition results due to the low quality of the surveillance image. A detective felt that the surveillance image resembled the actor Woody Harrelson, and used a picture of him to search for matches, rather than the suspect’s photo.

Failures in the use of FRT occur not only in police investigations. In the Rite Aid case mentioned in the introduction, the FTC’s complaint highlighted not just algorithmic errors but significant governance failures in how the system was operated by store employees. The commission found that Rite Aid did not take reasonable steps to train or oversee store employees who were responsible for acting on match alerts, including failing to teach staff how to interpret alerts or warn them that false positives could occur. The company also failed to test or monitor the technology’s accuracy once deployed, enforce image-quality standards, or implement any procedure for tracking false positive alerts and employee responses. As a result, employees in hundreds of stores routinely followed, confronted, searched, or even called police on customers based solely on system alerts—actions taken without meaningful training on the system’s limitations or appropriate safeguards. These shortcomings in training, oversight, and procedural controls were central to the FTC’s determination that Rite Aid had failed to prevent foreseeable consumer harm from the technology’s use.

In summary, it may be difficult for humans to correct mistakes made by algorithms, and in some cases they may place undue confidence on FRT results that are questionable and based on low quality images. In many applications, such as drug stores that are looking for known shop lifters, the people making use of FRT may not be expert investigators or well trained in the appropriate use of these systems.

Policy Interventions to Address Bias in Face Recognition Systems

Many errors can be addressed by better understanding and regulation of the way in which the technology is used.

A wide variety of policy interventions are available to deal with potential harms caused by bias in FRT. These include research, transparency in documenting bias, voluntary or mandatory guidelines governing the use of face recognition, and outright bans on the use of face recognition in certain contexts. As noted above, FRT make positive contributions in law enforcement and other applications, and these positives must be weighed against potential harms in crafting policy. Numerous institutions have suggested policy changes to address bias in FRT,including a comprehensive set of proposals in a recent report from the National Academies.

Research

Federal agencies already support substantial research on face recognition. NIST conducts ongoing evaluations of performance and demographic disparities, and agencies such as the Office of the Director of National Intelligence (ODNI) and the Intelligence Advanced Research Projects Activity (IARPA) have funded foundational research in face recognition systems. However, important gaps remain, particularly in understanding how these systems perform under operational conditions and how human users interact with their outputs. Additional federal funding could expand independent research in these areas, either by strengthening NIST’s evaluation programs or by supporting academic and nonprofit research focused specifically on bias mitigation and real-world deployment risks.

Two research priorities are especially important. First, evaluation frameworks should better reflect real-world conditions. Current large-scale benchmarks often rely on relatively high-quality images, whereas many high-stakes uses—such as criminal investigations—depend on low-resolution or poorly lit surveillance images. While efforts such as the IARPA Janus Surveillance Video Benchmark (IJB-S) dataset have begun to address this issue, broader and more systematic testing under operational conditions would provide policymakers with a clearer understanding of real-world risk.

Second, research is needed to develop tools that help human operators interpret and appropriately limit their reliance on face recognition results. For example, systems could assess probe image quality, estimate the likelihood that a reliable match can be produced, and warn users when results are unlikely to be dependable. Such tools could reduce the risk that investigators or retail employees draw strong conclusions from low-quality, unreliable inputs.

Measure and Reduce Bias

A better understanding of the bias in FRT can inform the procurement decisions of potential customers and encourage companies to take steps to reduce bias. Transparency in bias can be promoted in a number of ways. NIST is already conducting regular and impactful evaluations of bias in FRT, which can be thought of as an application of the Common Task Method (such evaluations have long been common in the computer vision community). This can be continued and potentially expanded. Regulations or government procurement guidelines can be used to incentivize or require companies to participate in evaluations and make these results public. Since criminal investigations are conducted by the government, procurement guidelines are a strong potential lever in promoting transparency. In addition to transparency in performance, these approaches could also be used to promote transparency in the data used to train FR systems. Making training data public may raise significant privacy concerns, but the government could incentivize the release of information describing the data and the steps taken to enhance the demographic balance of these data sets.

Regulate Sociotechnical use of Face Recognition

If we view FR as part of a sociotechnical system, it makes sense also to govern the way in which face recognition is applied, not just the technical performance of the underlying algorithm. In practice, “responsible use” protocols need to specify who can run searches, what minimum image-quality standards apply, what form results can take, and what documentation and oversight are required. They should also define the permissible purposes for which searches may be conducted, restrict access to trained and certified personnel, require supervisory approval for high-stakes uses, and mandate that face recognition results be treated only as investigative leads rather than as dispositive evidence. Protocols can require minimum similarity thresholds below which no candidate match is returned, prohibit the use of face recognition on images that fall below objective quality metrics, and require contemporaneous documentation explaining why a search was initiated and how results were interpreted.

Additional safeguards could include audit trails of all searches and outcomes, periodic independent audits of performance and demographic disparities, disclosure requirements when face recognition contributed to an arrest or charging decision, and exclusionary consequences if required procedures are not followed. Agencies could also be required to collect and publish aggregate statistics on the number of searches conducted, the rate at which matches lead to arrests, and the frequency of erroneous identifications.

As an example of governance procedures, the FBI has established guidelines on the use of face recognition. These include limiting situations in which it can be used and the type of probe images used. They require that all face queries be evaluated by trained examiners and mandate that face recognition be used for investigative leads that must be corroborated.

As another example, the New York City police department (N.Y.P.D.) has spelled out a detailed protocol for the use of FRT. This requires investigators to submit face images to a special facial identification section of the department (the Real Time Crime Center, Facial Identification Section) that will, for example, ensure that image quality is sufficient and that use of FRT is warranted. The section can reject unsuitable probe images and reviews matches. Critically, a “possible match candidate” is meant to be “treated as an investigative lead only” and does not establish probable cause to make an arrest. The unit also retains records of searches and results. It has been reported that in other localities, investigating officers have accessed FRT directly, without supervision. Specific requirements could be mandated, with legal consequences if they are not followed, such as disallowing evidence produced in subsequent investigation.

However, in spite of N.Y.P.D. guidelines, FRT did lead to the false arrest of Trevis Williams. After FRT identified him as a suspect in a crime, the victim identified him from a photo lineup, although he was eight inches taller and 70 pounds heavier than her initial description of the suspect, in addition to other exculpatory evidence. This illustrates the difficulty of ensuring that guidelines effectively prevent errors and false arrests.

Regulation may be applied not only to government agencies, such as police departments, but also to private companies that are increasingly deploying face recognition systems in commercial settings. RiteAid’s use of face recognition illustrates how governance failures can arise outside of law enforcement. According to the FTC complaint, “Rite Aid failed to consider or address foreseeable harms to consumers flowing from its use of facial recognition technology, failed to test or assess the technology’s accuracy before or after deployment, failed to enforce image quality standards that were necessary for the technology to function accurately, and failed to take reasonable steps to train and oversee the employees charged with operating the technology in Rite Aid stores.” These deficiencies were not primarily algorithmic; they reflected a lack of risk assessment, testing, training, oversight, and ongoing monitoring.

The FTC’s enforcement action demonstrates that existing consumer protection laws can be applied to address some forms of misuse. However, as commercial deployment expands, more explicit regulatory standards may be necessary to prevent similar failures. Such standards could require companies to conduct pre-deployment accuracy and bias testing, implement image-quality controls, establish employee training and supervision protocols, monitor and document false positive rates, and assess foreseeable risks before using face recognition in customer-facing environments. Clear statutory or regulatory requirements would provide ex ante guardrails rather than relying solely on ex post enforcement after harms have occurred. Regulations could also require clear disclosure when face recognition is used—both to affected individuals and in aggregate public reporting—so that its role in decision-making can be scrutinized, evaluated, and corrected where harms emerge.

Policymakers should be willing to ask if using facial recognition is appropriate at all in certain circumstances. In higher-risk contexts, policymakers could impose outrights bans, limit use to specified categories of serious crimes, require a warrant, or mandate corroborating evidence before an individual identified through face recognition is included in a lineup or arrested.

As an example of use restrictions, the state of Maryland has limited the use of automatic face recognition to specific, serious crimes, and requires that defense attorneys be notified when it was used in a case. Montana and Utah require police to obtain warrants in the use of face recognition. In Detroit, police must obtain corroborating evidence before placing a suspect identified through face recognition in a line up. Several cities have banned the police use of face recognition, including San Francisco and Boston, while Portland has banned the use of face recognition by private entities in all public places.

At the federal level, members of Congress have introduced legislation that would impose a nationwide moratorium on government uses of face recognition technology absent explicit congressional authorization. Together, these restrictions illustrate a broader policy approach: limiting deployment in high-risk settings until adequate safeguards, transparency, and accountability mechanisms are in place.

Conclusions

Face recognition systems have improved dramatically in accuracy over the past decade, and in tightly controlled environments they now perform at very high levels. At the same time, substantial differences in performance across demographic groups persist, particularly in the false positive errors most closely associated with wrongful arrests and other harms. As overall error rates decline, these disparities may matter less in low-risk settings, but increasing deployment in high-stakes and uncontrolled contexts may lead to continued harms.

Technical improvements can reduce some sources of bias. Developers can improve dataset balance, adjust thresholds, and refine model design. However, eliminating differential performance entirely is beyond the current state of the art, particularly in operational environments involving low-quality images and large search databases. Policymakers should not assume that continued technical progress alone will resolve these disparities.

Perhaps most importantly, policymakers should view the regulation of face recognition through a sociotechnical lens, considering the interaction between the technical system and the humans who use it.

We cannot wait for perfect sociotechnical systems, but must govern the deployment of imperfect ones. Policymakers must decide where face recognition is not legitimate. If face recognition is used in high-stakes applications, it should be subject to clear limitations, transparency requirements, and enforceable protocols designed to prevent errors from cascading into wrongful arrests or other serious harms.

Appendix: Variations in Bias Over Time

We examined the performance of face recognition systems evaluated by NIST on different demographic groups. All results are based on data on a verification task, updated on March 5, 2025. More recent data on somewhat different tasks shows similar levels of bias. False positive matches are measured when comparing two high quality, visa-like images of two different people of the same sex, age group and region of birth. Demographic disparities are computed by taking the ratio of the false positive rate for two different demographic groups. For example, the ratio of the false positive rate on faces of people born in Western Africa to the false positive rate for people born in Eastern Europe for the highest performing FRT was 17.42, meaning that a false positive match was 17.42 times as likely for someone from Western Africa.

NIST has evaluated differential performance of commercial systems for over five years. Many companies have submitted multiple versions of their FRT over time, as the systems have improved. This allows us to determine how the bias in these systems has changed. We considered the 20 systems with best overall performance, which originated from 12 different companies. Eight of these companies had submitted at least four different versions of their FRT for evaluation, and so we focused on these eight systems.

Figure 8 shows the change in the ratio of differential performance for three pairs of demographic groups. For illustrative purposes, we show results from two different companies. The curves from Sensetime illustrate differential performance that has increased over time, while the curves from Rank One Computing (ROC) show differential performance that has decreased. Solid curves show the ratio of false positives for subjects of West African birth compared to Eastern Europeans. The dashed curves show performance on females compared to males. The dashed-dotted curves show an older age group (65+) compared to a younger cohort (20-35).

Table 1 shows the correlation between the passage of time and the ratio of differential performance for all eight companies. A negative correlation indicates that bias has dropped over time, while a positive correlation shows an overall increase in bias. If the correlation is close to 1 or -1, this means that the change in performance over time is highly consistent, while a correlation close to 0 means that there is no clear trend in the increase or reduction in bias. We can see that Toshiba, Idemia, and ROC have reduced biased performance over all three ratios, while Sensetime has increased bias, with other companies showing mixed performance.

Reclaiming Privacy Rights: A Roadmap for Organizations Fighting Digital Surveillance

Surveillance has been used on civil rights activists, organizations, and protesters for decades by federal and local law enforcement. Some past victims of government spying include Martin Luther King Jr., Angela Davis, Jane Fonda, American Indian Movement, United Farm Workers, and the National Lawyers Guild. These activists and organizations were subjected to traditional surveillance tactics such as wiretapping and infiltration.

Today, surveillance looks different as technological advances have made it increasingly easy to track someone’s whereabouts, communications, and inner thoughts based on browser history, all without leaving an office. This level of digital surveillance has a chilling effect on people’s First Amendment rights, because a person may choose to censor themselves online or be reluctant to engage in political expression, such as attending a protest, due to their fear of being watched and retaliated against.

This report is the result of research that tries to answer the fundamental question: what can civil society do to fight back against the growing trend of widespread digital surveillance, particularly in the state of New York? New York is the focus of this research project because of the state’s widespread use of surveillance technology, particularly in New York City, and the strong activism within the state that works to improve the lives of marginalized communities.

Social justice organizations play an instrumental role in society through their organizing and fighting for civil rights. This report provides these organizations information on current surveillance practices and how these practices may impact the communities that they serve. The first section of the report provides a short roadmap on the recent history of digital surveillance in different contexts such as immigration, environmental justice, criminal legal system, housing, and the workplace. The next section will speak on pending and finalized legislation that could be helpful or harmful towards achieving the obliteration of surveillance. The third section will describe strategies organizations can take to help combat surveillance in their communities. Lastly, the report provides a list of legal organizations that are well versed in this arena and attuned to technological advances.

A Current History of Digital Surveillance

Before diving into action, it’s important to provide an overview of the types of surveillance that many communities may be subjected to. This section will demonstrate how widespread surveillance is and provide background stories on the surveillance activist communities face within the immigration, environmental rights, criminal legal system, housing, and workplace context.

Immigration

There have been a growing number of surveillance tactics used against activists, migrants, journalists, and attorneys in the immigration space. In 2019, NBC 7 San Diego reported that federal agencies were keeping and sharing a secret database of an attorney, journalists, organizers, and “instigators” who had previously worked at the U.S.-Mexico border. The database contained photos of each person, obtained from the person’s passport or social media accounts. It also included personal information such as the person’s work and travel history, names of their family members, and the kind of vehicle they drive. Some of these individuals reported that while traveling across the border, they were targeted for secondary screening. Border agents took their electronic devices and some individuals believed that the agents performed a warrantless search of their device, though they were unable to verify this. Journalists reported that these invasive actions affected their ability to protect their confidential sources. It’s easy to imagine that this unfounded suspicion and investigation could deter activists and journalists from continuing their work.

This isn’t the only incident of ICE keeping an eye on activists. In July 2021, The Intercept reported that U.S. Immigration and Customs Enforcement (ICE) had been surveilling activists and advocacy groups, such as Project South and Georgia Detention Watch, online and in person. This was done under the guise of safety and security as an ICE spokesperson stated “[l]ike all other law enforcement agencies, ICE follows planned protests to ensure the safety and security of its infrastructure, personnel, officers and all those involved.” Internal emails revealed that ICE officials were using Facebook to follow advocacy groups and ICE was tracking the attendees of their events.

Migrants have also been subjected to government surveillance. Over the last several years, ICE has increased its use of electronic monitoring as an alternative to holding migrants in detention centers. Since March 2024,183,935 people have been subjected to electronic monitoring by ICE, with 18,518 of those required to wear GPS ankle monitors. In 2018, ICE launched SmartLINK, an app that allows the agency to track a migrant’s whereabouts. Since April 2024, ICE has monitored over 700,000 people through the app. The agency requires migrants to do periodic check-ins using SmartLINK to confirm the user’s identity through voice recognition, geolocation, and facial recognition technology (FRT). The app has access to the user’s phone camera and has the ability to record audio. If a migrant complies with their check-ins for around 14 to 18 months, ICE may remove the person from the app to make room for new migrants who have just arrived in the country. Users of the app have expressed concern about the app’s location tracking, as it may put their undocumented family members at risk. Users have also stated that the app feels just as restrictive and invasive as an ankle monitor. Thirteen immigrant rights organizations found that electronic monitoring is not only harmful to the user’s livelihood but also hampers their personal relationships and their ability to organize in their community.

Surveillance in the immigration space interferes with the ability of migrants to organize and affects journalistic reporting. It also has the tendency to make migrants afraid of being a part of a community or spending time with their undocumented family members because they are aware that they are being watched. This kind of surveillance puts everyone in their circle at risk.

Criminal Legal System

Surveillance has been used in the criminal legal system for decades, as police often use various spying tools to investigate suspects. However, whereas before police would use agents to track a suspect’s movements, today, law enforcement is able to track a suspect from their desk. Law enforcement has been able to use private companies to obtain a person’s personal data such as their cell phone records, location data, web browsing history, and more. This tracking is not limited to suspects, as law enforcement agencies have been reported to subject activists to this level of surveillance as well.

In 2018, Memphis police were accused of spying on Black activists from 2016 until 2017. Memphis Police Department’s Office of Homeland Security (MPD) was accused of creating a Facebook profile to monitor activists in the area. There was one incident in which a community organizer posted a book on their page, and MPD collected the names of everyone who liked the post. With that list, they created a dossier of those individuals and called it “Blue Suede Shoes”. MPD is far from the only law enforcement agency that has collected a list of organizers, but it is unclear what happens with these lists after they’ve been created.

During the 2020 protests, the world experienced a new level of surveillance at the hands of local law enforcement and federal agencies. In 2021, it was reported that six federal agencies used FRT during the 2020 Black Lives Matter (BLM) protests across the United States. The agencies admitted that they did use this technology to identify individuals but they stated it was used to identify those who they suspected had violated laws. In one instance, police officers were able to arrest a protester after using FRT and receiving a match. NYPD has also been accused of using the technology to identify protesters after the event and charge them with crimes.

Environmental Justice

Surveillance has also been found in the environmental justice space, from both law enforcement and private companies. Shanai Matteson is an artist and climate activist based in Palisade, Minnesota. In 2021, Matteson spoke at the “Rally for the Rivers” event which was organized around protesting a pipeline construction. At the conclusion of the rally, 200 people left and went to the construction site to protest. At some point, the police arrested a number of protestors although Matteson was not one. However, five months later, law enforcement officials found livestream videos of the event, identified who was at the rally, and charged Matteson with a misdemeanor accusing her of conspiring trespass.

During the Dakota Access Pipeline protests, we saw a private company conducting mass surveillance on individuals, in an unprecedented way. In 2016, private security firm, TigerSwan was hired by Energy Transfer Partners to surveil Dakota Access Pipeline protesters. TigerSwan monitored protesters’ social media posts, utilized aerial surveillance, employed informants, and used radio eavesdropping to spy on activists. TigerSwan used this information to make lists of “persons of interests” and pressure law enforcement to be more aggressive against the protesters. The firm also shared their intel with local law enforcement agencies and provided evidence to prosecutors to help them build cases against the protesters. After learning that Lee County, Iowa increased bail for protesters, TigerSwan stated in one of their documents that they needed to work closer with other counties to make sure protesters would be fined or arrested in order to deter them. Because TigerSwan is a private company, it was able to conduct this level of mass surveillance on protesters without much government or judicial oversight.

Housing

One of the areas people may least expect surveillance is within their housing, however those in private and public housing may deal with this issue in the near future. Between 2018 and 2019, residents of a Brooklyn apartment complex organized and resisted their landlord’s attempts to install facial recognition cameras within the building. In retaliation, the landlord threatened the organizers with loitering fines and told them, wrongfully so, that handing out flyers to fellow residents was unlawful behavior. The apartment complex justified their actions by stating that this technology would provide safety and security for their residents.

Public housing facilities have also been accused of installing surveillance systems in their communities without the consent of residents. Some of these systems contain FRT or other forms of artificial intelligence. In Scott County, Virginia, cameras at a public housing facility have FRT that searches for people barred from the facility. In New Bedford, Massachusetts, a surveillance system searches through hours of recordings to locate movement near the doorways in order to identify residents who violate overnight guest rules. The footage has been used to punish and evict residents, who may have a difficult time securing housing in the future as a result of their eviction. While the cameras are only installed in public spaces within these facilities, they still violate people’s privacy rights as residents and their guests are tracked walking to and from their homes, a place that many people consider sacred.

Workplace Surveillance

Workers have been subjected to increasing surveillance over the last few years and one of the most infamous infringers is Amazon. The company has been accused of deploying many tactics in order to stop union organizing such as monitoring employee message boards and private Facebook groups. Amazon has also been accused of posting a job for an intelligence analyst who would be in charge of monitoring labor organizing threats.

Amazon has several resources within their facilities to monitor their employees such as employee ID badges which can be used to track an employee’s location and can allow the company to discover which of their employees are participating in organizing. Amazon facilities have surveillance cameras that are capable of allowing supervisors to track their workers and human monitors who walk around the facilities in order to keep an eye on the workers. Amazon has been accused of identifying union organizers and rotating them throughout the workplace, to prevent the organizers from having prolonged contact with the same employees. One source stated that workers were not allowed to socialize with each other as a manager would come and break them up.

Whole Foods, which is owned by Amazon, has also been accused of using surveillance to track union organizing. It was reported in 2020 that Whole Foods was using a heat map to track stores that could be at risk of unionization based on the distance from the store to the closest union, diversity within the store, team member sentiment, and additional factors.

Digital Surveillance: Where we are now

There have been a few promising federal and state bills introduced in the last few years that would provide vast protections for activists and journalists. On the other hand, there are also recent bills that have been passed that would increase government surveillance and cause more harm to these communities. This section provides a brief overview on where things currently stand.

Federal Legislation

In April 2021, U.S. Senators Ron Wyden (D-OR), Rand Paul (R-KY), and 18 additional senators introduced the Fourth Amendment is Not For Sale Act. For years, data brokers have been able to sell people’s personal information, such as their location data, to law enforcement and intelligence agencies without judicial oversight. Federal law fails to protect people’s data from being sold in this matter, so this bill would work to close this legal loophole and require the government to obtain a court order in order to buy a person’s data. This bill would prohibit law enforcement agencies from purchasing a person’s information from a third party, prohibit government agencies from sharing a person’s records with law enforcement and intelligence agencies, and require the agencies to obtain a court order before obtaining someone’s records. This bill was passed in the House and received by the Senate in April 2024 with little movement since then.

Another promising bill is the Protect Reporters from Exploitive State Spying (PRESS) Act, which was introduced in June 2023 by U.S. Senators Ron Wyden (D-OR), Mike Lee (R-UT), and Richard Durbin (D-IL). Law enforcement agencies have been secretly obtaining subpoenas for reporters’ emails and phone records in order to determine their confidential sources. The bill would protect a reporter’s data that is held by a third party from being secretly obtained from the government without having an opportunity to challenge the subpoena. As of now, this bill has passed the House and has been received in the Senate and referred to the Committee on the Judiciary.

On the opposite end of the spectrum, there has been legislation passed that expands surveillance such as the National Security Supplemental Appropriations Act bill, which was introduced in February 2024 and passed in April 2024. The bill provides $204 million to the FBI for DNA collection at the border. $170 million goes towards autonomous surveillance towers, mobile video surveillance systems, and drones at the border.

Digital Surveillance in New York State

Turning to New York specifically, there has been some positive movement towards obtaining information on the prevalence of government surveillance and curtailing the recent overreach as well. Recently, the NYPD was ordered by the New York Supreme Court to disclose 2,700 documents and emails related to its surveillance of the 2020 BLM protests between March and September 2020. This information can provide some clarity into the mystery around what surveillance tools were used during this time period, since much of the information known about this time period has come from FOIA requests instead of the NYPD voluntarily disclosing their surveillance practices.

In 2020, the Public Oversight of Surveillance Technology (POST) Act passed. This act required the NYPD to disclose the surveillance tools it uses and publish the impact of those technologies. NYPD is required to publish reports on these surveillance tools, informing the public about how it plans to use these tools and the potential impacts on New Yorkers’ civil liberties and rights. The Brennan Center has written about the shortcomings of the law, largely due to the NYPD failing to adhere to the provisions. In February 2024, a bill adding provisions to the POST Act was introduced to the New York City Council. The provisions would require NYPD to provide the Department of Investigation a list of all surveillance technologies currently in use and provide their retention policies for the information they collected from the technologies. This bill was referred to the Committee on Public Safety in February 2024.

How to Take Action Against Surveillance

There are numerous ways organizations can take action in order to combat the use of mass surveillance in their communities. This section will provide a few examples of actions that organizations can undertake in protecting their community right now, such as legislative action, forming working groups, sharing protest safety procedures, conducting Freedom of Information Act (FOIA) requests, and spreading the word.

Legislation

As demonstrated above, legislation can provide a promising avenue towards ending the overreach of widespread government surveillance of vulnerable communities. It’s important for organizations to have journalists who are willing to report on the issues their community may be facing, such as in the immigration space. The PRESS Act can help journalists who travel to the U.S.-Mexico border to report on issues affecting migrants and humanitarian organizations. Unfortunately, these journalists have been subjected to intimidation tactics while working on their stories which may prevent them from continuing their work. The PRESS Act would prevent government agencies from secretly obtaining subpoenas for reporters’ sources, but there is additional legislation needed to prevent law enforcement agencies from targeting journalists, activists, and attorneys who are providing assistance to migrants. Law enforcement should be prevented from performing warrantless searches, interrogating these individuals about their work without just cause, and creating dossiers of these individuals with illegitimately obtained personal information.

Legislation would also immensely benefit future protesters exercising their rights to free speech and assembly, and could have prevented many harms that occurred during the BLM protests. Since those protests, a few states and around 18 cities, such as Boston and Portland, have passed legislation banning government agencies from using FRT or layed out restrictions on how the technology can be used. But years later, some of these governments would roll back this legislation and allow law enforcement to utilize the technology to investigate crimes, such as New Orleans and Virginia which initially banned local police from using the tool. Vermont, a state that previously had a near complete ban on police use of FRT, passed legislation that would allow the police to use it for investigations in certain instances. Pushes can be made in New York and elsewhere to persuade legislators to care about privacy concerns as much as they care about crime.

Legislation can also be pushed to prevent government agencies from surveilling residents in public housing while they are at their homes. Additionally, legislation can prevent law enforcement agencies from making dossiers of individuals based on the content the person follows or likes on social media. There is a lot of room for growth in this arena since the law has failed to keep up with technological advances. Advocacy organizations can propose or draft bill text with other organizations, meet with legislators, or sign onto letters in support or opposition of pending bills related to digital surveillance and data privacy rights.

Form a local working group to review proposed technology

In 2020, Syracuse mayor Ben Walsh formed the Syracuse Surveillance Technology Working Group, which provides residents an opportunity to comment on proposed uses of surveillance technology by city departments. The group is composed of 12-15 individuals from different community groups in Syracuse, as well as some City of Syracuse employees that are selected by the mayor.

When a city department is interested in utilizing a technology, they submit the request to the working group for review. The group advertises to the public through social media and local news channels to get widespread input. The group obtains comments from the public about their opinion and concerns about the technology and the group conducts their own research as well. The group then produces a report for the city with recommendations and explains how the technology may affect the Syracuse community. The mayor then approves or disapproves of the technology based on the report. Thus far, the group has reviewed automated license plate readers, body-worn cameras, street cameras, and more.

This working group provides the public an opportunity to conduct their own research on the proposed technologies and voice their opinions in a public forum. With many local government agencies wanting to explore the use of technologies like facial recognition, this could give activists a chance to have their opinions heard on these issues before they are implemented. This working group concept could be incorporated in other cities and provide some oversight and input into surveillance technologies that local agencies are utilizing on their residents.

Share protest safety procedures

There are a few measures organizations can recommend to help individuals protect their privacy while they are at a protest. The Surveillance Technology Oversight Project has done a wonderful job creating a safety guide for protesters who wish to protect their digital privacy while organizing. The guide provides information on protecting location data, DNA, and cell phone data. Some of the tips include turning one’s cell phone on airplane mode so that location cannot be tracked, considering what information one posts and shares on social media since it can be observed, and consider what transportation one takes to the protest as vehicles could be tracked via automated license plate readers. This information could be shared by organizations within their communities to ensure activists are doing what they can to protect their information as well as their fellow co-activists. Following these recommendations could prevent activists from being unjustly targeted by law enforcement, such as in the case of Shanai Matteson, the climate activist and artist in Minnesota referenced earlier.

FOIA requests

Another avenue organizations may want to explore is FOIA requests, which can help an organization and the public understand what kind of surveillance their community is being subjected to. There is a cloud of secrecy surrounding which tools government agencies use to surveil people, largely because agencies refuse to share this information with the public without legal force. As stated above, the NYPD was recently ordered to turn over records that would reveal how they used FRT against BLM protesters. It is essential to have this kind of information as it will help organizations discover how law enforcement utilized this tool and help organizations fight against future use. Almost all of the stories featured above were derived from an organization submitting a FOIA request and obtaining internal documents that revealed how communities were being harmed by a government agency.

Sometimes, a party may refuse to comply with a FOIA request and the situation will escalate to legal action. As an example, in 2024, Just Futures Law, Mijente Support Group, and the Samuelson Clinic filed a lawsuit to force ICE to comply with a 2021 FOIA request that ICE failed to respond to. Because these situations can turn contentious, it’s important to have legal support when pursuing a FOIA, which can come from an attorney, a law firm, or law school clinic.

Spread the word

In order to combat these issues, people have to be informed about the mass surveillance that they are subjected to on a daily basis. Many people have expressed the sentiment “If you’re not doing anything wrong, you have nothing to hide”; however, they may not be fully versed on the implications of surveillance on vulnerable communities who have done nothing to warrant this invasion into their privacy. Some ways to spread the word can include holding public meetings on various surveillance topics with speakers, organizing against local surveillance tactics and publicizing the action, speaking with community members to see if they’ve noticed any surveillance tactics in their neighborhood, and working with other social justice, tech, or legal organizations. As stated above, a legal organization or clinic can help social justice organizations litigate FOIA requests that are not complied with as well as provide assistance with other kinds of litigation as needed. Social justice organizations can also work with think tank organizations to produce reports on civil rights violations and inform the public of rising issues. After the report is released, organizations can sign onto a letter calling on the government to stop an action or support an action.

Conclusion

There is much work to be done in the digital privacy space as the law has failed to keep up with the advancement of technology and rising surveillance concerns. However, everyone is capable of becoming well-versed in these issues, pushing for change on the state and federal level, and spreading the word throughout their communities. Privacy can be nearly impossible to achieve on an individual level, but together we can fight against efforts that degrade, dehumanize, and obstruct freedoms in our society.

Appendix 1. Legal Organizations to Know

Legal organizations in the privacy space are well versed on surveillance issues and could help social justice organizations know where to turn when individuals they serve come under surveillance. The following organizations are prominent in the privacy rights space and are performing groundbreaking work to combat government overreach.

ACLU

The ACLU has a Project on Speech, Privacy, and Technology department that focuses on the right to privacy, ensuring individuals have control of their personal information, and protecting individual’s civil liberties as new advances are made in science and technology. The project focuses on consumer privacy, internet privacy, location tracking, medical and genetic privacy, national ID, privacy at borders and checkpoints, surveillance technologies, and workplace privacy.

S.T.O.P

The Surveillance Technology Oversight Project fights government surveillance through advocacy and litigation and hopes to transform New York into a pro privacy state. S.T.O.P. organized over 100 organizations to get the POST Act approved in the state, has sued city and state agencies for records pertaining to a variety of issues such as NYPD’s use of FRT, and also publishes research papers on different surveillance technologies.

Brennan Center for Justice

The Brennan Center for Justice is a nonpartisan law and policy organization that works to defend democracy and justice. One of their initiatives is privacy and free expression. Through that project, the Brennan Center works to challenge mass surveillance policies that are overreaching and works to inform the public about these issues. The Brennan Center has been a leader in challenging the structure of the Foreign Intelligence Surveillance Court and the fact that the court only hears from the government when government agencies seek to obtain people’s data.

Center on Privacy and Technology

The Center on Privacy and Technology is a think tank focused on privacy and surveillance law and policy. The Center fights back against surveillance by conducting long-term investigations, research, and publishing reports with their findings. Their most recent report, Raiding the Genome: How the United States is abusing its immigration powers to amass DNA for Future Policing, discusses migrants having their privacy rights invaded by the Department of Homeland Security (DHS). DHS is taking DNA samples from detainees, which is later stored in the FBI’s database, CODIS.

Just Futures Law

Just Futures Law works alongside activists, organizers, and community groups to dismantle mass surveillance, incarceration, and deportation via advocacy and legal support. They have worked on ending ICE digital prisons, stopping data brokers from selling people’s data to ICE which could lead to deportation, ending the digital border wall, protecting driver data from being turned over to ICE, amongst many other projects.

What do we mean by ‘trust and fairness’ in AI?

On fairness

On public trust

How can existing law be used as a tool for trust and fairness?

The challenge of implementation: why good intentions fail to produce trusted outcomes

Policy Levers to Advance AI Fairness and Build Public Trust

Guardrails in Government Use of AI

Public Engagement

Sector-Specific Interventions

Redress and Remedies

From Ideas to Action

Challenge and Opportunity

Harms to Communities from Rapid Expansion of AI Infrastructure

Centering Community Needs in AI Infrastructure Development

Why Community Benefit Agreements?

Why now?

Plan of Action

Recommendation 1. Policymakers (and CBOs and community leaders negotiating on behalf of communities) should utilize specific provisions to address harms and provide mitigations, to increase transparency, and to steward ongoing governance and accountability.

Harm Remediation

Transparency, Governance & Accountability

Recommendation 2. Policymakers and CBOs negotiating on behalf of communities should require investment in communities as a baseline condition for any equitable agreement.

Recommendation 3. Policymakers (and/or community negotiators) should proactively identify and put the supporting mechanisms in place for meaningful representation, negotiation, enforcement, and accountability.

Conclusion

Summary Table 1. Key Provisions of Data Center CBAs

Challenge and Opportunity

Inert Fines, Federal Constraints, and State Action

Successful Lawsuits Against Defective Algorithms, Addictive Product Design

Plan of Action

Putting Settlement Dollars to Work

Table 1. Settlement Funds, Examples from Finance to Public Health

Recommendation 1. Establish a Digital Resilience Fund

Conclusion

Acknowledgements

Contents

Introduction

Face Recognition Technology Has Caused Significant Harms

Face Recognition Technology is Increasingly Widely Used

Face Recognition Difficulty Varies Significantly

What Do We Mean by Bias in Face Recognition?

Outline of the Rest of the Report

1. Improvements in accuracy have not eliminated bias.

2. Bias is difficult to measure and difficult to fully eliminate.

3. Harms arise from both technical errors and how systems are used.

4. Face recognition should be understood as a sociotechnical system.

5. Policy interventions can reduce harms even without perfect technical solutions.

6. Governance of use is as important as improving the technology.

Glossary

How Face Recognition Works

A Brief History of Face Recognition

How Face Recognition Models Are Trained

Face Recognition in Use Today

Face Recognition Performance Across Different Conditions

Challenges in Real-World Face Recognition

Defining and Measuring Bias in Face Recognition

Absolute vs. Relative Error

NIST Experiments on Demographic Variation

Challenges in Measuring Bias in Face Recognition

Sources of Bias in Face Recognition Systems

The Contribution of Dataset Bias

Sources of Bias Beyond the Data

Reductions in Bias Over Time

The Human Factor: Face Recognition Systems as part of a Socio-Technical System

Limitations of Human Oversight

User Errors

Policy Interventions to Address Bias in Face Recognition Systems

Research

Measure and Reduce Bias

Regulate Sociotechnical use of Face Recognition

Conclusions

Appendix: Variations in Bias Over Time

A Current History of Digital Surveillance

Immigration

Criminal Legal System

Environmental Justice

Housing

Workplace Surveillance

Digital Surveillance: Where we are now

Federal Legislation

Digital Surveillance in New York State

How to Take Action Against Surveillance