Move Algorithmic-Driven Pay and Scheduling Systems From Surveillance Pay to Fair Wages

Employers increasingly rely on scheduling, timekeeping, and payroll software to determine hours, eligibility, and pay. When monitoring data and optimization rules feed these systems, or what this memo refers to as “algorithmic wage-setting”—it rarely appears as a standalone tool. It shows up as configured rules and thresholds, time edits, automatic deductions, and eligibility flags that can quietly change compensable time and earnings. A 2025 Equitable Growth brief describes this dynamic as “surveillance pay”—the use of granular monitoring data integrated into pay systems to set compensation and calculate wages in ways that can disconnect time from pay and make outcomes harder to predict, audit, and challenge for discrimination.

States are already moving to regulate surveillance/algorithmic wage-setting, but proposals focus on prohibition and basic notice rights. This memo complements those efforts by centering the enforcement reality: payroll and timekeeping are the system of record and the regulatory choke point. It pairs guardrails on non-job-related data use with an enforcement operating model, audit-ready decision trails, integration/egress mapping, standardized audits and complaints triage, and minimum operational standards, so agencies can prove violations, correct errors quickly, and prevent repeat harm using preexisting wage-and-hour, civil rights, consumer protection, and procurement authority.

Challenge and Opportunity

Core labor protections like minimum wage, overtime, predictable scheduling, and anti-discrimination regulations, increasingly run through proprietary workplace systems that employers and vendors configure, but workers and regulators often cannot see or challenge. As these tools spread across white- and blue-collar industries—including healthcare, retail, logistics, food service, manufacturing, construction, and public services, they can normalize hidden wage loss, income volatility, and unequal treatment, especially when employers use surveillance-derived metrics to change pay tiers, incentives, benefits eligibility, or hours without clear notice or a workable way to challenge errors.

Why payroll and timekeeping are the focus.

In most workplaces, pay and schedules do not come from a single “algorithmic wage tool.” Instead, they come from connected systems that track hours, assign shifts, and apply workplace rules, that then feed into HR and payroll systems, which serve as the official record for compensation.

This memo focuses on payroll and timekeeping/scheduling because that’s where data turns into earnings: wages, hours paid, premiums, bonuses, and benefits eligibility. It is also where states can most realistically require auditable records, set clear limits on what data can influence pay decisions, and enforce worker rights.

Worker data typically flows through a simple data chain:

  1. Capture: timekeeping, scheduling, attendance, and productivity/monitoring tools record events (clock-ins, breaks, shift changes, performance flags). 
  2. Integrate: HR and payroll systems (and their vendors/subcontractors) pull those inputs together and link them to pay rules. 
  3. Decide: configured rules, thresholds, or models trigger pay-affecting actions—time edits, automatic deductions, eligibility flags for premiums/bonuses, schedule adjustments, and pay calculations. 
  4. Pay out: the results appear in payroll as wages, hours paid, premiums, bonuses, and take-home pay.

Because payroll and timekeeping are the official record, regulators cannot rely on the paycheck alone. Instead, regulators need to see and audit the system’s decision trail which includes the data sources that were used, the rule or thresholds that were applied, what changed (e.g., edits, deductions, eligibility flags), and who made (or approved) any changes.

Added risk pathway: third-party intermediaries.

Worker data does not always stay inside a single employer system. In instances, third parties such as verification services, analytics intermediaries, and sometimes data brokers/resellers collect and commercialize worker-related data and feed it back into workplace tools in the form of aggregated scores, flags, or “risk/reliability” signals that can affect scheduling, wages, and/or compensation.

Without clear limits and disclosure on this type of third-party data sourcing and onward sharing for pay and time keeping-affecting decisions (including brokered data and broker-derived scores), non-job-related data can also shape pay and scheduling indirectly while obscuring provenance (who supplied it), purpose (why it was used), and accountability (who is responsible) .

From a regulatory standpoint, the risks typically concentrate in four areas:

  1. Implementation/configuration failures, rollouts, integrations, default settings, or rule changes that trigger underpayment or missing premiums. 
  2. Improper inputs/uses in pay or time keeping decisions, such that non-job-related personal data (including surveillance-derived metrics and brokered inferences) used to set or modify wages, hours, eligibility, or incentives. 
  3. Secondary use and onward sharing (data governance risk), in that worker pay/HR data repurposed, shared, or sold beyond payroll/service delivery, potentially re-entering decision systems as scores, flags, or eligibility signals. 
  4. Black-box accountability gaps, in which systems that prevent workers, unions, and regulators from seeing which inputs and rules produced pay outcomes.

Understanding these is key from a regulatory standpoint because the question then becomes not only what the paycheck says, but what rules and input influenced any changes in wage or compensation calculations and whether those inputs are legitimate and traceable

The recommendations that follow do three things: (1) cut off high-risk data inputs, (2) require audit-ready decision trails, and (3) give workers enforceable rights to notice, explanation, and correction.

Why states should act now.

We already have evidence that algorithmic pay is common in some sectors in the labor markets, and that payroll “modernization” rollouts can cause widespread pay errors when software becomes the system of record. Even if “surveillance wages” is not yet widespread beyond the gig economy, which is the point: states can act upstream, before these tools harden into default infrastructure. At the same time and in parallel, states are also introducing surveillance-pricing prohibition signaling growing legislative appetite to regulate data-driven personalization and discrimination before it becomes default infrastructure.

Below are examples of the ways this trend is taking shape:

These examples show how payroll and timekeeping systems are often the choke point because they encode pay rules, execute pay-affecting actions (like time edits and eligibility flags), and generate, or withhold, the audit trail regulators need to verify compliance. 

Harms this proposal targets (and what we know about scope)

This memo targets a specific set of harms that arise when employers route compensation decisions through timekeeping, scheduling, and payroll systems (often with third-party inputs). 

These harms fall into five buckets:

  1. Hidden wage loss and underpayment.
    Examples include time edits and reclassifications, automatic deductions (e.g., meal breaks), missing premiums/differentials, or misapplied overtime triggers that reduce pay without a clear explanation or easy correction path.
    What we know: wage-and-hour complaints and litigation regularly surface these mechanisms, especially when payroll/timekeeping becomes the system of record.
  2. Income volatility and scheduling instability.
    Automated scheduling and rule-based eligibility can drive unpredictable hours, unstable earnings, and difficulty budgeting, even more so, when rules change inside proprietary systems.
    What we know: volatility is well-documented in app and gig-based labor markets and is a growing concern as similar logic moves into traditional workplaces.
  3. Discrimination and disparate impact at scale.
    Surveillance-derived metrics, proxy variables, and eligibility flags can embed unequal treatment in pay, hours allocation, or access to premiums/bonuses, especially when workers cannot see or contest the underlying rule or data input.
    What we know: civil rights risk is structural when decisioning relies on opaque metrics and limited contestability; disability advocates flag heightened vulnerability due to higher fixed costs and budgeting constraints.
  4. Accountability failures (“black box” enforcement gaps).
    When the system’s decision trail is unavailable, employers can’t explain pay outcomes, workers can’t self-advocate, and agencies can’t prove violations, turning basic labor protections into an after-the-fact guessing game.
    What we know: this is a recurring barrier in investigations and disputes involving payroll/timekeeping platforms and integrated tools.
  5. Data governance harms (secondary use and third-party re-entry).
    Worker pay/HR data may be repurposed, shared onward, or reintroduced via third-party scores/flags (e.g., verification, analytics intermediaries, brokers), shaping pay and scheduling indirectly while obscuring provenance and accountability.
    What we know: third-party ecosystems exist and can influence eligibility/access decisions; the risk increases when data egress and sourcing aren’t disclosed.

Given these harms, this memo seeks to reduce wage loss, volatility, and discrimination by (1) limiting high-risk inputs and secondary use, (2) requiring audit-ready decision trails and integration/egress visibility, and (3) giving workers practical rights to notice, explanation, and correction.

Plan of Action

Recommendation 1. Establish a clear guardrail on compensation data use.

Adopt legislation to create the bright-line ban, scope, and remedies, then reinforce it through existing wage/civil rights/UDAP enforcement and procurement requirements for public employers and contractors.

States should adopt a bright-line rule that bars employers and vendors from using non-job-related personal data, including brokered data and broker-derived scores or classifications—to set or change wages, hours, bonuses, differentials, benefits, or pay eligibility. “Non-job-related personal data” means any data or inference not reasonably necessary and proportionate to determine hours worked, pay owed, or job-related compensation factors, which are limited to seniority, job classification, documented skills/credentials, objective shift attributes (e.g., nights/weekends/hazard pay), location-based cost adjustments, and transparent performance metrics tied to job duties (not biometrics, health inferences, parenthood status, home address, or off-duty behavior). This targets the core risk: opaque, individualized wage manipulation.

To prevent loopholes and misclassification incentives, the guardrail should:

Recommendation 2. Make enforcement practical: require audit-ready records for algorithmic pay and scheduling systems.

Use rulemaking/guidance and enforcement to require decision-trail records and standardized audits, reinforced through procurement requirements for public employers and contractors, and use targeted legislation only if agencies lack clear authority to compel retention/production or to cover vendors directly.

This recommendation targets two recurring failure modes: (1) rollout/configuration errors (especially during integrations) and (2) black-box systems that prevent regulators from showing what the software did and why. Guardrails only work if agencies can access the decision trail behind pay outcomes. 

Agencies already use payroll records/paystubs, time and attendance data, schedules, job classifications and rate tables, and worker complaints. But those records often show only the outcome, not the mechanism; they rarely reveal which rules, inputs, or system changes produced a pay result. To enforce wage and civil rights protections when software mediates pay and scheduling, agencies must also require retention and production of:

These missing records are not “nice to have;” they are the minimum evidence needed to audit pay outcomes when software is the system of record. To close this enforcement gap, states should do two things at once: (1) require retention and production of decision-trail records, and (2) standardize how agencies request, analyze, and enforce them. 

Actions states can take now include:

  1. Modernize payroll recordkeeping. Require employers (and covered vendors where appropriate) to retain and produce audit trails, rule/configuration history, and integration/egress maps as standard payroll records.
  2. Standardize an audit protocol (Labor and State Attorney Generals). Use a shared checklist and data request template to compare system outputs to hours worked/pay owed and identify repeat patterns (missing premiums, unexplained deductions, volatility, disparate impact). A small interagency working group should maintain templates, secure intake, and a vendor/system map.
    • Rapid supply-chain mapping: for each investigation, map (1) payroll/HRIS, timekeeping, scheduling, and monitoring systems; (2) each vendor/subcontractor processing worker data; (3) third-party sources supplying scores/flags; (4) which fields feed which pay/eligibility rules; and (5) any onward sharing/sale of worker data.
    • Audit templates should include both case-level review (individual decision trails) and pattern tests (aggregate metrics that reveal systematic underpayment, volatility, or disparities after rollouts or rule changes).
  3. Use procurement as leverage. For public employers and contractors, require auditability, data retention, worker notice, and cooperation with investigations as contract conditions. Contracts should also prohibit undisclosed sale/sharing of workforce and pay data and prohibit using worker pay/HR data for analytics, benchmarking, or model training unrelated to the contracted service, with audit rights and penalties for noncompliance.
  4. Set minimum standards for pay-affecting vendor practices (rule-setting and procurement). States do not need to regulate every feature of payroll and scheduling software to reduce harm. A practical approach is to set a small set of baselines, enforcement-ready standards through State Attorney General labor enforcement guidance, settlement terms, and public procurement that target the most common ways software drives wage loss and blocks accountability.

To make this action (#4) more concrete, states can start with a brief list of “minimum operational standards” that directly targets the most common ways payroll and timekeeping systems reduce pay and block accountability.

Four minimum operational standards can pursue:

When to act. Agencies should open an investigation when complaints jump right after a new system rollout, when time edits or auto-deductions show up unusually often, when workers can’t get a plain-English explanation or timely correction, or when it looks like third-party/non-work data is affecting pay, hours, or eligibility. To do this consistently, agencies should use a simple, standardized intake and escalation process that logs the employer, the vendor/system (when known), and the issue type and flags patterns that should be reviewed by a designated triage team.

Recommendation 3. Guarantee worker-facing transparency and contestability: a right to know, a right to an explanation, and a right to correct.

Use agency guidance/rules and procurement to require notice, explanations, and fast corrections where agencies already have authority; use legislation to create new worker rights (access, deadlines, anti-retaliation) where needed; and use enforcement to hold employers and vendors accountable when notices or records are missing, false, or misleading.

Enforcement alone often leaves workers waiting months for relief. States should therefore require worker-facing transparency for any automated system that sets pay or materially shapes earnings through time classification, scheduling, differentials, bonuses, or pay eligibility so workers can spot problems early, document patterns, and seek timely correction. Aggregated reporting can help identify systemic issues, but it does not replace a worker’s right to see and contest the records that determine their individual pay.

Privacy and data-broker rules (e.g., CCPA/CPRA-style disclosure and Delete Act-style broker mechanisms) provide useful templates for disclosure and access rights in the worker-pay context.

A worker rights package focused on this issue would include: 

Worker-facing transparency also strengthens enforcement: it creates documentation, reduces information asymmetry, and helps agencies identify employers and vendors that warrant priority investigation.

Conclusion

Fair and trustworthy workplace technology starts with something workers understand: a paycheck they can trust and a schedule they can plan around. The evidence is clear: algorithmic pay-setting is established in app-based work, and payroll/timekeeping failures show how software can produce systemic wage harm at scale. States can act now using existing labor, civil rights, consumer protection, and procurement authority—strengthened by a prohibition on surveillance wage-setting, enforcement-ready decision trails, and worker rights to notice, explanation, and correction, so “efficiency” doesn’t come at the expense of fairness, dignity, accessibility, or basic economic security.

Frequently Asked Questions
Does this trend require new legislation?

Not necessarily, but targeted legislation is often the cleanest way to close emerging gaps. Policymakers can approach AI-mediated pay and scheduling in three lanes:


1. Enforce existing laws now. A large share of the harms described in this memo can already be investigated and remedied under preexisting wage-and-hour enforcement, recordkeeping requirements, civil rights/equal pay law, consumer protection (UDAP), and procurement authority.


2. Use rulemaking and guidance to modernize existing authority. Even where statutes are strong, enforcement can fail if agencies cannot access the documentation that explains how software produced pay outcomes. States can often use rulemaking, guidance, and standardized audit protocols to clarify that payroll records and compliance obligations include automated decision records (audit logs), pay-rule/configuration history, and basic documentation of upstream data sources/integrations when software is the system of record.


3. Use new legislation as a targeted backstop. Where current law does not clearly reach upstream practices—especially the use of surveillance-derived or non-job-related personal data to set or modify compensation targeted legislation can establish bright-line prohibitions (e.g., banning surveillance wage-setting), extend coverage to contractor/platform arrangements where algorithms determine pay, and ensure vendor accountability, cooperation, and meaningful remedies. Examples include Colorado’s HB26-1210 or New York’s proposed prohibition on algorithmic wage-setting (S8872 and Assembly companion A09641), and bills that explicitly address surveillance-based wage setting or wage discrimination (e.g., Maryland HB0148; Minnesota HF4131).


It is important to note that policymakers should also expect to see broader bills that create baseline rights and duties for automated tools across a wider range of employment decisions (not only wages and scheduling, but also hiring, promotion, discipline, and termination). In that context, the guardrails in this memo, especially a prohibition on surveillance wage-setting, can be adopted as a compensation-focused module within a broader worker-tech protections package.

Two examples that may be useful to consider.

Colorado. Colorado’s HB26-1210, Prohibit Surveillance Price & Wage Setting, would prohibit individualized wage setting (and individualized pricing) when a “price or wage setting algorithm” uses surveillance data and the algorithm’s output is a substantial factor in determining the wage offered to a worker. The bill also takes an enforcement-ready approach: it treats violations as a deceptive trade practice under the Colorado Consumer Protection Act, authorizes the Attorney General to adopt rules, and requires entities using these systems to publish procedures that promote data accuracy and allow workers to request information about the data used to set wages and to correct or challenge that data.


New York. New York lawmakers are considering a direct prohibition on algorithmic wage-setting (S8872), including penalties and a private right of action. New York also has proposals in the broader worker-tech rights direction, such as measures focused on disclosure and inventories of automated employment decision-making tools in the public sector and related employment contexts. This illustrates a practical model: enforce now under existing wage, recordkeeping, and civil rights authority use rulemaking to make records and audits enforcement-ready and codify new guardrails where emerging tech creates gaps.

Will this slow innovation or burden employers?
The framework in this memo does not prohibit AI tools; it requires transparency, recordkeeping, and accountability—all standards already expected in other regulated contexts. In practice, these guardrails enable responsible innovation by preventing payroll and wage-setting systems from becoming error-prone black boxes that generate disputes, litigation, and backlash. Broken or opaque deployments undermine workers and public trust and make it harder for employers and vendors to deploy genuinely beneficial algorithmic-driven systems at scale.
Why focus on states instead of federal agencies?
With federal enforcement capacity constrained, states are the most viable actors to act quickly, pilot solutions, and set de facto national standards. States can serve as testing grounds for practical implementation, for example in helping to determine what records to retain, how audits work, what worker notices are effective; and then share what works across jurisdictions. While a patchwork of state rules will prompt pushback, a core goal of this memo is to promote harmonizable baselines (common definitions, recordkeeping standards, and audit protocols) that reduce compliance friction and encourage vendors and large employers to standardize upward rather than race to the bottom.
How does this help workers directly?
Workers gain clearer pay explanations, the ability to contest errors, and stronger enforcement when AI systems undercut wages or stability.
What are “automated decision records” (sometimes called “algorithmic logs”)?
They are audit trails or the digital records showing when and how software affected pay or scheduling—such as time edits, automated deductions, rule/configuration changes, eligibility flags, calculation outputs, timestamps, and what data source triggered the change.
What is an “integration map”?
A list (or diagram) of which systems feed data into HR/payroll and which fields can affect pay such as timekeeping, scheduling, attendance systems, productivity tools, GPS/location data, performance dashboards, or customer ratings.
What does “contestability” mean in practice?
A clear path for workers to see what changed, request correction, and get timely human review without retaliation, plus the ability for unions to incorporate these rights into collective bargaining agreements.

How State Leaders Can Put People First in AI Decision-Making

How State Leaders Can Put People First in AI Decision-Making is a framework to ask and answer the right foundational questions about artificial intelligence (AI) from the beginning. The public wants the government to take action to ensure the power of AI technology is used for good. In the current political climate, the work of state leaders is critical. The recommendations in this memo are focused on helping state leaders across the country ground decision-making about AI use in fairness, accountability, evidence-based inquiry, and inclusive governance so that AI can work for people.

Many state agencies have already deployed or are considering using AI in consequential decisions related to healthcare, housing, education, policing, finance, and other highly sensitive areas. While a few states have taken steps to implement decision-making mechanisms for certain AI systems, too many leaders are simply accepting narratives about AI’s purported public benefit at face value – jumping to the “how” of AI implementation before thoroughly vetting potential systems and deciding whether they are appropriate to use at all.

State officials may be eager, and even feel pressure, to tap into the potential benefits of AI in the hopes of better serving their constituents. But the personal, political, and operational risks of AI use should not be underestimated. People across the political spectrum are deeply concerned about the impact of AI on their lives and these concerns are well-founded. There have already been numerous examples where the failure to center people in AI decision-making and use has resulted in government systems that range from inefficient and wasteful to disruptive and downright dangerous, causing significant harm to, and backlash from, community members.

For AI’s potential benefits to be realized, state leaders need to implement consistent, inclusive people-first AI decision-making structures. Crucially, this process should ask the foundational question of whether to use AI in the first place. This policy memo provides timely guidance on:

Rather than offering a one-size-fits-all approach, this memo provides a suite of mechanisms for engaging thoughtful AI decision-making with examples of how different state governments have tackled emerging AI issues. We give recommendations for how state leaders can implement the AI decision-making process for whichever path they choose, including methods to promote accountability so that the decision-making process is followed and can truly work to put people first.

Challenge and Opportunity

The use of AI by state agencies is growing. By 2024, 59% of state and local government employees reported that their agency had already made an AI application available for use and a majority of public sector employees reported using AI applications either several times a week or daily.

Generative AI (GenAI) systems and agentic AI systems are now joining machine learning and automated decision-making systems (ADS) that have been in use for many years – with the lines between the types of systems blurring as AI products become increasingly integrated.

AI is also being applied in many high-stakes situations where mistakes or bias can have life-altering ramifications. AI systems now make decisions that can affect the lives of tens of millions of low-income people in the United States, from determination of SNAP benefits, to Medicaid enrollment, to Social Security disability payments. Sixty percent of people in the United States live in a jurisdiction that employs some sort of pretrial risk assessment tool that uses AI. According to one AI surveillance vendor, thousands of police departments in the United States are using face surveillance.

While many policymakers may be enticed by the promise of AI, people across the country and political spectrum have deep concerns. As of 2025, only 17% of the general public believes AI will positively impact the United States. Americans broadly oppose AI being used in high-stakes decision-making, like health insurance, loan applications, and job screening. A 2025 poll of U.S. voters found that 82% said they do not trust technology leaders to tackle regulation independently. A supermajority – 69% – of the U.S. public does not think the government is doing enough to regulate AI.

How does the public feel about AI?

More than 50% of people in the U.S, and 65% of low-income people, fear being left behind by AI. Only 4 in 10 people ages 18-34 in the U.S. say that they “trust” AI and only 23% of people in the U.S. over age 55 trust AI systems. As AI advances, public anxiety grows. Polling reveals that 77% of people in the U.S. want companies to “take AI creation slowly to get it right the first time.”

Public concerns with AI are well-founded. Former high-profile staffers at several AI companies have warned that companies are moving too fast and minimizing AI’s deficiencies, with new AI systems “generating more errors, not fewer.” While the technology industry is pushing the pedal on AI, the public would like to hit the brakes and for leaders to “do something before it goes too far.”

In the rush to adopt AI, some government officials have been making mistakes. The most impacted communities, including low income and communities of color, often end up excluded from public deliberation about government use of technology. There are already numerous examples of how these same communities bear the brunt when there is a lack of people-centered AI decision-making:

There are high costs for improper AI use – for the people whose lives are impacted, in the state dollars that are invested, and in how these actions can further undermine trust in government.

At their best, AI systems can help improve government functions. They have the potential to be used to triage community feedback, provide translation services that make government more accessible, facilitate emergency preparedness, or aid scientific research, among other uses. For example, Maryland’s Department of Labor is partnering with academic researchers to help test how AI can train staff and assist caseworkers with compliance regulations and other complex paperwork.

People want government leaders to take action to ensure AI technology is used for the public good. As the current administration has undermined safeguards at the federal level and issued executive orders attempting to stifle state action on AI, the continuing work of state leaders to safeguard rights and center people in AI decision-making has become even more critical.

A few states have already taken some steps to implement process mechanisms for AI decision-making and potential use. These include: Connecticut’s Act Concerning Artificial Intelligence, Automated Decision-Making and Personal Data Privacy and AI Responsible Use Framework; California’s State Guidelines for Evaluating Impacts of Generative AI on Vulnerable and Marginalized Communities; Maryland’s Responsible AI Policy;  New York State’s 2024 LOADinG Act; and Texas’ Responsible Artificial Intelligence Governance Act.

While these steps are an important start, more needs to be done given what is at stake with AI use and its potential impact on people’s rights, livelihoods, and personal safety. For the potential benefits of AI to actually be realized for community members, strong state leadership in this moment is needed to pierce through the hype. This memo lays out a plan of action for state leaders to implement consistent, inclusive people-first AI decision-making structures that do not skip over the foundational questions of why and whether to use AI in the first place.

Plan of Action

State leaders should establish a people-centered decision-making process that consistently and thoughtfully considers why and whether to use AI before jumping to use policies or other safeguards. This process should be followed whenever a state is considering the acquisition or use of an artificial intelligence system, whether through formal procurement, partnerships, in-kind donations, or other means. This decision-making process should be utilized when considering any AI system that has the potential to impact people’s rights, opportunity, well-being, safety, and security.

In the following section we provide:

The Four Key Steps for People-First AI Decision-Making

Step 1. Articulate a specific and inclusive “why” for AI use that centers the interests and voices of diverse community members to identify problems and needs.

State leaders should ensure that the first step in decision-making about any existing or potential use of an AI system is for an agency to articulate a specific and inclusive “why” that centers the interests and voices of a wide range of community members. Particular attention should be paid to historically marginalized communities. This community engagement should happen pre-procurement or use of any AI system.

Key considerations for centering diverse community members include, but are not limited to:

Inclusivity and representation: Use multiple strategies to support participation from diverse stakeholders, including funding and support for state agency outreach. Develop potential partnerships with trusted local organizations such as community groups, faith-based organizations, schools, and neighborhood associations who can help spread the word, organize meetings, and share information and surveys with diverse community members.

Accessibility: Make it possible for diverse community members to be actively engaged through a combination of in-person and remote engagement mechanisms. Also provide asynchronous paper and online surveys distributed in multiple languages in easy-to-understand formats. Information about any proposed AI systems should describe how a system would work and what it would do in ways that the general public can understand. Schedule any in-person meetings in places and times when diverse community members will be able to attend and provide necessary support for participation, like childcare and transportation. Remote meetings should also be scheduled at a time in the day when working people and people with families can attend.

Power sharing: Centering diverse voices means meaningful collaboration, not token consultation. Community members should have genuine influence on determining what are the most important issues facing them and how they should be addressed. You should listen to community members about any non-AI solutions that they would prefer and why.

Transparency and Accountability: Be clear about the engagement process and ensure it allows for serial feedback. Make sure materials are publicly published and easily accessible on a government website in a timely manner to allow public engagement with the process. Articulate how community input will be incorporated and have a mechanism to report back to the community on how their input influenced the ultimate decision.

California took important steps to promote effective community consultation when it issued the State Guidelines for Evaluating Impacts of Generative AI on Vulnerable and Marginalized Communities. Authored by the state Government Operations Agency, Office of Data and Innovation, and California Department of Technology, the guidelines recognize the need for a systematic approach that leads with meaningful engagement with diverse communities and how critical it is to specifically consider potential impacts on vulnerable and marginalized communities. Appendix B of California’s guidelines provides some additional helpful guidance on key principles, structures, activities, and focus questions for community consultation.

Step 2. Conduct an AI Impact Assessment that evaluates public benefits and risks, including how the AI system would use people’s information, its impact on rights, and risks of discrimination and bias.

Technology vendors often tout the benefits and downplay costs and risks. It is crucial that amidst the hype state leaders create the structures and processes to support evidence-based decisions about a potential system’s public benefit and risks and avoid AI “snake oil” that wastes state resources and does more harm than good.

State leaders should ensure that there is an AI Impact Assessment (AIIA) to evaluate and explain how the proposed AI system will work, the evidence for its effectiveness and potential public benefit, and its potential for harm (for implementation advice, see below section, “Mechanisms to Operationalize People-Centered AI Decision-Making”). The process should include a public comment period for engagement with the AIIA so people can bring up additional information and concerns. Leaders should also ensure that any company they potentially contract with provides them with the necessary information to conduct an AIIA. Don’t let vendor claims, including claims about potential trade secrets, prevent meaningful review of its products and services.

An AI Impact Assessment (AIIA) should include:

Step 3. Use a decision-making standard that is based on diverse community considerations and an evidence-based inquiry that the public benefit justifies the proposed use and substantially outweighs the potential harms.

Decisions about why and how to deploy AI should be driven by the real needs and interests of impacted communities. Using the AI Impact Assessment and the input and preferences of potentially impacted communities, the agency or department should apply a public benefit standard, assessing whether such a purpose for the AI has been demonstrated and whether the evidence-based benefits of the particular use of AI substantially outweigh the potential harms.

This decision-making standard should give strong weight to the opinions of those who will be impacted by the technology, especially historically marginalized communities. Steps to accomplish this include: 

Decisions should clearly articulate what quantitative and qualitative evidence was relied on for the decision. These considerations should be memorialized in a publicly accessible document.

Step 4. Conduct timely, ongoing evaluation of AI systems to determine whether they should continue to be used.

If a state entity moves forward with use of a particular AI system, state leaders should require timely review that centers impacted communities in the qualitative and quantitative evaluation of whether the system is achieving the intended public benefit. This review should also identify any harms arising from the AI use. If public benefits of the particular use of AI do not continue to substantially outweigh the harms, the AI use should end.

The review and evaluation processes should ensure:

Recommendation 1. Some uses of AI are simply too dangerous. Get ahead by taking them off the table.

Putting people first in AI also means proactively prohibiting uses of AI systems and applications that are simply incompatible with democratic, civil, and human rights. Numerous evaluations from government leaders, academics, technologists, civil rights organizations, and groups representing vulnerable and marginalized communities have found that the threats stemming from the below applications of AI significantly outweigh the benefits. Your AI decision-making process should preclude the following:

Many prudent city and state government officials have already preemptively taken some dangerous AI uses off the table. Maryland’s AI policy prohibits AI that violates fundamental rights, such as social scoring and emotional recognition. Montana’s AI law bans using AI for cognitive behavioral manipulation and sets hard limits on dragnet mass surveillance. And many cities have prohibited government use of face surveillance.

Table 1. Examples of dangerous artificial intelligence use cases that should be subjected to the decision-making process

Government Service or Benefits-related decisions, including access, eligibility, revocation and use
Education-related decisions, including access to educational resources and programs, admissions decisions, student progress or outcomes, recommending disciplinary interventions; determining eligibility for student aid or education; or facilitating surveillance (whether online or in person)
Housing-related decisions, including screening or monitoring people in the context of public housing; providing valuations for homes; underwriting mortgages; or determining access to or terms of home insurance
Employment- related decisions, including terms and conditions of pre-employment and employment screening, reasonable accommodation, pay or promotion, performance management, hiring or termination, recommending disciplinary action; performing time-on-task tracking; or conducting workplace surveillance or automated personnel management
Healthcare-related decisions, including medical diagnoses, determining medical treatments; providing medical or insurance health-risk assessments; determining access to medication or interventions or benefits
Financial-related decisions, including allocating loans; credit scoring; financial audits; insurance determinations and risk assessments; determining interest rates; or determining financial penalties such as garnishing wages or withholding tax returns
Language services, including translating between languages for official communication to an individual or for an interaction that directly informs an agency decision or action
Personal Information and Protected Categories, including collecting, retaining, or using personal information, children’s information, and information pertaining to a protected classification, such as race, sex, gender, ethnicity, religion, immigration status, and national origin

Recommendation 2. Mechanisms to Operationalize People-Centered AI Decision-Making

How to best implement the AI decision-making framework depends on the particular needs, opportunities, and structure of each state government. States that have taken steps to create a consistent process for AI evaluation and adoption have done so through different legal and legislative mechanisms. Which option to pursue – executive action, legislation, agency guidelines, or a combination of the three – is a decision that should be made by those most familiar with the contours of their particular state.

Executive Action – A Governor can issue an executive order requiring all executive agencies to follow a people-centered AI decision-making process. This executive order can identify an agency, or a subset of existing agencies, to develop the process itself and coordinate among different department leaders and staff to provide expertise and oversight that ensures compliance. If relying on an existing agency or state department, state leaders may find that an agency or department already focused on technology, information services, operations, or administrative service might be most well-suited to this role. Or an executive order can create a new entity to provide support.

Legislation – State lawmakers can enact legislation to require state entities to create and follow an AI decision-making process, either through direct statutory language or by tasking a state agency to develop policy and implementation guidelines.

Recommendation 3. Provide Support Structures for State Agencies

State leaders should ensure that there are structures to support state agencies to operationalize the people-centered decision-making process, including conducting diverse community outreach, evidence-based AI Impact Assessment, and quantitative and qualitative evaluation.

This support can come from a variety of sources. State leaders should provide funding for existing staff or agencies to serve as point people, creating a diverse AI board, partnering with academic institutions to provide expertise, or a combination of these strategies.

Recommendation 4. Ensure the Process is Followed Through with Transparency, Accountability, and Oversight

It is also essential for state leaders to make sure the decision-making process does not just work on paper, but truly translates into people-centered transparency, accountability, and oversight of AI systems.

Any legislation, executive order, or agency guidelines should provide for public and private enforcement mechanisms so people can take action if rules are not followed. State leaders should also require a public inventory, updated at least annually, of all AI systems so the public knows what is in use. As discussed earlier, all assessment materials need to be publicly published in a timely manner during the process.

After the decision-making process is completed, state leaders should ensure that any agency that moves forward with an AI system is required to establish a robust use policy that will help protect people from abuse, misuse, and mistakes, with ongoing evaluation of the benefits and harms of the AI system. Developing a robust use policy is outside the scope of this memo, but please see the FAQ section for some resources.

Conclusion

State leaders can make AI work for people.

The future of government use of AI is still being written, and state governments have a powerful role to play. What we do now will help determine whether the power of AI will work for or against people’s rights and dignity.

If AI is to serve rights, justice, and democracy, leaders at the state level must act to implement a people-first process that centers diverse community members and asks and answers foundational questions about “why” and “whether” to use AI before skipping to the “how” of AI implementation. The recommendations in this memo help state leaders meet this moment and ground decision-making about AI use in fairness, accountability, evidence-based inquiry, and inclusive governance.

The views and opinions expressed herein are solely those of the author and do not necessarily reflect the views, positions, or policies of any organization, employer, board, institution, client, or other entity with which the author is affiliated.

Frequently Asked Questions
What are state-level examples of executive orders, laws, and policies for AI decision-making?

  • Connecticut’s 2023 Act Concerning Artificial Intelligence, Automated Decision-Making and Personal Data Privacyrequired each state agency to inventory all uses of AI systems and mandated a process for evaluation. The state developed an AI Responsible Use Framework that requires each agency to conduct an AI impact assessment before implementing an AI system. It also created an Advisory Board that evaluates agency adoption of AI systems.

  • California issued State Guidelines for Evaluating Impacts of Generative AI on Vulnerable and Marginalized Communities in December 2024 and directs state agencies to use these guidelines early in the AI consideration process, when assessing readiness and prior to initiating a procurement process. The guidelines provide an equity evaluation checklist where state agencies identify the communities potentially impacted by the AI system, conduct community outreach, and identify the potential forms of bias, mechanisms of oversight, and a process for transparency. These guidelines currently only apply to Generative AI systems, not all AI systems, and many of the provisions are recommendations, not requirements. On March 30, 2026, California Governor Newsom issued Executive Order N-5-26 that provides stipulations for AI procurement and contracting to prevent discrimination and harm to civil rights, among other issues.

  • Maryland issued a Responsible AI Policy in 2025 that creates a governance framework for all AI systems, which includes an intake process, impact assessment, and other processes. It also prohibits real time biometric surveillance, social scoring, emotion analysis, fully automated decision-making procedures, and behavioral manipulation.

  • New York State’s 2024 LOADinG act requires that all existing AI systems be disclosed and prohibits the future or ongoing use of any AI system that has not been evaluated using an impact assessment and found to be safe and free from discrimination.

  • Colorado’s Consumer Protections for Artificial Intelligence took effect on February 1, 2026, and requires both developers and deployers of artificial intelligence to disclose and preempt potentially dangerous use of the system in question through variety of stipulations, including the completion of an impact assessments.

  • The Texas Responsible Artificial Intelligence Governance Act limits dangerous AI practices like social scoring, behavioral manipulation, discrimination, and biometric identification.
Why should a consistent AI decision-making process be used instead of just focusing on “high risk” systems?

There have already been marked gaps in how “high risk” is interpreted. California enacted a law mandating annual inventory reports on all high-risk automated decision systems in use by the state. The report that the California Department of Technology issued identified no high-risk systems in use, despite publicly available examples of potentially worrisome ADS systems employed by different California agencies.

Empowering Communities through Community Benefit Agreements in AI-Fueled Data Center Development

The United States is experiencing an unprecedented surge in data center construction driven by AI infrastructure demand. Over 5,000 facilities are operating today, with investments of $400 billion in 2025 and an estimated $1.8 trillion in between 2024 and 2030. This capital is arriving faster than environmental review processes, utility planning cycles, and community engagement frameworks were designed to accommodate. The consequences for communities are serious and well-documented: rising electricity bills, massive water consumption, e-waste, noise and light pollution, and billions in tax subsidies to some of the world’s most profitable corporations — often without meaningful public disclosure. These harms do not fall evenly, with communities of color and low-income neighborhoods already carrying disproportionate burdens.

Community Benefit Agreements (CBAs) are a legally binding, enforceable tool that allows communities to secure real commitments from data center developers before development proceeds. When properly structured — with specific numeric targets, secured financial obligations, independent monitoring, and meaningful enforcement — CBAs transform data center deals into durable community partnerships. Drawing on practitioner expertise from dozens of negotiations across sectors, emerging AI data center agreements, and new research on community harm and regulatory gaps, this memo makes the case for CBAs and provides a practical policy playbook for using them effectively, including potential provisions and considerations like enforceable harm mitigations, meaningful community investment, and lasting accountability mechanisms, to surface broad community needs while remaining adaptable to local contexts. 

Challenge and Opportunity

Harms to Communities from Rapid Expansion of AI Infrastructure 

U.S. data centers consumed 183 TWh of electricity in 2024 – more than 4% of total national consumption and roughly equivalent to the annual electricity demand of Pakistan, with it only projected to grow larger – roughly 17% more by 2030. A typical AI-focused hyperscaler consumes as much electricity as 100,000 households; the largest under construction are expected to use 20 times as much. The scale is such that AI data center demand in Virginia alone contributed to an 833% increase in regional capacity market auction prices – what electricity utilities and grid operators pay to ensure there will be enough power generation available during peak demand periods – for 2025–2026. These pressures do not just translate directly into costs for ordinary ratepayers but because these are structural costs baked into the grid, they also make it harder for communities to see, contest, or hold anyone accountable for the surge. Electricity prices in some data center-heavy regions have surged over 250% in five years, with estimates predicting data center electricity demand could double–or even triple–by 2028. 

The scale of harm to nearby communities extends beyond electricity prices: increased water usage, e-waste, air and noise pollution, and adverse health effects. A single large data center can use up to 5 million gallons of water a day (with about a quarter of the usage from direct cooling), equivalent to a city of 50,000 people. Additionally, hardware disposal is projected to generate 1.2–5 million metric tons of e-waste from generative AI alone between 2020 and 2030. Diesel backup generators – utilized at almost every facility – emit particulate matter classified by the EPA as a likely human carcinogen. Diesel generators emit harmful nitrogen oxides 200–600 times more than natural gas plants per unit of electricity produced. Researchers estimate that data center backup generators in Virginia, operating at just 10% of permitted levels, could already cause 14,000 asthma symptom cases and 13-19 deaths annually, with public health costs of $220–$300 million per year spreading across multiple states – and communities of color, low income communities and rural communities paying the bulk of that price.

But perhaps the most underappreciated community harm from the data center boom is fiscal: the extraordinary scale of tax subsidies that state and local governments have extended to some of the world’s most profitable companies, frequently without meaningful public disclosure or community input. Good Jobs First, which tracks corporate subsidies nationally, found that in 10 of the 20 states disclosing data center subsidy costs, programs cost over $100 million per year. Further, the opacity of these arrangements is striking: of 36 states with data center subsidy programs, only 11 publicly disclose which companies receive benefits. Virginia, the world’s largest data center market, for example, forgoes nearly $1 billion annually in state and local revenue without telling the public which companies receive the money or how much each receives. Not to mention, data centers, once fully built and operational, employ on average only 157 permanent workers – an extraordinarily low jobs return on billions in public subsidy – averaged $1.4 million to $2.1 million in subsidies per permanent job. Additionally, companies frequently hide behind non-disclosure agreements (NDAs) avoiding public input and scrutiny, especially on critical details about energy use, water consumption, and sometimes even the identity of the data center operator.

Centering Community Needs in AI Infrastructure Development 

As data centers have proliferated and these harms are starting to be documented, so has grown the backlash against new developments. Data Center Watch, which tracks grassroots opposition to large-scale projects across 28 U.S. states, found that between May 2024 and March 2025, $64 billion worth of data center projects were blocked or delayed by local opposition. In Q2 2025 alone, more project disruptions occurred than in the previous two years combined. Opposition is bipartisan and geographically broad. Recent nationwide polling found that a whopping 70% of Americans oppose a data center construction nearby, with nearly half “strongly” opposed  – a far lower acceptance rate than for gas plants, wind farms, or nuclear facilities.

This issue is an urgent priority now because while public concern over rising energy rates, water usage, and unchecked development is growing, no comprehensive mechanism currently exists to align the interests of communities, developers, and local governments. 

As AI companies promise us the large-scale and incredible societal benefits to come from AI, they can show they are serious by starting with making sure the data centers they are building to power the AI future benefits the communities they’re in.

Why Community Benefit Agreements?

CBAs are legally binding agreements, negotiated between developers and community stakeholders, that secure enforceable commitments before development proceeds. Adapted from their successful use in bank merger oversight (under the Community Reinvestment Act) and clean energy project approvals, CBAs can:

In the absence of broader legislative and regulatory protections, CBAs offer a promising, underutilized and legally binding tool to ensure adequate harm mitigation and potential for communities to share in the opportunities, and not just the costs, of AI infrastructure; with the additional benefit of being able to be tailored specifically to a community’s needs

For instance, in late 2025, the city of Lancaster negotiated a legally binding CBA with the developers of the Lancaster AI Hub before construction was finalized, securing $20 million in community contributions. Key wins include a hard cap of 20,000 gallons per day of municipal water use per campus, a 100% clean energy requirement backed by tiered financial penalties of up to $10 million per building, strict noise limits tied to pre-construction ambient levels, and full public records transparency. 

The agreement also commits developers to a local hiring plan, free first-responder training, and ongoing community engagement — demonstrating that municipalities can extract meaningful, enforceable protections from data center developers when they engage before key approvals are locked in. Of note, the city is the negotiator of the CBA in this case, but the same negotiations and provisions can be won in a legally binding CBA through communities themselves as well – working with community leaders, community-based organizations, and local policymakers with enforcement mechanisms woven in for effectiveness. 

Importantly, CBAs do not require communities to support a project. They are negotiated exchanges. If a developer will not make commitments adequate to the community’s concerns, opposition — including calls for moratoriums — remains a legitimate and more appropriate response. The credibility of that alternative is precisely what gives CBA negotiations their teeth.

Especially while policymaking, legislation and other broader reforms can take time; in their absence, CBAs can be a particularly useful interim governance mechanism to meet the urgency of this moment.

Why now?

Hyperscalers are urgently racing to secure sites, power contracts, and permits to meet AI demand. Given that the time to power is crucial for the data center companies, it gives communities and municipalities genuine leverage right now, alongside the need, urgency, and tools/resources to be able to engage. Data center developments face political opposition that is delaying billions of dollars in projects. They need community support, or at minimum community acquiescence, to move through permitting processes that would require public hearings, board votes, and environmental reviews .

With the scale of projected and current investments in the billions of dollars, and their effects in communities already being felt with more to come, and especially as broader reforms that are slower to move are not yet in place, CBAs are not just a useful interim governance policy tool that can fill this currently urgent need, but now is also the time of maximum policy leverage.

Plan of Action

States should not rely on voluntary developer promises. They should create a statutory and regulatory framework that makes robust CBAs a condition for approval or subsidy in high-impact data center projects.

We recommend CBAs be utilized as a potential policy tool for facilitation and solutions-building to meet community, developers’, and local governments’ tripartite objectives, under defined conditions. Local policymakers should treat CBAs as a lever that enables communities to provide direct input, occupy an established space to negotiate impacts and mitigations, and secure reinvestment in ways that benefit the community. 

Local governments can require CBAs (working alongside community-based organizations and other community leaders) if developers apply for permits, zoning, or other approvals to build out data centers – such that planning departments, zoning boards, or city councils can condition approval on compliance and can then impose penalties, delay permits, or revoke approvals if terms aren’t met.

The following recommendations highlight specific ways and provisions that policymakers at the local governmental level (like the City of Lancaster for the Lancaster data center CBA) and community-based organizations advocating and negotiating on behalf of communities can utilize in their efforts to protect communities from harm and establish some fairness, transparency and accountability in the data center development process. As others like the Brookings Institute and National Association for the Advancement of Colored People (NAACP) have substantially outlined and advocated for, they represent emerging best practices at this juncture. Key provisions alongside their criticality are also summarized in Summary Table 1 at the end of this proposal.

Recommendation 1. Policymakers (and CBOs and community leaders negotiating on behalf of communities) should utilize specific provisions to address harms and provide mitigations, to increase transparency, and to steward ongoing governance and accountability.

Harm Remediation

Transparency, Governance & Accountability

Recommendation 2. Policymakers and CBOs negotiating on behalf of communities should require investment in communities as a baseline condition for any equitable agreement.

Beneath the gold rush of data centers and AI lies real places, real people, and real resources being quietly consumed in service of extraordinary profits. The companies cashing in are among the wealthiest in history — and that wealth is being built, quite literally, on local communal foundations: their land, their water, their power grids, their roads, their first responders, and their environment. The economic rewards generated need to reflect that. Communities supplying these resources and shouldering associated burdens cannot be sidelined as the immense profits generated flow elsewhere.

Aside from harm remediation, CBA, in its associated prep and processes, can serve as a platform to uncover, understand, and platform broad community needs. There should be specific provisions that specifically seek to address these needs, to ultimately move towards a more balanced and equitable distribution of the costs and benefits associated with AI development in the community, given the wide ramifications of data center developments in host communities.

Recommendation 3. Policymakers (and/or community negotiators) should proactively identify and put the supporting mechanisms in place for meaningful representation, negotiation, enforcement, and accountability.

The most common CBA failures are not in the provisions communities demand – they are in process and enforcement structure. When poorly structured, or negotiated after key approvals are in hand, they can give the appearance of community benefit while delivering very little.

There are certain necessary conditions, dependencies, and actionable sub-recommendations for CBAs to be effective such as investing in and strengthening community-level organizing and coalition-building, providing training and workshops on provisions and negotiations, and critically, providing thoughtful representation to prevent takeover, and building robust enforcement mechanisms for delivery of benefits in practice. Looking back at the legal history and utilization of CBAs in the bank merger approval process and CEQA “Opt-In” process in CA that requires a CBA, we have gleaned some important lessons about levers, enforceability, and accountability, as well as recommendations on the negotiation and power-building process, listed below. 

Conclusion

The extraordinary wealth generated by the AI data center boom is being built on community land, water, electricity, and environmental capacity. Yet, the communities bearing these burdens are seeing little of the benefit. The hyperscalers behind this buildout are among the most valuable companies in human history, and the AI services running on this infrastructure will generate billions in revenue. None of this wealth is created in a vacuum: it is created in specific places, using specific community resources, and the communities providing those resources deserve a meaningful share of the value they help create.
The current pattern in which vulnerable communities absorb the largest burdens, profitable companies receive the largest subsidies, and benefits flow primarily to shareholders, is neither inevitable nor acceptable. It reflects choices being made right now, as the buildout accelerates and the patterns of harm and benefit are being set. CBAs are a tool to make different choices: to insist that the communities hosting AI infrastructure share genuinely in its benefits, and that the costs of that infrastructure – to air quality, water systems, grid reliability, and community character – are borne by those who profit from it, not by those who simply happen to live nearby. The time to act is now.

Summary Table 1. Key Provisions of Data Center CBAs

Provision areaKey community protections & commitmentsPriority
Environmental protectionsBinding diesel generator emission limits beyond permit minimums; noise limits tied to pre-construction ambient (day and night); independent real-time air and water monitoring with public data; cumulative impact analysis for clustered facilities; proximity assessment for environmental justice communitiesCritical
Clean energy100% clean sourcing commitment; tiered financial penalties backed by Letter of Credit; third-party REC verification; prohibition on ratepayer cost pass-through for grid upgrades; annual public consumption reporting; energy ratcheting milestones where full compliance is not immediateCritical
Water usageSpecific daily cap on municipal water use; closed-loop cooling requirement; wastewater capacity compliance; quarterly public reporting on consumption; renegotiation trigger if facility scope expands materiallyHigh
Fiscal contributions & transparencyDollar-specific community investment fund with milestone-triggered payments; secured by Letter of Credit or corporate guarantee; full public disclosure of all tax incentives and PILOTs; no NDAs on public finance data; fund governance (committee composition, voting rules, permitted uses) specified in the agreementCritical
Workforce developmentLocal hire percentage targets for construction and operations; prevailing wage standards; apprenticeship and training pathways; targeted outreach to underserved zip codes; explicit FLSA anti-misclassification clauseHigh
Governance & enforcementCommunity Advisory Board with independent monitoring authority and seats for community residents; escalating financial penalties; grievance mechanism with binding arbitration; right to seek injunctive relief; annual public reporting to governing body; decommissioning plan, bonding requirements, and remediation escrow; regular equity impact assessmentsCritical
Priority ratings reflect the degree to which a provision is foundational to meaningful community protection. All provisions should be adapted to local context and available negotiating leverage.
Frequently Asked Questions
What are the limitations of CBAs? When are they potentially not the ideal tool?
CBAs are a powerful tool but are not a substitute for strong state and federal environmental permitting, transparent subsidy disclosure laws, or robust utility regulation protecting ratepayers. Their enforceability depends on clear terms, specific metrics, secured financial obligations, and parties with the legal standing and resources to enforce them. When permits are already in place, transparency has been denied, or a developer-backed document is being presented as a community agreement, opposition or a moratorium may be more appropriate than a CBA negotiation. However, especially as broader reforms can take time, CBAs are useful as an interim governance mechanism.
Are CBAs legally enforceable?
Yes. CBAs are legally binding contracts enforceable in court. Provisions backed by Letters of Credit can be enforced by drawing on the letter without costly litigation. Injunctive relief and specific performance are also available remedies in most jurisdictions.
Do CBAs require communities to support the project?
No. CBAs are negotiated exchanges. The community provides a path through the permitting process; the developer provides binding commitments. If commitments are inadequate, communities retain the right to oppose the project. The credibility of that option is what gives negotiations their leverage.
What if the developer won’t negotiate?
Community leverage mechanisms include direct lobbying of elected officials, media engagement, social media amplification, community organizing and protests, and formal procedural interventions such as CEQA comment periods. Coalitions should be prepared to escalate. In some cases, formal opposition or a moratorium is the appropriate response.
How are CBA funds governed?
Fund governance must be specified in the CBA itself — committee composition, voting rules, permitted uses, and annual reporting requirements. Ambiguous governance renders financial commitments meaningless in practice. The Lancaster CBA’s joint committee model is one approach; stronger versions include community representatives with independent authority and the ability to commission audits.
How does a CBA interact with tax abatement or PILOT agreements?
CBAs and payment-in-lieu-of-tax agreements must be negotiated together, with a clear understanding of total community obligations, ensuring community investment funds supplement rather than substitute for expected tax revenue. Communities should resist any framing in which CBA contributions are treated as the price for subsidies.
What are some successful examples of CBAs being used effectively?

Lancaster, PA, 2025



  • The City of Lancaster negotiated a legally binding CBA with the developers of the Lancaster AI Hub before construction was finalized, securing $20 million in community contributions. Key wins include a hard cap of 20,000 gallons per day of municipal water use per campus, a 100% clean energy requirement backed by tiered financial penalties of up to $10 million per building, strict noise limits tied to pre-construction ambient levels, and full public records transparency. The agreement also commits developers to a local hiring plan, free first-responder training, and ongoing community engagement — demonstrating that municipalities can extract meaningful, enforceable protections from data center developers when they engage before key approvals are locked in.


Nashville MLS Soccer, Nashville, TN, 2018



  • A coalition called Stand Up Nashville successfully advocated for this CBA in connection with a soccer stadium development project. The CBA includes, among other things, commitments on jobs that pay a living wage, hiring priorities, affordable housing, and a childcare center. As part of this CBA, Stand Up Nashville’s committed to support rezoning legislation for the stadium, which was widely opposed before the CBA. Nashville’s Mayor eventually supported the stadium project in large part due to the CBA.


Facebook Campus Expansion CBA, Menlo Park, CA, 2016



  • This CBA, associated with an office expansion, is between Facebook and a coalition of community groups. In this agreement, Facebook made an almost $20 million commitment to affordable housing in the area, which led to an additional $60 million in other donor commitments.

What is the typical CBA process like?

From NAACP’s CBA Guide


In practice, this can mean: 1. The initial agreement pays for legal counsel and technical support, selected by and managed by the community coalition. 2. The next phase is either: (1) an agreement to establish binding requirements for transparency, impact studies, labor standards, and equity protections, which is contained in Article 3 of the template; OR (2) a due diligence phase, which requests information provided in Article 3. 3. An amendment is negotiated after the community has access to impact information on electric, environmental, housing, and infrastructure demands, which could be an amendment specifying the exact dollar amounts and project-specific mitigation measures. This approach allows communities to understand the scale and type of impacts before finalizing the financial structure of the Community Benefits Agreement, while maintaining leverage and ensuring that non-opposition is tied to a complete, enforceable package of commitments.


From PolicyLink CBA Toolkit:


Unless developers face significant public pressure and/or legal leverage that jeopardizes public


approval, developers are unlikely to compromise. A coalition may exert leverage to bring the developer to the table in a variety of ways: direct lobbying of elected officials and city staff, notifying any reporters covering the issue that the community has significant concerns, using social media to amplify the community’s voice and raise support, protests at the worksite or at City Hall, or artist-led community responses, like chalk art at the site or near City Hall. 


Stakeholders & Roles:


A community coalition can include stakeholders such as: Individual residents, Neighborhoods councils, Faith groups, Local non-profits, Local businesses, PTAs, Housing advocates, City administration staff and elected leaders can demonstrate inclusive leadership by (i) providing transparency around the project; (2) insisting on broad community support for project approval; (3) encouraging CBA negotiations, without trying to influence them. 2-4 coalition representatives should contact the elected officials (or city council staff) most involved in the proposed project and brief them on the coalition, its priorities, and any engagement it has had or plans to have with the developer. The coalition representatives should ask that the officials condition a vote in favor of the project upon the developer’s support for the coalition’s priorities. 


Elected officials can be an important ally in a CBA negotiation because they can persuade their colleagues on council to delay a vote on the project to allow more time for the coalition to negotiate with the developer. They can also apply pressure on the developer to reach an agreement with the Coalition. The coalition should assess whether it can count on commitments of support from a majority of the committee and/or council members. Particularly if a coalition new, support from key elected officials will help bring developers to the table. It may be necessary to take legal action against objectionable aspects of the development to inspire a willingness to negotiate.

How to Safely Bring AI into Law Enforcement:  The Case of AI-Generated Police Reports

Commercial artificial intelligence tools have recently emerged that are able to produce police reports. Some police departments have already adopted this technology. Also, some individual officers are using publicly-available AI tools. If AI could greatly reduce the time spent producing police reports, this could either substantially reduce the cost of policing, or free up police officers for other work. However, if the resulting reports are inaccurate, incomplete or biased, or if the process leaks confidential information, this could undermine the criminal justice system and harm citizens, perhaps causing an innocent person to be charged with a crime while the actual criminal is overlooked. At this time, both the benefits and the risks are poorly understood.  

Yet, despite the uncertainty, each of the more than 18 thousand law enforcement agencies in the U.S. must make its own decision about the use of AI. These agencies do not have the expertise or resources to assess whether any of the AI-based products on the market are right for them, and if so, what training, departmental policies and deployment strategies are needed to use the technology both safely and effectively.

This memo proposes fostering innovation in AI for policing without sacrificing safety through a combination of centralized actions by the U.S. Department of Justice and independent actions by state and local law enforcement agencies. The Department of Justice, through its National Institute of Justice, should establish a new research and evaluation program that will give state and local government agencies the information they need to make the best decisions about use of AI for police reports given their own needs and resources, and keep Congress and the Department of Justice abreast of AI use in policing nationwide as well. Each state and local agency should use this information to devise its own strategy, addressing issues such as whether to adopt AI, officer training, technology choice, budget, transparency, and other policies and procedures to use the technology where it is safe and effective.

While this memo focuses on use of AI for police reports, the recommended solution serves as a model for other AI use cases as well. Similar problems occur every time a large number of local government agencies are contemplating the use of AI in scenarios where the pros and cons are poorly understood, and there is potential for significant harm.  

Challenge and Opportunity     

Why Police Departments are Considering AI for Police Reports

Police reports are a cornerstone of law enforcement. These reports serve as the official record and generally the only written record of significant interactions between police officers and individuals, including arrests, crimes reported, and car crashes observed. The contents of police reports can influence important decisions, such as whether an individual is charged with a crime. When police officers testify in court about an incident that occurred months or years earlier, they typically rely on the police reports that they wrote soon after the incident to get the details right. When insurance companies want to assess liability, their decisions often depend on police reports. When police officers are accused of misconduct, investigators study the relevant police reports. When compiling crime statistics on which policy decisions will be made, critical data comes from police reports. It is therefore important for police reports to be accurate, complete, and unbiased. 

Given the importance, it is no surprise that many police officers spend hours per day producing these reports. This comes at a cost. If the time spent on police reports could be reduced, then police departments could reduce the number of officers employed and thereby greatly reduce expenses, or reallocate officer time to other productive tasks, or some combination of the two. Many police departments in the U.S. are especially motivated now to free up their officers’ time, because there is a national shortage of qualified officers, and many departments have unfilled positions.

A number of companies have announced products that integrate AI into the writing of police reports. Some vendors such as Truleo and Axon have claimed that AI assistance can reduce the total time spent on police reports by 80% to 90%, which would yield tremendous cost savings if true.  In response to such promises, some police departments have already adopted this technology. Given financial and staffing pressures, more departments are likely to follow.  

But are the cost savings real? Are the reports produced when using AI reliable enough for their intended purpose? And what strategies for adoption will maximize both cost savings and report quality? Most police departments do not have the AI expertise on staff to answer those questions. Indeed, roughly three fourths of law enforcement agencies in the U.S. have fewer than 25 police officers, and thus very few IT professionals.

How AI Would Be Used

The general idea is that information about the incident is fed into an AI-based system which produces a draft report of what a particular police officer did and observed, which that officer must review. The details vary from one AI-based product to another. In some cases, police officers feed this information into the system by typing relevant facts on a computer. In others, officers participate in an interactive oral interview with the system. In the most ambitious system, the AI system is fed information about an incident by uploading recordings from a body-worn camera, with no direct involvement from the officer. These systems transcribe the audio and use the resulting text; some analyze video as well. In all of these cases, once the AI-based system produces an initial draft, the officer inspects the draft, makes any changes he or she wishes, and signs off on the result.  

The Risks of Using AI for Police Reports are Poorly Understood

AI-based products for police reports use generative AI, where an AI system is trained from a set of prior examples to understand which words and phrases are frequently used together. The system can then generate entirely new text for new circumstances by using the relationships observed in its training in combination with some new input data and some elements that are entirely random to avoid repetition and unnaturally formulaic text. Regardless of the domain, producing text using generative AI can be problematic. 

First, generative AI can randomly produce “hallucinations,” i.e. information that is roughly consistent with the training data but incorrect in the current circumstance.

Second, when an AI model is trained on biased data, it produces biased results. For example, if reckless driving citations in the training data are more likely to involve alcohol with young drivers than with old drivers, then hallucinations involving alcohol may be more likely with young drivers.  Companies are rarely transparent about their training data sources, but some sources from law enforcement could easily be biased with respect to factors such as race, age and gender. 

Third, some generative AI models leak information in unexpected and often unseen ways.  For example, if the system uses new inputs from users to improve (or “train”) the model, then a new input may later be revealed to other users. This happens with the widely-used generative AI services that are offered for free to the public, and some officers already use those free tools. Even if new inputs are not used in this way, those new inputs could be transferred to a provider of AI-based services with weak defenses. If a police department allows its officers to use a system with inadequate protections, this would risk citizens’ privacy and possibly compromise future court cases. It is technically possible to design systems with better protection against leakage, but police departments typically have no way to tell which services have done so effectively. Given all of these risks, it is no surprise that some localities have sought to prohibit use of AI for police reports.

Of the various methods of putting information into the system described above, using recordings from body-worn cameras could save the most officer time, but it also brings additional risks that must be assessed. For example, when an officer in Utah uploaded the recording of an incident that occurred while a movie was playing in the background, the AI reportedly produced a police report claiming that the officer transformed into a frog. An error like that does no harm because it is easy to detect, but a different movie might have produced a far more dangerous error. Also, audio transcription is less reliable when people speak with accents or with an African-American Vernacular. Using AI to accurately turn video into text can be even more challenging. Finally, with this approach there is no opportunity to record an officer’s subjective experience before the officer is influenced by AI-generated text, which some people have argued is important. Testing is required to understand the seriousness of these potential risks, and any mitigation strategies.

In 2025, I organized a research project at Carnegie Mellon University (CMU) to investigate use of generative AI for police reports. We produced police reports using three different kinds of generative AI technology, and observed that material inaccuracies do occur. For example, in one assault case, an input to the AI indicated that the victim was not transported to a medical facility without providing a reason, but the resulting report inaccurately claimed that the victim refused transport to a medical facility. We also observed that error rates varied from one AI product to another, as well as from one type of police report to another, perhaps because some types of reports are more complex than others. Thus, it matters which AI technology a police department chooses and under what circumstances it directs its officers to use that technology.

As long as AI is only used to produce the first draft of a report, problematic text does not compromise report quality if the police officer finds this text and rewrites it before submitting the final report. That may or may not be sufficient. As explained by MIT professor David Autor and Alphabet Senior Vice President James Manyika, AI systems that augment humans without replacing them can fail if the AI is not designed to collaborate with humans, such as when human pilots could not prevent an Air France flight from crashing after the autopilot failed because the tool gave the pilots limited situational awareness. It is even less obvious, but the converse is also true: problems can occur if humans are not explicitly trained to collaborate with AI.

The CMU researchers conducted experiments in which experienced police officers were asked to make corrections to prewritten police reports which contained hallucinations, omissions, and “event swaps” in which things occur in the wrong chronological order. We observed that officers missed many problems, including those that might matter in legal proceedings, such as when a report incorrectly indicated that a suspect was holding a knife when encountered. It is important to note that this occurred in a university research exercise rather than a professional setting, and that the officers had never been explicitly trained to edit AI-generated text, i.e. to collaborate with AI. Better results might be possible in real police departments that have adopted the right kind of training, but this requires more investigation.

Even an error that is not directly material to the case can do harm.  A memo from the King County Prosecuting Attorney’s Office reports that, thanks to AI, “an otherwise excellent report included a reference to an officer who was not even at the scene. … And when an officer on the stand alleges that their report is accurate — they will be proven wrong…we do not want your officers certifying false police reports. The consequences will be devastating for the case, the community and the officer.” Defense attorneys can bring up this error every time that officer testifies for many years to come. 

The Benefits of Using AI for Police Reports are Poorly Understood

On the positive side, many departments would save money if AI reduced the amount of time that each officer spends on police reports by just tens of minutes per week. This reduction could be within reach. One prominent survey found that 62% of officers spend more than two hours per day on police reports and 14% spend more than four, and there have been news articles quoting police officers who said that time savings from AI were substantial, although this is anecdotal. Yet the most rigorous study to date did not find any reduction in time spent when AI was introduced. This issue also deserves more investigation. Moreover, the impact of AI on time spent and police budgets will vary greatly between departments, so a single one-size fits-all conclusion is inadequate.  Savings depend on factors like the number of police incidents per week, the types of incidents that are most common, and how pervasive technology already is in the department.

The benefits and risks associated with AI also depend on the deployment strategy. For example, police departments may choose to use AI in cases where time savings are great and risks are low, or when time savings are insignificant and risks are high. Departments may choose to use AI in a transparent manner in which problems are easily observed and quickly corrected, or in an opaque manner. Research could provide guidance to police departments on whether and how to adopt this technology while minimizing risks.

Unfortunately, this research will rarely occur under current policies. Individual police departments are unlikely to invest their limited resources into testing commercial AI software products, developing new officer training programs, measuring whether AI saves time or money, or collecting best practices for adoption. If the federal government fails to act, some states or cities may fund useful work. However, even the state and local agencies with the largest budgets, such as the New York City Police Department and the California Highway Patrol, have little incentive to bear the full cost of making new discoveries and then informing the nation’s 18 thousand law enforcement agencies, most of which are small and have needs and resources that are quite different. There are university researchers doing this kind of work, but very few, and most police do not read academic journals.  Informed decisions will only happen if the federal government takes action.

Plan of Action

Most of the actual decisions about whether police should use AI technologies at all, which specific AI technologies to acquire, and how those AI technologies should be used will be made by local officials. The specific decision-maker varies from locality to locality.  For most of these decisions, police chiefs are critical. They can weigh in directly on issues such as officer training and department policies governing technology use, or can delegate that role. In some jurisdictions, police departments make independent decisions about procuring technology such as AI, whereas in others municipal Chief Information Officers may play a more decisive role. It should be the responsibility of the federal government to inform these decisions, regardless of which state or local official has the final say in any locality. Thus, this memo will make actionable recommendations to two audiences: the federal Department of Justice, and those who make decisions for state and local law enforcement agencies.

Recommendation 1.  The Department of Justice, through the National Institute of Justice (NIJ) and in consultation with the National Institute for Standards and Technology (NIST), should create ongoing projects whose goal is to provide information to state and local agencies that helps these agencies make better decisions regarding use of generative AI for police reports.

The introduction of AI for police reports raises technical and operational questions that individual law enforcement agencies are poorly positioned to answer on their own. Addressing these questions falls within the mission of the National Institute of Justice (NIJ), the Department of Justice’s research and evaluation arm. NIJ is well positioned to generate and disseminate this evidence at a national scale, reducing duplication across thousands of agencies and enabling more consistent, evidence-based adoption decisions.

The NIJ should draw on expertise from multiple institutions to address these important questions.  Universities should play a central role, because the best academic researchers are accustomed to inventing entirely new methods that address novel challenges and emerging technologies. NIJ should therefore establish a funding program to support external research.  Others already work for NIJ, where understanding of the problem domain is deep, so important work can also be done internally. Although they typically lack law enforcement expertise, there are also experienced AI researchers at NIST’s Center for AI Standards and Innovation, so consultation with that center could help. Below are some examples of research that is needed.

Research on Evaluation Methodology for AI Products and Services      

A new methodology must be created that can assess AI-based products and services for police reports, and quantitatively determine their ability to produce reports that are both accurate and complete under a wide variety of scenarios. This methodology should also assess the risk of leaking confidential information.

Research on how to train police to edit AI-generated reports

Even when reports are generated by AI, it is the responsibility of a police officer to ensure quality through editing. Simply having a human involved does not mean that the report will be anywhere near as accurate or complete as if a human wrote it. Detecting and correcting subtle mistakes in text that someone else wrote is challenging, and few police officers have experience with the task. Extensive training may prove critical.  For example, officers might first learn enough about how AI-based tools work to dispel any illusions that they are infallible. Then officers might learn the types of mistakes that AI tends to make, which are different from the types of mistakes that humans tend to make. Research is needed to develop training strategies, and determine their effectiveness.

Research on Benefits and Costs of AI

The primary motivation for adopting AI is to save time and money. Do AI tools really reduce the time spent on police reports, and if so, by how much?  What are the lifecycle costs, including software, storage, IT support, and officer training? How do expected cost savings depend on factors that vary by police department, such as number of officers, the types of police report that are most common in the department, and existing IT infrastructure? How do they depend on technology choices, such as whether officers feed the AI by typing in information, participating in an audio interview, or uploading recordings from a body-worn camera?

Research on how departments can perform quality control

Any organization that introduces a technology with unknown impact should have a way of measuring quality in context on an ongoing basis, and not just before deployment.  How does a police department know if the reports generated with AI assistance are good enough, or if its officers are well-trained? One possibility might be to routinely assess the completed reports, such as by comparing AI-generated reports with video footage in a monthly audit as the Boulder Police Department tried or with officer-written reports as the Oklahoma City Police Department tried. Doing this as efficiently and effectively as possible may require a new method. Another might be to artificially inject errors of the kind that AI is likely to produce, and monitor whether injected errors are corrected. (One existing product from Axon already injects errors. Effectiveness may be limited because the injected errors are unlike those that AI is likely to produce, but this requires testing.)  If a few officers consistently submit reports with injected errors or other problems, this may indicate that those officers need further training.  If many officers consistently do so, then this may indicate a more systemic problem.  

Other types of research and analysis are perennial and therefore should generally be led by staff within NIJ, although outside researchers could play a smaller role. Outside researchers tend to be less effective when success requires the trust of law enforcement agencies, or when being consistently accurate is more important than inventing something new. Examples include:

All results and recommendations from this program should be made available directly to all of the 18 thousand law enforcement agencies in the U.S.The program should disseminate results to organizations that train police officers, including future police chiefs.This includes the FBI National Academy and state organizations like the California Commission on Peace Officer Standards and Training.It should also disseminate results through national organizations that serve state and local decision-makers, such as the National Association of Chiefs of Police, the Association of Public-Safety Communications Officials International, the U.S. Council of Mayors, and the National Association of State Chief Information Officers.

The program should also provide annual summaries of use of AI for police reports in the U.S. to Congress, the Department of Justice, and the general public, so it is possible to track trends over time and detect potential concerns before they become problematic.

Recommendation 2. Any state or local law enforcement agency that is seriously considering adoption of AI for police reports should first produce a strategic plan using information provided by NIJ, knowledge of local needs and resources, and other available information.

Without an appropriate strategy in place, the use of AI for police reports is likely to produce reports that fail to meet the needs of the criminal justice system, potentially putting innocent people at risk, and wasting taxpayer money. An effective strategic plan can mitigate these risks. This plan should address the following.

Conclusion

In recent years, the capabilities of generative AI have advanced at an astonishing rate, leaving our understanding of how to make use of those capabilities far behind. This is particularly challenging for those who would like to use the potentially transformative capabilities of generative AI for producing police reports, and for other AI applications that share two qualities. First, there are dire consequences if use of the technology goes badly, such as the possibility that a flawed police report could lead authorities to charge the wrong person with a crime. Second, most of the decisions with significant impact are made by 18 thousand independent local government agencies with different needs and limited resources and AI expertise. It is hard to imagine how all of these agencies could make informed decisions regarding use of an emerging technology that is still poorly understood by tech-savvy institutions. 

Some agencies will avoid the risk by never even considering AI for a purpose like this.  However, they forgo any possibility of reaping potential benefits, such as a significant reduction in costs, or a reallocation of police time from paperwork to other productive activities. Other agencies will adopt AI, but in a way that does more harm than good, perhaps because they chose the wrong product or because they used it poorly. This paper proposes a two-pronged strategy that will give state and local decision-makers both the information they need to make good decisions, and the confidence that their decisions are right for their respective agencies.

The U.S. Department of Justice, through its National Institute of Justice, should establish a set of programs that all have the goal of providing actionable information to law enforcement agencies about use of AI for police reports. This includes the pros and cons of adopting the technology and how both vary from agency to agency, the strengths and weaknesses of AI products on the market, how to train officers in use of AI for police reports, how to perform continual quality control, and other best practices.   

Each state or local law enforcement agency that is considering AI for police reports should produce a strategic plan that makes use of information provided by NIJ. Topics in the strategic plan would likely include the types of AI that should and should not be used, a phased approach to adoption, a transparency strategy that makes it easier to identify issues before they become highly problematic, and other policies and procedures.

My thanks to my CMU colleagues who worked on a 2025 research project on AI and police reports: Dr. Aleecia McDonald, Dylan Bonanno, Kai Collins, Ayana Curto, Katie Eisenman, Madeline Falk, Jane Fleischman, Harrison Green, En Hung, Wendy Jiang, Lily Klucinec, Isabella Krisky, Skylar Lukic, Tzen-Chuen Ng, Nicholas Ortiz, Miguel Rivera-Lanas, Christopher Rodas Ochoa, Keya Sharma, Autumn Swartz, Morgan van der Linde, Maximilian Vieweg, Sophie Vincens, Kemp Winkler, Avi Wong.

Frequently Asked Questions
Are AI-based tools capable of producing police reports already available to police, and what do we know about them?

Yes. General-purpose generative AI tools have been available to the public for several years, including OpenAI’s ChatGPT, Google’s Gemini, Anthropic’s Claude, and Microsoft’s CoPilot. Police departments did not officially embrace these tools, but individual officers have. For example, it was discovered that an ICE agent used ChatGPT to produce reports, which led the judge to respond that this “may explain the inaccuracy of these reports.” This is inevitable unless police departments adopt policies that prohibit use of these tools and actively inform officers about those policies.


Since then, companies have built tools intended for law enforcement by adopting a general-purpose AI-based tool, and adding features specific to police reports, such as additional training data and police-friendly interfaces. Relevant companies include Axon, Caseify, Central Square, Code Four, Police1, PoliceNarratives.ai, Policereports.ai and Truleo.


Building on general-purpose models gives companies the opportunity to outperform general-purpose models, perhaps by improving accuracy or reducing risk of information leakage. However, since the technical details underlying commercial products developed for law enforcement are typically opaque and proprietary, many potential buyers cannot know whether improvements are present. Evaluation by a trusted organization could address this problem, by testing the product directly and demanding technical details about product design.

What are the potential advantages of using AI for police reports?
The most heavily touted potential advantage is that AI assistance could reduce the total time spent producing police reports. A number of departments that have adopted AI report that they have saved time, but they typically offer little hard data, and this is contradicted by the most rigorous study to date. Further research is required. If officers do spend significantly less time writing reports, then fewer officers may be needed, which would reduce costs, and reduce the need to hire in the midst of a nationwide shortage of qualified police officers. Alternatively, departments could maintain current staffing levels and direct officers to use the time saved for other important activities. It is also conceivable that AI could yield some improvements in report quality. For example, an AI tool may detect that a critical element is missing from a police report, and prompt a police officer to correct the deficiency. Research is needed to determine if quality improves in practice.
What are the potential risks of using AI for police reports?

The greatest risk is that AI tools will produce police reports with flaws that are not corrected in editing. Generative AI is inherently vulnerable to hallucinations that produce inaccurate information. AI tools can also omit critical facts, or put events in the wrong chronological order. AI can produce biased text, i.e. text may depend on characteristics of individuals in the report such as race, gender or age when those characteristics should be irrelevant. When an AI system is trained from biased data, the system is likely to perpetuate those biases.


Inaccuracies, omissions, event swaps and biased text can all be material in important decisions. Seemingly minor inaccuracies or omissions have serious consequences, such as making an innocent bystander look deceptively guilty, or making it appear that police did not comply with applicable laws when they did. Inaccuracies can undermine legal proceedings. Even errors that are not material to the case can become problematic if a police officer later testifies that the police report is entirely correct, as this could put the officer’s entire testimony and reputation in doubt. Research is needed to understand these risks.

Most law enforcement agencies are a part of municipal governments. Why is a program needed within the federal government?
While most decision-makers on these issues are in municipal government, those decision-makers do not currently have access to the information they need to make the right decisions for their agencies. State and local law enforcement agencies do not have the resources for tasks like assessing the latest AI-based products, determining whether AI would increase or decrease the time their officers spend on police reports, or developing new training programs for use of AI. Moreover, it makes no sense for the 18 thousand local law enforcement agencies in the U.S. to each duplicate this same basic work. There should be one trusted organization with the expertise and the resources to create and disseminate the information needed by decision-makers in state and local government, and that responsibility should fall to a federal agency such as the Department of Justice.
Could these recommendations apply for other uses of AI?

Yes, these recommendations are intended both for use of AI to produce police reports, and as a model for advancing safe, impactful and innovative adoption of AI and other technologies in similar cases. The goal is to adopt AI where (and only where) it brings improvements. The issues are similar whenever the following characteristics are present.


First, the technology being considered offers significant potential for benefits and significant risks for harm, so that “move fast and break things” is not the best approach. Adoption can be accelerated by addressing the concerns of potential adopters and building confidence.


Second, much is not known about how to use the technology safely, perhaps because the technology is as new as generative AI. Thus, someone should produce and disseminate information that will enable good informed decisions.


Third, local government agencies are the primary decision-makers. Unlike federal agencies and large companies, local governments have limited resources to investigate new technologies. Most for-profit companies that would advise them simply want to make a sale.


When these three characteristics are present, the federal government can provide critical information to decision-makers. Also, local governments can benefit from phased deployments with assessments after every phase, and transparency provisions.

Are the Trump Administration’s Executive Orders on AI relevant to police reports, and are these recommendations consistent with those Orders?

The Trump Administration’s executive orders do not address AI for police reports specifically, but they seek ways to advance AI innovation and adoption using a strategy that is consistent with the recommendations in this memo.


President Trump issued an executive order calling for an AI action plan. America’s AI Action Plan has three pillars, the first of which is innovation. According to the Plan, “the United States needs to innovate faster and more comprehensively than our competitors in the development and distribution of new AI technology across every field, and dismantle unnecessary regulatory barriers that hinder the private sector in doing so.” Consistent with America’s AI Action Plan, this memo recommends creation of federal programs that foster innovation wherever that innovation benefits society without imposing barriers on state and local governments.


America’s AI Action Plan explicitly recommends evaluation, stating that “rigorous evaluations can be a critical tool in defining and measuring AI reliability and performance in regulated industries,” and directing the federal government to “support the development of the science of measuring and evaluating AI models, led by NIST at DOC, DOE, NSF, and other Federal science agencies” This clearly includes NIJ assessments of AI for police reports.

Are there laws on use of AI for police reports, and are these recommendations consistent with those laws?

Congress has passed no laws that specifically address use of AI for police reports, but two states have: Utah and California. These laws are consistent with this memo’s recommendations.


Under Utah’s Law Enforcement Usage of Artificial Intelligence Law, agencies must have policies that indicate which generative AI technologies employees can use, and for what tasks. The law also mandates that any police report created with AI assistance should include a disclaimer describing the role of AI, and a certification that the author reviewed the report for accuracy.


California’s Law Enforcement Agencies: Artificial Intelligence Law similarly mandates that police reports created with AI assistance include a disclaimer, and that agencies retain the initial draft of the report which was created entirely by AI and an audit trail of subsequent changes. Finally, the law prohibits vendors of AI-based tools from selling information that they obtain in this process.


These policies are consistent with recommendations of this memo, although this memo is not proposing mandates from the federal government. This memo would recommend that the NIJ collect data on the consequences of any state law, and use the lessons learned to recommend best practices to the other states.

Face Recognition Performance, Bias, and the Limits of Technical Fixes

Christopher Gatlin was arrested for a brutal assault he didn’t commit after AI Face Recognition Technology (FRT) said he matched the suspect. He spent 17 months behind bars, and clearing his name took two years. As of March 2026, there were at least nine documented U.S. wrongful arrests tied to face recognition misidentification, mostly involving Black people.  From 2012 to 2020 Rite Aid customers, disproportionately in non-white neighborhoods, were flagged by FRT as shoplifters, confronted, and sometimes expelled, including the searching of an 11 year old girl, all on the basis of bad matches.

Errors made by FRT are one cause of these harms, and these systems are known to make more errors on certain populations, including Black people, women, East Asians, and older people. But the way these systems are used by humans is a key component of these errors. Christopher Gatlin was identified based on a grainy photo of a hooded, partially obscured face, which could not be expected to lead to reliable identification. Moreover, police arrested him despite a lack of corroborating evidence. Harms caused by Rite Aid were due in part to a decision to mainly deploy face recognition in disproportionately non-white communities, as well as a lack of proper user training and the use of poor quality photos. 

At the same time, face recognition does provide real benefits. In controlled, cooperative settings such as unlocking phones, banking apps, or passport verification, modern systems can be highly accurate. NIST evaluations show dramatic improvement over time, with errors occurring about one time in 1,000, depending on conditions. Millions of Americans use face recognition daily for convenience and security. 

In tasks involving uncontrolled settings with uncooperative subjects however, such as identifying people from surveillance images, accuracy is much lower and more difficult to measure. Law enforcement and child-protection organizations have still used face recognition to identify suspects, locate missing children, and support trafficking investigations, but the potential from harms from inaccurate results in high stakes settings is much greater. Furthermore, the effect of biased performance is magnified in these uncontrolled settings, in which the number of errors seems to be much greater for some subpopulations. This report focuses on the causes of this bias, its potential harms and possible steps to reduce these harms. The use of face recognition in mass surveillance obviously raises other serious potential concerns, but these are outside the scope of this report.

Harms from FRT result both from technical errors and flaws in the ways humans use these systems. This suggests two parallel strategies for reducing the negative effects of biased face recognition. One approach is to reduce the bias in face recognition systems directly. Bias can occur due to training FRT using biased datasets that do not accurately reflect the demographics of the overall population. This can be difficult to eliminate due to the massive scale of data used to train FRT, which makes it difficult to control or even understand the demographics of the data. But further efforts can be made to reduce demographic bias in the data. Numerous other external factors that are more difficult to control may also create biased performance. Consequently, in the near term it may be practical to reduce, but not to completely eliminate biased performance. 

A complementary approach to reducing harms from biased face recognition is to ensure that FRT are used appropriately by human operators. This solution is much easier to implement in the near term than the previous technical solution. It is not sufficient, however, simply to ensure there is a human in the loop confirming the results of FRT, since often FRT are more accurate than humans, their errors occur on challenging cases, and people may be unable to correct these errors. Behavioral policy interventions range from research aimed at better measuring bias and understanding when FRT results are not trustworthy to clear standards for how human operators  use and interpret the results of FRT and restricting the use of FRT when potential harms outweigh the benefits. 

In this report we provide an overview of face recognition performance and differential performance between different demographic groups. We summarize results from the National Institute of Standards and Technology assessing performance of numerous commercial face recognition systems. And we provide an overview of potential policies to reduce harms from face recognition bias.

Acknowledgements

Our understanding of this topic has benefitted greatly from conversations with Kevin Bowyer, Leah Frazier, Patrick Grother, Anil Jain, Brendan Klare, Alice O’Toole, Jonathan Phillips, Jay Stanley, and Nathan Wessler. We also received insightful comments and suggestions from Clara Langevin and Caroline Siegal Singh. Any failure in understanding is due to the authors.


Contents


Introduction

Face Recognition Technology Has Caused Significant Harms

Improper development or use of face recognition technology (FRT) can lead to serious harms. One such example occurred in 2020 when Christopher Gatlin was arrested for a brutal assault he didn’t commit after a face recognition system proposed him as a possible match for the suspect. He spent 17 months behind bars, and clearing his name took two years. Porcha Woodruff, eight months pregnant, spent 11 hours in detention for a carjacking after another bad match, even though surveillance footage showed the suspect was not pregnant. As of March 2026, there are at least nine documented U.S. wrongful arrests tied to face recognition misidentification.

In another example of this dynamic, Rite Aid, a major pharmacy chain, deployed face recognition technology widely in stores to spot alleged serial shoplifters. Impacted customers, disproportionately in non-white neighborhoods, were flagged, confronted, and sometimes banned from stores, including searching an 11 year old girl, all on the basis of bad facial recognition matches. Federal regulators later banned the company from deploying facial recognition technology in stores for five years, noting higher false-positive rates in stores serving predominantly Black and Asian communities and improper pre-deployment safeguards (more details here).  

These instances of incorrect matching and arrests have mostly involved non-white people. But, while errors may be more prevalent among these populations, as FRT use grows it can increasingly affect all people. For example, police recently released a white Tennessee grandmother who had been wrongly jailed for nearly six months based on FRT results. She was arrested while babysitting four children, accused of committing bank fraud in North Dakota, although she had never been there. Unable to pay her bills, she lost her home

Figure 1. On the left is a surveillance photo taken at a crime scene. On the right is the image of Robert Williams that was incorrectly matched to this photo by an automatic face recognition system.

The harms described above were instigated by flawed matches produced by FRT—computational models that perform face recognition. However, these models always form part of a larger system in which humans apply FRT to some task. The failures were not just the product of a bad model, but of human failure to follow effective procedures. In many cases, face recognition searches are performed using low resolution images, with faces partially obscured. Figure 1 shows the surveillance photo used to identify Robert Williams, who was wrongly arrested for theft on the basis of this image. He later stated, “My daughters can’t unsee me being handcuffed and put into a police car.”  In some cases, police have violated accepted practice with suggestive remarks that prompt witnesses to confirm the results of automatic face recognition technology. In the Rite Aid case, poor employee training, the use of low quality images, and many other deployment decisions contributed to a large number of mistaken identifications. 

Face Recognition Technology is Increasingly Widely Used

Face recognition technology has become increasingly accurate and widely adopted. It is estimated that 131 million Americans use face recognition on a daily basis for applications such as unlocking their phones or banking apps, providing convenience and improving security. FRT usage is especially prevalent in applications in which the person being recognized cooperates with the system. In controlled, cooperative settings, face recognition systems have improved rapidly, with error rates roughly halving every two years in some evaluations. Under ideal conditions, top-performing systems may make a mistake only once in several hundred attempts.

Face recognition is also increasingly used by law enforcement agencies to identify uncooperative subjects, identify criminal suspects, and find missing children. Its use in surveillance is also growing. For example, Immigration and Customs Enforcement (ICE) is using FRT to identify people and determine their immigration status. In these applications, FRT often successfully identifies individuals, but their accuracy is not as high, and the potential for harmful errors increases. An incorrect match in this instance can potentially result in wrongful detention or deportation of American citizens. As face recognition use grows, so will its benefits and harms, making it an urgent matter to understand its properties, impact, and effective policy interventions.

Figure 2. Each column shows a pair of images of the same person. Experimental subjects find the images on the left easiest to match, while it is most difficult to determine that the images on the right come from the same individual.

Face Recognition Difficulty Varies Significantly

The difficulty of face recognition problems varies tremendously depending on the setting. Figure 1 has already shown a difficult operational setting, in which a poor quality surveillance image must be matched. A human examining these images has a hard time telling whether they are of the same person. Figure 2 shows that even when images are of good quality, it is not always easy to tell whether they come from the same person, due to changes in things like hairstyle. 

What Do We Mean by Bias in Face Recognition? 

Bias in face recognition has been the subject of significant public concern and extensive research over the past decade, particularly as these systems have been deployed in high-stakes settings such as law enforcement and surveillance. This report examines the nature, causes, and consequences of this bias, and in this introduction we begin with a brief discussion of what we mean by “bias”. 

Face recognition is meant to solve a problem that has an objectively correct solution; do these two images come from the same person?  We say the system displays bias against certain demographic groups if it makes more errors on these groups than on the general population. We will use the terms “bias” and “differential performance” interchangeably. 

FRT have consistently shown worse performance on women than men and worse performance on Black people than on white people, and many FRT display worse performance on East Asian people than white Americans. One way that bias can occur is through training FRT models using unbalanced data that better represents some groups. When this occurs, bias can be mitigated by augmenting the training set to represent different groups more equally.

However, defining demographic subgroups exactly can be difficult, making it hard to balance data. Studies that compare performance on men and women generally ignore subtleties of gender identity.  Groups of Black or white people used in studies certainly contain many individuals of mixed race and, for example, Black people in the United States might have a different distribution of traits than Black people from East Africa. Different studies sample demographic subgroups in different ways, and therefore may not be evaluating exactly the same questions. 

Moreover, it is unclear how best to define demographic subgroups. For example, is it more fruitful to measure differential performance between white and Black people, or between light-skinned and dark-skinned people?  Black people can differ from white people not just in skin tone but also in structural properties of their face. At this time, it is unclear which aspects of appearance account for differential performance and how this would align with all possible subgroups. Most studies have been limited to a few broad demographic categories and it is not known, for example, whether performance would differ between specific nationality groups within a similar region such as Vietnamese and Korean people. 

Outline of the Rest of the Report

This article aims to provide necessary background to assess the trajectory and risks of bias in face recognition technology. We do not address other important concerns about FRT, such as maintenance of privacy and the use of FRT in mass surveillance

In the next section we will briefly describe how face recognition systems work. We will then discuss the world-wide scope of face recognition. Next we summarize the accuracy of FRT and how this has progressed. We then discuss the nature of bias in FRT, and consider the causes of this bias. Next we consider FRT as part of a socio-technical system, and the impact of human users on FRT harms. Finally, we suggest possible policy interventions to reduce these harms.

This report makes the following points:

1. Improvements in accuracy have not eliminated bias.

Face recognition systems have become significantly more accurate in recent years, but they continue to exhibit differential performance across demographic groups.

2. Bias is difficult to measure and difficult to fully eliminate.

In real-world, uncontrolled settings, bias is harder to quantify and may be larger than benchmark results suggest. While technical interventions can reduce disparities, there is no simple or complete solution.

3. Harms arise from both technical errors and how systems are used.

Errors in face recognition can lead to significant harms, including wrongful arrests and other adverse outcomes. These harms are often amplified by deployment decisions, such as where systems are used and how results are interpreted.

4. Face recognition should be understood as a sociotechnical system.

Bias and harm arise not only from the underlying models, but also from human judgment and organizational practices. Inappropriate use of face recognition results can be more significant than technical error. 

5. Policy interventions can reduce harms even without perfect technical solutions.

Effective policies include improving transparency and evaluation, supporting research on real-world performance.  Furthermore, just having humans check the results of FRT is not sufficient to avoid errors; this requires establishing clear, detailed protocols governing when and how face recognition may be used. 

6. Governance of use is as important as improving the technology.

Auditing data and system outputs, developing tools that signal when results are unreliable, and enforcing strict use protocols can significantly reduce the risk that errors lead to harmful outcomes.


Glossary


How Face Recognition Works

Face recognition is based on machine learning, and highly dependent on the use of large-scale data sets. This data is difficult to carefully control or characterize. 

Face Recognition refers to the process of automatically identifying a person from a photo. It is divided into two tasks. In verification (or one-to-one matching), two images of faces are compared to provide a yes/no answer to the question of whether they come from the same person. This is used, for example, in border control, when a live image of someone may be compared to their passport photo. In identification (or one-to-many matching), a single probe face image is compared to a potentially large gallery of images to determine which, if any faces in the gallery match the probe image. The gallery might contain, for example, mug shot images of people who have been arrested, driver’s license photos, images of people who have been barred from access to casinos, or a large collection of images scraped from the internet. A system performing identification might declare that it finds no match, return a single match, or return a potentially large collection of images that might resemble the probe image. In the latter case it is expected that these potential matches will be assessed by the user to identify valid matches. FRT may also return a confidence level about the correctness for each match, although these may not correspond to the true probability that the match is right. 

A Brief History of Face Recognition

The first fully automatic face recognition system was developed 50 years ago as the subject of the PhD thesis of Takeo Kanade, who went on to become one of the pioneers in the field of computer vision.  It identified landmarks on the face, such as the corner of the mouth, and used their position to compare images. Early methods like this, based on face geometry, had limited effectiveness. Scientists began to develop more useful and accurate face recognition systems through the growing use of machine learning, beginning in the late 1990s. These methods are trained with numerous face images, called a training set, to automatically extract representations of faces that can be used to compare them more robustly. 

Progress accelerated rapidly as researchers began to appreciate the power of using an approach known as neural networks, which allowed them to leverage massive datasets of faces to “teach” the computer how to recognize new faces. While neural networks were used by FRT by the late ’90s, their use became dominant in the mid-2010s after further breakthroughs in machine learning with large neural networks, a technique known as deep learning. Since the mid-2010s, improvements in model architectures, training methods, and data scale have driven substantial gains in measured accuracy, especially on standardized benchmarks. At the same time, these advances have enabled rapid adoption of face recognition across a range of applications, from smartphone authentication to large-scale identification systems used by governments and private firms, even as performance in real-world settings remains highly dependent on context.

How Face Recognition Models Are Trained

To perform accurately, an FRT must be able to determine that two images of the same person are similar, even if the images are taken at different times, from different viewpoints, under different lighting conditions. This is done by training the machine learning model to extract a representation that captures facial properties that can distinguish one person from another, but that are not significantly affected by viewing conditions or even some aging. The similarity between two faces can be given a numerical score that represents the degree of difference between the representation of each face. 

In its simplest form, training occurs by incrementally adjusting the parameters of a neural network.  In most current publicly available systems these parameters consist of tens of millions of numbers that control the network’s behavior. If it is shown two images of the same person, the parameters are adjusted to increase the similarity score. If the images are of two different people, parameters are changed to lower the score. Once the model is trained, if two images produce a similarity score above a chosen number, known as the cutoff, the system declares the two images to be the same person; if it falls below that cutoff, the system says they are different. 

Once the model has been trained, it can perform identification using a gallery of faces by comparing a representation of the probe to representations of the gallery images. That is, it can verify or identify images of people who were not in the training set, because it has learned a general representation that should apply to any faces.

The large data sets used in training are typically scraped from the internet. For example, one influential early data set, Labeled Faces in the Wild, made use of face images detected in Yahoo! news stories, with identifying captions. A number of large scale datasets containing millions of images have been developed using photos of celebrities available on the internet. Some companies, such as Meta and Google have made use of internal data that users have uploaded and labeled; these training data sets may contain more than 100 million images. Clearview, a face recognition company, claims to use data sets of more than 70 billion face images scraped from the internet. Given the high cost and diminishing returns of training with so many images it is unlikely that all of these images are used for training, and this large corpus is more likely to be used to form the gallery.  

Academic FRT generally train on datasets of images of public figures, such as the MS-Celeb-1M dataset, which contains ten million images of about 100,000 individuals. These massive datasets capture how a person’s appearance can vary with age, lighting, viewpoint, expression, and other conditions, which helps improve accuracy of systems trained on the datasets. Commercial systems do not generally provide details of their training sets, but it is expected that they include similarly large sets of images scraped from the internet, or provided by users, as in the case of Google and Meta. However, because these data sets are assembled at enormous scale—often from uncontrolled sources—they are difficult to audit, regulate, or correct when they embed systematic biases.


Face Recognition in Use Today

Face recognition use is increasing rapidly, becoming more prevalent in numerous high-stakes applications.

The global face recognition market was almost nine billion dollars in 2025, with projected growth to over 30 billion by 2034. Over a third of this market is in the U.S., but there is wide adoption of FRT around the world.  One of the primary applications of face recognition is to efficiently and reliably identify people. This can make access to financial systems more secure, potentially preventing identity theft. It can also make hospital admissions quicker and more accurate, and speed up passport verification. In these applications, a human subject opts-in to using the FRT, cooperating to allow consistency in viewpoint, avoiding unusual facial expressions, and enabling controlled lighting. This leads to highly accurate systems. In many cases, such as using FRT to unlock cell phones, users opt-in to the technology for added convenience and device security.  When entering the country, U.S. citizens may opt-in to face recognition systems, and their photos are deleted after 12 hours, while non-citizens are required to participate, with photos retained for 75 years

Face recognition is also widely used in surveillance and law enforcement. Ten percent of U.S. police departments use FRT.  The NYPD made 2,878 arrests resulting from FRT in the first five years of its use.  The Metropolitan Police in London report 100 arrests using FRT in conjunction with mounted security cameras, including a suspect accused of kidnapping.  Police in New Delhi used FRT to identify almost 3,000 missing children, and FRT has been used to identify refugee children who have been separated from their family.  The National Center for Missing & Exploited Children (NCMEC) has used a tool called Spotlight, which makes use of FRT, to identify children who are victims of sex trafficking. In 2023, the FBI worked with NCMEC to identify or arrest 68 suspects of trafficking.  A large number of retail stores use FRT to track customers to understand traffic patterns, and despite the Rite Aid case, retailers such as Wegmans still use FRT to spot accused shoplifters.  Immigration and Customs Enforcement (ICE) is using FRT to identify people and determine their immigration status

Face recognition has been widely used for surveillance of the Uyghur population by the Chinese government., FRT are used by the Israeli government to track and surveil Palestinians.  

These applications of face recognition can solve crimes, enhance security and make access more convenient, but also raise troubling concerns about mass surveillance, repression of civil liberties, and high-stakes errors which materially harm people. In surveillance and criminal investigations, subjects are not cooperative, and probe images used are often of poor quality, as illustrated in Figure 1, which produces much higher error rates. An awareness of mass surveillance can also have a chilling effect on people’s ability and willingness to participate in Constitutionally protected activities such as protest or dissent. 

As face recognition has grown more practical, a large number of companies have developed and marketed FRT. This includes large tech companies such as Amazon, Microsoft, Toshiba, NEC and Apple, and smaller companies that focus more narrowly on face recognition, biometrics and security, such as Clearview, Idemia, and Rank One Computing. Clearview is one of the most widely used by federal and local law enforcement in the U.S. 

Early in the development of face recognition technology, the best performing systems were produced by academics and used openly available architectures and data. However, with its rapid commercial growth, state of the art FRT are generally developed by companies that provide little transparency about how they work or what data they use. As we will discuss in more detail, the National Institute of Standards and Technology evaluates the performance of some of these systems, but this evaluation is voluntary and not all companies participate.


Face Recognition Performance Across Different Conditions

Face recognition performance has improved rapidly, but recognition can still be quite difficult in many settings.

Two types of errors can occur in face recognition. With false positives, a FRT incorrectly states that two images come from the same individual. With false negatives, the system incorrectly states that two images do not come from the same individual. The cutoff is what determines the balance between false positives and false negatives. Tightening it makes the system more cautious about declaring a match (reducing false positives) but also more likely to miss legitimate matches (increasing false negatives).

Figure 3. The ACLU found that Amazon’s face recognition system matched 28 members of Congress to mugshots of other people.

The significance of this cutoff is illustrated well by the American Civil Liberty Union’s (ACLU’s) evaluation of Amazon’s FR system, “Rekognition” and the subsequent controversy. The ACLU reported that they had tested Rekognition, and that it incorrectly identified 28 members of Congress with people who had committed crimes (Figure 3). A significantly disproportionate number of these false matches were people of color. Amazon responded by arguing that although the ACLU had used the default cutoff, or confidence threshold, of 80% for Rekognition, this was more appropriate for finding celebrities on social media, and that their documentation recommended a much more stringent cutoff of 99% for use in high stakes applications such as law enforcement. Amazon also pointed out that the bias in the results may have been due to bias in the gallery of images used by the ACLU. If the ACLU compared images to a gallery that disproportionately contained people of color it would be more likely to produce false matches for people of color in congress. The ACLU replied by stressing the dangers of a system that was inaccurate with default thresholds and a lack of guidance for the system’s use. 

One lesson from the Amazon Rekognition controversy is that the potential harms of an FRT depend not just on its technical accuracy but also on how users apply these systems. It also provides some indication that Rekognition was more prone to false positive errors when applied to people of color, at least at one significant cutoff threshold.

Figure 4. Three images of a researcher at the National Institute of Standards and Technology. The left image simulates a passport or similar photo, the middle image simulates images that might be taken while going through immigration, the right image simulates an image taken by a kiosk.

Figure 5. Two pairs of images, each pair shows the same person under identical imaging conditions except for a change in lighting (images from the Multi-PIE dataset).

Challenges in Real-World Face Recognition

The most rigorous experiments measuring face recognition accuracy are conducted under tightly controlled conditions. As a result, reported performance often overstates how systems perform in real-world settings, where error rates can be much higher.

The difficulty of face recognition tasks can vary widely. Frequently, identification is performed by performing verification between the probe image and all gallery images. Identification becomes more difficult as the gallery size grows and the number of opportunities for false positive matches increases. The difficulty of face recognition tasks also depends very much on the conditions under which images were taken. For example, in border control, the subject can be required to face the camera with their face fully visible, lighting can be controlled, and camera quality can be ensured. 

Figure 4 shows that even images taken at a kiosk can be much harder to match, due, for example, to changes in viewpoint. Figure 5 illustrates the effect that a change of lighting can have on the difficulty of matching faces. As previously shown in Figure 1, when images come from surveillance cameras, the subject may not be facing the camera, they may not be close to the camera, so image resolution can be low, and their hair or hand or another object may obscure part of the face. Identification with poor imaging conditions may have many orders of magnitude more errors than verification under tightly controlled conditions. 

By all metrics, there seems to be little doubt that face recognition accuracy has been improving rapidly. The National Institute of Standards and Technology (NIST) Face Recognition Vendor Test (FRVT) evaluations illustrate this increase (most recent results here).  NIST evaluates verification performance on two high quality images of frontal facing individuals. From 2020 to 2025 the error rate fell by a factor of three. (They set a threshold for matching to achieve a false positive rate of 0.003%, so about one false identification in 33,000 attempted matches. They then measure the false negative rate, the number of correct matches missed. The best performing system as of January 2025 achieved a false negative rate of 0.13%, a little more than one correct match missed in 800.)  Similarly, the error rate on an identification task that matched a mug shot probe image to a large gallery of mugshots fell by a factor of 5 during the same period. (The best performing method, when using a threshold to produce a false positive identification rate of 0.3%, had a false negative error rate of 0.05%. This means that the system would falsely identify a probe image in the gallery (of 1,600,000 mugshots) one time in about 300, while missing a correct match about one time in 2,000.)  Some results are shown in Figure 6, as of March 2025. Over a period of decades, NIST has found that errors have generally fallen by about a factor of two every two years.  Under controlled conditions, FRT are now much more accurate. For example, on the best performer as of March 30, 2026, when performing verification on two mugshots, using a cutoff set to make a false positive match one time in a million, a false negative failure to find a match will occur one time in 500. This sharp increase in accuracy in a short period has happened alongside widespread adoption in applications like border control or unlocking a phone. 

These experiments represent relatively ideal conditions. FRT in the real world may face much higher failure rates. This can occur due to more challenging imaging conditions, such as using a surveillance image as a probe, instead of a mugshot, or other factors such as changes in the subject’s appearance. For example, when the best performing system at mugshot identification is applied in a scenario in which the gallery contains visa images and the probe is taken from a kiosk, the error rate increases by a factor of about 18 with a false negative error about one time in 30 instead of one time in 500. This is a fairly typical increase, and still represents relatively idealized conditions compared to the most challenging ones.


Defining and Measuring Bias in Face Recognition

Face recognition performs with different levels of accuracy on different demographic groups. As face recognition becomes more accurate, this may limit the effects of this disparity in some applications, but it can still be quite significant in high-stakes applications.

Going back more than 30 years, researchers have observed different rates of accuracy in face recognition systems depending on demographic properties of the subject, including race, gender and age. For example, in 2011 a study showed that Western face recognition algorithms performed better on Caucasian faces than East Asian faces, while East Asian face recognition systems performed better on East Asian faces than Caucasian ones. In 2018, the influential Gender Shades paper examined differential performance not in face recognition, but in a related facial analysis problem of determining gender from a face, showing much poorer performance on images of dark skinned females than light skinned males. 

Absolute vs. Relative Error

In considering differential performance, it is important to distinguish between absolute and relative differences in performance. We define the absolute difference in two error rates as the difference between the larger and smaller error. For example, if an FRT produces 2% error on male faces and 4% error on female faces, we would say that the absolute difference is 4% – 2% = 2%. We describe the relative error as the ratio between the larger and smaller value, which in this case would be 4%/2% = 2. As overall performance improves, the absolute error tends to decrease, while the relative error rate might or might not decrease. For example, if a new generation of FRT reduces error on male faces to 1% and reduces error on female faces to 2%, absolute error decreases from 2% to 1%, while relative error remains constant. 

Whether absolute or relative error is more important depends on the operational considerations and use of the system. When performance is very high, absolute error will tend to shrink. If this translates into operational settings, then relative error may become unimportant. For example, if an FRT makes a mistake once in a billion queries on one population, and twice in a billion on another, errors for either population may be so rare that they are insignificant. In practice, the impact of absolute error also depends on how widely deployed a system is. As systems become more accurate, they may become more widely deployed, which can paradoxically result in more accurate systems producing more errors. 

Even though current FRT achieve quite low error rates under ideal conditions, these error rates tend to grow much higher under more challenging conditions, and errors can be quite common. Although it is difficult to study error rates accurately under the most challenging conditions, high relative error under ideal conditions may predict relative error that is just as high or higher under challenging conditions that also have high absolute error. That is, while absolute error in operational contexts is of greatest importance, relative error in highly controlled conditions may predict high absolute error in less controlled conditions. Consequently, it is premature to think that FRT are so accurate that relative error is no longer important. A more nuanced view would hold that continuingly high relative error rates may be less important for some applications, such as unlocking phones, and still be quite important in other applications, such as criminal investigations. 

NIST Experiments on Demographic Variation

Since 2019 NIST has performed extensive evaluations of demographic variations in performance on hundreds of face recognition systems. They have access to large collections of non-public images that they use to evaluate FRT submitted by companies. The large size and private nature of the dataset makes it especially unlikely that models are overfit to the data by, for example, selecting parameters that boost their performance on this particular data. NIST computes false negative rates using over a million pairs of images, comparing one high quality image of an individual to a medium quality image of the same person. False positive rates are computed using over a billion pairs of high quality images from different individuals. Image quality reflects applications such as passport checks at airports, but does not include more challenging problems such as police investigations using surveillance footage. All images come with demographic information, including the age, gender and country of origin of the subject. Country of origin is used as a proxy for race, focusing on countries that are less racially diverse, but this is not a perfect proxy.

NIST finds a relatively small demographic variation in false negative rates, in which a correct match is missed, and a much larger variation in false positive rates, in which an incorrect match is accepted. For example, the top performing FRT as of March 2025 produced 358 times as many false positives for West African females over 65 as for Eastern European males aged 35-50, with the false match rate increasing from about one in 15,000 to about one in 50. Among the top ten performing systems, the false positive rate for all West Africans was about 23 times higher, on average, than the rate for Eastern Europeans. The false positive rate for these performers on average is about 4.6 times higher for females than males, and about 2.9 times higher for people over 65 compared to people aged 20-35. The evaluations also show poorer performance on people from South or East Asia, relative to Eastern Europeans. Many additional studies have also found that FRT generally perform better on white people than people from other racial groups, and on males compared to females.  

These studies do have important limitations. More narrowly defined groups (e.g. West African women over 65) will have less data, leading to noisy estimates, and when we take the ratio of two noisy estimates we amplify the noise. Also, images taken in different countries may differ in ways beyond the race of the subject, such as in the types of cameras or lighting used. Also, incorrect labels may have a significant effect on accuracy. If a visa photo is associated with the wrong name, this can lead to a false match, and these incorrect labels may be more prevalent in some countries than others. Finally, measures of bias may vary depending on the specific ways in which performance is measured.  The chief scientist of a leading face recognition company has stated that in practice they find differential performance between racial groups of a factor of approximately 1.5, rather than the higher numbers found in NIST studies. (Brendan Klare, personal communication.) 

Challenges in Measuring Bias in Face Recognition

There is decades of evidence of differential performance of face recognition between demographic groups, particularly affecting non-white people and females. However, these studies generally make use of relatively high quality images, and may not accurately reflect the degree of differential performance in challenging operational cases, such as the use of surveillance footage in criminal investigations or in identifying people on a watch list. This is due to the fact that it is quite difficult to accurately characterize and sample images from challenging environments. And while large scale photo collections with known identities and some demographic information exist, such as passport photos, we do not have large scale collections of photos taken in challenging conditions that have this information. While this problem is elusive, there is some evidence that differential performance increases with the difficulty of the recognition task.  

Another limitation occurs because races are not well-defined biological categories but social constructs. It is not clear how to systematically divide a population into different races, especially in the case of multi-racial individuals. This is particularly challenging when images are scraped from the internet, and need to be labeled by race. Some studies have focused on skin darkness rather than race, but this is also difficult to determine accurately from photos due to the effect of unknown lighting conditions on apparent skin color. In spite of these limitations, there is a clear consensus among researchers that differences in FRT performance exist between racial groups. 

An important question is how differential performance in face recognition is evolving over time. Is this a problem that was initially ignored, but is now being effectively addressed, or one that is recalcitrant?  While there is no question that absolute differences in accuracy are shrinking over time, as FRT become more accurate, the behavior of relative differences is less clear. This is difficult to judge, since new test sets come out frequently, and experimental performance is generally measured over an ever changing landscape of conditions. Perhaps the most stable evaluation framework is NIST’s, which has consistently evaluated new FRT under the same conditions including systems developed from 2018 to 2026. Some of the top performing FRT have evolved, with multiple versions being released over this time period. When we examine these, we see that some have significantly reduced the amount of bias over time, while others have not, and have even seen increased bias. This suggests that it may be possible to reduce systematic bias through model design. More details can be found in the appendix.


Sources of Bias in Face Recognition Systems

Bias in face recognition systems arises from a combination of imbalanced training data, differences in image quality and gallery composition, and other technical and operational factors that are difficult to fully control or eliminate.

False negatives often arise when image quality is poor or facial features are obscured, while false positives are more likely when different individuals appear similar to the system, which can be exacerbated by limitations in training data or representation.  For example, if we compare two images of the same person, and one of these images is blurry or has bad lighting or low resolution, the images may appear dissimilar due to these effects. FRT are trained to be somewhat robust to changes in viewing condition, but they are still likely to make errors when these changes are large. On the other hand, if a system is trained using few images of one demographic group, the system may not learn representations that distinguish between a wide range of appearances within that group. For example, if one trained an FRT using images of only one Black person, the system would likely learn to associate dark skin with that individual, and would not learn features that effectively distinguish between different Black people. This is an extreme example, but it is generally found that deep neural networks become more effective as the amount of relevant data increases. 

We focus on false positive errors, as these show the greatest differences across demographic groups and are most closely associated with documented harms, such as wrongful arrests. In this section, we will discuss two key points. First, while it may be straightforward to improve demographic balance in datasets, completely eliminating demographic bias is complex and difficult. Second, while demographic bias in the data may be responsible for some bias in false positives, it is not necessarily the only source of these differences. Various research results present conflicting evidence of the importance of dataset bias in practice. 

The Contribution of Dataset Bias

Face datasets collected in the last 15-20 years have generally consisted of images scraped from the internet. This enables the creation of large scale datasets that capture a wide range of variations in viewing conditions. These datasets often used well-known people with many online photos, without specific regard to accurately representing the distribution of people of different races or genders in the population as a whole. For example, an early and very influential dataset, Labeled Faces in the Wild (LFW), consisted of 77.5% images of men and 22.5% images of women. LFW was based on people who had appeared in Yahoo! news stories that were identified in captions, making it easier to build a large dataset of known people. However, these people were obviously not representative of the overall population.

Some more recent datasets pay closer attention to capturing the true distribution of people in the world. However, creating unbiased datasets can sometimes be a subtle and difficult problem. For example, the BUPT-Balancedface (BUPT) dataset was constructed to have equal numbers of images of Caucasian, Indian, Asian and African faces. However, subsequent analysis revealed that the Asian and Indian faces consistently appeared as a larger size in the dataset.  So although the number of images was balanced, the viewing conditions of the images could still vary significantly.  This discrepancy might, for example, lead to biased performance at test time. 

The reason for systematic biases in datasets is often not well understood, but it is plausible that when scraping images from the internet, photos from different countries might follow different conventions, use different cameras, or differ in myriad other ways. Therefore, to judge whether a dataset is biased is not as simple as counting the number of images from each population. 

A deeper difficulty is even defining what it means to have an unbiased dataset. BUPT represented four demographics equally. But it is unclear what should count as a racial category. For example, should Asian faces be counted as one category? Should Chinese and Japanese people be considered two separate racial categories?  What about multiracial individuals? The concept of race is not biological, but a social construct that is not well defined.  It is also problematic to correctly label the racial origins of large scale datasets, which may contain images of millions of people. It seems clear that paying attention to demographic diversity will produce less biased datasets than building datasets based on arbitrary selection of celebrities. However, it is also clear that creating completely unbiased datasets is an ill-defined problem. Even with a given definition of “unbiased” it remains very challenging and beyond current technology.

There is certainly strong evidence that dataset bias can produce differential performance, and bias can be reduced through improving the training data balance.  It has been found that while Western face recognition algorithms perform better on Caucasian faces than on East Asian faces, algorithms developed in East Asia perform better on East Asian faces, a result that is likely due to dataset bias.  After the Gender Shades paper demonstrated that Microsoft’s gender identification algorithm performed much more poorly on Black women than white men, Microsoft quickly improved performance dramatically on Black women by balancing its datasets.

Differential performance can also occur because of biases in the gallery data or probe data. When the gallery is formed from images scraped from the internet, the properties and number of these images may vary drastically from individual to individual, or even from group to group. It has been shown, for example, that if one group is more highly represented in the gallery, this will lead to more false positives among that group because there is greater potential for the gallery to contain faces similar to the probe. As another example, if one group, such as women, frequently have longer hair that covers more of their face in the probe image, this can also lead to higher error rates.  Also, if a gallery image is of low quality, not showing a clear image of the face, it may be matched to a similar low quality probe image of a different person. Rite Aid’s use of low-quality images in its gallery is believed to have contributed to the large number of false matches it produced, which in turn led to customers—disproportionately in non-white neighborhoods—being wrongly flagged, confronted, and sometimes expelled from stores. When companies such as Clearview make use of billions of images scraped from the internet it is extremely challenging to balance these datasets or ensure uniformity in their quality. 

Assessing dataset bias in commercial systems is complicated further by the fact that companies generally do not make their datasets publicly available or disclose many details about them. Moreover, NIST experiments on dataset bias do not make use of the galleries used by commercial systems. Therefore any bias due to galleries would not be detected. 

Sources of Bias Beyond the Data 

Other factors besides data may also significantly influence differential performance. Some experiments have shown that even balanced datasets do not produce equal performance on men and women, or between races, and that sometimes more biased datasets produce less biased and better results. Furthermore, demographic groups may have properties that make them easier or harder to recognize. For example, there may be greater variation in hairstyle in one gender than another, and males in different countries may have different trends in facial hair. If someone has an unusual beard, for example, this may make him easier to recognize, or harder to recognize if he shaves his beard. It is difficult to determine the effects on differential performance of social conventions affecting appearance. It has also been noted that darker skin may require different types of lighting to bring out the facial structure. This could result in more recognition errors for people with darker skin when lighting is not controlled.  

In summary, it is clear that extreme dataset bias produces biased results. It is quite challenging to produce perfectly unbiased datasets, and less clear to what extent the differential performance observed in modern face recognition systems may be due to dataset bias, especially since these systems are built with proprietary data that is not open to public examination. 

Reductions in Bias Over Time

From a policy perspective, perhaps the most important question is whether companies have the ability to produce less biased FRT. To address this question we examined NIST measurements of the performance of models produced by leading companies. NIST has assessed the degree of bias in multiple models produced over time by some companies, allowing us to see how their performance has evolved. Based on NIST reports, we find that some companies have significantly reduced the absolute and relative bias in their systems in two or three years after initial evaluation, while other companies have not reduced relative bias, and in some cases it has increased, even while absolute bias decreases due to improved overall accuracy. Details of this analysis may be found in the appendix. 

These results suggest that companies are capable of reducing bias, although this is certainly not definitive. In a conversation with one of the authors, the chief scientist at a leading face recognition company confirmed that NIST evaluations have helped them identify certain variants of differential performance between racial groups, enabling them to take effective steps to proactively identify and reduce bias whenever the company becomes aware of it. (Brendan Klare, personal communication.)


The Human Factor: Face Recognition Systems as part of a Socio-Technical System

Many errors in face recognition are due not just to mistakes by the technology, but to the way in which people make use of it.

The preceding sections focused on the technical properties of face recognition systems. However, these systems do not operate in isolation. They are embedded in what researchers call a sociotechnical system, in which the technology interacts with human judgment and organizational practices. The real-world effects of face recognition therefore depend not only on technical FRT performance, but also on how human users interpret and act on its results. In practice, this interaction can create distinctive failure modes. For example, users may rely too heavily on algorithmic matches without considering other evidence or fail to appreciate how image quality and threshold choices affect reliability.

Limitations of Human Oversight

Some authors argue that these human factors can be structured to correct for technical weaknesses in face recognition systems. One commentator contends that: “it is stunningly easy to build protocols around face recognition that largely wash out the risk of discriminatory impacts…. A simple policy requiring additional confirmation before relying on algorithmic face matches would probably do the trick… one has to wonder why so few researchers who identify bias in artificial intelligence ever go on to ask whether the bias they’ve found could be controlled with such measures.” 

However, empirical evidence suggests that this confidence in human oversight may be misplaced. First, FRT tends to make errors on difficult cases, in which humans also make errors. Studies show that humans are unable to identify many of the errors made by automatic systems. Furthermore, human performance on face recognition suffers from similar differential performance as machine learning systems. Dubbed the other-’race’ effect, it has long been known that humans are more accurate in recognizing faces from their own race than from others (it has been posited that this also stems from dataset bias, in that people encounter more individuals of their own race than of others).  Some work indicates that current automated systems recognize faces more accurately than the typical person, and that in some cases, combining a less effective human judgement with an automatic system may actually lead to lower accuracy than simply using the results of the automatic system.  Human judgements can in some cases be used to improve algorithmic accuracy but it may be difficult to determine when that is the case. In general, we cannot assume that human judgements will be accurate or that human oversight can be counted on to correct errors made by automatic systems.

Figure 7. Christopher Gaitlin, right, was identified using the security photo on the left.

User Errors

Consistent with these findings, many of the known cases of false arrests due to FRT errors involved questionable practices by investigators. Christopher Gatlin was arrested for the brutal assault of a security guard, after an FRT flagged him as a possible suspect, based on a low quality image (Figure 7). Police steered the security guard to identify Gatlin, in what they later admitted was improper behavior

Robert Williams was arrested for burglary one year after the crime, based on applying FRT to a surveillance video. Lacking witnesses, police showed the surveillance video to an employee of the store’s insurance company, who identified Williams from a photo array, although the video was of poor quality and his face was obscured by a shadow (Figure 1). The police failed to take basic steps such as investigating Williams’ alibi. ​​The police chief at the time, James Craig, said that “this was clearly sloppy, sloppy investigative work.” In other cases, police have shown a single suspect’s photo to a witness, violating best practices by being unduly suggestive. This led to an arrest despite the suspect’s convincing alibi. 

In cases where FRT lead to false arrests, it seems that police may in fact give undue weight to the results of FRT, rather than catching their errors, an example of “automation bias”.  In another case in which recommended procedures were not followed, police were unable to obtain face recognition results due to the low quality of the surveillance image. A detective felt that the surveillance image resembled the actor Woody Harrelson, and used a picture of him to search for matches, rather than the suspect’s photo.

Failures in the use of FRT occur not only in police investigations. In the Rite Aid case mentioned in the introduction, the FTC’s complaint highlighted not just algorithmic errors but significant governance failures in how the system was operated by store employees. The commission found that Rite Aid did not take reasonable steps to train or oversee store employees who were responsible for acting on match alerts, including failing to teach staff how to interpret alerts or warn them that false positives could occur. The company also failed to test or monitor the technology’s accuracy once deployed, enforce image-quality standards, or implement any procedure for tracking false positive alerts and employee responses. As a result, employees in hundreds of stores routinely followed, confronted, searched, or even called police on customers based solely on system alerts—actions taken without meaningful training on the system’s limitations or appropriate safeguards. These shortcomings in training, oversight, and procedural controls were central to the FTC’s determination that Rite Aid had failed to prevent foreseeable consumer harm from the technology’s use.

In summary, it may be difficult for humans to correct mistakes made by algorithms, and in some cases they may place undue confidence on FRT results that are questionable and based on low quality images. In many applications, such as drug stores that are looking for known shop lifters, the people making use of FRT may not be expert investigators or well trained in the appropriate use of these systems.


Policy Interventions to Address Bias in Face Recognition Systems

Many errors can be addressed by better understanding and regulation of the way in which the technology is used.

A wide variety of policy interventions are available to deal with potential harms caused by bias in FRT. These include research, transparency in documenting bias, voluntary or mandatory guidelines governing the use of face recognition, and outright bans on the use of face recognition in certain contexts. As noted above, FRT make positive contributions in law enforcement and other applications, and these positives must be weighed against potential harms in crafting policy. Numerous institutions have suggested policy changes to address bias in FRT, including a comprehensive set of proposals in a recent report from the National Academies.

Research

Federal agencies already support substantial research on face recognition. NIST conducts ongoing evaluations of performance and demographic disparities, and agencies such as the Office of the Director of National Intelligence (ODNI) and the Intelligence Advanced Research Projects Activity (IARPA) have funded foundational research in face recognition systems. However, important gaps remain, particularly in understanding how these systems perform under operational conditions and how human users interact with their outputs. Additional federal funding could expand independent research in these areas, either by strengthening NIST’s evaluation programs or by supporting academic and nonprofit research focused specifically on bias mitigation and real-world deployment risks.

Two research priorities are especially important. First, evaluation frameworks should better reflect real-world conditions. Current large-scale benchmarks often rely on relatively high-quality images, whereas many high-stakes uses—such as criminal investigations—depend on low-resolution or poorly lit surveillance images. While efforts such as the IARPA Janus Surveillance Video Benchmark (IJB-S) dataset have begun to address this issue, broader and more systematic testing under operational conditions would provide policymakers with a clearer understanding of real-world risk. 

Second, research is needed to develop tools that help human operators interpret and appropriately limit their reliance on face recognition results. For example, systems could assess probe image quality, estimate the likelihood that a reliable match can be produced, and warn users when results are unlikely to be dependable. Such tools could reduce the risk that investigators or retail employees draw strong conclusions from low-quality, unreliable inputs.

Measure and Reduce Bias

A better understanding of the bias in FRT can inform the procurement decisions of potential customers and encourage companies to take steps to reduce bias. Transparency in bias can be promoted in a number of ways. NIST is already conducting regular and impactful evaluations of bias in FRT, which can be thought of as an application of the Common Task Method (such evaluations have long been common in the computer vision community). This can be continued and potentially expanded. Regulations or government procurement guidelines can be used to incentivize or require companies to participate in evaluations and make these results public. Since criminal investigations are conducted by the government, procurement guidelines are a strong potential lever in promoting transparency. In addition to transparency in performance, these approaches could also be used to promote transparency in the data used to train FR systems. Making training data public may raise significant privacy concerns, but the government could incentivize the release of information describing the data and the steps taken to enhance the demographic balance of these data sets.

Regulate Sociotechnical use of Face Recognition

If we view FR as part of a sociotechnical system, it makes sense also to govern the way in which face recognition is applied, not just the technical performance of the underlying algorithm. In practice, “responsible use” protocols need to specify who can run searches, what minimum image-quality standards apply, what form results can take, and what documentation and oversight are required. They should also define the permissible purposes for which searches may be conducted, restrict access to trained and certified personnel, require supervisory approval for high-stakes uses, and mandate that face recognition results be treated only as investigative leads rather than as dispositive evidence. Protocols can require minimum similarity thresholds below which no candidate match is returned, prohibit the use of face recognition on images that fall below objective quality metrics, and require contemporaneous documentation explaining why a search was initiated and how results were interpreted.

Additional safeguards could include audit trails of all searches and outcomes, periodic independent audits of performance and demographic disparities, disclosure requirements when face recognition contributed to an arrest or charging decision, and exclusionary consequences if required procedures are not followed. Agencies could also be required to collect and publish aggregate statistics on the number of searches conducted, the rate at which matches lead to arrests, and the frequency of erroneous identifications. 

As an example of governance procedures, the FBI has established guidelines on the use of face recognition. These include limiting situations in which it can be used and the type of probe images used. They require that all face queries be evaluated by trained examiners and mandate that face recognition be used for investigative leads that must be corroborated. 

As another example, the New York City police department (N.Y.P.D.) has spelled out a detailed protocol for the use of FRT. This requires investigators to submit face images to a special facial identification section of the department (the Real Time Crime Center, Facial Identification Section) that will, for example, ensure that image quality is sufficient and that use of FRT is warranted. The section can reject unsuitable probe images and reviews matches. Critically, a “possible match candidate” is meant to be “treated as an investigative lead only” and does not establish probable cause to make an arrest. The unit also retains records of searches and results. It has been reported that in other localities, investigating officers have accessed FRT directly, without supervision. Specific requirements could be mandated, with legal consequences if they are not followed, such as disallowing evidence produced in subsequent investigation.

However, in spite of N.Y.P.D. guidelines, FRT did lead to the false arrest of Trevis Williams. After FRT identified him as a suspect in a crime, the victim identified him from a photo lineup, although he was eight inches taller and 70 pounds heavier than her initial description of the suspect, in addition to other exculpatory evidence.  This illustrates the difficulty of ensuring that guidelines effectively prevent errors and false arrests.

Regulation may be applied not only to government agencies, such as police departments, but also to private companies that are increasingly deploying face recognition systems in commercial settings. RiteAid’s use of face recognition illustrates how governance failures can arise outside of law enforcement. According to the FTC complaint, “Rite Aid failed to consider or address foreseeable harms to consumers flowing from its use of facial recognition technology, failed to test or assess the technology’s accuracy before or after deployment, failed to enforce image quality standards that were necessary for the technology to function accurately, and failed to take reasonable steps to train and oversee the employees charged with operating the technology in Rite Aid stores.”  These deficiencies were not primarily algorithmic; they reflected a lack of risk assessment, testing, training, oversight, and ongoing monitoring.

The FTC’s enforcement action demonstrates that existing consumer protection laws can be applied to address some forms of misuse. However, as commercial deployment expands, more explicit regulatory standards may be necessary to prevent similar failures. Such standards could require companies to conduct pre-deployment accuracy and bias testing, implement image-quality controls, establish employee training and supervision protocols, monitor and document false positive rates, and assess foreseeable risks before using face recognition in customer-facing environments. Clear statutory or regulatory requirements would provide ex ante guardrails rather than relying solely on ex post enforcement after harms have occurred. Regulations could also require clear disclosure when face recognition is used—both to affected individuals and in aggregate public reporting—so that its role in decision-making can be scrutinized, evaluated, and corrected where harms emerge. 

Policymakers should be willing to ask if using facial recognition is appropriate at all in certain circumstances. In higher-risk contexts, policymakers could impose outrights bans, limit use to specified categories of serious crimes, require a warrant, or mandate corroborating evidence before an individual identified through face recognition is included in a lineup or arrested.  

As an example of use restrictions, the state of Maryland has limited the use of automatic face recognition to specific, serious crimes, and requires that defense attorneys be notified when it was used in a case. Montana and Utah require police to obtain warrants in the use of face recognition. In Detroit, police must obtain corroborating evidence before placing a suspect identified through face recognition in a line up. Several cities have banned the police use of face recognition, including San Francisco and Boston, while Portland has banned the use of face recognition by private entities in all public places. 

At the federal level, members of Congress have introduced legislation that would impose a nationwide moratorium on government uses of face recognition technology absent explicit congressional authorization. Together, these restrictions illustrate a broader policy approach: limiting deployment in high-risk settings until adequate safeguards, transparency, and accountability mechanisms are in place.


Conclusions

Face recognition systems have improved dramatically in accuracy over the past decade, and in tightly controlled environments they now perform at very high levels. At the same time, substantial differences in performance across demographic groups persist, particularly in the false positive errors most closely associated with wrongful arrests and other harms. As overall error rates decline, these disparities may matter less in low-risk settings, but increasing deployment in high-stakes and uncontrolled contexts may lead to continued harms. 

Technical improvements can reduce some sources of bias. Developers can improve dataset balance, adjust thresholds, and refine model design. However, eliminating differential performance entirely is beyond the current state of the art, particularly in operational environments involving low-quality images and large search databases. Policymakers should not assume that continued technical progress alone will resolve these disparities. 

Perhaps most importantly, policymakers should view the regulation of face recognition through a sociotechnical lens, considering the interaction between the technical system and the humans who use it.

We cannot wait for perfect sociotechnical systems, but must govern the deployment of imperfect ones. Policymakers must decide where face recognition is not legitimate. If face recognition is used in high-stakes applications, it should be subject to clear limitations, transparency requirements, and enforceable protocols designed to prevent errors from cascading into wrongful arrests or other serious harms.


Appendix: Variations in Bias Over Time

We examined the performance of face recognition systems evaluated by NIST on different demographic groups.  All results are based on data on a verification task, updated on March 5, 2025. More recent data on somewhat different tasks shows similar levels of bias. False positive matches are measured when comparing two high quality, visa-like images of two different people of the same sex, age group and region of birth. Demographic disparities are computed by taking the ratio of the false positive rate for two different demographic groups. For example, the ratio of the false positive rate on faces of people born in Western Africa to the false positive rate for people born in Eastern Europe for the highest performing FRT was 17.42, meaning that a false positive match was 17.42 times as likely for someone from Western Africa. 

NIST has evaluated differential performance of commercial systems for over five years. Many companies have submitted multiple versions of their FRT over time, as the systems have improved. This allows us to determine how the bias in these systems has changed. We considered the 20 systems with best overall performance, which originated from 12 different companies. Eight of these companies had submitted at least four different versions of their FRT for evaluation, and so we focused on these eight systems. 

Figure 8 shows the change in the ratio of differential performance for three pairs of demographic groups. For illustrative purposes, we show results from two different companies. The curves from Sensetime illustrate differential performance that has increased over time, while the curves from Rank One Computing (ROC) show differential performance that has decreased. Solid curves show the ratio of false positives for subjects of West African birth compared to Eastern Europeans. The dashed curves show performance on females compared to males. The dashed-dotted curves show an older age group (65+) compared to a younger cohort (20-35). 

Table 1 shows the correlation between the passage of time and the ratio of differential performance for all eight companies. A negative correlation indicates that bias has dropped over time, while a positive correlation shows an overall increase in bias. If the correlation is close to 1 or -1, this means that the change in performance over time is highly consistent, while a correlation close to 0 means that there is no clear trend in the increase or reduction in bias.  We can see that Toshiba, Idemia, and ROC have reduced biased performance over all three ratios, while Sensetime has increased bias, with other companies showing mixed performance.

Transforming the Carceral Experience: Leveraging Technology for Rehabilitation

Despite a $182 billion annual cost, the U.S. correctional system perpetuates itself: At least 95% of all state prisoners will be released from prison at some point, yet more than 50% of them reoffend within three years. 

A key driver of high recidivism is the systemic negligence of the carceral experience. While much attention is given to interventions post-release, rehabilitation inside correctional facilities is largely invisible to the public. This dynamic results in approximately 2 million incarcerated persons being locked in a “time capsule”—the world passes them by as they serve their sentences. This is a missed opportunity, as simple interventions like accessing educational resources and maintaining family contact during incarceration can cut recidivism by up to 56%. Reduced recidivism translates into more robust workforce, safer communities, and higher political participation. The new administration should harness the momentous bipartisan interest in criminal justice reform, audit the condition and availability of rehabilitative resources in prisons and jails, invest in digital and technology infrastructure, and sustainably end mass incarceration through building meaningful digital citizenship behind bars. 

Challenge and Opportunity

In the post-COVID-19 world, robust and reliable technology and digital infrastructure are prerequisites for any program and resource delivery. However, the vast majority of U.S. correctional facilities still lack adequate technology infrastructure, with cascading effects on the availability of in-prison programs, utilization of digital resources, and incarcerated people’s transition to the free world. 

As many other institutions quickly embrace new technology, prisons lag behind. In Massachusetts, prisons struggle to provide even basic rehabilitative, educational, and vocational training programs due to a shortage of hardware devices, such as tablets and Chromebooks, and insufficient staffing. Similarly, in Florida, internet access is constrained by legislation and exacerbated by a lack of funding. Many prisons are forced to limit or entirely cancel programs when in-person visits are inaccessible, due to either COVID-19 restrictions or simply insufficient transportation options for resource providers. Consequently, only 0.5% of incarcerated individuals are enrolled in educational courses. The situation is equally dire in juvenile detention centers from California to Louisiana, where poor access to educational opportunities contributes to low graduation rates, severely limiting future employment prospects for at-risk youths.

Despite these systemic challenges, there is a strong, bipartisan recognition of the need to improve conditions within the carceral system—and therefore a unique opportunity for reform. 

The Federal Communications Commission (FCC) has passed the most comprehensive regulations on incarcerated people’s communication services, setting rate caps for various means of virtual communications. Electronic devices, such as tablets and Chromebooks, are gradually being accepted in correctional facilities, and they carry education resources and entertainment. Foundationally, federal investments in broadband and digital equity present a generational opportunity for correctional facilities and incarcerated people. These investments will provide baseline assessment of the network conditions and digital landscape in prisons, and the learnings can lay the very foundation to enable incarcerated people to enter the digital age prepared, ready to contribute to their communities from the day they return home.

This is just the beginning. 

Plan of Action

Recommendation 1. Invest in technology infrastructure inside correctional facilities.

A significant investment in technology infrastructure within correctional facilities is the prerequisite to transforming corrections. 

The Infrastructure Investment and Jobs Act (IIJA), through the Broadband Equity, Access, and Deployment (BEAD) and Digital Equity (DE) programs, sets a good precedent. BEAD and DE funding enable digital infrastructure assessments and improvements inside correctional facilities. These are critical for delivering educational programs, maintaining family connections, and facilitating legal and medical communications. However, only a few corrections systems are able to utilize the funding, as BEAD and DE do not have a specific focus on improving the carceral system, and states tend to prioritize other vulnerable populations (e.g., the rural, aging, veteran populations) over the incarcerated. Currently incarcerated individuals are difficult to reach, so they are routinely neglected from the planning process of funding distribution across the country. 

The new administration should recognize the urgent need to modernize digital infrastructure behind bars and allocate new and dedicated federal funding sources specifically for correctional institutions. The administration can ensure the implementation of best practices through grant guidelines. For example, it could stipulate that prior to accessing funding, states have to conduct a comprehensive network assessment, including speed and capacity tests, a security risk analysis, and a thorough audit of existing equipment and wiring. Further, it could mandate that all new networks built or consolidated using federal funding be vendor-neutral, ensuring robust competition among service providers down the road. 

Recommendation 2. Incentivize mission-driven technology solutions.

Expanding mandatory access to social benefits for incarcerated individuals will incentivize mission-driven technology innovation and adoption in this space.

There have been best practices on how to do so at both the federal and state levels. For example, the Second Chance Pell restored educational opportunities for incarcerated individuals and inspired the emergence of mission-driven educational technologies. Another critical federal action was the Consolidated Appropriations Act of 2023 (specifically, Section 5121), which mandated Medicaid enrollment and assessment for juveniles, thereby expanding demand for health and telehealth solutions in correctional facilities. 

The new administration should work with Congress to propose new legislation that mandates access to social benefits for those behind bars. Specifically, access to mental health assessment, screening, and treatment, as well as affordable access to communication with families and loved ones on the outside, will be critical to successful rehabilitation and reentry. Additionally, it should invest in robust research focusing on in-prison interventions. Such projects can be rare and more costly, given the complexity of doing research in a correctional environment and the dearth of in-prison interventions. But they will play a big part in establishing the basis for data-driven policies.

Recommendation 3. Remove procurement barriers for new solutions and encourage pilots.

Archaic procurement procedures pose significant barriers to competition in the correctional technology industry and block innovative solutions from being piloted. 

The prison telecommunications industry, for example, has been dominated by two private companies for decades. The effective duopoly has consolidated the market by entering into exclusive contracts with high percentages of kickback and so-called “premium services.” These procurement and contracting tactics minimize healthy competition from new entrants of the industry. 

Some states and federal agencies are trying to change this. In July 2024, the FCC ruled out revenue-sharing between correctional agencies and for-profit providers, ending the arms race of higher commission for good. On a state level, California’s RFI initiative exemplifies how strategic procurement processes can encourage public-private partnerships to deliver cutting-edge technology solutions to government agencies.

The administration should take a strong stance by issuing an executive order asking all Federal Bureau of Prisons facilities, including ICE detention centers, to competitively procure innovative technology solutions and establish pilots across its institutions, setting an example and a playbook for state corrections to follow. 

Recommendation 4. Invest in need assessments, topic-specific research and development of best practices through National Science Foundation and Bureau of Justice Assistance. 

Accurate needs assessments, topic-specific research, development of best practices, and technical assistance are all critical to smooth delivery and implementation. 

The Department of Justice, through the Bureau of Justice Assistance (BJA), offers a range of technical assistance (TA) programs that can support state and local correctional facilities in implementing these technology and educational initiatives. Programs such as the Community-based Reentry Program and the Encouraging Innovation: Field-Initiated Program have demonstrated success in providing the necessary resources and expertise to ensure these reforms are effectively integrated. 

However, these TA programs tend to disproportionately benefit correctional facilities where significant programs are already in place but are less useful for “first timers,” where taking that first step is hard enough.

The new administration should work with the National Science Foundation (NSF) and the BJA to systematically assess and understand challenges faced by correctional systems trying to take the first step of reform. Many first-timer agencies have deep understanding of the issues they experience (“program providers complain that tablets are not online”) but limited knowledge on how to assess the root causes of the issues (multiple proprietary wireless networks in place). 

The NSF can bring together subject matter experts to offer workshops to correctional workers on network assessments, program cataloging, and human-centered design on service delivery. These workshops can help grow capacity at correctional facilities. The NSF should also establish guidelines and standards for these assessments. In addition to the TA efforts, the BJA could offer informational sessions, seminars, and gatherings for practitioners, as many of them learn best from each other. In parallel to learning on the ground, the BJA should also organize centralized task forces to oversee and advise on implementation across jurisdictions, document best practice, and make recommendations. 

Conclusion

Investing in interventions behind the walls is not just a matter of improving conditions for incarcerated individuals—it is a public safety and economic imperative. By reducing recidivism through education and family contact, we can improve reentry outcomes and save billions in taxpayer dollars. A robust technology infrastructure and an innovative provider ecosystem are prerequisites to delivering outcomes. As 95% of incarcerated individuals will reenter society one day, it is vital to ensure that they can become contributing members of their communities. These investments will create a stronger workforce, more stable families, and safer communities. Now is the time for the new administration to act and ensure that the carceral system enables rehabilitation, not recidivism.

Reclaiming Privacy Rights: A Roadmap for Organizations Fighting Digital Surveillance

Surveillance has been used on civil rights activists, organizations, and protesters for decades by federal and local law enforcement. Some past victims of government spying include Martin Luther King Jr., Angela Davis, Jane Fonda, American Indian Movement, United Farm Workers, and the National Lawyers Guild. These activists and organizations were subjected to traditional surveillance tactics such as wiretapping and infiltration.

Today, surveillance looks different as technological advances have made it increasingly easy to track someone’s whereabouts, communications, and inner thoughts based on browser history, all without leaving an office. This level of digital surveillance has a chilling effect on people’s First Amendment rights, because a person may choose to censor themselves online or be reluctant to engage in political expression, such as attending a protest, due to their fear of being watched and retaliated against.

This report is the result of research that tries to answer the fundamental question: what can civil society do to fight back against the growing trend of widespread digital surveillance, particularly in the state of New York? New York is the focus of this research project because of the state’s widespread use of surveillance technology, particularly in New York City, and the strong activism within the state that works to improve the lives of marginalized communities.

Social justice organizations play an instrumental role in society through their organizing and fighting for civil rights. This report provides these organizations information on current surveillance practices and how these practices may impact the communities that they serve. The first section of the report provides a short roadmap on the recent history of digital surveillance in different contexts such as immigration, environmental justice, criminal legal system, housing, and the workplace. The next section will speak on pending and finalized legislation that could be helpful or harmful towards achieving the obliteration of surveillance. The third section will describe strategies organizations can take to help combat surveillance in their communities. Lastly, the report provides a list of legal organizations that are well versed in this arena and attuned to technological advances.

A Current History of Digital Surveillance

Before diving into action, it’s important to provide an overview of the types of  surveillance that many communities may be subjected to. This section will demonstrate how widespread surveillance is and provide background stories on the surveillance activist communities face within the immigration, environmental rights, criminal legal system, housing, and workplace context.

Immigration

There have been a growing number of surveillance tactics used against activists, migrants, journalists, and attorneys in the immigration space. In 2019, NBC 7 San Diego reported that federal agencies were keeping and sharing a secret database of an attorney, journalists, organizers, and “instigators” who had previously  worked at the U.S.-Mexico border.  The database contained photos of each person, obtained from the person’s passport or social media accounts.  It also included personal information such as the person’s work and travel history, names of their family members, and the kind of vehicle they drive. Some of these individuals reported that while traveling across the border, they were targeted for secondary screening. Border agents took their electronic devices and some individuals believed that the agents performed a warrantless search of their device, though they were unable to verify this. Journalists reported that these invasive actions affected their ability to protect their confidential sources. It’s easy to imagine that this unfounded suspicion and investigation could deter activists and journalists from continuing their work.

This isn’t the only incident of ICE keeping an eye on activists. In July 2021, The Intercept reported that U.S. Immigration and Customs Enforcement (ICE) had been surveilling activists and advocacy groups, such as Project South and Georgia Detention Watch, online and in person. This was done under the guise of safety and security as an ICE spokesperson stated “[l]ike all other law enforcement agencies, ICE follows planned protests to ensure the safety and security of its infrastructure, personnel, officers and all those involved.” Internal emails revealed that ICE officials were using Facebook to follow advocacy groups and ICE was tracking the attendees of their events.

Migrants have also been subjected to government surveillance. Over the last several years, ICE has increased its use of electronic monitoring as an alternative to holding migrants in detention centers. Since March 2024,183,935 people have been subjected to electronic monitoring by ICE, with 18,518 of those required to wear GPS ankle monitors. In 2018, ICE launched SmartLINK, an app that allows the agency to track a migrant’s whereabouts. Since April 2024, ICE has monitored over 700,000 people through the app. The agency requires migrants to do periodic check-ins using SmartLINK to confirm the user’s identity through voice recognition, geolocation, and facial recognition technology (FRT). The app has access to the user’s phone camera and has the ability to record audio. If a migrant complies with their check-ins for around 14 to 18 months, ICE may remove the person from the app to make room for new migrants who have just arrived in the country. Users of the app have expressed concern about the app’s location tracking, as it may put their undocumented family members at risk. Users have also stated that the app feels just as restrictive and invasive as an ankle monitor. Thirteen immigrant rights organizations found that electronic monitoring is not only harmful to the user’s livelihood but also hampers their personal relationships and their ability to organize in their community. 

Surveillance in the immigration space interferes with the ability of migrants to organize and affects journalistic reporting. It also has the tendency to make migrants afraid of being a part of a community or spending time with their undocumented family members because they are aware that they are being watched. This kind of surveillance puts everyone in their circle at risk.

Criminal Legal System

Surveillance has been used in the criminal legal system for decades, as police often use various spying tools to investigate suspects. However, whereas before police would use agents to track a suspect’s movements, today, law enforcement is able to track a suspect from their desk. Law enforcement has been able to use private companies to obtain a person’s personal data such as their cell phone records, location data, web browsing history, and more. This tracking is not limited to suspects, as law enforcement agencies have been reported to subject activists to this level of surveillance as well. 

In 2018, Memphis police were accused of spying on Black activists from 2016 until 2017. Memphis Police Department’s Office of Homeland Security (MPD) was accused of creating a Facebook profile to monitor activists in the area. There was one incident in which a community organizer posted a book on their page, and MPD collected the names of everyone who liked the post. With that list, they created a dossier of those individuals and called it “Blue Suede Shoes”. MPD is far from the only law enforcement agency that has collected a list of organizers, but it is unclear what happens with these lists after they’ve been created.

During the 2020 protests, the world experienced a new level of surveillance at the hands of local law enforcement and federal agencies. In 2021, it was reported that six federal agencies used FRT during the 2020 Black Lives Matter (BLM) protests across the United States. The agencies admitted that they did use this technology to identify individuals but they stated it was used to identify those who they suspected had violated laws. In one instance, police officers were able to arrest a protester after using FRT and receiving a match. NYPD has also been accused of using the technology to identify protesters after the event and charge them with crimes.

Environmental Justice

Surveillance has also been found in the environmental justice space, from both law enforcement and private companies. Shanai Matteson is an artist and climate activist based in Palisade, Minnesota. In 2021, Matteson spoke at the “Rally for the Rivers” event which was organized around protesting a pipeline construction. At the conclusion of the rally, 200 people left and went to the construction site to protest. At some point, the police arrested a number of protestors although Matteson was not one. However, five months later, law enforcement officials found livestream videos of the event, identified who was at the rally, and charged Matteson with a misdemeanor accusing her of conspiring trespass.

During the Dakota Access Pipeline protests, we saw a private company conducting mass surveillance on individuals, in an unprecedented way. In 2016, private security firm, TigerSwan was hired by Energy Transfer Partners to surveil Dakota Access Pipeline protesters. TigerSwan monitored protesters’ social media posts, utilized aerial surveillance, employed informants, and used radio eavesdropping to spy on activists. TigerSwan used this information to make lists of “persons of interests” and pressure law enforcement to be more aggressive against the protesters. The firm also shared their intel with local law enforcement agencies and provided evidence to prosecutors to help them build cases against the protesters. After learning that Lee County, Iowa increased bail for protesters, TigerSwan stated in one of their documents that they needed to work closer with other counties to make sure protesters would be fined or arrested in order to deter them. Because TigerSwan is a private company, it was able to conduct this level of mass surveillance on protesters without much government or judicial oversight.

Housing

One of the areas people may least expect surveillance is within their housing, however those in private and public housing may deal with this issue in the near future.  Between 2018 and 2019, residents of a Brooklyn apartment complex organized and resisted their landlord’s attempts to install facial recognition cameras within the building.  In retaliation, the landlord threatened the organizers with loitering fines and told them, wrongfully so, that handing out flyers to fellow residents was unlawful behavior. The apartment complex justified their actions by stating that this technology would provide safety and security for their residents.

Public housing facilities have also been accused of installing surveillance systems in their communities without the consent of residents. Some of these systems contain FRT or other forms of artificial intelligence. In Scott County, Virginia, cameras at a public housing facility have FRT that searches for people barred from the facility. In New Bedford, Massachusetts, a surveillance system searches through hours of recordings to locate movement near the doorways in order to identify residents who violate overnight guest rules. The footage has been used to punish and evict residents, who may have a difficult time securing housing in the future as a result of their eviction. While the cameras are only installed in public spaces within these facilities, they still violate people’s privacy rights as residents and their guests are tracked walking to and from their homes, a place that many people consider sacred. 

Workplace Surveillance

Workers have been subjected to increasing surveillance over the last few years and one of the most infamous infringers is Amazon. The company has been accused of deploying many tactics in order to stop union organizing such as monitoring employee message boards and private Facebook groups. Amazon has also been accused of posting a job for an intelligence analyst who would be in charge of monitoring labor organizing threats. 

Amazon has several resources within their facilities to monitor their employees such as employee ID badges which can be used to track an employee’s location and can allow the company to discover which of their employees are participating in organizing. Amazon facilities have surveillance cameras that are capable of allowing supervisors to track their workers and human monitors who walk around the facilities in order to keep an eye on the workers. Amazon has been accused of identifying union organizers and rotating them throughout the workplace, to prevent the organizers from having prolonged contact with the same employees. One source stated that workers were not allowed to socialize with each other as a manager would come and break them up. 

Whole Foods, which is owned by Amazon, has also been accused of using surveillance to track union organizing. It was reported in 2020 that Whole Foods was using a heat map to track stores that could be at risk of unionization based on the distance from the store to the closest union, diversity within the store, team member sentiment, and additional factors.

Digital Surveillance: Where we are now

There have been a few promising federal and state bills introduced in the last few years that would provide vast protections for activists and journalists. On the other hand, there are also recent bills that have been passed that would increase government surveillance and cause more harm to these communities. This section provides a brief overview on where things currently stand. 

Federal Legislation

In April 2021, U.S. Senators Ron Wyden (D-OR), Rand Paul (R-KY), and 18 additional senators introduced the Fourth Amendment is Not For Sale Act. For years, data brokers have been able to sell people’s personal information, such as their location data, to law enforcement and intelligence agencies without judicial oversight. Federal law fails to protect people’s data from being sold in this matter, so this bill would work to close this legal loophole and require the government to obtain a court order in order to buy a person’s data. This bill would prohibit law enforcement agencies from purchasing a person’s information from a third party, prohibit government agencies from sharing a person’s records with law enforcement and intelligence agencies, and require the agencies to obtain a court order before obtaining someone’s records. This bill was passed in the House and received by the Senate in April 2024 with little movement since then.

Another promising bill is the Protect Reporters from Exploitive State Spying (PRESS) Act, which was introduced in June 2023 by U.S. Senators Ron Wyden (D-OR), Mike Lee (R-UT), and Richard Durbin (D-IL). Law enforcement agencies have been secretly obtaining subpoenas for reporters’ emails and phone records in order to determine their confidential sources. The bill would protect a reporter’s data that is held by a third party from being secretly obtained from the government without having an opportunity to challenge the subpoena. As of now, this bill has passed the House and has been received in the Senate and referred to the Committee on the Judiciary.

On the opposite end of the spectrum, there has been legislation passed that expands surveillance such as the National Security Supplemental Appropriations Act bill, which was introduced in February 2024 and passed in April 2024. The bill provides $204 million to the FBI for DNA collection at the border. $170 million goes towards autonomous surveillance towers, mobile video surveillance systems, and drones at the border.

Digital Surveillance in New York State

Turning to New York specifically, there has been some positive movement towards obtaining information on the prevalence of government surveillance and curtailing the recent overreach as well. Recently, the NYPD was ordered by the New York Supreme Court to disclose 2,700 documents and emails related to its surveillance of the 2020 BLM protests between March and September 2020. This information can provide some clarity into the mystery around what surveillance tools were used during this time period, since much of the information known about this time period has come from FOIA requests instead of the NYPD voluntarily disclosing their surveillance practices.

In 2020, the Public Oversight of Surveillance Technology (POST) Act passed. This act required the NYPD to disclose the surveillance tools it uses and publish the impact of those technologies. NYPD is required to publish reports on these surveillance tools, informing the public about how it plans to use these tools and the potential impacts on New Yorkers’ civil liberties and rights. The Brennan Center has written about the shortcomings of the law, largely due to the NYPD failing to adhere to the provisions. In February 2024, a bill adding provisions to the POST Act was introduced to the New York City Council. The provisions would require NYPD to provide the Department of Investigation a list of all surveillance technologies currently in use and provide their retention policies for the information they collected from the technologies. This bill was referred to the Committee on Public Safety in February 2024.

How to Take Action Against Surveillance

There are numerous ways organizations can take action in order to combat the use of mass surveillance in their communities. This section will provide  a few examples of actions that organizations can undertake in protecting their community right now, such as legislative action, forming working groups, sharing protest safety procedures, conducting Freedom of Information Act (FOIA) requests, and spreading the word. 

Legislation

As demonstrated above, legislation can provide a promising avenue towards ending the overreach of widespread government surveillance of vulnerable communities. It’s important for organizations to have journalists who are willing to report on the issues their community may be facing, such as in the immigration space. The PRESS Act can help journalists who travel to the U.S.-Mexico border to report on issues affecting migrants and humanitarian organizations. Unfortunately, these journalists have been subjected to intimidation tactics while working on their stories which may prevent them from continuing their work. The PRESS Act would prevent government agencies from secretly obtaining subpoenas for reporters’ sources, but there is additional legislation needed to prevent law enforcement agencies from targeting journalists, activists, and attorneys who are providing assistance to migrants. Law enforcement should be prevented from performing warrantless searches, interrogating these individuals about their work without just cause, and creating dossiers of these individuals with illegitimately obtained personal information.

Legislation would also immensely benefit future protesters exercising their rights to free speech and assembly, and could have prevented many harms that occurred during the BLM protests. Since those protests, a few states and around 18 cities, such as Boston and Portland, have passed legislation banning government agencies from using FRT or layed out restrictions on how the technology can be used. But years later, some of these governments would roll back this legislation and allow law enforcement to utilize the technology to investigate crimes, such as New Orleans and Virginia which initially banned local police from using the tool. Vermont, a state that previously had a near complete ban on police use of FRT, passed legislation that would allow the police to use it for investigations in certain instances. Pushes can be made in New York and elsewhere to persuade legislators to care about privacy concerns as much as they care about crime. 

Legislation can also be pushed to prevent government agencies from surveilling residents in public housing while they are at their homes. Additionally, legislation can prevent law enforcement agencies from making dossiers of individuals based on the content the person follows or likes on social media. There is a lot of room for growth in this arena since the law has failed to keep up with technological advances. Advocacy organizations can propose or draft bill text with other organizations, meet with legislators, or sign onto letters in support or opposition of pending bills related to digital surveillance and data privacy rights.

Form a local working group to review proposed technology

In 2020, Syracuse mayor Ben Walsh formed the Syracuse Surveillance Technology Working Group, which provides residents an opportunity to comment on proposed uses of surveillance technology by city departments. The group is composed of 12-15 individuals from different community groups in Syracuse, as well as some City of Syracuse employees that are selected by the mayor.

When a city department is interested in utilizing a technology, they submit the request to the working group for review. The group advertises to the public through social media and local news channels to get widespread input. The group obtains comments from the public about their opinion and concerns about the technology and the group conducts their own research as well. The group then produces a report for the city with recommendations and explains how the technology may affect the Syracuse community. The mayor then approves or disapproves of the technology based on the report. Thus far, the group has reviewed automated license plate readers, body-worn cameras, street cameras, and more.

This working group provides the public an opportunity to conduct their own research on the proposed technologies and voice their opinions in a public forum. With many local government agencies wanting to explore the use of technologies like facial recognition, this could give activists a chance to have their opinions heard on these issues before they are implemented. This working group concept could be incorporated in other cities and provide some oversight and input into surveillance technologies that local agencies are utilizing on their residents.

Share protest safety procedures  

There are a few measures organizations can recommend to help individuals protect their privacy while they are at a protest. The Surveillance Technology Oversight Project has done a wonderful job creating a safety guide for protesters who wish to protect their digital privacy while organizing. The guide provides information on protecting location data, DNA, and cell phone data. Some of the tips include turning one’s cell phone on airplane mode so that location cannot be tracked, considering what information one posts and shares on social media since it can be observed, and consider what transportation one takes to the protest as vehicles could be tracked via automated license plate readers. This information could be shared by organizations within their communities to ensure activists are doing what they can to protect their information as well as their fellow co-activists. Following these recommendations could prevent activists from being unjustly targeted by law enforcement, such as in the case of Shanai Matteson, the climate activist and artist in Minnesota referenced earlier.

FOIA requests

Another avenue organizations may want to explore is FOIA requests, which can help an organization and the public understand what kind of surveillance their community is being subjected to. There is a cloud of secrecy surrounding which tools government agencies use to surveil people, largely because agencies refuse to share this information with the public without legal force. As stated above, the NYPD was recently ordered to turn over records that would reveal how they used FRT against BLM protesters. It is essential to have this kind of information as it will help organizations discover how law enforcement utilized this tool and help organizations fight against future use. Almost all of the stories featured above were derived from an organization submitting a FOIA request and obtaining internal documents that revealed how communities were being harmed by a government agency. 

Sometimes, a party may refuse to comply with a FOIA request and the situation will escalate to legal action. As an example, in 2024, Just Futures Law, Mijente Support Group, and the Samuelson Clinic filed a lawsuit to force ICE to comply with a 2021 FOIA request that ICE failed to respond to. Because these situations can turn contentious, it’s important to have legal support when pursuing a FOIA, which can come from an attorney, a law firm, or law school clinic.

Spread the word

In order to combat these issues, people have to be informed about the mass surveillance that they are subjected to on a daily basis. Many people have expressed the sentiment “If you’re not doing anything wrong, you have nothing to hide”; however, they may not be fully versed on the implications of surveillance on vulnerable communities who have done nothing to warrant this invasion into their privacy. Some ways to spread the word can include holding public meetings on various surveillance topics with speakers, organizing against local surveillance tactics and publicizing the action, speaking with community members to see if they’ve noticed any surveillance tactics in their neighborhood, and working with other social justice, tech, or legal organizations. As stated above, a legal organization or clinic can help social justice organizations litigate FOIA requests that are not complied with as well as provide assistance with other kinds of litigation as needed. Social justice organizations can also work with think tank organizations to produce reports on civil rights violations and inform the public of rising issues. After the report is released, organizations can sign onto a letter calling on the government to stop an action or support an action. 

Conclusion

There is much work to be done in the digital privacy space as the law has failed to keep up with the advancement of technology and rising surveillance concerns. However, everyone is capable of becoming well-versed in these issues, pushing for change on the state and federal level, and spreading the word throughout their communities. Privacy can be nearly impossible to achieve on an individual level, but together we can fight against efforts that degrade, dehumanize, and obstruct freedoms in our society.


Appendix 1. Legal Organizations to Know

Legal organizations in the privacy space are well versed on surveillance issues and could help social justice organizations know where to turn when individuals they serve come under surveillance. The following organizations are prominent in the privacy rights space and are performing groundbreaking work to combat government overreach.

ACLU

The ACLU has a Project on Speech, Privacy, and Technology department that focuses on the right to privacy, ensuring individuals have control of their personal information, and protecting individual’s civil liberties as new advances are made in science and technology. The project focuses on consumer privacy, internet privacy, location tracking, medical and genetic privacy, national ID, privacy at borders and checkpoints, surveillance technologies, and workplace privacy.

S.T.O.P

The Surveillance Technology Oversight Project fights government surveillance through advocacy and litigation and hopes to transform New York into a pro privacy state. S.T.O.P. organized over 100 organizations to get the POST Act approved in the state, has sued city and state agencies for records pertaining to a variety of issues such as NYPD’s use of FRT, and also publishes research papers on different surveillance technologies. 

Brennan Center for Justice

The Brennan Center for Justice is a nonpartisan law and policy organization that works to defend democracy and justice. One of their initiatives is privacy and free expression. Through that project, the Brennan Center works to challenge mass surveillance policies that are overreaching and works to inform the public about these issues. The Brennan Center has been a leader in challenging the structure of the Foreign Intelligence Surveillance Court and the fact that the court only hears from the government when government agencies seek to obtain people’s data. 

Center on Privacy and Technology

The Center on Privacy and Technology is a think tank focused on privacy and surveillance law and policy. The Center fights back against surveillance by conducting long-term investigations, research, and publishing reports with their findings. Their most recent report, Raiding the Genome: How the United States is abusing its immigration powers to amass DNA for Future Policing, discusses migrants having their privacy rights invaded by the Department of Homeland Security (DHS). DHS  is taking DNA samples from detainees, which is later stored in the FBI’s database, CODIS. 

Just Futures Law

Just Futures Law works alongside activists, organizers, and community groups to dismantle mass surveillance, incarceration, and deportation via advocacy and legal support. They have worked on ending ICE digital prisons, stopping data brokers from selling people’s data to ICE which could lead to deportation, ending the digital border wall, protecting driver data from being turned over to ICE, amongst many other projects.