Behavioral Economics Megastudies are Necessary to Make America Healthy

Through partnership with the Doris Duke Foundation, FAS is advancing a vision for healthcare innovation that centers safety, equity, and effectiveness in artificial intelligence. Inspired by work from the Social Science Research Council (SSRC) and Arizona State University (ASU) symposiums, this memo explores new research models such as large-scale behavioral “megastudies” and how they can transform our understanding of what drives healthier choices for longer lives. Through policy entrepreneurship FAS engages with key actors in government, research, academia and industry. These recommendations align with ongoing efforts to integrate human-centered design, data interoperability, and evidence-based decision-making into health innovation.

By shifting funding from small underpowered randomized control trials to large field experiments in which many different treatments are tested synchronously in a large population using the same objective measure of success, so-called megastudies can start to drive people toward healthier lifestyles. Megastudies will allow us to more quickly determine what works, in whom, and when for health-related behavioral interventions, saving tremendous dollars over traditional randomized controlled trial (RCT) approaches because of the scalability. But doing so requires the government to back the establishment of a research platform that sits on top of a large, diverse cohort of people with deep demographic data. 

Challenge and Opportunity

According to the National Research Council, almost half of premature deaths (< 86 years of age) are caused by behavioral factors. Poor diet, high blood pressure, sedentary lifestyle, obesity, and tobacco use are the primary causes of early death for most of these people. Yet, despite studying these factors for decades, we know surprisingly little about what can be done to turn these unhealthy behaviors into healthier ones. This has not been due to a lack of effort. Thousands of randomized controlled trials intended to uncover messaging and incentives that can be used to steer people towards healthier behaviors have failed to yield impactful steps that can be broadly deployed to drive behavioral change across our diverse population. For sure, changing human behavior through such mechanisms is controversial, and difficult. Nonetheless studying how to bend behavior should be a national imperative if we are to extend healthspan and address the declining lifespan of Americans at scale.

Limitations of RCTs

Traditional randomized controlled trials (RCTs), which usually test a single intervention, are often underpowered, and expensive, and short-lived, limiting their utility even though RCTs remain the gold standard for determining the validity of behavioral economics studies. In addition, because the diversity of our population in terms of biology, and culture are severely limiting factors for study design, RCTs are often conducted on narrow, well-defined populations. What works for a 24-year-old female African American attorney in Los Angeles may not be effective for a 68-year-old male white fisherman living in Mississippi. Overcoming such noise in the system means either limiting the population you are examining through demographics, or deploying raw power of numbers of study participants that can allow post study stratification and hypothesis development. It also means that health data alone is not enough. Such studies require deep personal demographic data to be combined with health data, and wearables. In essence, we need a very clear picture of the lives of participants to properly identify interventions that work and apply them appropriately post-study on broader populations. Similarly, testing a single intervention means that you cannot be sure that it is the most cost-effective or impactful intervention for a desired outcome. This further limits the ability to deploy RCTs at scale. Finally, the data sometimes implies spurious associations. Therefore, preregistration of endpoints, interventions, and analysis of such studies will make for solid evidence development even if the most tantalizing outcomes come from sifting through the data later to develop new hypotheses that can be further tested. 

Value of Megastudies

Professors Angela Duckworth and Katherine Milkman, at the University of Pennsylvania, have proposed an expansion of the use of megastudies to gain deeper behavioral insights from larger populations. In essence, megastudies are “massive field experiments in which many different treatments are tested synchronously in a large sample using a common objective outcome.” This sort of paradigm allows independent researchers to develop interventions to test in parallel against other teams. Participants are randomly assigned across a large cohort to determine the most impactful and cost-effective interventions. In essence, the teams are competing against each other to develop the most effective and practical interventions on the same population for the same measurable outcome. 

Using this paradigm, we can rapidly assess interventions, accelerate scientific progress by saving time and money, all while making more appropriate comparisons to bend behavior towards healthier lifestyles. Due to the large sample sizes involved and deep knowledge of the demographics of participants, megastudies allow for the noise that is normal in a broad population that normally necessitates narrowing the demographics of participants. Further, post study analysis allows for rich hypothesis generation on what interventions are likely to work in more narrow populations. This enables tailored messaging and incentives to the individual. A centralized entity managing the population data reduces costs and makes it easier to try a more diverse set of risk-tolerant interventions. A centralized entity also opens the door to smaller labs to participate in studies. Finally, the participants in these megastudies are normally part of ongoing health interactions through a large cohort study or directly through care providers. Thus, they benefit directly from participation and tailored messages and incentives. Additionally, dataset scale allows for longer term study design because of the reduction in overall costs. This enables study designers to determine if their interventions work well over a longer period of time or if the impact of interventions wane and need to be adjusted.

Funding and Operational Challenges

But this kind of “apples to apples” comparison has serious drawbacks that have prevented megastudies from being used routinely in science despite their inherent advantage. First, megastudies require access to a large standing cohort of study participants that will remain in the cohort long term. Ideally, the organizer of such studies should be vested in having positive outcomes. Here, larger insurance companies are poor targets for organizing. Similarly, they have to be efficient, thus, government run cohorts, which tend to be highly bureaucratic, expensive, and inefficient are not ideal. Not everything need go through a committee. (Looking at you, All of Us at NIH and Million Veterans Program at the VA). 

Companies like third party administrators of healthcare plans might be an ideal organizing body, but so can companies that aim to lower healthcare costs as a means of generating revenue through cost savings. These companies tend to have access to much deeper data than traditional cohorts run by government and academic institutions and could leverage that data for better stratifying participants and results. However, if the goal of government and philanthropic research efforts is to improve outcomes, then they should open the aperture on available funds to stand up a persistent cohort that can be used by many researchers rather than continuing the one-off paradigm, which in the end is far more expensive and inefficient. Finally, we do not imply that all intervention types should be run through megastudies. They are an essential, albeit underutilized tool in the arsenal, but not a silver bullet for testing behavioral interventions.

Fear of Unauthorized Data Access or Misuse 

There is substantial risk when bringing together such deep personal data on a large population of people. While companies compile deep data all the time, it is unusual to do so for research purposes and will, for sure, raise some eyebrows, as has been the case for large studies like the aforementioned All of Us and the Million Veteran’s Program. 

Patients fear misuse of their data, inaccurate recommendations, and biased algorithms—especially among historically marginalized populations. Patients must trust that their data is being used for good, not for marketing purposes and determining their insurance rates. 

Icons © 2024 by Jae Deasigner is licensed under CC BY 4.0

Need for Data Interoperability

Many healthcare and community systems operate in data silos and data integration is a perennial challenge in healthcare. Patient-generated data from wearables, apps, or remote sensors often do not integrate with electronic health record data or demographic data gathered from elsewhere, limiting the precision and personalization of behavior-change interventions. This lack of interoperability undermines both provider engagement and user benefit. Data fragmentation and poor usability requires designing cloud-based data connectors and integration, creating shared feedback dashboards linking self-generated data to provider workflows, and creating and promoting policies that move towards interoperability. In short, given the constantly evolving data integration challenge and lack of real standards for data formats and integration requirements, a dedicated and persistent effort will have to be made to ensure that data can be seamlessly integrated if we are to draw value from combining data from many sources for each patient.

Additional Barriers

One of the largest barriers to using behavioral economics is that some rural, tribal, low-income, and older adults face access barriers. These could include device affordability, broadband coverage, and other usability and digital literacy limitations. Megastudies are not generally designed to bridge this gap leaving a significant limitation of applicability for these populations. Complicating matters, these populations also happen to have significant and specific health challenges unique to their cohorts. As the use of behavioral economic levers are developed, these communities are in danger of being left behind, further exacerbating health disparities. Nonetheless, insight into how to reach these populations can be gained for individuals in these populations that do have access to technology platforms. Communications will have to be tailored accordingly. 

External motivators have been consistently shown to be the essential drivers of behavioral change. But motivation to sustain a behavior change and continue using technology often wanes over time. Embedding intrinsic-value rewards and workplace incentives may not be enough. Therefore, external motivations likely have to be adjusted over time in a dynamic system to ensure that adjustments to the behavior of the individual can be rooted in evidence. Indeed, study of the dynamic nature of driving behavioral change will be necessary due to the likelihood of waning influence of static messaging. By designing reward systems that tie personal values and workplace wellness programs sustained engagement through social incentives and tailored nudges may keep users engaged.

Plan of Action 

By enabling a private sector entity to create a research platform using patient data combined with deep demographic data, and an ethical framework for access and use, we can create a platform for megastudies. This would  allow the rapid testing of behavioral interventions that steer people towards healthier lifestyles, saving money, accelerating progress, and better understanding what works, in whom, and when for changing human behavior. 

This could have been done through either the All of Us program or Million Veterans program or a different large cohort study, but neither program has the deep demographic and lifestyle data required to stratify their population. Both are mired in bureaucratic lethargy that is common in large scale government programs. Health insurance companies and third-party administrators of health insurance can gather such data, be nimbler, create a platform for communicating directly with patients, coordinate with their clinical providers. But one could argue that neither entity has a real incentive to bend behavior and encourage healthy lifestyles. Simply put, that is not their business.

Recommendation 1. Issue a directive to agencies to invest in the development of a megastudy platform for health behavioral economics studies.

The White House of HHS Secretary should direct the NIH or ARPA-H to develop a plan for funding the creation of a behavioral economics megastudy platform. The directive should include details on the ethical and technical framework requirements as well as directions for development of oversight of the platform once it is created. The platform should be required to establish a sustainability plan as part of the application for a contract to create the megastudy platform. 

Recommendation 2. Government should fund the establishment of a megastudy platform.

ARPA-H and/or DARPA should develop a program to establish a broad research platform in the private sector that will allow for megastudies to be conducted. Then research teams can, in parallel, test dozens of behavioral interventions on populations and access patient data. This platform should have required ethical rules and be grounded in data sovereignty that allows patients to opt out of participation and having their data shared.

Data sovereignty is one solution to the trust challenge. Simply put, data sovereignty means that patients have access to the data on themselves (without having to pay a fee that physicians’ offices now routinely charge for access) and control over who sees and keeps that data. So, if at any time, a participant changes their mind, they can get their data and force anyone in possession of that data to delete it (with notable exceptions, like their healthcare providers). Patients would have ultimate control of their data in a ‘trust-less’ way that they never need to surrender, going well past the rather weak privacy provisions of HIPAA, so there is no question that they are in charge.

We suggest that using blockchain and token systems for data transfer would certainly be appropriate. Data held in a federated network to limit the danger of a breach would also be appropriate. 

Recommendation 3. The NIH should fund behavioral economics megastudies using the platform. 

Once the megastudy platform(s) are established, the NIH should make dedicated funds available for researchers to test for behavioral interventions using the platform to decrease costs, increase study longevity, and improve speed and efficiency for behavioral economics studies on behavioral health interventions. 

Conclusion

Randomized controlled trials have been the gold standard for behavioral research but are not well suited for health behavioral interventions on a broad and diverse population because of the required number of participants, typical narrow population, recruiting challenges, and cost. Yet, there is an urgent need to encourage and incentivize d health related behaviors to make Americans healthier. Simply put, we cannot start to grow healthspan and lifespan unless we change behaviors towards healthier choices and habits. When the U.S. government funds the establishment of a platform for testing hundreds of behavioral interventions on a large diverse population, we will start to better understand the interventions that will have an efficient and lasting impact on health behavior. Doing so requires private sector cooperation and strict ethical rules to ensure public trust.

This memo produced as part of Strengthening Pathways to Disease Prevention and Improved Health Outcomes.

Making Healthcare AI Human-Centered through the Requirement of Clinician Input

Through partnership with the Doris Duke Foundation, FAS is advancing a vision for healthcare innovation that centers safety, equity, and effectiveness in artificial intelligence. Informed by the NYU Langone Health symposium on transforming health systems into learning health systems, FAS seeks to ensure that AI tools are developed, deployed, and evaluated in ways that reflect real-world clinical practice. FAS is leveraging its role in policy entrepreneurship to promote responsible innovation by engaging with key actors in government, research, and software development. These recommendations align with emerging efforts across health systems to integrate human-centered AI and evidence-based decision-making into digital transformation. By shaping AI grant requirements and post-market evaluation standards, these ideas aim to accelerate safe, equitable implementation while supporting ongoing learning and improvement.   

The United States must ensure AI improves healthcare while safeguarding patient safety and clinical expertise. There are three priority needs:

This memo examines the challenges and opportunities related to integrating AI tools into healthcare. It emphasizes how human-centered design must ensure these technologies are tailored to real-world clinical environments. As AI adoption grows in healthcare, it is essential that clinician feedback is embedded into the federal grant requirements for AI development to ensure these systems are effective and aligned with real-world needs. Embedding clinician feedback into grant requirements for healthcare AI development and ensuring the use of representative data will assist with promoting safety, accuracy, and equity in healthcare tools. In addition, regular updates to these tools based on evolving clinical practices and patient populations must be part of the development lifecycle to maintain long-term reliability. Continuous post-market surveillance is necessary to ensure these tools remain both accurate and equitable. By taking these steps, healthcare systems can harness the full potential of AI while safeguarding patient safety and clinician expertise. Federal agencies such as the Office of the National Coordinator for Health Information Technology (ONC), the Food and Drug Administration (FDA) can incentivize clinician involvement through outcomes-based contracting approaches that link funding to measurable improvements in patient care. This strategy ensures that grant recipients embed clinician expertise at key stages of development and testing, ultimately aligning incentives with real-world health outcomes.

Challenge and Opportunity

The use of AI tools such as predictive triage classifiers and large language models (LLMs) have the potential to improve care delivery. However, there are significant challenges in integrating these tools effectively into daily clinical workflows without meaningful clinician involvement. As just one  example, AI tools used in chronic illness triage can be particularly useful in helping to prioritize patients based on the severity of their condition, which can lead to timely care delivery. However, without direct involvement from clinicians in validating, interpreting, and guiding AI recommendations, these tools can suffer from poor usability and limited real-world effectiveness. Even highly accurate tools can become irrelevant if they are not adopted and clinicians do not engage with them, thereby reducing the positive impact they can have on patient outcomes.

Mysterious Inner Workings

The mysterious box of AI has fueled skepticism among healthcare providers and undermined trust among patients. Moreover, when AI systems lack clear and interpretable explanations, clinicians are more likely to avoid or distrust them. This response is attributed to what’s known as algorithm aversion. Algorithm aversion occurs when clinicians lose trust in a tool after seeing it make errors, making future use less likely, even if the tool is usually accurate. Designing AI with human-centered principles, particularly offering clinicians a role where they can validate, interpret, and guide AI recommendations, will help build trust and ensure decisions remain grounded in clinical expertise. A key approach to increasing trust and usability would be institutionalizing clinician engagement in the early stages of the development process. By involving clinicians during the development and testing phases, AI developers can ensure the tools fit seamlessly into clinical workflows. This will also help to mitigate concerns about the tool’s real-world effectiveness, as clinicians will be more likely to adopt tools they feel confident in. Without this collaborative approach, AI tools risk being sidelined or misused, preventing health systems from becoming genuinely adaptive and learning oriented. 

Lack of Interoperability

A significant challenge in deploying AI tools across healthcare systems is the issue of interoperability. Most patients receive care across multiple providers and healthcare settings, making it essential for AI tools to be able to seamlessly integrate with electronic health records (EHR) and other clinical systems. Not having this integration could lead to tools losing their clinical relevance, effectiveness, and ability to be adopted on a larger scale. This lack of connectivity can lead to inefficiencies, duplicate testing, and other harmful errors. One way to address this is through a contracting process called Outcomes-based contracting (OBC), discussed shortly.

Trust in AI and Skill Erosion

Beyond trust and usability, there are broader risks associated with sidelining clinicians during AI integration. The use of AI tools without clinician input also presents the risk of clinician deskilling. Deskilling refers to the occurrence where clinicians’ skills and decision-making abilities erode over time due to their reliance on AI tools. This skill erosion  leads to a decline in the judgement in situations where AI may not be readily available or suitable. Recent evidence from the ACCEPT trial shows that endoscopists’ performance dropped in non-AI settings after months of AI-assisted procedures. This presents a troubling phenomenon that we should aimt to prevent. AI-induced skill erosion also raises ethical concerns, particularly in complex environments where over-reliance on AI could erode clinical judgement and autonomy. If clinicians become too dependent on automated outputs, their ability to make critical decisions may be compromised, potentially impacting patient safety.

Embedded Biases

In addition to the erosion of human skills, AI systems also risk embedding biases if trained on unrepresentative data, leading to unfair or inaccurate outcomes across different patient groups. AI tools may present errors that appear plausible, such as generating nonexistent terms, which pose serious safety concerns, especially when clinicians don’t catch those mistakes. A systematic review of AI tools found that 22% of studies involved clinicians throughout the development phase. This lack of early clinician involvement has contributed to usability and integration issues across AI healthcare tools.

All of these issues underscore how critical clinician involvement is in the development of AI tools to ensure they are usable, effective, and safe. Clinician  involvement should include defining relevant clinical tasks, evaluating interpretability of the system, validating performance across diverse patient groups, and setting standards for handoff between AI and clinician decision-making. Therefore, funding agencies should require AI developers to incorporate representative data and meaningful clinician involvement in order to mitigate these risks. Recognizing these challenges, it’s crucial to understand that implementing and maintaining AI requires continual human oversight and substantial infrastructure. Many health systems find this infrastructure too resource-intensive to properly sustain. Given the complexity of these challenges, without adequate governance, transparency, clinician training, and ethical safeguards, AI may hinder rather than help the transition to an enhanced learning health system.

Outcome-based Models (OBM)

To ensure that AI tools deliver properly, the federal contracting process should reinforce clinical involvement through measurable incentives. Outcomes-based contracting (OBC), a model where payments or grants are tied to demonstrated improvements in patient outcomes, can be a powerful tool. This model is not only a financing mechanism, but serves as a lever to institutionalize clinician engagement. By tying funding to real-world clinical impact, this compels developers to design tools that clinicians will use and find value in, ultimately increasing usability, trust, and adoption. This model provides a clear reward for impact rather than just for building tools or producing novel methods.

Leveraging outcomes-based models could also help institutionalize clinician engagement in the funding lifecycle. This would ensure developers demonstrate explicit plans for clinician participation through staff integration or formal consultation as a prerequisite for funding. Although AI tools may be safe and effective at the initial onset of their use, performance can change over time due to various patient populations, changes in clinical practice, and updates to software. This is known as model degradation. Therefore, a crucial component of using these AI tools needs to be regular surveillance to ensure the tools remain accurate, responsive to real-world use with clinicians and patients, and equitable. However, while clinician involvement is essential, it is important to acknowledge that including clinicians in all stages of the AI tool development, testing, deployment, and evaluation may not be realistic given the significant time cost for clinicians, their competing clinical responsibilities, and their limited familiarity with AI technology. Despite these factors, there are ways to engage clinicians effectively at key decision points during the AI development and testing process without requiring their presence at every stage.

Urgency and Federal Momentum

Major challenges associated with integrating AI into clinical workflows due to poor usability, algorithm aversion, clinician skepticism, and the potential for embedded biases in these tools highlight a need for thoughtful deployment of these tools. These challenges have presented a sense of urgency in light of recent healthcare shifts, particularly with the rapid acceleration of AI adoption after the COVID-19 pandemic. This drove breakthroughs in the areas of telemedicine, diagnostics, and pharmaceutical innovation that simply weren’t possible before. However, with the rapid pace of integration also comes the risk of unregulated deployment, potentially embedding safety vulnerabilities. Federal momentum supports this growth, with directives placing emphasis on AI safety, transparency, and responsible deployment, including the authorization of over 1,200 AI powered medical devices, primarily used in radiology, cardiology, and pathology, which tend to be areas that are complex in nature. However, without clinician involvement and the use of representative data for training, algorithms for devices such as the ones mentioned may remain biased and fail to integrate smoothly into care delivery. This disconnect could delay adoption, reduce clinical impact, and increase the risk of patient harm. Therefore, it’s imperative we set standards, embed clinician expertise in AI design, and ensure safe, effective deployment for the specific use of care delivery. 

Furthermore, this moment of federal momentum aligns with broader policy shifts. As highlighted by a recent CMS announcement, the White House and national health agencies are working with technology leaders to create a patient-centric healthcare ecosystem. This includes a push for interoperability, clinical collaboration, and outcomes-driven innovation, all of which bolster the case for clinician engagement being woven into the very fabric of AI development. AI can potentially improve patient outcomes dramatically, as well as increase cost-efficiency in healthcare. Yet, without structured safeguards, these tools may deepen existing health inequities. However,  with proper input from clinicians, these tools can reduce diagnostic errors, improve accuracy in high-stakes cases such as cancer detection, and streamline workflows, ultimately saving lives and reducing unnecessary costs. 

As AI systems become further embedded into clinical practice, they will help to shape standards of care, influencing clinical guidelines and decision-making pathways. Furthermore, interoperability is essential when using these tools because most patients receive care from multiple providers across systems. Therefore, AI tools must be designed to communicate and integrate data from various sources, including electronic health records (EHR), lab databases, imaging systems, and more. Enabling shared access can enhance the coordination of care and reduce redundant testing or conflicting diagnoses. To ensure this functionality, clinicians must help design AI tools that account for real-world care delivery across what is currently a fragmented system.

Reshaping Healthcare AI 

These challenges and risks culminate in a moment of opportunity where we can reshape and revolutionize the way AI supports healthcare delivery to ensure that its design is trustworthy and focused on outcomes. To fully realize this opportunity, clinicians must be embedded into various stages of AI development technology to  improve its safety, usability, and adoption in healthcare settings. While some developers do involve clinicians during development, this practice is not the standard. Bridging this gap requires targeted action to ensure clinical expertise is consistently incorporated from the start. One way to achieve this is through federal agencies requiring AI developers to integrate representative data and clinician feedback into their AI tools as a condition of funding eligibility. This approach would improve the usability of the tool and enhance its contextual relevance to diverse patient populations and practice environments. Further, it would address current shortcomings as evidence has shown that some AI tools are poorly integrated into clinical workflows, which not only reduces their impact, but also undermines broader adoption and clinician confidence in the systems. Moreover, creating a clinician feedback loop for these systems will reduce the clerical burden that many clinicians experience and allow them to spend more dedicated time with their patients. Through the incorporation of human-centered design, we can mitigate issues that would normally arise by using clinician expertise during the development and testing process. This approach would build trust amongst clinicians and improve patient safety, which is most important when aiming to reduce errors and misinterpretations of diagnoses. With strong requirements and funding standards in place as safeguards, AI can transform health systems into adaptable learning environments that produce evidence and deliver equitable and higher quality care. This is a pivotal opportunity to showcase how innovation can support human expertise and strengthen trust in healthcare. 

AI has the potential to dramatically improve patient outcomes and healthcare cost-efficiency, particularly in high-stakes diagnostic and treatment decisions like oncology, and critical care. In these areas, AI can analyze imaging, lab, and genomic data to uncover patterns that may not be immediately apparent to clinicians. For example, AI tools have shown promise in improving diagnostic accuracy in cancer detection and reducing the time clinicians spend on tasks like charting, allowing for more face-to-face time with patients. 

However, these tools must be designed with clinician input at key stages, especially for higher-risk conditions, or tools may be prone to errors or fail to integrate into clinical workflows. By embedding outcome-based contracting (OBC) into federal funding and aligning financial incentives with clinical effectiveness, we are encouraging the development and use of AI tools that have the ability to improve patient outcomes and strengthen the healthcare system’s shift toward value-based care. This supports a broader shift toward value-based care where outcomes, not just outputs, define success. 

The connection between OBC and clinician involvement is straightforward. When clinicians are involved in the design and testing of AI tools, these tools are more likely to be effective in real-world settings, thereby improving outcomes and justifying the financial incentives tied to OBC. AI tools can provide significant value for healthcare use in high-stakes, diagnostic and treatment decisions (oncology, cardiology, and critical care) where errors have large consequences on patient outcomes. In those settings, AI can assist by analyzing imaging, lab, and genomic data to uncover patterns that may not be immediately apparent to clinicians. However, these tools should not function autonomously, and input from clinicians is critical to validate AI outputs, specifically for issues where mortality or morbidity is high. In contrast, for lower-risk or routine care of common colds or minor dermatologic conditions, AI may be useful as a time-saving tool that does not require the same depth of clinician oversight. 

Plan of Action

These actionable recommendations aim to help federal agencies and health systems embed clinician involvement, representative data, and continuous oversight into the lifecycle of healthcare AI.

Recommendation 1. Federal Agencies Should Require Clinician Involvement in the Development and Testing of AI Tools used in Clinical Settings.

Federal agencies should require clinician involvement  in all aspects of the development and testing of AI healthcare tools. This mechanism could be enforced through a combination of agency guidance and tying funding eligibility to specific roles and checkpoints for clinicians. Specifically, agencies like the Office of the National Coordinator for Health Information Technology (ONC), the Food and Drug Administration (FDA) can issue guidance mandating clinician participation, and can tie AI tool development funding to the inclusion of clinicians in the design and testing phases. Guidance can mandate clinician involvement at critical stages for: (1) defining clinical tasks and user interface requirements (2) validating interpretability and performance for diverse populations (3) piloting in real workflows and (4) reviewing for safety and bias metrics. This would ensure AI tools used in clinical settings are human-centered, effective, and safe.

Key stakeholders who may wish to be consulted in this process include offices underneath the Department of Health and Human Services (HHS) such as the Office of the National Coordinator for Health Information Technology (ONC), the Food and Drug Administration (FDA), and the Agency for Healthcare Research and Quality (AHRQ). ONC and FDA should work to issue guidance encouraging clinician engagement during the premarket review. This would  allow experts thorough review of scientific data and real-world evidence to  ensure that the tools used are human-centered and have the ability to improve the quality of care. 

Recommendation 2. Incentivize Clinician Involvement Through Outcomes-Based Contracting

Federal agencies such as the Department of Health and Human Services (HHS), the Centers for Medicare and Medicaid Services (CMS), and the Agency for Healthcare Research and Quality (AHRQ) should incorporate outcomes-based contracting requirements into AI-related healthcare grant programs. Funding should be awarded to grantees who: (1) include clinicians as part of their AI design teams or advisory boards, (2) develop formal clinician feedback loops, and (3) demonstrate measurable outcomes such as improved diagnostic accuracy or workflow efficiency. These outcomes are essential when thinking about clinician engagement and how it will improve the usability of AI tools and their clinical impact.

Key stakeholders  include HHS, CMS, ONC, AHRQ, as well as clinicians, AI developers, and potentially patient advocacy organizations. These requirements should prioritize funding for entities that demonstrate clear clinician involvement at key development and testing phases, with metrics tied to improvements in patient outcomes and clinician satisfaction. This model would align with CMS’s ongoing efforts to foster a patient-centered, data-driven healthcare ecosystem that uses tools designed with clinical needs in mind, as recently emphasized during the  health tech ecosystem initiative meeting. Embedding outcomes-based contracting into the federal grant process will link funding to clinical effectiveness and incentivize developers to work alongside clinicians through the lifecycle of their AI tools. 

Recommendation 3. Develop Standards for AI Interoperability

ONC should develop interoperability guidelines that enable AI systems to share information across platforms while simultaneously protecting patient privacy. As the challenge of healthcare data fragmentation has become evident, AI tools must seamlessly integrate with diverse electronic healthcare records (EHRs) and other clinical platforms to ensure their effectiveness. 

An example of successful interoperability frameworks can be seen through the Trusted Exchange Framework and Common Agreement (TEFCA), which aims to establish a nationwide interoperability infrastructure for the exchange of health information. Using a model such as this one can establish seamless integration across different healthcare settings and EHR systems, ultimately promoting efficient and accurate patient care. This effort would involve the consultation of clinicians, electronic health record vendors, patients, and AI developers. These guidelines will help ensure that AI tools can be used safely and effectively across clinical settings. 

Recommendation 4. Establish Post-Market Surveillance and Evaluation of Healthcare AI Tools to Enhance Performance and Reliability

Federal agencies such as FDA and AHRQ should establish frameworks that can be used for the continuous monitoring of AI tools in clinical settings. These frameworks for privacy-protected data collection should incorporate feedback loops that allow real-world data from clinicians and patients to inform ongoing updates and improvements to the systems. This ensures the effectiveness and accuracy of the tools over time. Special emphasis should be placed on bias audits that can detect disparities in the system’s performance across different patient groups. Bias audits will be key to identifying whether AI tools inadvertently present disadvantages to specific populations based on the data they were trained on. Agencies should require that these audits be conducted routinely as part of the post-market surveillance process. The surveillance data collected can be used for future development cycles where AI tools are updated or re-trained to address shortcomings. 

Evaluation methods should track clinician satisfaction, error rates, diagnostic accuracy, and reportability of failures. During this ongoing evaluation process, incorporating routine bias audits into post-market surveillance will ensure that these tools remain equitable and effective over time. Funding for this initiative could potentially be provided through a zero-cost, fee-based structure or federally appropriated grants. Key stakeholders in this process could include clinicians, AI developers, and patients, all of whom would be responsible for providing oversight. 

Conclusion

Integrating AI tools into healthcare has an immense amount of potential to improve patient outcomes, streamline clinical workflows, and reduce errors and bias. However, without clinician involvement in the development and testing of these tools, we risk continual system degradation and patient harm. Requiring that all AI systems used for healthcare are human-centered through clinician input will ensure these systems are effective, safe, and align with real-world clinical needs. This human-centered approach is critical not only for usability, but also for building trust among clinicians and patients, fostering the adoption of AI tools, and ensuring they function properly in real-world clinical settings. 

In addition, aligning funding and clinical outcomes through outcomes-based contracting adds a mechanism that forces accountability and ensures lasting impact. When developers are rewarded for improving safety, usability, and equity through clinician involvement, we can transform AI tools into safer care. There is an urgency to address these challenges due to the rapid adoption of AI tools which will require safeguards and ethical oversight. By embedding these recommendations into funding opportunities, we will move America toward building trustworthy healthcare systems that enhance patient safety, clinician expertise, and are adaptive while maximizing AI’s potential for improving patient outcomes. Clinician engagement, both in the development process and through ongoing feedback loops will be the foundation of this transformation. With the right structures in place, we can ensure AI becomes a trusted partner in healthcare and not a risk to it.

This memo produced as part of Strengthening Pathways to Disease Prevention and Improved Health Outcomes.