Accelerating Materials Science with AI and Robotics
Innovations in materials science enable innumerable downstream innovations: steel enabled skyscrapers, and novel configurations of silicon enabled microelectronics. Yet progress in materials science has slowed in recent years. Fundamentally, this is because there is a vast universe of potential materials, and the only way to discover which among them are most useful is to experiment. Today, those experiments are largely conducted by hand. Innovations in artificial intelligence and robotics will allow us to accelerate the search process using foundation AI models for science research and automate much of the experimentation with robotic, self-driving labs. This policy memo recommends the Department of Energy (DOE) lead this effort because of its unique expertise in supercomputing, AI, and its large network of National Labs.
Challenge and Opportunity
Take a look at your smartphone. How long does its battery last? How durable is its frame? How tough is its screen? How fast and efficient are the chips inside it?
Each of these questions implicates materials science in fundamental ways. The limits of our technological capabilities are defined by the limits of what we can build, and what we can build is defined by what materials we have at our disposal. The early eras of human history are named for materials: the Stone Age, the Bronze Age, the Iron Age. Even today, the cradle of American innovation is Silicon Valley, a reminder that even our digital era is enabled by finding innovative ways to assemble matter to accomplish novel things.
Materials science has been a driver of economic growth and innovation for decades. Improvements to silicon purification and processing—painstakingly worked on in labs for decades—fundamentally enabled silicon-based semiconductors, a $600 billion industry today that McKinsey recently projected would double in size by 2030. The entire digital economy, conservatively estimated by the Bureau of Economic Analysis (BEA) at $3.7 trillion in the U.S. alone, in turn, rests on semiconductors. Plastics, another profound materials science innovation, are estimated to have generated more than $500 billion in economic value in the U.S. last year. The quantitative benefits are staggering, but even qualitatively, it is impossible to imagine modern life without these materials.
However, present-day materials are beginning to show their age. We need better batteries to accelerate the transition to clean energy. We may be approaching the limits of traditional methods of manufacturing semiconductors in the next decade. We require exotic new forms of magnets to bring technologies like nuclear fusion to life. We need materials with better thermal properties to improve spacecraft.
Yet materials science and engineering—the disciplines of discovering and learning to use new materials—have slowed down in recent decades. The low-hanging fruit has been plucked, and the easy discoveries are old news. We’re approaching the limits of what our materials can do because we are also approaching the limits of what the traditional practice of materials science can do.
Today, materials science proceeds at much the same pace as it did half a century ago: manually, with small academic labs and graduate students formulating potential new combinations of elements, synthesizing those combinations, and studying their characteristics. Because there are more ways to configure matter than there are atoms in the universe, manually searching through the space of possible materials is an impossible task.
Fortunately, AI and robotics present an opportunity to automate that process. AI foundation models for physics and chemistry can be used to simulate potential materials with unprecedented speed and low cost compared to traditional ab initio methods. Robotic labs (also known as “self-driving labs”) can automate the manual process of performing experiments, allowing scientists to synthesize, validate, and characterize new materials twenty-four hours a day at dramatically lower costs. The experiments will generate valuable data for further refining the foundation models, resulting in a positive feedback loop. AI language models like OpenAI’s GPT-4 can write summaries of experimental results and even help ideate new experiments. The scientists and their grad students, freed from this manual and often tedious labor, can do what humans do best: think creatively and imaginatively.
Achieving this goal will require a coordinated effort, significant investment, and expertise at the frontiers of science and engineering. Because much of materials science is basic R&D—too far from commercialization to attract private investment—there is a unique opportunity for the federal government to lead the way. As with much scientific R&D, the economic benefits of new materials science discoveries may take time to emerge. One literature review estimated that it can take roughly 20 years for basic research to translate to economic growth. Research indicates that the returns—once they materialize—are significant. A study from the Federal Reserve Bank of Dallas suggests a return of 150-300% on federal R&D spending.
The best-positioned department within the federal government to coordinate this effort is the DOE, which has many of the key ingredients in place: a demonstrated track record of building and maintaining the supercomputing facilities required to make physics-based AI models, unparalleled scientific datasets with which to train those models collected over decades of work by national labs and other DOE facilities, and a skilled scientific and engineering workforce capable of bringing challenging projects to fruition.
Plan of Action
Achieving the goal of using AI and robotics to simulate potential materials with unprecedented speed and low cost, and benefit from the discoveries, rests on five key pillars:
- Creating large physics and chemistry datasets for foundation model training (estimated cost: $100 million)
- Developing foundation AI models for materials science discovery, either independently or in collaboration with the private sector (estimated cost: $10-100 million, depending on the nature of the collaboration);
- Building 1-2 pilot self-driving labs (SDLs) aimed at establishing best practices, building a supply chain for robotics and other equipment, and validating the scientific merit of SDLs (estimated cost: $20-40 million);
- Making self-driving labs an official priority of the DOE’s preexisting FASST initiative (described below);
- Directing the DOE’s new Foundation for Energy Security and Innovation (FESI) to prioritize establishing fellowships and public-private partnerships to support items (1) and (2), both financially and with human capital.
The total cost of the proposal, then, is estimated at between $130-240 million. The potential return on this investment, though, is far higher. Moderate improvements to battery materials could drive tens or hundreds of billions of dollars in value. Discovery of a “holy grail” material, such as a room-temperature, ambient-pressure superconductor, could create trillions of dollars in value.
Creating Materials Science Foundation Model Datasets
Before a large materials science foundation model can be trained, vast datasets must be assembled. DOE, through its large network of scientific facilities including particle colliders, observatories, supercomputers, and other experimental sites, collects enormous quantities of data–but this, unfortunately, is only the beginning. DOE’s data infrastructure is out-of-date and fragmented between different user facilities. Data access and retention policies make sharing and combining different datasets difficult or impossible.
All of these policy and infrastructural decisions were made far before training large-scale foundation models was a priority. They will have to be changed to capitalize on the newfound opportunity of AI. Existing DOE data will have to be reorganized into formats and within technical infrastructure suited to training foundation models. In some cases, data access and retention policies will need to be relaxed or otherwise modified.
In other cases, however, highly sensitive data will need to be integrated in more sophisticated ways. A 2023 DOE report, recognizing the problems with DOE data infrastructure, suggests developing federated learning capabilities–an active area of research in the broader machine learning community–which would allow for data to be used for training without being shared. This would, the report argues, ”allow access and connections to the information through access control processes that are developed explicitly for multilevel privacy.”
This work will require deep collaboration between data scientists, machine learning scientists and engineers, and domain-specific scientists. It is, by far, the least glamorous part of the process–yet it is the necessary groundwork for all progress to follow.
Building AI Foundation Models for Science
Fundamentally, AI is a sophisticated form of statistics. Deep learning, the broad approach that has undergirded all advances in AI over the past decade, allows AI models to uncover deep patterns in extremely complex datasets, such as all the content on the internet, the genomes of millions of organisms, or the structures of thousands of proteins and other biomolecules. Models of this kind are sometimes loosely referred to as “foundation models.”
Foundation models for materials science can take many different forms, incorporating various aspects of physics, chemistry, and even—for the emerging field of biomaterials—biology. Broadly speaking, foundation models can help materials science in two ways: inverse design and property prediction. Inverse design allows scientists to input a given set of desired characteristics (toughness, brittleness, heat resistance, electrical conductivity, etc.) and receive a prediction for what material might be able to achieve those properties. Property prediction is the opposite flow of information, inputting a given material and receiving a prediction of what properties it will have in the real world.
DOE has already proposed creating AI foundation models for materials science as part of its Frontiers in Artificial Intelligence for Science, Security and Technology (FASST) initiative. While this initiative contains numerous other AI-related science and technology objectives, supporting it would enable the creation of new foundation models, which can in turn be used to support the broader materials science work.
DOE’s long history of stewarding America’s national labs makes it the best-suited home for this proposal. DOE labs and other DOE sub-agencies have decades of data from particle accelerators, nuclear fusion reactors, and other specialized equipment rarely seen in other facilities. These labs have performed hundreds of thousands of experiments in physics and chemistry over their lifetimes, and over time, DOE has created standardized data collection practices. AI models are defined by the data that they are trained with, and DOE has some of the most comprehensive physics and chemistry datasets in the country—if not the world.
The foundation models created by DOE should be made available to scientists. The extent of that availability should be determined by the sensitivity of the data used to train the model and other potential risks associated with broad availability. If, for example, a model was created using purely internal or otherwise sensitive DOE datasets, it might have to be made available only to select audiences with usage monitored; otherwise, there is a risk of exfiltrating sensitive training data. If there are no such data security concerns, DOE could choose to fully open source the models, meaning their weights and code would be available to the general public. Regardless of how the models themselves are distributed, the fruits of all research enabled by both DOE foundation models and self-driving labs should be made available to the academic community and broader public.
Scaling Self-Driving Labs
Self-driving labs are largely automated facilities that allow robotic equipment to autonomously conduct scientific experiments with human supervision. They are well-suited to relatively simple, routine experiments—the exact kind involved in much of materials science. Recent advancements in robotics have been driven by a combination of cheaper hardware and enhanced AI models. While fully autonomous humanoid robots capable of automating arbitrary manual labor are likely years away, it is now possible to configure facilities to automate a broad range of scripted tasks.
Many experiments in materials science involve making iterative tweaks to variables within the same broad experimental design. For example, a grad student might tweak the ratios of the elements that constitute the material, or change the temperature at which the elements are combined. These are highly automatable tasks. Furthermore, by allowing multiple experiments to be conducted in parallel, self-driving labs allow scientists to rapidly accelerate the pace at which they conduct their work.
Creating a successful large-scale self-driving lab will require collaboration with private sector partners, particularly robot manufacturers and the creators of AI models for robotics. Fortunately, the United States has many such firms. Therefore, DOE should initiate a competitive bidding process for the robotic equipment that will be housed within its self-driving labs. Because DOE has experience in building lab facilities, it should directly oversee the construction of the self-driving lab itself.
The United States already has several small-scale self-driving labs, primarily led by investments at DOE National Labs. The small size of these projects, however, makes it difficult to achieve the economies of scale that are necessary for self-driving labs to become an enduring part of America’s scientific ecosystem.
AI creates additional opportunities to expand automated materials science. Frontier language and multi-modal models, such as OpenAI’s GPT-4o, Anthropic’s Claude 3.5, and Google’s Gemini family, have already been used to ideate scientific experiments, including directing a robotic lab in the fully autonomous synthesis of a known chemical compound. These models would not operate with full autonomy. Instead, scientists would direct the inquiry and the design of the experiment, with the models autonomously suggesting variables to tweak.
Modern frontier models have substantial knowledge in all fields of science, and can hold all of the academic literature relevant to a specific niche of materials science within their active attention. This combination means that they have—when paired with a trained human—the scientific intuition to iteratively tweak an experimental design. They can also write the code necessary to direct the robots in the self-driving lab. Finally, they can write summaries of the experimental results—including the failures. This is crucial, because, given the constraints on their time, scientists today often only report their successes in published writing. Yet failures are just as important to document publicly to avoid other scientists duplicating their efforts.
Once constructed, this self-driving lab infrastructure can be a resource made available as another DOE user facility to materials scientists across the country, much as DOE supercomputers are today. DOE already has a robust process and infrastructure in place to share in-demand resources among different scientists, again underscoring why the Department is well-positioned to lead this endeavor.
Conclusion
Taken together, materials science faces a grand challenge, yet an even grander opportunity. Room-temperature, ambient-pressure superconductors—permitted by the laws of physics but as-yet undiscovered—could transform consumer electronics, clean energy, transportation, and even space travel. New forms of magnets could enable a wide range of cutting-edge technologies, such as nuclear fusion reactors. High-performance ceramics could improve reusable rockets and hypersonic aircraft. The opportunities are limitless.
With a coordinated effort led by DOE, the federal government can demonstrate to Americans that scientific innovation and technological progress can still deliver profound improvements to daily life. It can pave the way for a new approach to science firmly rooted in modern technology, creating an example for other areas of science to follow. Perhaps most importantly, it can make Americans excited about the future—something that has been sorely lacking in American society in recent decades.
AI is a radically transformative technology. Contemplating that transformation in the abstract almost inevitably leads to anxiety and fear. There are legislative proposals, white papers, speeches, blog posts, and tweets about using AI to positive ends. Yet merely talking about positive uses of AI is insufficient: the technology is ready, and the opportunities are there. Now is the time to act.
This action-ready policy memo is part of Day One 2025 — our effort to bring forward bold policy ideas, grounded in science and evidence, that can tackle the country’s biggest challenges and bring us closer to the prosperous, equitable and safe future that we all hope for whoever takes office in 2025 and beyond.
Compared to “cloud labs” for biology and chemistry, the risks associated with self-driving labs for materials science are low. In a cloud lab equipped with nucleic acid synthesis machines, for example, genetic sequences need to be screened carefully to ensure that they are not dangerous pathogens—a nontrivial task. There are not analogous risks for most materials science applications.
However, given the dual-use nature of many novel materials, any self-driving lab would need to have strong cybersecurity and intellectual property protections. Scientists using self-driving lab facilities would need to be carefully screened by DOE—fortunately, this is an infrastructure DOE possesses already for determining access to its supercomputing facilities.
Not all materials involve easily repeatable, and hence automatable, experiments for synthesis and characterization. But many important classes of materials do, including:
- Thin films and coatings
- Photonic and optoelectronic materials such as perovskites (used for solar panels)
- Polymers and monomers
- Battery and energy storage materials
Over time, additional classes of materials can be added.
DOE can and should be creative and resourceful in finding additional resources beyond public funding for this project. Collaborations on both foundation AI models and scaling self-driving labs between DOE and private sector AI firms can be uniquely facilitated by DOE’s new Foundation for Energy Security and Innovation (FESI), a private foundation created by DOE to support scientific fellowships, public-private partnerships, and other key mission-related initiatives.
Yes. Some private firms have recently demonstrated the promise. In late 2023, Google DeepMind unveiled GNoME, a materials science model that identified thousands of new potential materials (though they need to be experimentally validated). Microsoft’s GenMatter model pushed in a similar direction. Both models were developed in collaboration with DOE National Labs (Lawrence Berkeley in the case of DeepMind, and Pacific Northwest in the case of Microsoft).
Innovations in artificial intelligence and robotics will allow us to accelerate the search process using foundation AI models for science research and automate much of the experimentation with robotic, self-driving labs.
The United States needs a strategic investment fund (SIF) to shepherd promising technologies in nationally vital sectors through the valley of death.
Standardizing support for Accessibility & Accommodations in federally funded research efforts would open opportunities for disabled scientists and their research programs.
The incoming administration must act to address bias in medical technology at the development, testing and regulation, and market-deployment and evaluation phases.