Interview with Erika DeBenedictis
2022 Bioautomation Challenge: Investing in Automating Protein Engineering
Thomas Kalil, Chief Innovation Officer of Schmidt Futures, interviews biomedical engineer Erika DeBenedictis
Schmidt Futures is supporting an initiative – the 2022 Bioautomation Challenge – to accelerate the adoption of automation by leading researchers in protein engineering. The Federation of American Scientists will act as the fiscal sponsor for this challenge.
This initiative was designed by Erika DeBenedictis, who will also serve as the program director. Erika holds a PhD in biological engineering from MIT, and has also worked in biochemist David Baker’s lab on machine learning for protein design at the University of Washington in Seattle.
Recently, I caught up with Erika to understand why she’s excited about the opportunity to automate protein engineering.
Why is it important to encourage widespread use of automation in life science research?
Automation improves reproducibility and scalability of life science. Today, it is difficult to transfer experiments between labs. This slows progress in the entire field, both amongst academics and also from academia to industry. Automation allows new techniques to be shared frictionlessly, accelerating broader availability of new techniques. It also allows us to make better use of our scientific workforce. Widespread automation in life science would shift the time spent away from repetitive experiments and toward more creative, conceptual work, including designing experiments and carefully selecting the most important problems.
How did you get interested in the role that automation can play in the life sciences?
I started graduate school in biological engineering directly after working as a software engineer at Dropbox. I was shocked to learn that people use a drag-and-drop GUI to control laboratory automation rather than an actual programming language. It was clear to me that automation has the potential to massively accelerate life science research, and there’s a lot of low-hanging fruit.
Why is this the right time to encourage the adoption of automation?
The industrial revolution was 200 years ago, and yet people are still using hand pipettes. It’s insane! The hardware for doing life science robotically is quite mature at this point, and there are quite a few groups (Ginkgo, Strateos, Emerald Cloud Lab, Arctoris) that have automated robotic setups. Two barriers to widespread automation remain: the development of robust protocols that are well adapted to robotic execution and overcoming cultural and institutional inertia.
What role could automation play in generating the data we need for machine learning? What are the limitations of today’s publicly available data sets?
There’s plenty of life science datasets available online, but unfortunately most of it is unusable for machine learning purposes. Datasets collected by individual labs are usually too small, and combining datasets between labs, or even amongst different experimentalists, is often a nightmare. Today, when two different people run the ‘same’ experiment they will often get subtly different results. That’s a problem we need to systematically fix before we can collect big datasets. Automating and standardizing measurements is one promising strategy to address this challenge.
Why protein engineering?
The success of AlphaFold has highlighted to everyone the value of using machine learning to understand molecular biology. Methods for machine-learning guided closed-loop protein engineering are increasingly well developed, and automation makes it that much easier for scientists to benefit from these techniques. Protein engineering also benefits from “robotic brute force.” When you engineer any protein, it is always valuable to test more variants, making this discipline uniquely benefit from automation.
If it’s such a good idea, why haven’t academics done it in the past?
Cost and risk are the main barriers. What sort of methods are valuable to automate and run remotely? Will automation be as valuable as expected? It’s a totally different research paradigm; what will it be like? Even assuming that an academic wants to go ahead and spend $300k for a year of access to a cloud laboratory, it is difficult to find a funding source. Very few labs have enough discretionary funds to cover this cost, equipment grants are unlikely to pay for cloud lab access, and it is not obvious whether or not the NIH or other traditional funders would look favorably on this sort of expense in the budget for an R01 or equivalent. Additionally, it is difficult to seek out funding without already having data demonstrating the utility of automation for a particular application. All together, there are just a lot of barriers to entry.
You’re starting this new program called the 2022 Bioautomation Challenge. How does the program eliminate those barriers?
This program is designed to allow academic labs to test out automation with little risk and at no cost. Groups are invited to submit proposals for methods they would like to automate. Selected proposals will be granted three months of cloud lab development time, plus a generous reagent budget. Groups that successfully automate their method will also be given transition funding so that they can continue to use their cloud lab method while applying for grants with their brand-new preliminary data. This way, labs don’t need to put in any money up-front, and are able to decide whether they like the workflow and results of automation before finding long-term funding.
Historically, some investments that have been made in automation have been disappointing, like GM in the 1980s, or Tesla in the 2010s. What can we learn from the experiences of other industries? Are there any risks?
For sure. I would say even “life science in the 2010s” is an example of disappointing automation: academic labs started buying automation robots, but it didn’t end up being the right paradigm to see the benefits. I see the 2022 Bioautomation Challenge as an experiment itself: we’re going to empower labs across the country to test out many different use cases for cloud labs to see what works and what doesn’t.
Where will funding for cloud lab access come from in the future?
Currently there’s a question as to whether traditional funding sources like the NIH would look favorably on cloud lab access in a budget. One of the goals of this program is to demonstrate the benefits of cloud science, which I hope will encourage traditional funders to support this research paradigm. In addition, the natural place to house cloud lab access in the academic ecosystem is at the university level. I expect that many universities may create cloud lab access programs, or upgrade their existing core facilities into cloud labs. In fact, it’s already happening: Carnegie Mellon recently announced they’re opening a local robotic facility that runs Emerald Cloud Lab’s software.
What role will biofabs and core facilities play?
In 10 years, I think the terms “biofab,” “core facility,” and “cloud lab” will all be synonymous. Today the only important difference is how experiments are specified: many core facilities still take orders through bespoke Google forms, whereas Emerald Cloud Lab has figured out how to expose a single programming interface for all their instruments. We’re implementing this program at Emerald because it’s important that all the labs that participate can talk to one another and share protocols, rather than each developing methods that can only run in their local biofab. Eventually, I think we’ll see standardization, and all the facilities will be capable of running any protocol for which they have the necessary instruments.
In addition to protein engineering, are there other areas in the life sciences that would benefit from cloud labs and large-scale, reliable data collection for machine learning?
I think there are many areas that would benefit. Areas that struggle with reproducibility, are manually repetitive and time intensive, or that benefit from closely integrating computational analysis with data are both good targets for automation. Microscopy and mammalian tissue culture might be another two candidates. But there’s a lot of intellectual work for the community to do in order to articulate problems that can be solved with machine learning approaches, if given the opportunity to collect the data.
To improve program outcomes, federal evaluation officers should conduct “unmet desire surveys” to advance federal learning agendas and built agency buy-in.
At least 40% of Medicare beneficiaries do not have a documented AHCD. In the absence of one, medical professionals may perform major and costly interventions unknowingly against a patient’s wishes.
AI has transformative potential in the public health space, but innovation driven primarily by the private sector today may be exacerbating existing disparities by training models.
With targeted policy interventions, we can efficiently and effectively support the U.S. innovation economy through the translation of breakthrough scientific research from the lab to the market.