Expected Utility Forecasting for Science Funding

The typical science grantmaker seeks to maximize their (positive) impact with a limited amount of money. The decision-making process for how to allocate that funding requires them to consider the different dimensions of risk and uncertainty involved in science proposals, as described in foundational work by economists Chiara Franzoni and Paula Stephan. The Von Neumann-Morgenstern utility theorem implies that there exists for the grantmaker — or the peer reviewer(s) assessing proposals on their behalf — a utility function whose expected value they will seek to maximize. 

Common frameworks for evaluating proposals leave this utility function implicit, often evaluating aspects of risk, uncertainty, and potential value independently and qualitatively. Empirical work has suggested that such an approach may lead to biases, resulting in funding decisions that deviate from grantmakers’ ultimate goals. An expected utility approach to reviewing science proposals aims to make that implicit decision-making process explicit, and thus reduce biases, by asking reviewers to directly predict the probability and value of different potential outcomes occurring. Implementing this approach through forecasting brings the added benefits of providing (1) a resolution and scoring process that could help incentivize reviewers to make better, more accurate predictions over time and (2) empirical estimates of reviewers’ accuracy and tendency to over or underestimate the value and probability of success of proposals.

At the Federation of American Scientists, we are currently piloting this approach on a series of proposals in the life sciences that we have collected for Focused Research Organizations (FROs), a new type of non-profit research organization designed to tackle challenges that neither academia or industry is incentivized to work on. The pilot study was developed in collaboration with Metaculus, a forecasting platform and aggregator, and is hosted on their website. In this paper, we provide the detailed methodology for the approach that we have developed, which builds upon Franzoni and Stephan’s work, so that interested grantmakers may adapt it for their own purposes. The motivation for developing this approach and how we believe it may help address biases against risk in traditional peer review processes is discussed in our article “Risk and Reward in Peer Review”.

Defining Outcomes

To illustrate how an expected utility forecasting approach could be applied to scientific proposal evaluation, let us first imagine a research project consisting of multiple possible outcomes or milestones. In the most straightforward case, the outcomes that could arise are mutually exclusive (i.e., only a single one will be observed). Indexing each outcome with the letter 𝑖, we can define the expected value of each as the product of its value (or utility; 𝓊𝑖) and the probability of it occurring, 𝑃(𝑚𝑖). Because the outcomes in this example are mutually exclusive, the total expected utility (TEU) of the proposed project is the sum of the expected value of each outcome1:

𝑇𝐸𝑈 = 𝛴𝑖𝓊𝑖𝑃(𝑚𝑖).

However, in most cases, it is easier and more accurate to define the range of outcomes of a research project as a set of primary and secondary outcomes or research milestones that are not mutually exclusive, and can instead occur in various combinations.

For instance, science proposals usually highlight the primary outcome(s) that they aim to achieve, but may also involve important secondary outcome(s) that can be achieved in addition to or instead of the primary goals. Secondary outcomes can be a research method, tool, or dataset produced for the purpose of achieving the primary outcome; a discovery made in the process of pursuing the primary outcome; or an outcome that researchers pivot to pursuing as they obtain new information from the research process. As such, primary and secondary outcomes are not necessarily mutually exclusive. In the simplest scenario with just two outcomes (either two primary or one primary and one secondary), the total expected utility becomes

𝑇𝐸𝑈 = 𝓊1𝑃(𝑚1⋂ not 𝑚2) + 𝓊2𝑃(𝑚2⋂ not 𝑚1) + (𝓊1 + 𝓊2)𝑃(𝑚1⋂𝑚2),

𝑇𝐸𝑈 = 𝓊1𝑃(𝑚1) – (𝑚1⋂ 𝑚2)) + 𝓊2𝑃(𝑚2) – 𝑃(𝑚1⋂ 𝑚2)) + (𝓊1 + 𝓊2)𝑃(𝑚1⋂𝑚2)

𝑇𝐸𝑈 = 𝓊1𝑃(𝑚1) + 𝓊2𝑃(𝑚2) – (𝓊1 + 𝓊2)𝑃(𝑚1⋂𝑚2).

As the number of outcomes increases, the number of joint probability terms increases as well. Assuming the outcomes are independent though, they can be reduced to the product of the probabilities of individual outcomes. For example,

𝑃(𝑚1⋂𝑚2) = 𝑃(𝑚1) * 𝑃(𝑚2)

On the other hand, milestones are typically designed to build upon one another, such that achieving later milestones necessitates the achievement of prior milestones. In these cases, the value of later milestones typically includes the value of prior milestones: for example, the value of demonstrating a complete pilot of a technology is inclusive of the value of demonstrating individual components of that technology. The total expected utility can thus be defined as the sum of the product of the marginal utility of each additional milestone and its probability of success:

𝑇𝐸𝑈 = 𝛴𝑖(𝓊𝑖 + 𝓊𝑖-1)𝑃(𝑚𝑖),
where 𝓊0 = 0.

Depending on the science proposal, either of these approaches — or a combination — may make the most sense for determining the set of outcomes to evaluate.

In our FRO Forecasting pilot, we worked with proposal authors to define two outcomes for each of their proposals. Depending on what made the most sense for each proposal, the two outcomes reflected either relatively independent primary and secondary goals, or sequential milestone outcomes that directly built upon one another (though for simplicity, we called all of the outcomes milestones).

Defining Probability of Success

Once the set of potential outcomes have been defined, the next step is to determine the probability of success between 0% and 100% for each outcome if the proposal is funded. A prediction of 50% would indicate the highest level of uncertainty about the outcome, whereas the closer the predicted probability of success is to 0% or 100%, the more certainty there is that the outcome will be one over the other. 
Furthermore, Franzoni and Stephan decompose probability of success into two components: the probability that the outcome can actually occur in nature or reality and the probability that the proposed methodology will succeed in obtaining the outcome (conditional on it being possible in nature). The total probability is then the product of these two components:

𝑃(𝑚𝑖) = 𝑃nature(𝑚𝑖) * 𝑃proposal(𝑚𝑖)

Depending on the nature of the proposal (e.g., more technology-driven, or more theoretical/discovery driven), each component may be more or less relevant. For example, our forecasting pilot includes a proposal to perform knockout validation of renewable antibodies for 10,000 to 15,000 human proteins; for this project, 𝑃nature(𝑚𝑖) approaches 1 and 𝑃proposal(𝑚𝑖) drives the overall probability of success.

Defining Utility

Similarly, the value of an outcome can be separated into its impact on the scientific field and its impact on society at large. Scientific impact aims to capture the extent to which a project advances the frontiers of knowledge, enables new discoveries or innovations, or enhances scientific capabilities or methods. Social impact aims to capture the extent to which a project contributes to solving important societal problems, improving well-being, or advancing social goals. 

In both of these cases, determining the value of an outcome entails some subjective preferences, so there is no “correct” choice, at least mathematically speaking. However, proxy metrics may be helpful in considering impact. Though each is imperfect, one could consider citations of papers, patents on tools or methods, or users of method, tools, and datasets as proxies of scientific impact. For social impact, some proxy metrics that one might consider are the value of lives saved, the cost of illness prevented, the number of job-years of employment generated, economic output in terms of GDP, or the social return on investment.

The approach outlined by Franzoni and Stephan asks reviewers to assess scientific and social impact on a linear scale (0-100), after which the values can be averaged to determine the overall impact of an outcome. However, we believe that an exponential scale better captures the tendency in science for a small number of research projects to have an outsized impact and provides more room at the top end of the scale for reviewers to increase the rating of the proposals that they believe will have an exceptional impact.

Exponential relationship between the impact score and actual impact + Citation distribution of journal articles

As such, for our FRO Forecasting pilot, we chose to use a framework in which a simple 1–10 score corresponds to real-world impact via a base 2 exponential scale. In this case, the overall impact score of an outcome can be calculated according to

𝓊𝑖 = log[2science impact of 𝑖 + 2social impact of 𝑖] – 1.

For an exponential scale with a different base, one would substitute that base for two in the above equation. Depending on each funder’s specific understanding of impact and the type(s) of proposals they are evaluating, different relationships between scores and utility could be more appropriate.

In order to capture reviewers’ assessment of uncertainty in their evaluations, we asked them to provide median, 25th, and 75th percentile predictions for impact instead of a single prediction. High uncertainty would be indicated by a narrow confidence interval, while low uncertainty would be indicated by a wide confidence interval.

Determining the “But For” Effect of Funding

The above approach aims to identify the highest impact proposals. However, a grantmaker may not want to simply fund the highest impact proposals; rather, they may be most interested in understanding where their funding would make the highest impact — i.e., their “but for” effect. In this case, the grantmaker would want to fund proposals with the maximum difference between the total expected utility of the research proposal if they chose to funded it versus if they chose not to:

“But For” Impact = 𝑇𝐸𝑈(funding) – 𝑇𝐸𝑈(no funding).

For TEU(funding), the probability of the outcome occurring with this specific grantmaker’s funding using the proposed approach would still be defined as above

𝑃(𝑚𝑖 | funding) = 𝑃nature(𝑚𝑖) * 𝑃proposal(𝑚𝑖),

but for 𝑇𝐸𝑈(no funding),  reviewers would need to consider the likelihood of the outcome being achieved through other means. This could involve the outcome being realized by other sources of funding, other researchers, other approaches, etc. Here, the probability of success without this specific grantmaker’s funding could be described as

𝑃(𝑚𝑖 | no funding) = 𝑃nature(𝑚𝑖) * 𝑃other mechanism(𝑚𝑖).

In our FRO Forecasting pilot, we assumed that 𝑃other mechanism(𝑚𝑖) ≈ 0. The theory of change for FROs is that there exists a set of research problems at the boundary of scientific research and engineering that are not adequately supported by traditional research and development models and are unlikely to be pursued by academia or industry. Thus, in these cases it is plausible to assume that,

𝑃(𝑚𝑖 | no funding) ≈ 0
𝑇𝐸𝑈(no funding) ≈ 0
“But For” Impact ≈ 𝑇𝐸𝑈(funding).

This assumption, while not generalizable to all contexts, can help reduce the number of questions that reviewers have to consider — a dynamic which we explore further in the next section.

Designing Forecasting Questions

Once one has determined the total expected utility equation(s) relevant for the proposal(s) that they are trying to evaluate, the parameters of the equation(s) must be translated into forecasting questions for reviewers to respond to. In general, for each outcome, reviewers will need to answer the following four questions:

  1. If this proposal is funded, what is the probability that this outcome will occur?
  2. If this proposal is not funded, what is the probability that this outcome will still occur? 
  3. What will be the scientific impact of this outcome occurring?
  4. What will be the social impact of this outcome occurring?

For the probability questions, one could alternatively ask reviewers about the different probability components (𝑃nature(𝑚𝑖), 𝑃proposal(𝑚𝑖), 𝑃other mechanism(𝑚𝑖), etc.), but in most cases it will be sufficient — and simpler for the reviewer — to focus on the top-level probabilities that feed into the TEU calculation.

In order for the above questions to tap into the benefits of the forecasting framework, they must be resolvable. Resolving the forecasting questions means that at a set time in the future, reviewers’ predictions will be compared to a ground truth based on the actual events that have occurred (i.e., was the outcome actually achieved and, if so, what was its actual impact?). Consequently, reviewers will need to be provided with the resolution date and the resolution criteria for their forecasts. 

Resolution of the probability-based questions hinges mostly on a careful and objective definition of the potential outcomes, and is otherwise straightforward — though note that only one of the probability questions will be resolved, since they are mutually exclusive. The optimal resolution of the scientific and social impact questions may depend on the context of the project and the chosen approach to defining utility. A widely applicable approach is to resolve the utility forecasts by having either program managers or subject matter experts evaluate the results of the completed project and score its impact at the resolution date.

For our pilot, we asked forecasting questions only about the probability of success given funding (question 1 above) and the scientific and social impact of each outcome (questions 3 and 4); since we assumed that the probability of success without funding was zero, we did not ask question 2. Because outcomes for the FRO proposals were designed to be either independent or sequential, we did not have to ask additional questions on the joint probability of multiple outcomes being achieved. We chose to resolve our impact questions with a post-project panel of subject matter experts.

Additional Considerations

In general, there is a tradeoff in implementing this approach between simplicity and thoroughness, efficiency and accuracy. Here are some additional considerations on that tradeoff for those looking to use this approach:

  1. The responsibility of determining the range of potential outcomes for a proposal could be assigned to three different parties: the proposal author, the proposal reviewers, or the program manager. First, grantmakers could ask proposal authors to comprehensively define within their proposal the potential primary and secondary outcomes and/or project milestones. Alternatively, reviewers could be allowed to individually — or collectively — determine what they see as the full range of potential outcomes. The third option would be for program managers to define the potential outcomes based on each proposal, with or without input from proposal authors. In our pilot, we chose to use the third approach with input from proposal authors, since it simplified the process for reviewers and allowed us to limit the number of outcomes under consideration to a manageable amount.
  1. In many cases, a “failed” or null outcome may still provide meaningful value by informing other scientists that the research method doesn’t work or that the hypothesis is unlikely to be true. Considering the replication crises in multiple fields, this could be an important and unaddressed aspect of peer review. Grantmakers could choose to ask reviewers to consider the value of these null outcomes alongside other outcomes to obtain a more complete picture of the project’s utility. We chose not to address this consideration in our pilot for the sake of limiting the evaluation burden on reviewers.
  1. If grant recipients’ are permitted greater flexibility in their research agendas, this expected value approach could become more difficult to implement, since reviewers would have to consider a wider and more uncertain range of potential outcomes. This was not the case for our FRO Forecasting pilot, since FROs are designed to have specific and well-defined research goals.

Other Similar Efforts

Currently, forecasting is an approach rarely used in grantmaking. Open Philanthropy is the only grantmaking organization we know of that has publicized their use of internal forecasts about grant-related outcomes, though their forecasts do not directly influence funding decisions and are not specifically of expected value. Franzoni and Stephan are also currently piloting their Subjective Expected Utility approach with Novo Nordisk.

Conclusion

Our goal in publishing this methodology is for interested grantmakers to freely adapt it to their own needs and iterate upon our approach. We hope that this paper will help start a conversation in the science research and funding communities that leads to further experimentation. A follow up report will be published at the end of the FRO Forecasting pilot sharing the results and learnings from the project.

Acknowledgements

We’d like to thank Peter Mühlbacher, former research scientist at Metaculus, for his meticulous feedback as we developed this approach and for his guidance in designing resolvable forecasting questions. We’d also like to thank the rest of the Metaculus team for being open to our ideas and working with us on piloting this approach, the process of which has helped refine our ideas to their current state. Any mistakes here are of course our own.

Culture Blast at the Kurt Vonnegut Museum

Charlotte Yeung is a Purdue student and New Voices in Nuclear Weapons fellow at FAS. Her multimedia show, Culture Blast, opens this week at the Kurt Vonnegut Museum in Indianapolis.

Best known for his anti-war novel Slaughterhouse-Five (1969), Kurt Vonnegut’s experience serving in World War II informed his work and his life. He acted as a powerful spokesman for the preservation of our Constitutional freedoms, for nuclear arms control, and for the protection of the earth’s fragile biosphere throughout the 1980s and 1990s. He remained engaged in these issues throughout his life. 

FAS: Tell us about this project and its goals…

Charlotte: My exhibit, Culture Blast, weaves Kurt Vonnegut’s stance on nuclear weapons with current issues we face today. It serves as a space for preservation and linkage, asking the viewer to connect Vonnegut’s concerns to the modern day. It is also a place of artistic protest and education, meant to inform the viewer of the often ignored complexity of nuclear weapons and how they affect many different parts of society. I worked with Lovely Umayam and the team at the Federation of American Scientists (FAS) to ideate and research what underpins this exhibit. 

Why this medium?

I explore nuclear history and literary protest through blackout poetry, free verse, and digital illustration – contemporary forms of written and visual art that carries on Vonnegut’s artistic protest into the modern day.

In particular, I wanted to create art that can be accessible to everyone and not just people who go to the museum. Seeing a photo of a pencil drawing or a statue online is experiencing just the surface-level nature of a work of art. It lacks the other sensory components like seeing how the art fits with the wider space and how others react to it. A similar issue comes up if my poems were spoken word. That art form lives off of a crowd. My digital art and poetry is meant to be seen in both a museum setting and online setting. Viewers can watch a time lapse of my art and see the process of creating this work. The poetry is meant to be read rather than performed. 

Can you share some of the backstory on a few of the multimedia pieces? How did the concept evolve as you worked on them?

My favorite pair in the collection is Sketch 1 (the typewriter) and the blackout poem Protection in the name of public interest. I drew Vonnegut on his typewriter because I felt this to be  the most symbolic image of his artistic protest. Though he wrote and sketched on countless personal notebooks, his work commenting on nuclear weapons can be found in some of his published stories. Vonnegut turned to his literary creativity to reflect on nuclear weapons, in particular scientists’ indifference to the suffering caused by the atom bomb, as seen in works such as Cat’s Cradle and Report on The Barnhouse Effect

The uncensored version of my poem (Protection in the name of public interest) meditates on the bomb’s harm and also the long association between science and violence. I chose to create a blackout poem because in the immediate aftermath of World War II, the American government censored many of the photos and stories about what happened to the people in Hiroshima and Nagasaki, in effect erasing the lived experience of these people. It wasn’t until writer John Hersey’s account of the U.S. nuclear attacks, titled Hiroshima, was published in The New Yorker in 1946 that the American public had an uncensored understanding of what had happened in Japan.

In terms of the process, I began this thinking that I would draw a hand and a pencil but Vonnegut is strongly associated with his typewriter so I felt that it wouldn’t be as accurate to draw something else. I knew I would write a blackout poem and link it to censorship but I wasn’t sure what it would be called or what it would say. I ultimately wrote how I felt about the exhibit and what I’ve learned so far in the FAS fellowship and what I wanted to convey. I started blacking out the poem and left the main themes I wanted viewers to have from this exhibit.

Still from Typewritertimerelapse, Charlotte Yeung, 2023

I am very fond of Sketch 2 (the cat’s cradle with the mushroom cloud inside) and the poem Labels. Labels was inspired by a lunch I had last year with a hibakusha, Yoshiko Kajimoto. She was 14 when the bomb fell and she spoke of the following days in vivid detail. I believe it was her testimony in particular that started me on this path towards researching the societal implications of nuclear weapons. 

I drew an artistic interpretation of a cat’s cradle seeking to capture a mushroom cloud to communicate the concept of hyperobjects. According to Timothy Morton, a hyperobject is a real event or phenomenon so vast that it is beyond human comprehension. Nuclear weapons are an example of a hyperobject – its existence and use has had devastating ramifications touching different aspects of life that may be hard to fully comprehend all at once. 

Nuclear weapons are both deeply present and hidden. On one hand, nuclear weapons are a constant security threat and have left deep scars on cities and bodies ranging from people in Japan to Utah downwinders. On the other hand, some of the information around them is clouded in mystery, and censored by different governments. Nuclear and fallout shelters built for nuclear warfare are considered to be Cold War relics; in the United States, they are largely abandoned or hidden, tucked in the  basements of homes and schools, or located far out in the countryside away from cities. Some nuclear bunkers are parking lots in New York City and rumored rooms in DC. Secrecy and hidden locality are reasons why there isn’t as much widespread public knowledge or understanding of nuclear weapons.

Still from Wintertimerelapse, Charlotte Yeung, 2023

Why is nuclear weapon risk a conversation we’re still having today? Where can people new to this issue learn more? 

As a young person, I encounter many people my age who ask me why I care about nuclear weapons and why they matter in this day and age. It is, in a sense, meaningless to them. The violence of the bomb cannot be completely understood without hearing about the pain it wrought on people who experienced it firsthand. From burning skin and bodies to radiation poisoning, nuclear weapons have left permanent physical and psychological scars that are rarely spoken about, but have affected generations of families and communities. 

I suggest reading more about what happened to individual A-bomb survivors to truly understand the effects of nuclear weapons. Humanizing those affected by war combats the practice of minimizing lives and experience to death counts. Great works to look at include Barefoot Gen (a manga on the Hiroshima bombings inspired by the hibakusha author’s experience), Grave of the Fireflies (a film about children grappling with the effects of the bombing), and the poem Bringing Forth New Life (生ましめんかな) by hibakusha poet Sadako Kurihara (which is about a woman giving birth in the ruins while the midwife dies from injuries in the middle of the process). 

I knew the design from the beginning. It had to be a cat’s cradle because Vonnegut equates scientific irreverence to the bomb’s humanitarian effects to a cat’s cradle (an essentially useless game of moving strings with your fingers). The poem centers around hyperobjects, a term I grappled with in a university seminar with Dr. Brite from Purdue’s Honors College. I had never heard of the term before that class but I thought it was fitting for something like nuclear weapons. My research is interdisciplinary in nature and it seeks to analyze the cultural aspects of nuclear weapons that aren’t traditionally used by the political science community.

Still from Cat’s Cradle, Charlotte Yeung, 2023

The third piece I’ll discuss is Sketch 5 and Rebuilding. This sketch of a rose is reminiscent of the Duftwolke roses sent from Germany to Hiroshima after the war as a symbol of rebuilding. It is also symbolic of Vonnegut’s experiences in World War II. He was caught in the firestorm that engulfed Dresden and he sheltered in a slaughterhouse. As one of the remaining survivors, he was forced to burn dead bodies in the aftermath. Vonnegut became an anti-war activist as a result of this experience. He also wrote Slaughterhouse-Five a book that grapples with trauma and PTSD after war. His writing was his way of finding closure and rebuilding, hence the title of the poem.

I wanted to end this collection on an optimistic note because, as dark and grim as war and nuclear weapons can be, there is great resilience in humanity. It takes monumental courage and hope to rebuild a city or mind or soul after facing the devastation of an all-consuming weapon.

Still from Broken Arrow, Charlotte Yeung, 2023

I actually drew this while talking with FAS fellows and FAS advisors. I often draw to pay attention to important conversations (so all of my notebooks are filled with drawings). I thought the deep, complex observations about nuclear weapons and ethics and misinformation and other fields was so fascinating and I think as a result, I created my favorite drawing. I felt very hopeful during the conversation, because I saw so many people who were invested in this topic and were actively researching and discussing the implication of nuclear weapons. 

Where can people see your show/contact you?

The showcase can be seen at the Kurt Vonnegut Museum and Library in Indianapolis, Indiana. If someone wants to speak about this topic with me, they can reach me at X or Instagram at @cmyeungg.

FAS Forum: Envisioning the Future of Wildland Fire Policy

In this critical year for reimagining wildland fire policy, the Federation of American Scientists (FAS) hosted a convening that provided stakeholders from the science, technology, and policy communities with an opportunity to exchange forward-looking ideas with the shared goal of improving the federal government’s approach to managing wildland fire.

A total of 43 participants attended the event. Attendee affiliations included universities, federal agencies, state and local agencies, nonprofit organizations, and philanthropies.

This event was designed as an additive opportunity for co-learning and deep dives on topics relevant to the Wildland Fire Mitigation and Management Commission (the Commission) with leading experts in relevant fields (the convening was independent from any formal Commission activities).

In particular, the Forum highlighted, and encouraged iteration on, ideas emerging from leading experts who participated in the Wildland Fire Policy Accelerator. Coordinated by FAS in partnership with COMPASS, the California Council on Science and Technology (CCST), and Conservation X Labs, this accelerator has served as a pathway to source and develop actionable policy recommendations to inform the work of the Commission.

A full list of recommendations from the Accelerator is available on the FAS website.

The above PDF summarizes discussions and key takeaways from the event for participant reference. We look forward to building on the connections made during this event.

AI for science: creating a virtuous circle of discovery and innovation

In this interview, Tom Kalil discusses the opportunities for science agencies and the research community to use AI/ML to accelerate the pace of scientific discovery and technological advancement.

Q.  Why do you think that science agencies and the research community should be paying more attention to the intersection between AI/ML and science?

Recently, researchers have used DeepMind’s AlphaFold to predict the structures of more than 200 million proteins from roughly 1 million species, covering almost every known protein on the planet! Although not all of these predictions will be accurate, this is a massive step forward for the field of protein structure prediction.

The question that science agencies and different research communities should be actively exploring is – what were the pre-conditions for this result, and are there steps we can take to create those circumstances in other fields?   

Photo by DeepMind on Unsplash

One partial answer to that question is that the protein structure community benefited from a large open database (the Protein Data Bank) and what linguist Mark Liberman calls the “Common Task Method.”

Q.  What is the Common Task Method (CTM), and why is it so important for AI/ML?

In a CTM, competitors share the common task of training a model on a challenging, standardized dataset with the goal of receiving a better score.  One paper noted that common tasks typically have four elements:

  1. Tasks are formally defined with a clear mathematical interpretation
  2. Easily accessible gold-standard datasets are publicly available in a ready-to-go standardized format
  3. One or more quantitative metrics are defined for each task to judge success
  4. State-of-the-art methods are ranked in a continuously updated leaderboard

Computational physicist and synthetic biologist Erika DeBenedictis has proposed adding a fifth component, which is that “new data can be generated on demand.”  Erika, who runs Schmidt Futures-supported competitions such as the 2022 BioAutomation Challenge,  argues that creating extensible living datasets has a few advantages.  This approach can detect and help prevent overfitting; active learning can be used to improve performance per new datapoint; and datasets can grow organically to a useful size.

Common Task Methods have been critical to progress in AI/ML.  As David Donoho noted in 50 Years of Data Science

Q.  Why do you think that we may be under-investing in the CTM approach?

U.S. agencies have already started to invest in AI for Science.  Examples include NSF’s AI Institutes, DARPA’s Accelerated Molecular Discovery, NIH’s Bridge2AI, and DOE’s investments in scientific machine learning.  The NeurIPS conference (one of the largest scientific conferences on machine learning and computational neuroscience) now has an entire track devoted to datasets and benchmarks.

However, there are a number of reasons why we are likely to be under-investing in this approach.

  1. These open datasets, benchmarks and competitions are what economists call “public goods.”  They benefit the field as a whole, and often do not disproportionately benefit the team that created the dataset.  Also, the CTM requires some level of community buy-in.  No one researcher can unilaterally define the metrics that a community will use to measure progress. 
  2. Researchers don’t spend a lot of time coming up with ideas if they don’t see a clear and reliable path to getting them funded.  Researchers ask themselves, “what datasets already exist, or what dataset could I create with a $500,000 – $1 million grant?”  They don’t ask the question, “what dataset + CTM would have a transformational impact on a given scientific or technological challenge, regardless of the resources that would be required to create it?”  If we want more researchers to generate concrete, high-impact ideas, we have to make it worth the time and effort to do so.
  3. Many key datasets (e.g., in fields such as chemistry) are proprietary, and were designed prior to the era of modern machine learning.  Although researchers are supposed to include Data Management Plans in their grant applications, these requirements are not enforced, data is often not shared in a way that is useful, and data can be of variable quality and reliability. In addition, large dataset creation may sometimes not be considered academically novel enough to garner high impact publications for researchers. 
  4. Creation of sufficiently large datasets may be prohibitively expensive.  For example, experts estimate that the cost of recreating the Protein Data Bank would be $15 billion!   Science agencies may need to also explore the role that innovation in hardware or new techniques can play in reducing the cost and increasing the uniformity of the data, using, for example, automation, massive parallelism, miniaturization, and multiplexing.  A good example of this was NIH’s $1,000 Genome project, led by Jeffrey Schloss.

Q.  Why is close collaboration between experimental and computational teams necessary to take advantage of the role that AI can play in accelerating science?

According to Michael Frumkin with Google Accelerated Science, what is even more valuable than a static dataset is a data generation capability, with a good balance of latency, throughput, and flexibility.  That’s because researchers may not immediately identify the right “objective function” that will result in a useful model with real-world applications, or the most important problem to solve.  This requires iteration between experimental and computational teams.

Q.  What do you think is the broader opportunity to enable the digital transformation of science

I think there are different tools and techniques that can be mixed and matched in a variety of ways that will collectively enable the digital transformation of science and engineering. Some examples include:

There are many opportunities at the intersection of these different scientific and technical building blocks.  For example, use of prior knowledge can sometimes reduce the amount of data that is needed to train a ML model.  Innovation in hardware could lower the time and cost of generating training data.  ML can predict the answer that a more computationally-intensive simulation might generate.  So there are undoubtedly opportunities to create a virtuous circle of innovation.

Q.  Are there any risks of the common task method?

Some researchers are pointing to negative sociological impacts associated with “SOTA-chasing” – e.g. a single-minded focus on generating a state-of-the-art result.  These include reducing the breadth of the type of research that is regarded as legitimate, too much competition and not enough cooperation, and overhyping AI/ML results with claims of “super-human” levels of performance.  Also, a researcher who makes a contribution to increasing the size and usefulness of the dataset may not get the same recognition as the researcher who gets a state-of-the-art result.

Some fields that have become overly dominated by incremental improvements in a metric have had to introduce Wild and Crazy Ideas as a separate track in their conferences to create a space for more speculative research directions.

Q.  Which types of science and engineering problems should be prioritized?

One benefit to the digital transformation of science and engineering is that it will accelerate the pace of discovery and technological advances.  This argues for picking problems where time is of the essence, including:

Obviously, it also has to be a problem where AI and ML can make a difference, e.g. ML’s ability to approximate a function that maps between an input and an output, or to lower the cost of making a prediction.

Q.  Why should economic policy-makers care about this as well?

One of the key drivers of the long-run increases in our standard of living is productivity (output per worker), and one source of productivity is what economists call general purpose technologies (GPTs).  These are technologies that have a pervasive impact on our economy and our society, such as interchangeable parts, the electric grid, the transistor, and the Internet.  

Historically –  GPTs have required other complementary changes (e.g. organizational changes, changes in production processes and the nature of work) before their economic and societal benefits can be realized.  The introduction of electricity eventually led to massive increases in manufacturing productivity, but not until factories and production lines were reorganized to take advantage of small electric motors.  There are similar challenges for fostering the role that AI/ML and complementary technologies will play in accelerating the pace of scientific and technological advances:

Q.  Why is this an area where it might make sense to “unbundle” idea generation from execution?

Traditional funding mechanisms assume that the same individual or team who has an idea should always be the person who implements the idea.  I don’t think this is necessarily the case for datasets and CTMs.  A researcher may have a brilliant idea for a dataset, but may not be in a position to liberate the data (if it already exists), rally the community, and raise the funds needed to create the dataset.  There is still a value in getting researchers to submit and publish their ideas, because their proposal could be catalytic of a larger-scale effort.

Agencies could sponsor white paper competitions with a cash prize for the best ideas. [A good example of a white paper competition is MIT’s Climate Grand Challenge, which had a number of features which made it catalytic.]  Competitions could motivate researchers to answer questions such as:

The views and opinions expressed in this blog are the author’s own and do not necessarily reflect the view of Schmidt Futures.

The Magic Laptop Thought Experiment

One of the main goals of Kalil’s Corner is to share some of the things I’ve learned over the course of my career about policy entrepreneurship. Below is an FAQ on a thought experiment that I think is useful for policy entrepreneurs, and how the thought experiment is related to a concept I call “shared agency.”

Q.  What is your favorite thought experiment?

Imagine that you have a magic laptop. The power of the laptop is that any press release that you write will come true.

You have to write a headline (goal statement), several paragraphs to provide context, and 1-2 paragraph descriptions of who is agreeing to do what (in the form organization A takes action B to achieve goal C). The individuals or organizations could be federal agencies, the Congress, companies, philanthropists, investors, research universities, non-profits, skilled volunteers, etc. The constraint, however, is that it has to be plausible that the organizations would be both willing and able to take the action. For example, a for-profit company is not going to take actions that are directly contrary to the interests of their shareholders. 

What press release would you write, and why? What makes this a compelling idea?  

Q.  What was the variant of this that you used to ask people when you worked in the White House for President Obama?

You have a 15-minute meeting in the Oval Office with the President, and he asks:

“If you give me a good idea, I will call anyone on the planet.  It can be a conference call, so there can be more than one person on the line.  What’s your idea, and why are you excited about it?  In order to make your idea happen, who would I need to call and what would I need to ask them to do in order to make it happen?”

Q.  What was your motivation for posing this thought experiment to people?

I’ve been in roles where I can occasionally serve as a “force multiplier” for other people’s ideas. The best way to have a good idea is to be exposed to many ideas.

When I was in the White House, I would meet with a lot of people who would tell me that what they worked on was very important, and deserved greater attention from policy-makers.

But when I asked them what they wanted the Administration to consider doing, they didn’t always have a specific response.  Sometimes people would have the kernel of a good idea, but I would need to play “20 questions” with them to refine it. This thought experiment would occasionally help me elicit answers to basic questions like who, what, how and why.

Q.  Why does this thought experiment relate to the Hamming question?

Richard Hamming was a researcher at Bell Labs who used to ask his colleagues, “What are the most important problems in your field?  And what are you working on?” This would annoy some of his colleagues, because it forced them to confront the fact that they were working on something that they didn’t think was that important.

If you really did have a magic laptop or a meeting with the President, you would presumably use it to help solve a problem that you thought was important!

Q.  How does this thought experiment highlight the importance of coalition-building?

There are many instances where we have a goal that requires building a coalition of individuals and organizations.

It’s hard to do that if you can’t identify (1) the potential members of the coalition; and (2) the mutually reinforcing actions you would like them to consider taking.

Once you have a hypothesis about the members of your coalition of the willing and able, you can begin to ask and answer other key questions as well, such as:

Q.  Is this thought experiment only relevant to policy-makers?

Not at all. I think it is relevant for any goal that you are pursuing — especially ones that require concerted action by multiple individuals and organizations to accomplish.

Q.  What’s the relationship between this thought experiment and Bucky Fuller’s concept of a “trim tab?”

Fuller observed that a tiny device called a trim tab is designed to move a rudder, which in turn can move a giant ship like the Queen Elizabeth.

So, it’s incredibly useful to identify these leverage points that can help solve important problems.

For example, some environmental advocates have focused on the supply chains of large multinationals. If these companies source products that are more sustainable (e.g. cooking oils that are produced without requiring deforestation) – that can have a big impact on the environment.

Q.  What steps can people take to generate better answers to this thought experiment?

There are many things – like having a deep understanding of a particular problem, being exposed to both successful and unsuccessful efforts to solve important problems in many different domains, or understanding how particular organizations that you are trying to influence make decisions.

One that I’ve been interested in is the creation of a “toolkit” for solving problems. If, as opposed to having a hammer and looking for nails to hit, you also have a saw, a screwdriver, and a tape measure, you are more likely to have the right tool or combination of tools for the right job.

For example, during my tenure in the Obama Administration, my team and other people in the White House encouraged awareness and adoption of dozens of approaches to solving problems, such as:

Of course, ideally one would be familiar with the problem-solving tactics of different types of actors (companies, research universities, foundations, investors, civil society organization) and individuals with different functional or disciplinary expertise. No one is going to master all of these tools, but you might aspire to (1) know that they exist; (2) have some heuristics about when and under what circumstances you might use them; and (3) know how to learn more about a particular approach to solving problems that might be relevant. For example, I’ve identified a number of tactics that I’ve seen foundations and nonprofits use.

Q.  How does this thought experiment relate to the concept that psychologists call “agency?”

Agency is defined by psychologists like Albert Bandura as “the human capability to influence …the course of events by one’s actions.”

The particular dimension of agency that I have experienced is a sense that there are more aspects of the status quo that are potentially changeable as opposed to being fixed. These are the elements of the status quo that are attributable to human action or inaction, as opposed to the laws of physics.

Obviously, this sense of agency didn’t extend to every problem under the sun. It was limited to those areas where progress could be made by getting identifiable individuals and organizations to take some action – like the President signing an Executive Order or proposing a new budget initiative, the G20 agreeing to increase investment in a global public good, Congress passing a law, or a coalition of organizations like companies, foundations, nonprofits and universities working together to achieve a shared goal.

Q.  How did you develop a strong sense of agency over the course of your career?

I had the privilege of working at the White House for both Presidents Clinton and Obama.

As a White House staffer, I had the ability to send the President a decision memo. If he checked the box that said “yes” – and the idea actually happened and was well-implemented, this reinforced my sense of agency.

But it wasn’t just the experience of being successful. It was also the knowledge that one acquires by repeatedly trying to move from an idea to something happening in the world, such as:

Q.  What does it mean for you to have a shared sense of agency with another individual, a team, or a community?

Obviously, most people have not had 16 years of their professional life in which they could send a decision memo to the President, get a line in the President’s State of the Union address, work with Congress to pass legislation, create a new institution, shape the federal budget, and build large coalitions with hundreds of organizations that are taking mutually reinforcing actions in the pursuit of a shared goal.

So sometimes when I am talking to an individual, a team or a community, it will become clear to me that there is some aspect of the status quo that they view as fixed, and I view as potentially changeable. It might make sense for me to explain why I believe the status quo is changeable, and what are the steps we could take together in the service of achieving a shared goal.

Q.  Why is shared agency important?

Changing the status quo is hard. If I don’t know how to do it, or believe that I would be tilting at windmills – it’s unlikely that I would devote a lot of time and energy to trying to do so.

It may be the case that pushing for change will require a fair amount of work, such as:

So if I want people to devote time and energy to fleshing out an idea or doing some of the work needed to make it happen, I need to convince them that something constructive could plausibly happen. And one way to do that is to describe what success might look like, and discuss the actions that we would take in order to achieve our shared goal. As an economist might put it, I am trying to increase their “expected return” of pursuing a shared goal by increasing the likelihood that my collaborators attach to our success.

Q.  Are there risks associated with having this strong sense of agency, and how might one mitigate against those risks?

Yes, absolutely. One is a lack of appropriate epistemic humility, by pushing a proposed solution in the absence of reasonable evidence that it will work, or failing to identify unintended consequences. It’s useful to read books like James Scott’s Seeing Like a State.

I also like the idea of evidence-based policy. For example, governments should provide modest amounts of funding for new ideas, medium-sized grants to evaluate promising approaches, and large grants to scale interventions that have been rigorously evaluated and have a high benefit to cost ratio.

The views and opinions expressed in this blog are the author’s own and do not necessarily reflect the view of Schmidt Futures.

2022 Bioautomation Challenge: Investing in Automating Protein Engineering

2022 Bioautomation Challenge: Investing in Automating Protein Engineering
Thomas Kalil, Chief Innovation Officer of Schmidt Futures, interviews biomedical engineer Erika DeBenedictis

Schmidt Futures is supporting an initiative – the 2022 Bioautomation Challenge – to accelerate the adoption of automation by leading researchers in protein engineering. The Federation of American Scientists will act as the fiscal sponsor for this challenge.

​This initiative was designed by Erika DeBenedictis, who will also serve as the program director. Erika holds a PhD in biological engineering from MIT, and has also worked in biochemist David Baker’s lab on machine learning for protein design ​​at the University of Washington in Seattle.  

​Recently, I caught up with Erika to understand why she’s excited about the opportunity to automate protein engineering.

Why is it important to encourage widespread use of automation in life science research?

Automation improves reproducibility and scalability of life science. Today, it is difficult to transfer experiments between labs. This slows progress in the entire field, both amongst academics and also from academia to industry. Automation allows new techniques to be shared frictionlessly, accelerating broader availability of new techniques. It also allows us to make better use of our scientific workforce. Widespread automation in life science would shift the time spent away from repetitive experiments and toward more creative, conceptual work, including designing experiments and carefully selecting the most important problems. 

How did you get interested in the role that automation can play in the life sciences?

​I started graduate school in biological engineering directly after working as a software engineer at Dropbox. I was shocked to learn that people use a drag-and-drop GUI to control laboratory automation rather than an actual programming language. It was clear to me that automation has the potential to massively accelerate life science research, and there’s a lot of low-hanging fruit. 

Why is this the right time to encourage the adoption of automation?

​The industrial revolution was 200 years ago, and yet people are still using hand pipettes. It’s insane! The hardware for doing life science robotically is quite mature at this point, and there are quite a few groups (Ginkgo, Strateos, Emerald Cloud Lab, Arctoris) that have automated robotic setups. Two barriers to widespread automation remain: the development of robust protocols that are well adapted to robotic execution and overcoming cultural and institutional inertia.

What role could automation play in generating the data we need for machine learning?  What are the limitations of today’s publicly available data sets?

​There’s plenty of life science datasets available online, but unfortunately most of it is unusable for machine learning purposes. Datasets collected by individual labs are usually too small, and combining datasets between labs, or even amongst different experimentalists, is often a nightmare. Today, when two different people run the ‘same’ experiment they will often get subtly different results. That’s a problem we need to systematically fix before we can collect big datasets. Automating and standardizing measurements is one promising strategy to address this challenge.

Why protein engineering?

​The success of AlphaFold has highlighted to everyone the value of using machine learning to understand molecular biology. Methods for machine-learning guided closed-loop protein engineering are increasingly well developed, and automation makes it that much easier for scientists to benefit from these techniques. Protein engineering also benefits from “robotic brute force.” When you engineer any protein, it is always valuable to test more variants, making this discipline uniquely benefit from automation. 

If it’s such a good idea, why haven’t academics done it in the past?

​Cost and risk are the main barriers. What sort of methods are valuable to automate and run remotely? Will automation be as valuable as expected? It’s a totally different research paradigm; what will it be like? Even assuming that an academic wants to go ahead and spend $300k for a year of access to a cloud laboratory, it is difficult to find a funding source. Very few labs have enough discretionary funds to cover this cost, equipment grants are unlikely to pay for cloud lab access, and it is not obvious whether or not the NIH or other traditional funders would look favorably on this sort of expense in the budget for an R01 or equivalent. Additionally, it is difficult to seek out funding without already having data demonstrating the utility of automation for a particular application. All together, there are just a lot of barriers to entry.

You’re starting this new program called the 2022 Bioautomation Challenge. How does the program eliminate those barriers?

​This program is designed to allow academic labs to test out automation with little risk and at no cost. Groups are invited to submit proposals for methods they would like to automate. Selected proposals will be granted three months of cloud lab development time, plus a generous reagent budget. Groups that successfully automate their method will also be given transition funding so that they can continue to use their cloud lab method while applying for grants with their brand-new preliminary data. This way, labs don’t need to put in any money up-front, and are able to decide whether they like the workflow and results of automation before finding long-term funding.

Historically, some investments that have been made in automation have been disappointing, like GM in the 1980s, or Tesla in the 2010s. What can we learn from the experiences of other industries? Are there any risks?

​For sure. I would say even “life science in the 2010s” is an example of disappointing automation: academic labs started buying automation robots, but it didn’t end up being the right paradigm to see the benefits. I see the 2022 Bioautomation Challenge as an experiment itself: we’re going to empower labs across the country to test out many different use cases for cloud labs to see what works and what doesn’t.

Where will funding for cloud lab access come from in the future?

​Currently there’s a question as to whether traditional funding sources like the NIH would look favorably on cloud lab access in a budget. One of the goals of this program is to demonstrate the benefits of cloud science, which I hope will encourage traditional funders to support this research paradigm. In addition, the natural place to house cloud lab access in the academic ecosystem is at the university level. I expect that many universities may create cloud lab access programs, or upgrade their existing core facilities into cloud labs. In fact, it’s already happening: Carnegie Mellon recently announced they’re opening a local robotic facility that runs Emerald Cloud Lab’s software.

What role will biofabs and core facilities play?

​In 10 years, I think the terms “biofab,” “core facility,” and “cloud lab” will all be synonymous. Today the only important difference is how experiments are specified: many core facilities still take orders through bespoke Google forms, whereas Emerald Cloud Lab has figured out how to expose a single programming interface for all their instruments. We’re implementing this program at Emerald because it’s important that all the labs that participate can talk to one another and share protocols, rather than each developing methods that can only run in their local biofab. Eventually, I think we’ll see standardization, and all the facilities will be capable of running any protocol for which they have the necessary instruments.

In addition to protein engineering, are there other areas in the life sciences that would benefit from cloud labs and large-scale, reliable data collection for machine learning?

​I think there are many areas that would benefit. Areas that struggle with reproducibility, are manually repetitive and time intensive, or that benefit from closely integrating computational analysis with data are both good targets for automation. Microscopy and mammalian tissue culture might be another two candidates. But there’s a lot of intellectual work for the community to do in order to articulate problems that can be solved with machine learning approaches, if given the opportunity to collect the data.

Growing Innovative Companies to Scale: A Listening Session with Startups in Critical Industries

On September 16th, 2021, the Day One Project convened a closed-door listening session for interagency government leaders to hear from co-founders and supply-chain leaders of 10 startups in critical industries — bioeconomy, cleantech, semiconductor — about challenges and opportunities to scale their operations and improve resilience in the United States. The panel was moderated by Elisabeth Reynolds, Special Assistant to the President for Manufacturing and Economic Development. The overarching theme is that for innovative companies in critical industries, the path of least resistance for scaling production is not in the United States — but it could be.

Unlike many startups that are purely software based and can scale quickly with little capital expenditure, these companies produce a product that requires manufacturing expertise and can take longer and more capital to grow to scale. Capital markets and government programs are often not well aligned with the needs of these companies, leaving the country at risk that many of the most cutting-edge technologies are invented here, but made elsewhere. As there is a tight relationship between the learning-by-building phase of scale up and innovation capacity, outsourcing production poses a threat to U.S. competitiveness. The country also risks losing the downstream quality manufacturing jobs that could stimulate economic growth in regions across the country.

Key Takeaways:

Challenges

Solutions

Challenges

There are significant challenges to taking advanced technology from earlier R&D phases to manufacturing products that demonstrate viability at scale. Available financing opportunities do not adequately support longer time horizons or larger capital requirements. A lack of manufacturing and engineering skills pose another barrier to scaling a product from prototype to pilot to commercial production. After many decades of disinvestment in the country’s manufacturing base, overcoming these challenges will be difficult but essential if we are to grow and benefit from our most innovative, emerging companies. As two of the bioeconomy startups stated:

“The USG knows how to fund research and purchase finished products. There is not enough money, and far more problematically, not nearly enough skilled Sherpas to fill the gap in between.

“Manufacturing … has been considered as a “cost center,” … reducing cost of manufacturing (e.g., moving manufacturing sites offshore) is one of the major themes … Rarely there are investments or financing opportunities coming to the sector to develop new technologies that can drive innovationthe types of investment are usually very large (e.g., capex for building a manufacturing plant). As a result, it has been very hard for startups which dedicate themselves to novel, next generation manufacturing technologies to raise or secure sufficient funding.”

During the conversation, three specific challenges were identified that speak to key factors that contribute to this manufacturing gap in the United States:

1) Overseas Government Incentives and Manufacturing Ecosystems

The startups largely agreed that overseas governments provide more incentives to manufacture than the United States. Often, these countries have developed “manufacturing-led” ecosystems of private companies and other institutions that can reliably deliver critical inputs, whether as part of their value chain, or in terms of their broader development needs. Some examples from the companies include:

2) Shortcomings with Existing Federal Programs and Funding

The U.S. government has a wide range of programs that focus on supporting innovation and manufacturing. However, these programs are either targeted at the earlier stages of R&D and less on manufacturing scale up, are relatively small in scope, or involve time consuming and complicated processes to access them.

3) Supply Chain Gaps and Opportunities for Sustainable Manufacturing in the U.S.

A few specific instances were described where the United States lacks access to critical inputs for bioeconomy and quantum development, as key suppliers are located abroad. However, as these emerging fields develop, critical inputs will change and present an opportunity to course correct. Therefore, improving our domestic manufacturing base now is vital for driving demand and establishing innovation ecosystems for industries of the future.

Solutions

Startups commented on the importance of expanding funding opportunities, such as co- investment and tax credit solutions, as well as key process and regulatory changes. Most importantly, startups highlighted the importance of demand-pull mechanisms to help commercialize new technologies and create new markets.

1) Additional Government Financing Mechanisms

Several companies commented on the need to provide additional financing to support manufacturers, as equipment is often too expensive for venture avenues and other forms of capital are not readily available. These solutions include expanding government co- investment and leveraging tax credits.

2) Improving Government Processes and Regulations

A few of the startups identified specific government processes or regulations that could be improved upon, such as application times for funding in energy sectors or restrictions in procurement or foreign acquisitions.

3) Government Demand-pull Incentives:

Most, if not all, startups felt that the best role for the government is in creating demand- pull incentives to support the development of technology from basic science to commercialization and help create new markets for leading-edge products. This can range from procurement contracts to new regulatory standards and requirements that can incent higher quality, domestic production.

Conclusion

These anecdotes provide a small window into some of the challenges startups face scaling their innovative technologies in the United States. Fixing our scale up ecosystem to support more investment in the later-stage manufacturing and growth of these companies is essential for U.S. leadership in emerging technologies and industries. The fixes are many — large and small, financial and regulatory, product and process-oriented — but now is a moment of opportunity to change pace from the past several decades. By addressing these challenges, the United States can build the next generation of U.S.-based advanced manufacturing companies that create good quality, middle-skill jobs in regions across the country. The Biden-Harris Administration has outlined a new industrial strategy that seeks to realize this vision and ensure U.S. global technological and economic leadership, but it’s success will require informing policy efforts with on-the-ground perspectives from small- and medium-sized private enterprises.

Session Readout: Rebuilding American Manufacturing

Our roundtable of senior leadership at the White House National Economic Council and U.S. Dept. of Health and Human Services as well as a diversity of viewpoints across political ideologies from Breakthrough Energy, American Compass, MIT’s The Engine, and Employ America discussed competing with China on advanced manufacturing sectors (bioeconomic, semiconductor, quantum, etc.), supply chain resilience, and new visions for industrial policy that can stimulate regional development. This document contains a summary of the event.

Topic Introduction: Advanced Manufacturing & U.S. Competitiveness

The session began with an introduction by Bill Bonvillian (MIT), who shared a series of reflections, challenges, and solutions to rebuilding American manufacturing:

Advanced manufacturing and supply chain resilience are two sides of the same coin. The pandemic awoke us to our over dependence on foreign supply chains. Unless we build a robust domestic manufacturing system, our supply chains will crumble. American competitiveness therefore depends on how well we can apply our innovation capabilities to the historically underfunded advanced manufacturing ecosystem. Other nations are pouring tremendous amounts of resources because they recognize the importance of manufacturing for the overall innovation cycle. To rebuild American manufacturing, an ecosystem is needed—private sector, educational institutions, and government—to create an effective regional workforce and advanced manufacturing technology pipeline.

Panel 1: Framing the Challenge and Identifying Key Misconceptions

Our first panel hosted Arnab Datta (Employ America), Chris Griswold (American Compass), and Abigail Regitsky (Breakthrough Energy). The questions and responses are summarized below:

What would you say are some misconceptions that have posed obstacles to finding consensus on an industrial policy for advanced manufacturing?

Chris Griswold: The largest misconception is the ideological view that industrial policy is central planning, with the government picking winners and losers—that it is un- American. That’s simply not true. From Alexander Hamilton, to Henry Clay and Abraham Lincoln, and through the post-war period and America’s technological rise, the American way has involved a rich public-private sector innovation ecosystem. Recent decades of libertarian economics have weakened supply chains and permitted the flight of industry from American communities.

Arnab Datta: People like to say that market choices have forced manufacturing overseas, but really it’s been about policy. Germany has maintained a world-class manufacturing base with high wages and regulations. We have underrated two important factors in sustaining a high-quality American manufacturing ecosystem: financing and aggregate demand. Manufacturing financing is cash-flow intensive, making asset-light strategies difficult. And when you see scarce aggregate demand, you see a cost-cutting mentality that leads to things like consolidation and offshoring. We only need to look back to the once-booming semiconductor industry that lost its edge. Our competitors are making the policy choices necessary to grow and develop strategically; we should do the same.

Abigail Regitsky: For climate and clean energy, startups see the benefit of developing and manufacturing in the United States—that’s a large misconception, that startups do not want to produce domestically. The large issue is that they do not have financing support to develop domestic supply chains. We need to ensure there is a market for these technologies and there is financing available to access them.

With the recently introduced bill for an Industrial Finance Corporation from Senator Coons’ Office, what would you say are the unique benefits of using government corporations and why should the general public care? And how might something like this stimulate job and economic growth regionally?

Arnab Datta: The unique benefits of a government corporation are two-fold: flexibility in affordability and in financing. In some of our most difficult times, government entities were empowered with a range of abilities to accomplish important goals. During the Great Depression and World War III, the Reconstruction Financing Corporation was necessary to ramp up wartime investment through loans, purchase guarantees, and other methods. America has faced difficult challenges, but government corporations have been a bright spot in overcoming these challenges. We face big challenges now. The Industrial Finance Corporation (IFC) bill arrives at a similar moment, granting the government the authority to tackle big problems related to industrial competition—national security, climate change, etc. We need a flexible entity, and the public should care because they are taking risks in this competition with their tax dollars. They should be able to have a stake in the product, and the IFC’s equity investments and other methods provide that. It will also help with job growth across regions.Currently, we are witnessing rising capital expenditures to a degree not seen for a very long time. We were told manufacturing jobs would never come back, but the demand is there. Creating an institution that establishes permanence for job growth in manufacturing should not be an exception but a norm.

Abigail Regitsky: We need a political coalition to get the policies in supply to support the clean energy agenda. An IFC could support a factory that leverages hydrogen in a green way, or something even more nascent. These moves require a lot of capital, but we can create a lot of economic returns and jobs if we see the long-term linkage and support it.

What would you say might be the conservative case for industrial policy for advanced manufacturing? And what specific aspects of the advanced manufacturing ecosystem specifically do you see opportunities and needs?

Chris Griswold: It’s the same as the general case—it’s a common sense, good idea. Fortunately, there is much more consensus on this now than there was just a few years ago. Some specific arguments that should appeal to both sides include:

  1. The national security imperative to bolster our currently vulnerable supply chain and industrial base.
  2. Having national economic resiliency to keep up with competitors. It’s almost unanimous at this point that it will be difficult to compete without an effective advanced manufacturing sector and resilient supply chain. Offshoring all of our capacity has diminished our know-how and degraded our ability to innovate ourselves back out of this situation. We can’t just flip the innovation switch back on—it takes time to get our manufacturing ecosystem up to speed with the pace of technological progress.
  3. Deindustrialization has hurt working communities and created regional inequality. It has made not just our country weaker in general, but it has harmed many specific working-class local communities. Working class people without a college degree have been hit the hardest. Working class communities of color have been harmed in unique ways. At the heart of these large discussions is a moral imperative about workers and their families. They matter. We must do more to support local economies, which means caring about the composition of those economies.

Abigail Regitsky: It’s the idea of losing the “know-how” or “learning-by-building” phase of innovation. This is crucial for developing solutions to solve climate change. With climate, time is of the essence; when you are able to tie manufacturing to the innovation process, it fosters a faster scale up of new technology. We need the manufacturing know-how to scale up emerging technologies and reduce emissions to zero by mid-century.

Panel 2: Ideas for Action

Our first panel hosted Dr. Elisabeth Reynolds (WHNEC), Joseph Hamel (ASPR), and Katie Rae (MIT’s The Engine). The questions and responses are summarized below:

In the last panel, we heard from a variety of perspectives on this deep and comprehensive issue, what are a few priorities you have for improving the manufacturing base?

Elisabeth Reynolds: The last panel presented the imperative and opportunity of today’s moment perfectly. The administration is working to reframe the nation’s thoughts on industrial policy. All of those problems outlined existed before the pandemic. What we’re addressing now is a new commitment and understanding that this is not just about national security—it’s about economic security. We don’t need to build and make everything here, but we need to build and make a lot here, from commodities to next-gen technology. We have to support small and medium-sized businesses. The administration’s plans compliment the Industrial Finance Corporation bill and the initiatives included in it. There is a real effort to include and support communities, schools, and people who have not been included. We’refocusing on the regional level—we are aiming to have workforce training at the regional level to build a pipeline for the next generation of workers in manufacturing. Another critical component is the climate agenda, which manufacturing facilities should leverage demonstration funding, tax credits, and procurement to facilitate, especially on the latter, with the role of government as a buyer. Finally, each of these issues, must be approached through an equity lens, in terms of geographic, racial, small vs. big business, and more. We need to create a level playing field, that is where America will thrive.

“President Biden recently issued an Executive Order 14017 directing the US government to undertake a comprehensive review of six critical US industrial base sectors. ASPR is the lead for the public health and biological preparedness industry base review. What can you tell us about these efforts to date?”

Joseph Hamel: These efforts are focused on furthering the relationships and leveraging partnerships that were discovered during pandemic response, from the Food and Drug Administration to the Defense Advanced Research Projects Agency and National Institute of Standards and Technology, it is important to explore the right level of coordination. We are conducting a review of essential medicines to identify the most critical and relevant, then exploring potential threats and ways to invest andimprove the supply chain for these drugs. We’re bringing in clinicians, manufacturers and distributor partners to ask questions like “what is the most vulnerable item in our global supply chain and how can we act on it? We’re also establishing an innovation laboratory with FDA to evaluate a wide array of products that are subject to shortage and geographic production dependencies. We are also investigating overlooked capacities for the assembly of these products and leveraging opportunities inside and outside of government so manufacturers can realize new capabilities at scale. We need a more resilient global supply chain, as was demonstrated in the pandemic. And we have to think about doing this with lower-cost, lower-footprint environmental impact so that we can become competitive inside a larger ecosystem.

A few weeks ago the Day One Project held a listening session with several startups in cleantech, semiconductor, and bioeconomy industries that governments overseas provide more incentives, from subsidies to more available tools, to manufacture there than in the United States. What is the most important way to make it easier for small and medium sized companies to manufacture in the United States?

Katie Rae: The Engine was founded to go after the world’s biggest problems. Advanced manufacturing is one of them—ensuring foundational industries are built here. This collides with everything, including our supply chains. The impact is not theoretical—how do we get vaccines to everyone? There’s been a lot of innovation, but our current system didn’t want to support the ideas because they were out of favor for investments. We had the ideas, but we didn’t have the financing, this was a market failure. We need funding to bring these ideas to life. When startups begin scaling, they need capital to do so. It is not inherently provided by the private market, so governments are not picking winners and losers but rather ensuring that money goes to a list of potential winners.

Elisabeth Reynolds: The comments about the financing gap are exactly right. We have less support for the scale up of cutting-edge technologies at their later stage of development. We need more time and capital to get these ideas there. Katie’s team is focused on finding this capital and supporting the commercialization into government. We also have a growing shift in the mindset of the country—first thought has been to take manufacturing offshore, but the equalization of the costs is bringing some of this production back to our shores.

If you were to ask the audience to work on a specific domain, what would you challenge them to do?

Elisabeth Reynolds: We should build in on the positive programs we have; Joe’s is a great example. We also can’t forget about the examples of work outside of government. We innovate well across a wide range of places and the government needs to be a partner in supporting this.

Katie Rae: Loan guarantee programs in procurement is a must-have. Other governments will do it and our companies will relocate their headquarters there.

Joseph Hamel: Furthering investments in platform technology development. We need to leverage what is growing as a bioeconomy initiative and use these applications to create end products that we never thought were achievable. We should explore material science applications and innovation in quality by design, up front.

Interview with Erika DeBenedictis

2022 Bioautomation Challenge: Investing in Automating Protein Engineering
Thomas Kalil, Chief Innovation Officer of Schmidt Futures, interviews biomedical engineer Erika DeBenedictis

Schmidt Futures is supporting an initiative – the 2022 Bioautomation Challenge – to accelerate the adoption of automation by leading researchers in protein engineering. The Federation of American Scientists will act as the fiscal sponsor for this challenge.

​This initiative was designed by Erika DeBenedictis, who will also serve as the program director. Erika holds a PhD in biological engineering from MIT, and has also worked in biochemist David Baker’s lab on machine learning for protein design ​​at the University of Washington in Seattle.  

​Recently, I caught up with Erika to understand why she’s excited about the opportunity to automate protein engineering.

Why is it important to encourage widespread use of automation in life science research?

Automation improves reproducibility and scalability of life science. Today, it is difficult to transfer experiments between labs. This slows progress in the entire field, both amongst academics and also from academia to industry. Automation allows new techniques to be shared frictionlessly, accelerating broader availability of new techniques. It also allows us to make better use of our scientific workforce. Widespread automation in life science would shift the time spent away from repetitive experiments and toward more creative, conceptual work, including designing experiments and carefully selecting the most important problems. 

How did you get interested in the role that automation can play in the life sciences?

​I started graduate school in biological engineering directly after working as a software engineer at Dropbox. I was shocked to learn that people use a drag-and-drop GUI to control laboratory automation rather than an actual programming language. It was clear to me that automation has the potential to massively accelerate life science research, and there’s a lot of low-hanging fruit. 

Why is this the right time to encourage the adoption of automation?

​The industrial revolution was 200 years ago, and yet people are still using hand pipettes. It’s insane! The hardware for doing life science robotically is quite mature at this point, and there are quite a few groups (Ginkgo, Strateos, Emerald Cloud Lab, Arctoris) that have automated robotic setups. Two barriers to widespread automation remain: the development of robust protocols that are well adapted to robotic execution and overcoming cultural and institutional inertia.

What role could automation play in generating the data we need for machine learning?  What are the limitations of today’s publicly available data sets?

​There’s plenty of life science datasets available online, but unfortunately most of it is unusable for machine learning purposes. Datasets collected by individual labs are usually too small, and combining datasets between labs, or even amongst different experimentalists, is often a nightmare. Today, when two different people run the ‘same’ experiment they will often get subtly different results. That’s a problem we need to systematically fix before we can collect big datasets. Automating and standardizing measurements is one promising strategy to address this challenge.

Why protein engineering?

​The success of AlphaFold has highlighted to everyone the value of using machine learning to understand molecular biology. Methods for machine-learning guided closed-loop protein engineering are increasingly well developed, and automation makes it that much easier for scientists to benefit from these techniques. Protein engineering also benefits from “robotic brute force.” When you engineer any protein, it is always valuable to test more variants, making this discipline uniquely benefit from automation. 

If it’s such a good idea, why haven’t academics done it in the past?

​Cost and risk are the main barriers. What sort of methods are valuable to automate and run remotely? Will automation be as valuable as expected? It’s a totally different research paradigm; what will it be like? Even assuming that an academic wants to go ahead and spend $300k for a year of access to a cloud laboratory, it is difficult to find a funding source. Very few labs have enough discretionary funds to cover this cost, equipment grants are unlikely to pay for cloud lab access, and it is not obvious whether or not the NIH or other traditional funders would look favorably on this sort of expense in the budget for an R01 or equivalent. Additionally, it is difficult to seek out funding without already having data demonstrating the utility of automation for a particular application. All together, there are just a lot of barriers to entry.

You’re starting this new program called the 2022 Bioautomation Challenge. How does the program eliminate those barriers?

​This program is designed to allow academic labs to test out automation with little risk and at no cost. Groups are invited to submit proposals for methods they would like to automate. Selected proposals will be granted three months of cloud lab development time, plus a generous reagent budget. Groups that successfully automate their method will also be given transition funding so that they can continue to use their cloud lab method while applying for grants with their brand-new preliminary data. This way, labs don’t need to put in any money up-front, and are able to decide whether they like the workflow and results of automation before finding long-term funding.

Historically, some investments that have been made in automation have been disappointing, like GM in the 1980s, or Tesla in the 2010s. What can we learn from the experiences of other industries? Are there any risks?

​For sure. I would say even “life science in the 2010s” is an example of disappointing automation: academic labs started buying automation robots, but it didn’t end up being the right paradigm to see the benefits. I see the 2022 Bioautomation Challenge as an experiment itself: we’re going to empower labs across the country to test out many different use cases for cloud labs to see what works and what doesn’t.

Where will funding for cloud lab access come from in the future?

​Currently there’s a question as to whether traditional funding sources like the NIH would look favorably on cloud lab access in a budget. One of the goals of this program is to demonstrate the benefits of cloud science, which I hope will encourage traditional funders to support this research paradigm. In addition, the natural place to house cloud lab access in the academic ecosystem is at the university level. I expect that many universities may create cloud lab access programs, or upgrade their existing core facilities into cloud labs. In fact, it’s already happening: Carnegie Mellon recently announced they’re opening a local robotic facility that runs Emerald Cloud Lab’s software.

What role will biofabs and core facilities play?

​In 10 years, I think the terms “biofab,” “core facility,” and “cloud lab” will all be synonymous. Today the only important difference is how experiments are specified: many core facilities still take orders through bespoke Google forms, whereas Emerald Cloud Lab has figured out how to expose a single programming interface for all their instruments. We’re implementing this program at Emerald because it’s important that all the labs that participate can talk to one another and share protocols, rather than each developing methods that can only run in their local biofab. Eventually, I think we’ll see standardization, and all the facilities will be capable of running any protocol for which they have the necessary instruments.

In addition to protein engineering, are there other areas in the life sciences that would benefit from cloud labs and large-scale, reliable data collection for machine learning?

​I think there are many areas that would benefit. Areas that struggle with reproducibility, are manually repetitive and time intensive, or that benefit from closely integrating computational analysis with data are both good targets for automation. Microscopy and mammalian tissue culture might be another two candidates. But there’s a lot of intellectual work for the community to do in order to articulate problems that can be solved with machine learning approaches, if given the opportunity to collect the data.

An interview with Martin Borch Jensen, Co-founder of Gordian Biotechnology

Recently, I caught up with Martin Borch Jensen, the Chief Science Officer of the biotech company Gordian Biotechnology.  Gordian is a therapeutics company focused on the diseases of aging.

​Martin did his Ph.D. in the biology of aging, received a prestigious NIH award to jumpstart an academic career, but decided to return the grant to launch Gordian.  Recently, he designed and launched a $26 million competition called Longevity Impetus Grants.  This program has already funded 98 grants to help scientists address what they consider to be the most important problems in aging biology (also known as geroscience).  There is a growing body of research which suggests that there are underlying biological mechanisms of aging, and that it may be possible to delay the onset of multiple chronic diseases of aging, allowing people to live longer, healthier lives.

​I interviewed Martin not only because I think that the field of geroscience is important, but also because I think the role that Martin is playing has significant benefits for science and society, and should be replicated in other fields.  With this work, essentially, you could say that Martin is serving as a strategist for the field of geroscience as a whole, and designing a process for the competitive, merit-based allocation of funding that philanthropists such as Juan Benet, James Fickel, Jed McCaleb, Karl Pfleger, Fred Ehrsam, and Vitalik Buterin have confidence in, and have been willing to support. Martin’s role has a number of potential benefits:

Below is a copy of the Q&A conducted over email between me and Martin Borch Jensen.​

Tom Kalil:  What motivated you to launch Impetus grants?

​Martin Borch Jensen: Hearing Patrick Collison describe the outcomes of the COVID-19 Fast Grants. Coming from the world of NIH funding, it seemed to me that the results of this super-fast program were very similar to the year-ish cycle of applying for and receiving a grant from the NIH. If the paperwork and delays could be greatly reduced, while supporting an underfunded field, that seemed unambiguously good.

​My time in academia had also taught me that a number of ideas exist, with great potential impact but that fall outside of the most common topics or viewpoints and thus have trouble getting funding. And within aging biology, several ‘unfundable’ ideas turned out to shape the field (for example, DNA methylation ‘clocks’, rejuvenating factors in young blood, and the recent focus on partial epigenetic reprogramming). So what if we focused funding on ideas with the potential to shape thinking in the field, even if there’s a big risk that the idea is wrong? Averaged across a lot of projects, it seemed like that could result in more progress overall.

​TK:  What enabled you to do this, given that you also have a full-time job as CSO of Gordian? 

​MBJ: I was lucky (or prescient?) in having recently started a mentoring program for talented individuals who want to enter the field of aging biology. This Longevity Apprenticeship program is centered on contributing to real-life projects, so Impetus was a perfect fit. The first apprentices, mainly Lada Nuzhna and Kush Sharma, with some help from Edmar Ferreira and Tara Mei, helped set up a non-profit to host the program, designed the website and user interface for reviewers, communicated with universities, and did a ton of operational work. 

​TK:  What are some of the most important design decisions you made with respect to the competition, and how did it shape the outcome of the competition?

​MBJ: A big one was to remain blind to the applicant while evaluating the impact of the idea. The reviewer discussion was very much focused on ‘will this change things, if true’. We don’t have a counterfactual, but based on the number of awards that went to grad students and postdocs (almost a quarter) I think we made decisions differently than most funders. 

Another innovation was to team up with one of the top geroscience journals to organize a special issue where Impetus awardees would be able to publish negative results – the experiments showing that their hypothesis is incorrect. In doing so, we both wanted to empower researchers to take risks and go for their boldest ideas (since you’re expected to publish steadily, risky projects are disincentivized for career reasons), and at the same time take a step towards more sharing of negative results so that the whole field can learn from every project. 

​TK:  What are some possible future directions for Impetus?  What advice do you have for philanthropists that are interested in supporting geroscience?

​MBJ: I’m excited that Lada (one of the Apprentices) will be taking over to run the Impetus Grants as a recurring funding source. She’s already started fundraising, and we have a lot of ideas for focused topics to support (for example, biomarkers of aging that could be used in clinical trials). We’re also planning a symposium where the awardees can meet, to foster a community of people with bold ideas and different areas of expertise.

​One thing that I think could greatly benefit the geroscience field, is to fund more tools and methods development, including and especially by people who aren’t pureblooded ‘aging biologists’. Our field is very limited in what we’re able to measure within aging organisms, as well as measuring the relationships between different areas of aging biology. Determining causal relationships between two mechanisms, e.g. DNA damage and senescence, requires an extensive study when we can’t simultaneously measure both with high time resolution. And tool-building is not a common focus within geroscience. So I think there’d be great benefit to steering talented researchers who are focused on that towards applications in geroscience. If done early in their careers, this could also serve to pull people into a long-term focus on geroscience, which would be a terrific return on investment. The main challenges to this approach are to make sure the people are sincerely interested in aging biology (or at least properly incentivized to solve important problems there), and that they’re solving real problems for the field. The latter might be accomplished by pairing them up with geroscience labs.

​TK:  If you were going to try to find other people who could play a similar role for another scientific field, what would you look for?

​MBJ: I think the hardest part of making Impetus go well was finding the right reviewers. You want people who are knowledgeable, but open to embracing new ideas. Optimistic, but also critical. And not biased towards their own, or their friends’, research topics. So first, look for a ringleader who possesses these traits, and who has spent a long time in the field so that they know the tendencies and reputations of other researchers. In my case, I spent a long time in academia but have now jumped to startups, so I no longer have a dog in the fight. I think this might well be a benefit for avoiding bias.

​TK:  What have you learned from the process that you think is important for both philanthropists considering this model and scientists that might want to lead an initiative in their field?

​MBJ: One thing is that there’s room to improve the basic user interface of how reviews are done. We designed a UI based on what I would have wanted while reviewing papers and grants. Multiple reviewers wrote to us unprompted that this was the smoothest experience they’d had. And we only spent a few weeks building this. So I’d say, it’s working putting a bit of effort into making things work smoothly at each step.

​As noted above, getting the right reviewers is key. Our process ran smoothly in large part because the reviewers were all aligned on wanting projects that move the needle, and not biased towards specific topics.

​But the most important thing we learned, or validated, is that this rapid model works just fine. We’ll see how things work out, but I think that it is highly likely that Impetus will support more breakthroughs than the same amount of money distributed through a traditional mechanism, although there may be more failures.  I think that’s a tradeoff that philanthropists should be willing to embrace.

​TK:  What other novel approaches to funding and organizing research should we be considering?

​MBJ: Hmmm, that’s a tough one. So many interesting experiments are happening already.

​One idea we’ve been throwing around in the Longevity Apprenticeship is ‘Impetus for clinical trials’. Fast Grants funded several trials of off-patent drugs, and at least one (fluvoxamine) now looks very promising. Impetus funded some trials as well, but within geroscience in particular, there are several compounds with enough evidence that human trials are warranted, but which are off-patent and thus unlikely to be pursued by biopharma.

​One challenge for ‘alternative funding sources’ is that most work is still funded by the NIH. So there has to be a possibility of continuity of research funded by the two mechanisms. Given the amount of funding we had for Impetus (4-7% of the NIA’s budget for basic aging biology), what we had in mind was funding bold ideas to the point where sufficient proof of concept data could be collected so that the NIH would be willing to provide additional funding. Whatever you do, keeping in mind how the projects will garner continued support is important.

Biden, You Should Be Aware That Your Submarine Deal Has Costs

For more than a decade, Washington has struggled to prioritize what it calls great power competition with China — a contest for military and political dominance. President Biden has been working hard to make the pivot to Asia that his two predecessors never quite managed.

The landmark defense pact with Australia and Britain, AUKUS, that Mr. Biden announced this month is a major step to making that pivot a reality. Under the agreement, Australia will explore hosting U.S. bombers on its territory, gain access to advanced missiles and receive nuclear propulsion technology to power a new fleet of submarines.

Read the full op-ed at the New York Times. 

Why and How Faculty Should Participate in U.S. Policy Making

If the U.S. Congress is to produce sound policies that benefit the public good, science and technology faculty members must become active participants in the American policy-making process. One key element of that process is congressional hearings: public forums where members of Congress question witnesses, learn about pressing issues, develop policy initiatives and conduct oversight of both the executive branch and corporate practices.

Faculty in science and technology should contribute to congressional hearings because: 1) legislators should use data and scientifically derived knowledge to guide policy development, 2) deep expertise is needed to support effective oversight of complex issues like the spread of misinformation on internet platforms or pandemic response, and 3) members of Congress are decision makers on major issues that impact the science and technology community, such as research funding priorities or the role of foreign nationals in the research enterprise. A compelling moment during a hearing can have a profound impact on public policy, and faculty members can help make those moments happen.

Read the full article at Inside Higher Ed.