Science Policy

DNA for Data Storage and Retrieval

07.21.21 | 2 min read | Text by Tricia White

To reduce the burden on traditional data centers, improving on DNA data storage could be the key

The pace at which data – such as photos, videos, and social media posts – are being generated is ramping up drastically, exceeding the scaling limits of traditional silicon-based data storage technologies, and DNA could be deployed to help meet this challenge. As an indication of the massive amount of data storage that may be required, one model predicts that by the year 2030, electricity use by data centers could approach about eight percent of total global electricity demand. New paradigms for data storage, such as the use of DNA for preserving information, are necessary.

DNA is genetic material that contains plans for the design of living things, but DNA can also be used to store data created by living things. DNA is an attractive material for data storage – it is stable, writable, readable, and information dense. In theory, the entire world’s data could be stored in a coffee mug-sized portion of DNA.

So how does storing, for example, a video, in DNA work? (See Figure 1.) First, an algorithm is used to encode the video into the As, Ts, Cs, and Gs that make up DNA molecules. The DNA molecules are then synthesized, and stored. To access the data, the DNA molecules would be sequenced, and the DNA sequences translated using the same algorithm, reproducing the video.

Figure 1.

Data storage and retrieval in DNA. First, data – like those stored on a computer hard drive – are processed by an algorithm that translates 1s and 0s into DNA sequences made up of As, Ts, Cs, and Gs. DNA strands with those sequences are then synthesized – or written – and stored either in living cells (in vivo) or in the test tube (in vitro). Data can be retrieved from storage in part by using PCR – the same technology deployed to test for the coronavirus that causes COVID-19 – to selectively target specific data packages. The PCR products can be read with DNA sequencing instruments, providing the original DNA sequences, and reproducing the data. Figure adapted from Ceze, Nivala, and Strauss 2019, Nature Reviews Genetics.

DNA is a polymer – a substance consisting of a high number of similar building blocks that are linked together – and other polymers can be used to store information, too. For example, plastic polymers are being explored for information-storage applications; one group synthesized a plastic polymer that, when read out, reproduced a quote by Jane Austen. By expanding experimental development efforts into (i) increasing the rates at which DNA can be synthesized and sequenced and (ii) detecting and correcting for errors in DNA synthesis, and by pursuing fundamental research into data storage across a variety of polymers, it is possible the U.S. science and technology enterprise could devise a polymer-based method for rapid data storage and retrieval, and meet the data storage challenge.

This CSPI Science and Technology Policy Snapshot expands upon a scientific exchange between Congressman Bill Foster (D, IL-11) and his new FAS-organized Science Council.