Scientists have discovered how to fit the maximum amount of data in a single nucleotide. Humans create a lot of digital data. And figuring out the best way to store it is a challenge. Well, researchers think they may have started to solve that problem, by figuring out an efficient way to store digital data: on DNA. Researchers report that they’ve come up with a…way to encode digital data in DNA to create the highest-density large-scale data storage scheme ever invented…the system could…store every bit of datum ever recorded by humans in a container about the size and weight of a couple of pickup trucks.
There are several advantages to using DNA. It’s a lot smaller than traditional media; a single gram can fit 215,000 times more data than a one terabyte hard drive, The Atlantitc notes. It’s also incredibly durable. Scientists are using DNA thousands of years old to de-extinct wooly mammoths, for example. But, until now, they’ve only unlocked a fraction of its storage capacity. Study coauthors Yaniv Erlich and Dina Zielinski were able to fit the theoretical maximum amount of information per nucleotide using a new method inspired by how movies stream across the internet.
Humanity has a data storage problem: More data were created in the past 2 years than in all of preceding history. And that torrent of information may soon outstrip the ability of hard drives to capture it. Now, researchers report that they’ve come up with a new way to encode digital data in DNA to create the highest-density large-scale data storage scheme ever invented. Capable of storing 215 petabytes (215 million gigabytes) in a single gram of DNA, the system could, in principle, store every bit of datum ever recorded by humans in a container about the size and weight of a couple of pickup trucks. But whether the technology takes off may depend on its cost.
Scientists have been storing digital data in DNA since 2012. That was when Harvard University geneticists George Church, Sri Kosuri, and colleagues encoded a 52,000-word book in thousands of snippets of DNA, using strands of DNA’s four-letter alphabet of A, G, T, and C to encode the 0s and 1s of the digitized file. Their particular encoding scheme was relatively inefficient, however, and could store only 1.28 petabytes per gram of DNA. Other approaches have done better. But none has been able to store more than half of what researchers think DNA can actually handle, about 1.8 bits of data per nucleotide of DNA. (The number isn’t 2 bits because of rare, but inevitable, DNA writing and reading errors.)
DNA is made of strands of molecules known as nucleotides: adenine, thymine, cytosine and guanine, abbreviated A, T, C and G. Just as patterns of ink can represent letters of the alphabet, sequences of nucleotides can be used to encode data. As genetic analyses of woolly mammoth and Neanderthal fossils has revealed, DNA can remain stable for millennia — unlike, say, magnetic tape, which can degrade within a decade. DNA is also compact and does not require any power for storage, so keeping and shipping it could prove relatively easy. Previous attempts at encoding data in strands of DNA only reached about half of DNA storage’s theoretical maximum capacity. In addition, prior work often experienced small gaps in retrieved data because of errors introduced during DNA synthesis. But Erlich took a cue from the entertainment section of the newspaper in developing DNA Fountain.
Dr. Spike Narayan, the Director of Science and Technology at IBM Research, said,”The technology to read and write DNA is already available today but it’s not necessarily accessible. DNA data storage and access is already possible from a technological standpoint, but not necessarily from an economical one. For example the cost of reading genetic data, or identifying the components of genetic material, is getting dramatically cheaper. For example, you can have the 3 billion bases in your own DNA sequenced for as little as $1000.
However, the cost of writing that data – or chemically synthesizing the sequence of nucleotides that represent your data – is a different story. Specifically, researchers in the UK estimated recently that it would cost more than $12,000 per MB to encode DNA data, but only around $200 per MB to read that data back. The hope is that the techniques for writing DNA will catch up with the amazing progress that is happening in technology to sequence or read DNA. Until there is greater demand, it will be many years until we see greater technological adoption due to cost factors.”
Incorporating data into a living organism, which continually modifies its genetic code, can allow for writing and re-writing of digital code. The way to accomplish this is to make DNA mutate itself on purpose. This could be done by enzymes programmed to react to certain digital signals. Or, mutation may be done using the concept of bacteriophages. In the same way as computer viruses alter the code of programs these viruses of bacteria can be used to alter the genetic code, effectively rewriting it. Essentially, when a person is ready to save, a virus is created and then used to infect the organism such that it changes the data.
Yaniv Erlich, an Associate Professor of Computer Science at Columbia University, and his team used the new technique to encode six files into DNA:
- A complete computer operating system known as Kolibri.
- A kind of computer virus known as a zip bomb.
- The 1895 French film “Arrival of a train at La Ciotat,” which according to urban legends terrified audiences with the moving image of a life-sized train.
- A Pioneer plaque, a copy of the metal plates placed onboard the Pioneer spacecraft meant to deliver a message to any extraterrestrial intelligences that might pick them up.
- The 1948 study “A Mathematical Theory of Communication” by information theory founder Claude Shannon, which helped shape virtually all systems that store, process or transmit digital information.
- A $50 Amazon gift card.
Of course, all of this potential has yet to be transformed into consumer products. It’ll still be at least five years before we see any movement towards silica-DNA storage reading devices and at least another five before writable living media becomes available. But, we can be sure when the information does become available we’ll be able to preserve our digital files for thousands of years and possibly show future generations all those selfies we’re taking now. In essence, DNA works just like your hard drive, but instead of binary ones and zeros to store digital data, it uses a quaternary base to store information about a living organism’s genes. DNA is an ideal storage medium because it is ultra-compact and can last hundreds of thousands of years if kept in a cool, dry place, as demonstrated by the recent recovery of DNA from the bones of a 430,000-year-old human ancestor found in a cave in Spain.