DNA Data Storage using Hamming and Reed-Solomon Codes

DNA Data Opslag met Hamming en Reed-Solomon Codes

More Info
expand_more

Abstract

Nowadays, enormous amounts of data are produced on a daily basis. Whether it is a cute family
picture, a funny cat video, or a scientic paper, all of the data is stored. The challenge in data
storage nowadays is about nding a way to store a lot of data, in such a way that it will stay
preserved for many years without too much maintenance and such that if errors occur, the data
can still be retrieved. DNA is a great choice for this, since the DNA from extinct species that
lived 10,000 years ago can still be retrieved, it does not need maintenance and it is estimated that
it can store 5 PB per gram [2]. However, in reading and writing DNA, substitution, insertion
and deletion errors occur, so the data needs to be protected against these errors. Therefore,
several coding methods have already been invented and researched. Takahasi et al. [1] designed
a full automated system for writing, storing and reading data, which consisted only of the word
hello, using DNA.
This thesis focuses on the coding method used by [1], namely a Hamming code, and compares
it to the implementation of a DNA based Reed-Solomon code, applied to the same data. An
analysis is made based on the net information density, GC-weight, homopolymer runs and
the error detection and correction properties. As expected, there is a trade-o between the
net information density and the error detection and correction properties. Although the net
information density of the Reed-Solomon code is lower, it can correct more errors and it has the
potential of also being applied to a bigger data set.