DNA Data Storage using Hamming and Reed-Solomon Codes

DNA Data Opslag met Hamming en Reed-Solomon Codes

Bachelor Thesis (2019)
Author(s)

E. Slingerland (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

J.H. Weber – Mentor (TU Delft - Discrete Mathematics and Optimization)

B. van den Dries – Graduation committee member (TU Delft - Analysis)

Dion Gijswijt – Graduation committee member (TU Delft - Discrete Mathematics and Optimization)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2019 Eva Slingerland
More Info
expand_more
Publication Year
2019
Language
English
Copyright
© 2019 Eva Slingerland
Graduation Date
01-07-2019
Awarding Institution
Delft University of Technology
Programme
['Applied Mathematics']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Nowadays, enormous amounts of data are produced on a daily basis. Whether it is a cute family
picture, a funny cat video, or a scientic paper, all of the data is stored. The challenge in data
storage nowadays is about nding a way to store a lot of data, in such a way that it will stay
preserved for many years without too much maintenance and such that if errors occur, the data
can still be retrieved. DNA is a great choice for this, since the DNA from extinct species that
lived 10,000 years ago can still be retrieved, it does not need maintenance and it is estimated that
it can store 5 PB per gram [2]. However, in reading and writing DNA, substitution, insertion
and deletion errors occur, so the data needs to be protected against these errors. Therefore,
several coding methods have already been invented and researched. Takahasi et al. [1] designed
a full automated system for writing, storing and reading data, which consisted only of the word
hello, using DNA.
This thesis focuses on the coding method used by [1], namely a Hamming code, and compares
it to the implementation of a DNA based Reed-Solomon code, applied to the same data. An
analysis is made based on the net information density, GC-weight, homopolymer runs and
the error detection and correction properties. As expected, there is a trade-o between the
net information density and the error detection and correction properties. Although the net
information density of the Reed-Solomon code is lower, it can correct more errors and it has the
potential of also being applied to a bigger data set.

Files

License info not available