DNA Data Storage using Hamming and Reed-Solomon Codes

DNA Data Opslag met Hamming en Reed-Solomon Codes

Bachelor thesis (2019)

Authors

E. Slingerland Electrical Engineering, Mathematics and Computer Science

Contributors

J.H. Weber Discrete Mathematics and Optimization - (supervisor 1)

B. van den Dries Analysis - (supervisor 2)

Dion Gijswijt Discrete Mathematics and Optimization - (supervisor 2)

Faculty

Electrical Engineering, Mathematics and Computer Science

Reed-Solomon DNA Hamming Data storage

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:01b4c335-f112-4f32-b589-f6d51893b302

Published Date

01-07-2019

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Nowadays, enormous amounts of data are produced on a daily basis. Whether it is a cute family
picture, a funny cat video, or a scientic paper, all of the data is stored. The challenge in data
storage nowadays is about nding a way to store a lot of data, in such a way that it will stay
preserved for many years without too much maintenance and such that if errors occur, the data
can still be retrieved. DNA is a great choice for this, since the DNA from extinct species that
lived 10,000 years ago can still be retrieved, it does not need maintenance and it is estimated that
it can store 5 PB per gram [2]. However, in reading and writing DNA, substitution, insertion
and deletion errors occur, so the data needs to be protected against these errors. Therefore,
several coding methods have already been invented and researched. Takahasi et al. [1] designed
a full automated system for writing, storing and reading data, which consisted only of the word
hello, using DNA.
This thesis focuses on the coding method used by [1], namely a Hamming code, and compares
it to the implementation of a DNA based Reed-Solomon code, applied to the same data. An
analysis is made based on the net information density, GC-weight, homopolymer runs and
the error detection and correction properties. As expected, there is a trade-o between the
net information density and the error detection and correction properties. Although the net
information density of the Reed-Solomon code is lower, it can correct more errors and it has the
potential of also being applied to a bigger data set.

Files

BEP_definitieve_versie_21_6.pd... (.pdf)

(.pdf | 0.271 Mb)