Constrained Codes for DNA-Based Storage Systems

None, None

Constrained Codes for DNA-Based Storage Systems

Bachelor Thesis (2020)

Author(s)

C.J. van Leeuwen (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

J.H. Weber – Mentor (TU Delft - Discrete Mathematics and Optimization)

J.A.M. de Groot – Graduation committee member (TU Delft - Mathematical Physics)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Constant-weight codes DNA-based data storage Runlength-limited sequences.

To reference this document use:

https://resolver.tudelft.nl/uuid:2262b42f-4b77-4862-a820-60cf970bd166

More Info

expand_more

Publication Year

2020

Language

English

Copyright

Graduation Date

27-05-2020

Awarding Institution

Delft University of Technology

Programme

['Applied Mathematics']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Every day we produce an extremely high amount of data and a significant portion of this data is archival data. It is an important challenge to save this data in a cheap and environmentally friendly way. The current methods to save archival data are sufficient, but improvements can be made. Synthetic DNA based data storage is a great choice to do so. DNA is (roughly) made out of four nucleotides, Adine (A), Cytosine (C), Guanine (G) and Thymine (T). To save data using DNA the binary data is encoded to quaternary data and later DNA strands are created using this quaternary data. During the process of saving the data errors can occur. Previous research [3] has found that some errors can be prevented by taking two constraints into account.The no run-length constraint, which states that no DNA word can have two repeated symbolsand GC-weight constraint, which states that every DNA word must have a fixed number of G and C nucleotides. These constraints reduce the number of quaternary data sequences that can be used to save data. The aim of this thesis is to improve the lower bound on the maximum number of quaternary data sequences that satisfy the no run-length constraint, the fixed GC-weight and a minimum distance. Limbachiya et. al. [1] gave an algorithm to compute a set with a given minimum distance of quaternary data sequences that satisfies the constraints. The size of this set is the lower bound that we aim to improve. We do so by introducing two other algorithms that also compute a quaternary data sequences that satisfy the constraints and a minimum distance. The size of each computed set gives a lower bound on the maximum size. The lower bound computed with these two algorithms is always better or equal to the lower bound computed by Limbachiya et. al. for certain parameters. When the minimum distance of a code is 2 we know this maximum size. The formula for this maximum size is given in this thesis together with a proof for this formula.

Files

Finalversion_Bachelor_Eind_Pro... (pdf)

(pdf | 0.379 Mb)

License info not available