DATA COMPRESSION FOR DNA SEQUENCE

Sumoom Daghal, Asaad

DATA COMPRESSION FOR DNA SEQUENCE

Author

Asaad Sumoom Daghal

Abstract

DNA Sequences making up any organism comprise the basic blueprint of that organism so that understanding and analyzing different genes within sequences has become an extremely important task. Biologists are producing huge volumes of DNA sequences every day that makes genome sequence database growing exponentially. The databases such as Gen-Bank represent millions of DNA sequences filling many thousands of gigabytes computer storage capacity. Hence an efficient algorithm to compress DNA sequence is required. In this paper compression algorithm which is called “Huffman code tree” is used to code and compress DNA sequences. Depending upon this algorithm we assigning binary bit codes (0 and 1) for each base (A, T, C, and G). After assigning the bases by bit codes, we determine the code for each base. Code for each base is determined by tracing out the path from the root of the tree to the leaf that represents that base.
Huffman code provides a variable code length. In fact the codes for characters having a higher frequency of occurrence are shorter than those codes for characters having lower frequency. So this algorithm compress DNA sequences better than from old method (fixed length) if we assigning 2 bits per base. From analysis the results, average code length (1.62 bits/base) can be achieved using this algorithm. For a higher compression ratio advised to use other compression method with the proposed method such as the learning automata.

Keywords

DNA

Huffman code

Compression

Al-Qadisiyah Journal for Engineering Sciences

Volume 6, Issue 1
Winter 2013
Pages 26-34

XML

PDF 0 K

Receive Date 01 March 2013
Revise Date 20 March 2013
Accept Date 25 March 2013

Article View	255
PDF Download	128

Advanced Search

Al-Qadisiyah Journal for Engineering Sciences

DATA COMPRESSION FOR DNA SEQUENCE

Volume 6, Issue 1
Winter 2013
Pages 26-34

Submit Manuscript

Guide for Authors

Article Processing Charges (APC)

Reviewers

Call for Reviewers

Contact Us

Al-Qadisiyah Journal for Engineering Sciences

DATA COMPRESSION FOR DNA SEQUENCE

Volume 6, Issue 1Winter 2013Pages 26-34

Files

History

Share

How to cite

Statistics

Submit Manuscript

Browse

Journal Info

Guide for Authors

Article Processing Charges (APC)

Reviewers

Call for Reviewers

Contact Us

Volume 6, Issue 1
Winter 2013
Pages 26-34