Description of Algorithms

This page describes various algorithms used in GenomeABC.

Create Hypothetical or Random Genome

We have created an array of nucleotides of length equal to the length of genome provided by the user.
Then a random number have been generated in the 'for loop(limit equals to the length of given genome)' and nucleotide corresponding to that random number was picked.
In the last step we have prepared a string of nucleotides by adding randomly picked nucleotides again and again.

We have created two arrays, one of four bases i.e. A,T,G and C and other is of the genome provided by the user.
A random number 'mutvalue' have been generated from the genome provided, which is equal to (Genome length*Percentage mutation value(%))/100.
Another random number have been generated from the array of four nucleotides.
Then the nucleotides corresponds to the both random numbers were picked.
In a 'for loop(limit<=mutvalue)', two bases from these two arrays the replaced by each other.
In this way, a genome can be mutated.
> Limitation:- Same random number can be generated in many steps. This might change a base again and again at a particular position. So, a genome might not be 100% mutated.

In the 'while loop' the genome file opened and whole genome is treated as a string.
A variable 'numread' is generating where 'numread' = (Genome length* Coverage)/Read size.
In a 'for loop(limit<=numread)' we generated a random number and cut a substring from the position equal to that random number. This is the strategy to make a fragment, like Solexa technology.
Now, for single end reads, we cut a substring of length equal to read length provided by the user, from that fragment in this for loop.
For paired end reads a substring of length equal to read length cutted from the opposite end of that fragment.Then we have changed the nucleotides by complementary nucleotides. Then we have reversed the read.
In this way, we have generated solexa single end and paired end reads as well.

N 50 Contig length = The contig length such that 50% of the the denovo assembled genome lies in blocks of this size or larger.

Genome covered (%) = Total genome covered(Nucleotides) * 100 / Total reference genome size

Contig matches (%) = Total nucleotides of contigs matches to reference genome * 100 / Total contig size sum

Error rate contig(%) = (Total mismatches + Total query gaps + Total hit gaps + Total N's) * 100 / Total contig size sum

Error rate of total assembly(%) = (Total unalienable base count + Total unalienable contig base count + Total mismatches) * 100 / Total contig size sum