Dataset for Benchmarking
In this study, we evaluate performance of existing assemblers on one real genome Pseudomonas syringae and four hypothetical genomes. These genomes and their short reads are available for users to download from here. User can use this dataset for evaluating performance of their assemblers. Following is description of these genomes.Real Genome (Pseudomonas syringae pv. syringae B728a)
Organism | Pseudomonas syringae pv. syringae B728a |
Sequencing type | Illumina's Solexa, Paired end sequencing |
Total number of reads generated | 7102266 |
Total number of Paired end reads | 3551133 |
Genome size | 6 Mb |
Sequencing Coverage | 40X |
Length of each read | 36 Nucleotide |
Insert Length | 400 base pair |
Download | Complete Genome and Short Reads |
Hypothetical Dataset
In this study, we created four hypothetical genomes of size 6MB each and their paired end reads (i.e. Solexa type). Short read are created at coverage of 10X, 20X, 30X and 40X. Size of short reads is 36 base-pair and inset-length 400 base pairs. We evaluated performance of different assemblers on these hypothetical genomes. These genomes and their short reads are available for public so user can evaluate their server on this dataset.
Genome Name | Coverage | Download Genome | Download Short Reads |
GenomeA | 10X | GenomeA | Short Reads |
GenomeB | 20X | GenomeB | Short Reads |
GenomeC | 30X | GenomeC | Short Reads |
GenomeD | 40X | GenomeD | Short Reads |