Dataset for Benchmarking

In this study, we evaluate performance of existing assemblers on one real genome Pseudomonas syringae and four hypothetical genomes. These genomes and their short reads are available for users to download from here. User can use this dataset for evaluating performance of their assemblers. Following is description of these genomes.

Real Genome (Pseudomonas syringae pv. syringae B728a)

Organism	Pseudomonas syringae pv. syringae B728a
Sequencing type	Illumina's Solexa, Paired end sequencing
Total number of reads generated	7102266
Total number of Paired end reads	3551133
Genome size	6 Mb
Sequencing Coverage	40X
Length of each read	36 Nucleotide
Insert Length	400 base pair
Download	Complete Genome and Short Reads

Hypothetical Dataset
In this study, we created four hypothetical genomes of size 6MB each and their paired end reads (i.e. Solexa type). Short read are created at coverage of 10X, 20X, 30X and 40X. Size of short reads is 36 base-pair and inset-length 400 base pairs. We evaluated performance of different assemblers on these hypothetical genomes. These genomes and their short reads are available for public so user can evaluate their server on this dataset.

Genome Name	Coverage	Download Genome	Download Short Reads
GenomeA	10X	GenomeA	Short Reads
GenomeB	20X	GenomeB	Short Reads
GenomeC	30X	GenomeC	Short Reads
GenomeD	40X	GenomeD	Short Reads

GenomeABC: Benchmarking of Genome Assemblers

Dataset for Benchmarking

General

Benchmarking

Create data

Important