1001 Genomes

A Catalog of Arabidopsis thaliana Genetic Variation.

Logo

The original 1001 Genomes Project was launched at the beginning of 2008 to discover detailed whole-genome sequence variation in at least 1001 strains (accessions) of the reference plant Arabidopsis thaliana. The first major phase of the project was completed in 2016, with publication of a detailed analysis of 1135 genomes. Because the second-generation sequencing methods did not actually produce complete genome sequences, only simple variants are reported, and the results are invariably biased with respect to what is present or missing in the reference genome. Large or complex structural variants, as well as simple variants inside complex variants are generally missed completely.

The 1001 Genomes Plus (1001G+) Project is now remedying this problem, based on long-read sequencing and (near) complete assembly of genome sequences, with only the most repetitive portions of the genomes being incompletely represented. Though distributed efforts of labs across the world, genomes from a diverse collection of A. thaliana strains are being assembled. In the initial phase (1001G+ Phase 1), a 27 genomes were assembled and compared, to identify analytical challenges. As a next step (1001G+ Phase 2), hundreds of genomes from many different contributors are being curated and made available here.

1001G+ Phase2

A curated collection of A. thaliana genome sequences assembled from long reads, including both previously published and so far unpublished assemblies.

Coming soon