1001 Genomes

A Catalog of Arabidopsis thaliana Genetic Variation.


Explore the variants. We maintain several tools for data download, visualization, and analysis.



Visit the Data Center and download whole sets of SNPs, indels, SVs, and genome sequences.


The 1001 Genomes Plus Vision

The 1001 Genomes Project was launched at the beginning of 2008 to discover detailed whole-genome sequence variation in at least 1001 strains (accessions) of the reference plant Arabidopsis thaliana. The first major phase of the project was completed in 2016, with publication of a detailed analysis of 1135 genomes. Unfortunately, the second-generation sequencing methods that have made it economically feasible to screen large numbers of individuals do not actually produce complete genome sequences — they produce massive numbers of very short sequence fragments that must be aligned to a reference genome in order to identify variants. Because of this, only simple variants are reported, and the results are invariably biased with respect to what is present or missing in the reference genome. Large or complex structural variants, as well as simple variants inside complex variants are generally missed completely. To remedy this problem, we have recently begun the second major phase, the 1001G+ project. We have begun to assemble genomes from a diverse collection of A. thaliana strains, with the goal of annotating them with transcriptome and epigenome information, and to develop tools to make the results available to the community.

Read more ...

Latest News

More news >>


The main publications for the 1001 Genomes Project:

1,135 Genomes Reveal the Global Pattern of Polymorphism in Arabidopsis thaliana
1001 Genomes Consortium
Cell (2016), 166(2) 481-91.
Epigenomic Diversity in a Global Collection of Arabidopsis thaliana Accessions
Kawakatsu T., Huang S.C., Jupe F., Sasaki E. et al., 1001 Genomes Consortium
Cell (2016), 166(2) 492-505

Papers that described subsets were:

Sequencing of natural strains of Arabidopsis thaliana with short reads
Ossowski S., Schneeberger K., Clark R. M., Lanz C., Warthmann N. and Weigel D.
Genome Res (2008), 18(12) 2024-33.
Reference-guided assembly of four diverse Arabidopsis thaliana genomes
Schneeberger K., Ossowski S., Ott F., Klein J. D., Wang X., Lanz C., Smith L. M., Cao J., Fitz J., Warthmann N., Henz S. R., Huson D. H. and Weigel D.
Proc Natl Acad Sci U S A (2011), 108(25) 10249-54.
Whole-genome sequencing of multiple Arabidopsis thaliana populations
Cao J., Schneeberger K., Ossowski S., Gunther T., Bender S., Fitz J., Koenig D., Lanz C., Stegle O., Lippert C., Wang X., Ott F., Muller J., Alonso-Blanco C., Borgwardt K., Schmid K. J. and Weigel D.
Nat Genet (2011), 43(10) 956-63.
Massive genomic variation and strong selection in Arabidopsis thaliana lines from Sweden
Long Q, Rabanal FA, Meng D, Huber CD, Farlow A, Platzer A, Zhang Q, Vilhjálmsson BJ, Korte A, Nizhynska V, Voronin V, Korte P, Sedman L, Mandáková T, Lysak MA, Seren Ü, Hellmann I, Nordborg M.
Nat Genet (2013) 45, 884–890
Patterns of population epigenomic diversity.
Schmitz RJ, Schultz MD, Urich MA, Nery JR, Pelizzola M, Libiger O, Alix A, McCosh RB, Chen H, Schork NJ, Ecker JR.
Nature (2013) 495, 193–198
Multiple reference genomes and transcriptomes for Arabidopsis thaliana.
Gan X, Stegle O, Behr J, Steffen JG, Drewe P, Hildebrand KL, Lyngsoe R, Schultheiss SJ, Osborne EJ, Sreedharan VT, Kahles A, Bohnert R, Jean G, Derwent P, Kersey P, Belfield EJ, Harberd NP, Kemen E, Toomajian C, Kover PX, Clark RM, Rätsch G, Mott R.
Nature (2011) 477, 419–423