Noah Rosenberg laboratory at the University of Michigan

HGDP-CEPH human genome diversity cell line panel

The diversity panel is a large and widely-used collection of DNA samples from individuals distributed around the world. Several of our papers have utilized genotypes from the diversity panel. Here we provide microsatellite, indel, and SNP data exactly as used in these papers.

Note that slightly different versions of our microsatellite and indel data sets are located at the website of the Marshfield Clinic Research Foundation. In cases where it is of interest to compare new results on the diversity panel to what has been seen in our previous work, we recommend using the files downloadable from this site, rather than those available in Microsoft Excel from Marshfield.

Further information about the microsatellite markers, such as PCR primers and map positions, are available from Marshfield.


HGDP+India SNP data

New! (Posted June 27, 2008) HGDP+India SNP data are now available online for
TJ Pemberton*, M Jakobsson*, DF Conrad, G Coop, JD Wall, JK Pritchard, PI Patel, NA Rosenberg (2008) Using population mixtures to optimize the utility of genomic databases: linkage disequilibrium and association study design in India. Annals of Human Genetics 72: 535-546. [Abstract]

HGDP high-resolution genome-wide SNP data

(Posted Feb 26, 2008) HGDP SNP data are now available online for
M Jakobsson*, SW Scholz*, P Scheet*, JR Gibbs, JM VanLiere, H-C Fung, ZA Szpiech, JH Degnan, K Wang, R Guerreiro, JM Bras, JC Schymick, DG Hernandez, BJ Traynor, J Simon-Sanchez, M Matarin, A Britton, J van de Leemput, I Rafferty, M Bucan, HM Cann, JA Hardy, NA Rosenberg, AB Singleton (2008) Genotype, haplotype and copy-number variation in worldwide human populations. Nature 451: 998-1003. [Abstract]

HGDP SNP data

(Posted May 23, 2007) HGDP SNP data are now available online for
DF Conrad*, M Jakobsson*, G Coop*, X Wen, JD Wall, NA Rosenberg, JK Pritchard (2006) A worldwide survey of haplotype variation and linkage disequilibrium in the human genome. Nature Genetics 38: 1251-1260. [Abstract] [PDF]

Relatives

(Posted October 17, 2006) It is recommended that anyone working with the diversity panel read the following paper, which reports a variety of anomalies in the diversity panel individuals and recommends standard subsets for future use.

NA Rosenberg (2006) Standardized subsets of the HGDP-CEPH Human Genome Diversity Cell Line Panel, accounting for atypical and duplicated samples and pairs of close relatives. Annals of Human Genetics 70: 841-847. [Abstract] [PDF] [Supplement] [Spreadsheet with recommended subsets (txt format)] [Spreadsheet with recommended subsets (xls format)]


Data sets

If you are using any of the data files on this site and wish to be contacted in case of updates or modifications, please send an email to Noah Rosenberg.

377 autosomal microsatellites in 1056 individuals from 52 populations

The following data files, all in plain text format, are used in the each of the papers listed below. The markers are drawn from Marshfield screening set 10. A description of how these data files differ from those on the Marshfield site is in the online supplement to our 2002 paper.

List of papers that use the above files:

  • NA Rosenberg, JK Pritchard, JL Weber, HM Cann, KK Kidd, LA Zhivotovsky, MW Feldman (2002) Genetic structure of human populations. Science 298: 2381-2385. [Abstract] [Full Text at Science website] [PDF] [Supplement] [Software for drawing figures] [Español]
  • LA Zhivotovsky, NA Rosenberg, MW Feldman (2003) Features of evolution and expansion of modern humans, inferred from genomewide microsatellite markers. American Journal of Human Genetics 72: 1171-1186. [Abstract] [PDF]
  • NA Rosenberg, JK Pritchard, JL Weber, HM Cann, KK Kidd, LA Zhivotovsky, MW Feldman (2003) Response to comment on "Genetic structure of human populations." Science 300: 1877. [Abstract] [PDF]
  • NA Rosenberg, LM Li, R Ward, JK Pritchard (2003) Informativeness of genetic markers for inference of ancestry. American Journal of Human Genetics 73: 1402-1422. [Abstract] [PDF] [Supplement] [SNP data] [SNP data readme] [Solution to Problem 11039 required in appendix of paper (American Mathematical Monthly 112: 572-573, 2005)]
  • S Ramachandran, NA Rosenberg, LA Zhivotovsky, MW Feldman (2004) Robustness of the inference of human population structure: a comparison of X-chromosomal and autosomal microsatellites. Human Genomics 1: 87-97. [Abstract] [PDF]
  • NA Rosenberg (2005) Algorithms for selecting informative marker panels for population assignment. Journal of Computational Biology 12: 1183-1201. [Abstract] [PDF]

List of papers that use slightly altered versions of the above files (the alterations are described in the papers):

  • NA Rosenberg, PP Calabrese (2004) Polyploid and multilocus extensions of the Wahlund inequality. Theoretical Population Biology 66: 381-391. [Abstract] [PDF]

  • NA Rosenberg, MGB Blum (2007) Sampling properties of homozygosity-based statistics for linkage disequilibrium. Mathematical Biosciences 208: 33-47. [Abstract]

783 autosomal microsatellite loci and 210 insertion/deletion polymorphisms in 1048 individuals from 53 populations

The following data files, all in plain text format, are used in the each of the papers listed below. The microsatellite marker are drawn from Marshfield screening sets 10, 13, and 52, and the indels are drawn from Marshfield screening set 100. A description of how these data files differ from those on the Marshfield site is in the Ramachandran et al. (2005) and Rosenberg et al. (2005) papers.

In choosing data files for analysis, note that there are slight differences between the data used by Ramachandran et al. (2005) and those used by Rosenberg et al. (2005)

List of papers that use the above files:


2834 single-nucleotide polymorphisms polymorphisms in 927 individuals from 52 populations

Download SNP data (you will be directed first to a registration page and we would very much appreciate if you register)

List of papers that use the SNP data:

  • DF Conrad*, M Jakobsson*, G Coop*, X Wen, JD Wall, NA Rosenberg, JK Pritchard (2006) A worldwide survey of haplotype variation and linkage disequilibrium in the human genome. Nature Genetics 38: 1251-1260. [Abstract] [PDF]


525,910 single-nucleotide polymorphisms and 1428 copy-number variable loci in 485 individuals from 29 populations

Download SNP data

List of papers that use the SNP and copy number data:

  • M Jakobsson*, SW Scholz*, P Scheet*, JR Gibbs, JM VanLiere, H-C Fung, ZA Szpiech, JH Degnan, K Wang, R Guerreiro, JM Bras, JC Schymick, DG Hernandez, BJ Traynor, J Simon-Sanchez, M Matarin, A Britton, J van de Leemput, I Rafferty, M Bucan, HM Cann, JA Hardy, NA Rosenberg, AB Singleton (2008) Genotype, haplotype and copy-number variation in worldwide human populations. Nature 451: 998-1003. [Abstract]

2810 single-nucleotide polymorphisms polymorphisms in 957 individuals from 55 populations

These data update the data of Conrad et al. (2006) described above.

Download SNP data (you will be directed first to a registration page and we would very much appreciate if you register)

List of papers that use the SNP data:

  • TJ Pemberton*, M Jakobsson*, DF Conrad, G Coop, JD Wall, JK Pritchard, PI Patel, NA Rosenberg (2008) Using population mixtures to optimize the utility of genomic databases: linkage disequilibrium and association study design in India. Annals of Human Genetics 72: 535-546. [Abstract]


History

Created with 377 microsatellites, 22 November 2002
Addition of NEXUS file for 377 microsatellites, 28 December 2002
Minor modifications to site, 30 April 2004
Addition of data on 783 microsatellites and 210 indels, 1 November 2005
Addition of standardized subsets of individuals, 17 November 2006
Addition of SNP data from Conrad et al., 23 May 2007
Addition of genome-wide SNP and copy-number data, 26 February 2008
Addition of SNP data from Pemberton et al., 27 June 2008