Example Datasets

DATASET DESCRIPTION RELEVANT DATA
(save link to download)
REFERENCE
Gender Transcriptional profiles from male and female lymphoblastoid cell lines
Results of C1 GSEA analysis of this dataset
Results of C2 GSEA analysis of this dataset
Gender_hgu133a.gct
Gender_collapsed.gct
Gender.cls
Unpublished
p53 Transcriptional profiles from p53+ and p53 mutant cancer cell lines
Results of C2 GSEA analysis of this dataset
P53_hgu95av2.gct
P53_collapsed.gct
P53.cls
Unpublished
Diabetes Transcriptional profiles of smooth muscle biopsies of diabetic and normal individuals
Results of C2 GSEA analysis of this dataset
Diabetes_hgu133a.gct
Diabetes_collapsed.gct
Diabetes.cls
Mootha et al. (2003) Nat Genet 34(3): 267-73
Leukemia Transcriptional profiles from leukemias - ALL and AML
Results of C1 GSEA analysis of this dataset
Leukemia_hgu95av2.gct
Leukemia_collapsed.gct
Leukemia.cls
Armstrong et al. (2002) Nat Genet 30(1): 41-7.
Lung cancer Transcriptional profiles from two independent lung cancer outcome datasets Lung_Michigan_hu6800.gct
Lung_Michigan_collapsed.gct
Lung_Mich_collapsed_common_Mich_Bost.gct Lung_Michigan.cls

Lung_Boston_hgu95av2.gct
Lung_Boston_collapsed.gct
Lung_Bost_collapsed_common_Mich_Bost.gct Lung_Boston.cls

Beer et al. (2002) Nat Med 8(8): 816-24.
Bhattacharjee et al. (2001) Proc Natl Acad Sci U S A 98(24): 13790-5.
Gene sets Archived gene sets from the GSEA PNAS 2005 publication.

Note: This collection of gene sets is not the latest version, so when beginning a new analysis you might want to download the current collection of gene sets from the Downloads page.
C1.symbols.gmt (positional)
C2.symbols.gmt (curated)
Subramanian and Tamayo PNAS 2005

'Collapsed' refers to datasets whose identifiers (i.e Affymetrix probe set ids) have been replaced with symbols. In this process, all probe sets that map to a particular gene are summarized into a single expression vector by picking the maximum expression value in each sample. A utility to do this is included in the GSEA java software.

The example GSEA results above correspond to results from the GSEA Subramanian & Tamayo PNAS 2005 paper. The results were generated by the java GSEA program. Note that because random number generators (for sample permutation) are different and because different seeds are used, the numbers do not match precisely. However, the significant sets are identical to published results. If you are implementing GSEA on your own, we recommend you benchmark your code against these datasets/gene sets.