Gene Sets from Community Contributors

This page contains references to gene sets and collections from community contributors. These are not part of MSigDB but may be useful for certain analyses. The descriptions and all other information given below are courtesy of the contributors. Note that these contributions are under copyright and license terms as specified by the authors rather than the MSigDB license terms.

If you have gene sets to contribute that might benefit others, either for this page or for inclusion in MSigDB, feel free to contact us at genesets@broadinstitute.org.

SysMyo Muscle Gene Sets

SysMyo has contributed a collection of Muscle Gene Sets:

"More than ten thousand samples of muscle transcriptomic data have been uploaded to the public Gene Expression Omnibus in the past ten years, representing many millions of dollars of research expenditure and incalculable hours of research effort. These data ought to serve as a massive reference set for ongoing and future studies of neuromuscular disorders. One way to distil the data and render them more accessible to bench researchers is to extract from each study lists of genes ("gene sets") that were differentially expressed. With careful curation, each transcriptomic dataset may yield multiple comparisons, not only relating to the primary focus of that study, such as a pathology or an experimental treatment, but also more general comparisons not necessarily envisaged by the study's authors, but relating to factors such as age, sex, and muscle group."

See their website for more information.

PorSignDB

Nicolaas Van Renne et al. have contributed PorSignDB:

"The Porcine Signature Database (PorSignDB) is a collection of annotated gene sets for use with GSEA software. These gene sets were mostly derived from in vivo derived transcriptomic data, and describe a wide spectrum of (patho)physiological states of different tissue types. Only a minority of gene sets describe cell culture systems. Although the original data stems from pigs (Sus Scrofa), gene identifiers were adapted to human orthologs in order to fit into the MSigDB collection and facilitate its application to data from any mammalian species..."

See their website for more information.

BrainCortex_CellTypeSpecificGenes

Megan Hastings Hagenauer et al. have contributed the BrainCortex_CellTypeSpecificGenes gene sets, described in https://www.biorxiv.org/content/early/2017/12/20/089391.full.pdf+html (preprint).
From the Abstract:

"Psychiatric illness is unlikely to arise from pathology occurring uniformly across all cell types in affected brain regions. Despite this, transcriptomic analyses of the human brain have typically been conducted using macro-dissected tissue due to the difficulty of performing single-cell type analyses with donated post-mortem brains. To address this issue statistically, we compiled a database of several thousand transcripts that were specifically-enriched in one of 10 primary cortical cell types, as identified in previous publications... "

See their website for more information.

DSigDB

Minjae Yoo et al. have contributed DSigDB, described in https://academic.oup.com/bioinformatics/article/31/18/3069/241009.
From the Abstract:

"We report the creation of Drug Signatures Database (DSigDB), a new gene set resource that relates drugs/compounds and their target genes, for gene set enrichment analysis (GSEA). DSigDB currently holds 22527 gene sets, consists of 17389 unique compounds covering 19531 genes. We also developed an online DSigDB resource that allows users to search, view and download drugs/compounds and gene sets. DSigDB gene sets provide seamless integration to GSEA software for linking gene expressions with drugs/compounds for drug repurposing and translational research. "

See their website for more information.

Caenorhabditis elegans Co-Expression Cliques

Lukas Schmauder and Klaus Richter have contributed a clique map of the C. Elegans transcriptome, described in https://www.nature.com/articles/s41598-021-91690-6.
From the Abstract:

"Nematode development is characterized by progression through several larval stages. Thousands of genes were found in large scale RNAi-experiments to block this development at certain steps, two of which target the molecular chaperone HSP-90 and its cofactor UNC-45. Aiming to define the cause of arrest, we here investigate the status of nematodes after treatment with RNAi against hsp-90 and unc-45 by employing an in-depth transcriptional analysis of the arrested larvae. To identify misregulated transcriptional units, we calculate and validate genome-wide coexpression cliques covering the entire nematode genome. We define 307 coexpression cliques and more than half of these can be related to organismal functions by GO-term enrichment, phenotype enrichment or tissue enrichment analysis..... With most of the defined gene cliques showing concerted behaviour at some stage of development from embryo to late adult, the “clique map” together with the clique-specific GO-terms, tissue and phenotype assignments will be a valuable tool in understanding concerted responses on the genome-wide level in Caenorhabditis elegans."

See their GitHub repository for more information and to obtain the clique gene sets.

Saccharomyces cerevisiae Co-Expression Cliques

Siyuan Sima, Lukas Schmauder, Klaus Richter have contributed a clique map of the S. cerevisiae transcriptome, described in http://microbialcell.com/researcharticles/2019a-sima-microbial-cell/.
From the Abstract:

"We generated a set of 72 co-regulation cliques using the information from S. cerevisiae 3196 microarray experiments. The obtained cliques performed highly significant in gene ontology and transcription factor enrichment analyses. We then tested the clique set on individual microarray experiments reporting on responses to pheromone, glycerol versus glucose based growth and the cellular response to heat. In all cases a highly significant determination of affected expression cliques was possible based on their average expression differences, the positions of their genes within hit rankings (UpRegScore) or the enrichment of the Top200 hits in certain cliques. The 72 cliques were finally used to compare experiments, which reported on the transcriptional response to polyglutamine proteins of different lengths. Using the predefined clique set it is possible to identify with high sensitivity and good significance sample and condition specific changes to gene expression. We thus conclude that an analysis, starting with these 72 preformed expression cliques, can complement traditional microarray analyses by visualizing the entire response on a static genome-wide gene set."

See their GitHub repository for more information and to obtain the clique gene sets.