|
Help with Investigating Gene SetsInput Gene IdentifiersThe tools on the Investigate Gene Sets page all take a list of genes as input. Enter a list of gene identifiers in the box provided and specify the appropriate species; human, mouse, and rat are supported. Ensembl Gene IDs and NCBI (Entrez) Gene IDs are accepted, as are HGNC (HUGO) IDs and Symbols, MGI IDs and Symbols, and RGD Symbols. These are case sensitive (e.g. egfr is not the same as EGFR). For Ensembl identifiers remove any version suffixes (e.g. use ENSG00000141510 instead of ENSG00000141510.17); transcript-level IDs are not accepted. For identifiers from other species, we recommend using Biomart to convert into HGNC Gene Symbols or Human/Mouse/Rat NCBI Gene IDs. Beginning in MSigDB 7.0, we are using Ensembl as the platform annotation authority. Identifiers for genes are mapped to approved gene symbols and NCBI Gene ID through annotations extracted from Ensembl's BioMart data service, and will be updated at each MSigDB release with the latest available version of Ensembl. See the Release Notes for the current MSigDB release for a link to the version of Ensembl Biomart used in mapping. Compute OverlapsWhen gene sets share genes, examination of how they overlap can highlight common processes, pathways, and underlying biological themes. This tool evaluates the overlap of a user provided gene set, and an estimate of the statistical significance, with as many MSigDB collections as you choose. Note: this simple overlap evaluation is not the same as the full gene set enrichment analysis provided by the GSEA desktop application. Enter a list of gene identifiers in the box provided and specify the appropriate species as described in Input Gene Identifiers above. Overlap results are presented using the gene symbols and NCBI Gene IDs specific to the species of the target gene set database; any required conversion (i.e. orthology) is done automatically by the tool. Due to the characteristics of the hypergeometric distribution there are limits to how large the user provided gene set can be, yet still produce meaningful significance estimates. At most 500 genes will be allowed, anything larger will be rejected. Click on the "compute overlaps" button to display the results, including
Our thanks to GATHER: Gene Annotation Tool to Help Explain Relationships (Change & Nevins, Bioinformatics, 2006; https://changlab.uth.tmc.edu/gather/) for their inspiration in the output format of our gene set overlap tool. Compendia Expression ProfilesThere are tools for producing heatmaps from an MSigDB gene set or user-provided gene list against the samples of several compendia of expression data. You can create a heatmap as a static image or, as of July 2023, in an interactive form. Our interactive Compendia Expression Profiles tool uses Next-Generation Clustered Heat Maps (NG-CHM) from the Department of Bioinformatics and Computational Biology at The University of Texas MD Anderson Cancer Center to allow ad-hoc exploration of the expression profile. As they describe it on their home page: The NG-CHM Heat Map Viewer is a dynamic, graphical environment for exploration of clustered or non-clustered heat map data in a web browser. It supports zooming, panning, searching, covariate bars, and link-outs that enable deep exploration of patterns and associations in heat maps. Full instructions on the navigation and use of the NG-CHM viewer can be found on the project's website, along with links to video tutorials, citing information, and more. You can display a heatmap of the expression levels of the genes in your gene list in the samples of any one of these compendia of expression data: For data in the human gene space:
For data in the mouse gene space:
Enter a list of gene identifiers in the box provided and specify the appropriate species as described in Input Gene Identifiers above. Choose one of the available compendia and click on "launch expression profile". Heat maps are presented using gene symbols; any required conversion is done automatically by the website. The resulting heat map includes dendrograms clustering gene expression by gene and samples.
Gene FamiliesNote that gene family information is only available for human genes in the human component of MSigDB. A gene family describes any collection of proteins that share a common feature such as homology or biochemical activity. Available categories and links to the relevant source publications in PubMed:
Enter a list of gene identifiers in the box provided and specify the appropriate species as described in Input Gene Identifiers above. Click on "show gene familes" to categorize the input genes by gene families. Filtered By SimilarityBeginning inMSigDB 7.3, gene sets in C5 and C2:CP:Reactome that have undergone redundancy filtering for inclusion in MSigDB now have an additional field on the gene set page "Filtered by similarity". This field contains the source database IDs of other candidate gene sets that clustered with the selected set by Jaccard similarity coefficient, and exhibited Jaccard coefficients >0.85 with the selected set but were filtered out of the collection on the basis of tree distance or set size. These database IDs link to the source resource's page for that identifier as in the EXTERNAL_DETAILS_URL field. This redundancy filtering procedure also applies to the cooresponding collections in the Mouse MSigDB collections (M5:GO and M2:CP:Reactome) NDEx Biological Network RepositoryYou can further investigate the genes in your gene list through a query to NDEx, the Network Data Exchange (www.ndexbio.org), an online biological networks repository that is also integrated with Cytoscape, the network analysis and visualization environment (cytoscape.org). Networks are a powerful tool for expressing biological knowledge, including molecular interactions, biological relationships curated from literature, and outputs from analysis of big data. Enter a list of gene identifiers in the box provided and specify the appropriate species as described in Input Gene Identifiers above. Click on "query NDEx" to send the list of genes to the NDEx IQuery tool (www.ndexbio.org/iquery), which finds pathways enriched for the query genes, networks representing the interactions between those genes and other proteins, and networks representing the associations between those genes and other biological or chemical entities. The NDEx query results page will allow you to:
See the IQuery help documentation for more details on using the NDEx query results page. See the NDEx home page for information on how to cite your use of NDEx. |