Gp msigdb

Recapitulate cross-species analysis results to credential genetically engineered mouse models of breast cancer (PMID: 24220145)

Added by schen2 on 2016.03.07
Last updated on almost 4 years ago.


This recipe recapitulates research by Pfefferle et al., in Genome Biology(2013), "Transcriptomic classification of genetically engineered mouse models of breast cancer identifies human subtype counterparts", conducted by Charles M Perou's group.  This study encompasses the largest comprehensive genomic dataset to date to identify human-to-mouse disease subtype counterparts, consisting of three independent human breast cancer datasets and 385 DNA gene expression microarrays from 27 GEMMs of mammary carcinoma(Gene Expression Omnibus accession numbers GSE3165, GSE8516, GSE9343, GSE14457, GSE15263, GSE17916, GSE27101, and GSE42640).  In the original study, the similarity between specific human and mouse subtypes was measured using gene set analysis (GSA)(Table 2 in the publication).  To recapitulate this research, we will use Gene Set Enrichment Analysis module(v17) in GenePattern, as an independent method to further validate the research findings.  This effort is supported by NCI Oncology Models Forum(OMF), a collaborative effort to credential cancer mouse models for translational research.


To complete this recipe, we need a combined mouse gene expression data matrix and human breast cancer gene signatures.  Charles M Perou kindly provided the data matrices used in the original publication, in order to improve reproducibility by avoiding different data processing and normalization across various microarray datasets and platforms.  The data matrices were reformatted as gct and cls formats for Gene Set Enrichment Analysis.  The processed dataset (with annotation on the journal website) can be accessbile from GenomeSpace's Public folder.

In the original study, 17 distinct murine expression-based class (with Ex designation) were defined by SigClust analysis.  In this recipe, we will specifically examine the similarity between the human Basal-like subtype and murine C3TagEx class, mainly composed of two genetically engineered mouse models C3 Tag and Wap Tag.  Other subtypes can be investigated in a similar way and will not be covered in this recipe.

Provided file 1 (gene expression matrix): Perou_363Mouse_17SigClust.gct
​A combined gene expression matrix from 27 murine models.  22 outlying samples which didn't belong to any of the 17 expression-based class were excluded from the final expression data matrix.

Provided file 2 (phenotype labels): Perou_SigClust15_C3TagEx_vs_Others.cls

Class labels associated with each sample in the expression data.  For the pairwise Gene Set Enrichment Analysis in this recipe, label "0" denotes samples belonging to C3TagEx cluster and "1" denotes other samples.

Provided file 3 (Geneset database): c2.kegg.c7.all.v5.1_Perou_MouseGenesets_0513_2016.gmt

The customized geneset database is consisted of three sets.

a) CP:KEGG: KEGG gene sets (subset of C2: curated gene sets) from MSigDB.

b) C7: immunologic signatures (ImmuneSigDB, PMID: 26795250) from MSigDB.  These genesets are used to explore transcriptional programs in the human and mouse inmmune systems.

c) Customized genesets extracted by Supplemental Table 1 in the original manuscript.  These genesets (with prefix "Perou") are gene-expression signatures of human and mouse breast cancer subtypes.

We used ortholog gene assignments from Mouse Genome Informatics, retrieved from the MGI website on Jan 28, 2016.


Recipe steps

  • GenePattern
    1. Locate data on GenomeSpace public folder
    2. Perform Gene Set Enrichment Analysis module in GenePattern
    3. Visualize the results by GSEA Leading Edge Viewer Analysis

  • Locate data on GenomeSpace Public->omf->Perou_BreastCancerModel_GenomeBiology2013 folder.

  • Select the three provided files, and click "Launch on File" under GenePattern icon to launch GenePattern from GenomeSpace.

  • Send the selected files to GSEA module.
    • Click Submit for GSEA module

  • Modify the following parameters in GSEA module.
    • Basic parameters → collapse dataset → false
    • Basic parameters → output file name → (or your choice)
    • Advanced parameters →  Algorithmic → random seed → 12345678 (for reproducibility)
    • Reporting → plot graphs for the top sets of each phenotype → 100 (or your choice; default 20)

  • Click Run to run the job.
  • Run GSEALeadingEdgeViewer module.

  • Open Leading Edge Visualizer to examine top genesets.

Results Interpretation

In the original publication (Table 2), the murine C3TagEx class is strongly associated with the human Basal-like subtype(on UNC, TCGA, and combined dataset), and slightly associated with TCGA Luminal B subtype.  The present analysis faithfully recapitulates this finding, suggesting that the two mouse models C3 Tag and Wap Tag resemble human Basal-like subtype at the gene-expression level.

Part of Table 2: Gene set analysis of murine classes and human subtypes


Acknowledgement  We thank Professor Charles M Perou for providing the data matrices used in the original publication.  If you need to use these data for your own research, please cite the original publication below:

Pfefferle AD, Herschkowitz JI, Usary J, Harrell JC, Spike BT, Adams JR et al. Transcriptomic classification of genetically engineered mouse models of breast cancer identifies human subtype counterparts. Genome Biol 2013;14:R125

Submit a Comment