Expand all recipe descriptions

Found 3 recipes

How do I create a custom-generated gene set? Are there any commonalities between custom-generated gene sets, and MSigDB hallmark gene sets?

This recipe provides a method for identifying and visualizing similarities between diverse gene sets relevant to a study. An example use of this recipe is a case where an investigator may want to compare two phenotypes, such as two types of cancer, to determine which gene sets may be similar between these phenotypes.


Background information: What is Gene Set Enrichment Analysis, and why should I use it?

Gene sets are lists of genes that share similar functions, transcriptional regulation, chromosomal positions, pathways, or other biological processes. It is possible to identify gene sets that are enriched or over-represented in a particular phenotype, such as a specific disease. Gene Set Enrichment Analysis (GSEA) is a computational method which determines whether an a priori defined set of genes shows statistically significant, concordant differences between two phenotypes. GSEA can be used with a custom gene set generated by the user, or with the annotated, standardized gene sets which are available in the Molecular Signatures Database (MSigDB) collection. Completing GSEA on a gene expression dataset will identify those gene sets which are significantly enriched in a particular phenotype. Comparing similarities between the top gene sets following GSEA can yield unique insights into the mechanisms associated with a specific phenotype, which cannot be observed using a single-gene analysis.


Use case: Targeting MYCN in Neuroblastoma by BET Bromodomain Inhibition (Puissant et al. , Cancer Discov. 2013).

This study analyzed gene expression data generated from primary neuroblastoma tumors of two genetic classes: tumors harboring MYCN amplification (“MYCN amplified”) and tumors without MYCN amplification (“MYCN non-amplified”). MYCN amplified neuroblastoma is exquisitely dependent on the bromodomain and extra-terminal (BET) family of proteins. As such, treatment of MYCN amplified cell lines or tumors with JQ1, a small-molecule inhibitor of BET proteins, leads to dramatic transcriptional changes and induces cell death.

A training set of gene expression data was analyzed using GenePattern, and custom gene sets were generated representing the MYCN amplified and MYCN non-amplified datasets. The custom-generated gene sets were then concatenated with the Hallmark gene set from MSigDB using tools in Galaxy. Subsequently, a test gene expression dataset of neuroblastoma cell lines treated with JQ1 (treatment) or DMSO (control) was used to rank this collection of gene sets using single-sample Gene Set Enrichment Analysis (ssGSEA).

This analysis reveals that MYCN-associated gene sets are enriched in JQ1-associated datasets, and suggests that JQ1 functions to suppress transcriptional programs mediated by MYCN amplification. The resulting similarities of the top-ranked gene sets are visualized using ConstellationMap, a module available in GenePattern. This helps to highlight similarities and overlaps between gene sets.

Are there specific transcriptional regulators, whose expression and copy number correlate with the expression of genes associated with a specific phenotype?

This recipe provides a method for identifying transcriptional regulators of a gene set associated with a specific phenotype. An example use of this recipe is a case where an investigator may want to identify determine which transcriptional regulators exhibit unique expression phenotypes (e.g. up-regulation or down-regulation). This recipe uses a procedure called "Stepwise Linkage Analysis of Microarray Signatures", first described by Adler et al. (Nat Genetics 2006). This recipe does not use the SLAMS software tool.

In particular, the phenotype is the embryonic stem cell (ESC) state, which is common to ESCs, as well as induced pluripotent stem cells (iPSCs), and also in a compendium of human cancers, such as breast cancer. In this recipe, we are interested in determining which genes transcriptionally regulate this 'stemness signature' of gene expression. This recipe recapitulates research by Wong et al., in Cell Stem Cell (2008), "Module map of stem cell genes guides creation of epithelial cancer stem cells". To recapitulate this research, we will use a procedure called Stepwise Linkage Analysis of Microarray Signatures (SLAMS), which is described by Adler et al. in Nature Genetics (2006), "Genetic regulators of large-scale transcriptional signatures in cancer". A summary description of the SLAMS procedure is listed below, and more information about SLAMS can be found in the review paper, "A SLAMS dunk for cancer regulators", by Kumar-Sinha and Chinnaiyan.

We use a gene expression dataset of primary human breast cancer tumor samples, with a complementary dataset of copy number variation data in array comparative genomic hybridization (aCGH) format, as described in Chin, K. et al, Cancer Cell, 2006. We use a set of stemness signature genes to separate breast cancer tumor samples into those which exhibit the stemness signature and those that do not by creating a module map in Genomica.  A module map characterizes the expression of the gene expression dataset, providing information about sets of genes within the dataset.

We use the classified samples (e.g. stemness signature present vs. stemness signature absent) to normalize the copy number variation data in GenePattern. Next, we identify transcriptional regulators that correlate with the changes in the copy number dataset using a gene set collection from MSigDB, in Genomica. Finally, we identify transcriptional regulators whose amplification or deletion is correlated with up- or downregulation of gene expression. We consider these genes to be 'stemness regulators', i.e. genes which regulate the genes associated with the stemness signature.

Description of the Stepwise Linkage Analysis of Microarray Signatures (SLAMS) procedure (Kumar-Sinha and Chinnaiyan):

  1. Sort tumor samples into groups based on whether the stemness signature is present (“ON”) or absent (“OFF”).
  2. Compare the DNA copy number changes between the groups of tumor samples. Calculate the association between stemness expression and CNV datasets to identify amplifications/deletions associated with the stemness signature.
  3. Select genes which are potential candidate regulators of the stemness signature, based on coordinate gene amplification/deletion and gene expression upregulation/downregulation.
  4. Validate the candidate regulators by assessing their predictive ability in independent samples of tumor samples.

Which genes are differentially expressed between my two phenotypes, based on my RNA-seq data?

This recipe provides one method to identify and visualize gene expression in different diseases and during cell differentiation and development. In collecting ChIP-seq data, we can obtain genome-wide maps of transcription factor occupancies or histone modifications between a treatment and control. In locating these regions, we can integrate ChIP-seq and RNA-seq data to better understand how these binding events regulate associated gene expression of nearby genes. An example use case of this recipe is when Laurent et al. observed how the binding of the Prep1 transcription factor influences gene regulation in mouse embryonic stem cells. The integration of both RNA-seq and Chip-seq data allows a user to identify target genes that are directly regulated by transcription factor binding or any other epigenetic occupancy in the genome.

What is Model-based Analysis of ChiP-seq (MACS)?

Model-based Analysis of ChIP-seq (MACS) is a computational algorithm that identifies genome-wide locations of transcription/chromatin factor binding or histone modifications. It is often preferred over other peak calling algorithms due to its consistency in reporting fewer false positives and its finer spatial resolution. First, it removes redundant reads to account for possible over-amplification of ChIP-DNA, which may affect peak-calling downstream. Then it shifts read positions based on the fragment size distribution to better represent the original ChIP-DNA fragment positions. Once read positions are adjusted, peak enrichment is calculated by identifying regions that are significantly enriched relative to the genomic background. MACS empirically estimates the FDR for experiments with controls for each peak, which can be used as a cutoff to filter enriched peaks. The treatment and control samples are swapped and any enriched peaks found in the control sample are regarded as false positives.

Why differential expression analysis?

We assume that most genes are not expressed all the time, but rather are expressed in specific tissues, stages of development, or under certain conditions. Genes which are expressed in one condition, such as cancerous tissue, are said to be differentially expressed when compared to normal conditions.

Use Case: ChIP-Seq and RNA-Seq Analyses Identify Components of the Wnt and Fgf Signaling Pathways as Prep1 Target Genes in Mouse Embryonic Stem Cells (Laurent et al., PLoS ONE, 2015)

The sample datatset, Series GSE6328, used for this recipe are from NCBI's GEO. We identify the interplay between epigentics and transcriptomics mouse embryonic stems cells by observing how the binding of the transcription factor, Prep1, influences gene expression. Prep1 is predominantly known for its contribution in embryonic development. In comparing genome-wide maps of mouse embryonic cells experiencing Prep1 binding to those that do not, we can identify potential target genes that are being differentially regulated by these binding events.

Filter by analysis type

Filter by data type

Filter by all available tags

Filter by tool