Expand all recipe descriptions

Found 7 recipes

What subnetworks of differentially expressed genes are enriched in my samples? What biological functions are they related to?

This recipe provides a method for identifying differentially expressed genes between two phenotypes, such as tumor and normal, to find subnetworks of interacting proteins and determine their functional annotations. An example use of this recipe is a case where an investigator may want to compare two phenotypes to determine which gene networks are similar between phenotypes, and to determine how functional annotation changes between phenotypes.

 

In particular, this recipe makes use of several GenePattern modules to identify differentially regulated genes, then uses several Cytoscape plugins to identify potential interactions between gene products, and to visualize the resulting network.

Why differential expression analysis? We assume that most genes are not expressed all the time, but rather are expressed in specific tissues, stages of development, or under certain conditions. Genes which are expressed in one condition, such as cancer tissue, are said to be differentially expressed when compared to normal conditions. To identify which genes change in response to specific conditions (e.g. cancer), we must filter or process the dataset to remove genes which are not informative.

Why protein interaction network analysis? Gene expression analysis results in a list of differentially expressed genes, but it does not explain whether these genes are connected biologically in a pathway or network. To better understanding the underlying biology that drives changes in gene expression analysis, we can perform network analysis to determine whether gene products (e.g. proteins) are reported to interact. To identify potential networks or pathways, we search for highly interconnected subnetworks within a large interaction network.

Which genes are differentially expressed in my microarray data? Are these genes enriched for certain biological pathways?

This recipe provides an outline of one method to identify known biological functions for genes that are differentially expressed between two conditions or phenotypes, using microarray data. An example use of this recipe is a case where an investigator may want to determine if a specific cancer phenotype is associated with expression of certain pathways.

 

Given a set of differentially expressed genes, the goal is to infer which biological functions (for example, Gene Ontology biological processes) are overrepresented in the set of reference genes found to be differentially expressed. In particular, this recipe uses a gene expression dataset which has two conditions: normal and mild hyperthermia. Then, GenePattern is used to identify differentially expressed genes, and finally MSigDB is used to identify biological functions and pathways that are enriched in the gene set.

Why differential expression analysis? We assume that most genes are not expressed all the time, but rather are expressed in specific tissues, stages of development, or under certain conditions. Genes which are expressed in one condition, such as cancerous tissue, are said to be differentially expressed when compared to normal conditions. To identify which genes change in response to specific conditions (e.g. cancer), we must filter or process the dataset to remove genes which are not informative.

Why perform functional annotation? Many analyses end with the retrieval of a gene list, e.g. gene expression analysis identifies a list of genes which are differentially expressed when comparing multiple conditions. However, often times a researcher has additional questions about the function or relatedness of genes in a gene list: Are the genes a part of the same pathway? Do the gene products interact physically? Do the gene products localize to a specific part of the cell? Are the genes only expressed during a certain stage of development? These questions, and others like them, can be answered by performing functional annotation on gene lists, to better understand the underlying connections between genes.

Which genes lie in my copy number variation regions? Are there any sets of co-regulated genes in these aberrant regions?

This recipe provides a method for identifying and visualizing a network of co-regulated genes that are associated with aberrant regions identified by single nucleotide polymorphism (SNP) arrays. An example use of this recipe is a case where an investigator may want to find which genes are located in regions that exhibit significant changes (e.g. amplification or deletion) in cancer cells.

 

This recipe provides one method for identifying and visualizing aberrant regions in Diffuse Large B-Cell Lymphoma (DLBCL) cancer cells. This recipe uses copy-number variation (CNV) data from SNP arrays, and evaluates the expression of aberrant regions using a microarray dataset. Regions that are significantly changed (e.g., amplified or deleted) in cancer cells are defined by the GISTIC algorithm. In particular, this recipe makes use of several Galaxy tools to find the overlap between the aberrant regions and reference genes, and uses GenePattern to process the microarray dataset. Genomica is used to find module networks of co-regulated genes associated with these aberrant regions. A module network is a model which identifies regulatory modules from gene expression data, especially modules of co-regulated genes and their regulators. The module also identifies the conditions under which the regulation can occur.

Why analyze copy number variation regions? Copy number variations (CNVs) are large alterations to genomes, such as duplication or deletion of large segments of a chromosome. These variations in the genome have been associated with different conditions, such as cancer. In this recipe, we explore the scenario in which CNVs are elevated in a cancer cell line, and our goal is to determine the function of these duplicated genes.

Do phenotypically different expression datasets share a common signature? Can the signature distinguish phenotypes in an independent dataset?

This recipe provides one method for identifying a consensus gene signature from a training set of several phenotypically distinct gene expression dataset. The recipe then validates the ability of the consensus signature to accurately distinguish phenotypes by using an independent test gene expression dataset. An example use case of this recipe is when an investigator may want to develop a gene expression signature to predict a specific phenotype, such as cancer or another disease.

 

Background information: What is a consensus gene expression signature?

A gene expression signature is the pattern of expression in a specific group of genes, usually ones that are related by function, position or other biological process. A consensus gene signature is an expression pattern for a specific group of genes, which is shared among different samples or across different phenotypes. For example, a group of genes regulating immune response could be similarly up-regulated during many different, unrelated infections. There are several types of consensus signatures; those that can be derived from gene expression data are called transcriptional consensus signatures. Consensus signatures can be created by overlapping individual gene signatures derived from multiple datasets. Compared to individual gene expression signatures, consensus signatures may be more accurate at distinguishing different phenotypes, such as diseased vs. normal samples.

 

Use case: Targeting MYCN in Neuroblastoma by BET Bromodomain Inhibition (Puissant et al. , Cancer Discov. 2013).

This study analyzed gene expression data generated from primary neuroblastoma tumors of two genetic classes: tumors harboring MYCN amplification (“MYCN amplified”) and tumors without MYCN amplification (“MYCN non-amplified”). MYCN amplified neuroblastoma is exquisitely dependent on the bromodomain and extra-terminal (BET) family of proteins. As such, treatment of MYCN amplified cell lines or tumors with JQ1, a small-molecule inhibitor of BET proteins, leads to dramatic transcriptional changes and induces cell death.

To identify a consensus signature to predict sensitivity to JQ1 treatment, two training datasets and one test dataset were used. The training dataset included acute myeloid leukemia (AML) and a multiple myeloid leukemia (MM) cell lines, which had been treated with either DMSO (control) or with JQ1 (treatment). The test dataset included MYCN amplified and MYCN nonamplified neuroblastoma primary tumor samples. GenePattern was used to analyze the AML and MM cell lines; for each dataset, a gene expression signature was derived to identify JQ1 response in the cell line. Using Galaxy, the two signatures were then overlapped to determine the consensus signature between the two phenotypes.

GenePattern was used to validate the ability of this JQ1-associated consensus signature to differentiate between phenotypes, by using the signature to hierarchically cluster the test dataset (neuroblastoma). Since the MYCN amplified and MYCN non-amplified neuroblastoma samples should have differing expression profiles, it was hypothesized that the consensus signature would be able to separate the samples by phenotype. Indeed, the consensus signature was able to cluster the MYCN-amplified and MYCN-nonamplified samples separately, revealing that the consensus signature accurately distinguishes the sensitivity-to-JQ1 phenotype.

How do I create a custom-generated gene set? Are there any commonalities between custom-generated gene sets, and MSigDB hallmark gene sets?

This recipe provides a method for identifying and visualizing similarities between diverse gene sets relevant to a study. An example use of this recipe is a case where an investigator may want to compare two phenotypes, such as two types of cancer, to determine which gene sets may be similar between these phenotypes.

 

Background information: What is Gene Set Enrichment Analysis, and why should I use it?

Gene sets are lists of genes that share similar functions, transcriptional regulation, chromosomal positions, pathways, or other biological processes. It is possible to identify gene sets that are enriched or over-represented in a particular phenotype, such as a specific disease. Gene Set Enrichment Analysis (GSEA) is a computational method which determines whether an a priori defined set of genes shows statistically significant, concordant differences between two phenotypes. GSEA can be used with a custom gene set generated by the user, or with the annotated, standardized gene sets which are available in the Molecular Signatures Database (MSigDB) collection. Completing GSEA on a gene expression dataset will identify those gene sets which are significantly enriched in a particular phenotype. Comparing similarities between the top gene sets following GSEA can yield unique insights into the mechanisms associated with a specific phenotype, which cannot be observed using a single-gene analysis.

 

Use case: Targeting MYCN in Neuroblastoma by BET Bromodomain Inhibition (Puissant et al. , Cancer Discov. 2013).

This study analyzed gene expression data generated from primary neuroblastoma tumors of two genetic classes: tumors harboring MYCN amplification (“MYCN amplified”) and tumors without MYCN amplification (“MYCN non-amplified”). MYCN amplified neuroblastoma is exquisitely dependent on the bromodomain and extra-terminal (BET) family of proteins. As such, treatment of MYCN amplified cell lines or tumors with JQ1, a small-molecule inhibitor of BET proteins, leads to dramatic transcriptional changes and induces cell death.

A training set of gene expression data was analyzed using GenePattern, and custom gene sets were generated representing the MYCN amplified and MYCN non-amplified datasets. The custom-generated gene sets were then concatenated with the Hallmark gene set from MSigDB using tools in Galaxy. Subsequently, a test gene expression dataset of neuroblastoma cell lines treated with JQ1 (treatment) or DMSO (control) was used to rank this collection of gene sets using single-sample Gene Set Enrichment Analysis (ssGSEA).

This analysis reveals that MYCN-associated gene sets are enriched in JQ1-associated datasets, and suggests that JQ1 functions to suppress transcriptional programs mediated by MYCN amplification. The resulting similarities of the top-ranked gene sets are visualized using ConstellationMap, a module available in GenePattern. This helps to highlight similarities and overlaps between gene sets.

Does my gene expression dataset contain a module network of regulatory genes? Does the network have any special features?

This recipe provides one method for creating and visualizing a module network of regulatory genes. An example use of this recipe is a case where an investigator may want to evaulate an expression dataset to find regulatory genes such as transcription factors, and then determine if they are connected in a network.

 

In particular, the regulatory genes of interest are genes which regulate other genes associated with an embryonic stem cell (ESC) state. This 'stemness signature' is a feature common to ESCs, as well as induced pluripotent stem cells (iPSCs), and also in a compendium of human cancers, such as breast cancer. This recipe recapitulates research by Wong et al., in Cell Stem Cell (2008), "Module map of stem cell genes guides creation of epithelial cancer stem cells."

We use a gene expression dataset of primary human breast cancer tumor samples (described in Chin, K. et al, Cancer Cell, 2006), and create a module network by projecting a set of stemness regulators onto the gene expression dataset, using Genomica. A module network is a model which identifies regulatory modules from gene expression data, especially modules of co-regulated genes and their regulators. The module also identifies the conditions under which the regulation can occur.

After obtaining the module network, we visualize it using Cytoscape. Since the network is very large, we then filter it to just a subnetwork of stemness regulators and their connections, again using Cytoscape. This provides us with a visual representation of the stemness regulators as they appear projected onto a breast cancer tumor dataset.

 

Are there specific transcriptional regulators, whose expression and copy number correlate with the expression of genes associated with a specific phenotype?

This recipe provides a method for identifying transcriptional regulators of a gene set associated with a specific phenotype. An example use of this recipe is a case where an investigator may want to identify determine which transcriptional regulators exhibit unique expression phenotypes (e.g. up-regulation or down-regulation). This recipe uses a procedure called "Stepwise Linkage Analysis of Microarray Signatures", first described by Adler et al. (Nat Genetics 2006). This recipe does not use the SLAMS software tool.

In particular, the phenotype is the embryonic stem cell (ESC) state, which is common to ESCs, as well as induced pluripotent stem cells (iPSCs), and also in a compendium of human cancers, such as breast cancer. In this recipe, we are interested in determining which genes transcriptionally regulate this 'stemness signature' of gene expression. This recipe recapitulates research by Wong et al., in Cell Stem Cell (2008), "Module map of stem cell genes guides creation of epithelial cancer stem cells". To recapitulate this research, we will use a procedure called Stepwise Linkage Analysis of Microarray Signatures (SLAMS), which is described by Adler et al. in Nature Genetics (2006), "Genetic regulators of large-scale transcriptional signatures in cancer". A summary description of the SLAMS procedure is listed below, and more information about SLAMS can be found in the review paper, "A SLAMS dunk for cancer regulators", by Kumar-Sinha and Chinnaiyan.

We use a gene expression dataset of primary human breast cancer tumor samples, with a complementary dataset of copy number variation data in array comparative genomic hybridization (aCGH) format, as described in Chin, K. et al, Cancer Cell, 2006. We use a set of stemness signature genes to separate breast cancer tumor samples into those which exhibit the stemness signature and those that do not by creating a module map in Genomica.  A module map characterizes the expression of the gene expression dataset, providing information about sets of genes within the dataset.

We use the classified samples (e.g. stemness signature present vs. stemness signature absent) to normalize the copy number variation data in GenePattern. Next, we identify transcriptional regulators that correlate with the changes in the copy number dataset using a gene set collection from MSigDB, in Genomica. Finally, we identify transcriptional regulators whose amplification or deletion is correlated with up- or downregulation of gene expression. We consider these genes to be 'stemness regulators', i.e. genes which regulate the genes associated with the stemness signature.

Description of the Stepwise Linkage Analysis of Microarray Signatures (SLAMS) procedure (Kumar-Sinha and Chinnaiyan):

  1. Sort tumor samples into groups based on whether the stemness signature is present (“ON”) or absent (“OFF”).
  2. Compare the DNA copy number changes between the groups of tumor samples. Calculate the association between stemness expression and CNV datasets to identify amplifications/deletions associated with the stemness signature.
  3. Select genes which are potential candidate regulators of the stemness signature, based on coordinate gene amplification/deletion and gene expression upregulation/downregulation.
  4. Validate the candidate regulators by assessing their predictive ability in independent samples of tumor samples.

Filter by analysis type

Filter by data type

Filter by all available tags

Filter by tool