Identify and validate a consensus signature using gene expression data
Added by GenomeSpaceTeam on 2015.04.22
Last updated on over 3 years ago.
This recipe provides one method for identifying a consensus gene signature from gene expression data, and validating the ability of the consensus signature to accurately distinguish phenotypes by using a test gene expression dataset. In particular, this recipe examines JQ1, a small-molecule inhibitor which binds and selectively inhibits the bromodomain and extra-terminal (BET) family of bromodomain proteins. This is thought to inhibit MYCN amplification in cancers such as neuroblastoma.
First, this recipe identifies a consensus gene signature of JQ1 by comparing the expression of JQ1-treated acute myeloid leukemia (AML) to JQ1-treated multiple myeloma (MM) cell lines. Finally, this recipe validates the ability of this consensus signature to distinguish between MYCN-amplified and MYCN-nonamplified neuroblastoma by projecting the consensus signature onto a test gene expression dataset.
This recipe recapitulates some of the results found by Puissant et al. in "Targeting MYCN in Neuroblastoma by BET Bromodomain Inhibition" (Cancer Discov. 2013).
To complete this recipe, we will need several gene expression datasets, which will be obtained from InSilico DB; thus, we do not need any additional files from the GenomeSpace Public Folder.
In this step, we use InSilico DB to retrieve three gene expression datasets. GenomeSpace will automatically convert the expression dataset files to a form that is readable by GenePattern. If you are using your own data, make sure that your input will include a GCT and CLS file.
In this example, we are using datasets that have already been normalized by the original authors. In these examples, the Robust Multiarray Averaging (RMA) method of microarray normalization and summarization was used on the datasets. We use the following datasets:
Publictab, search for "GSE29799".
Analyzebutton for the entry with, 'HuGene-1_0-st-v1' under
Technology. Click the following buttons in the pop-up:
Normalization options: Original normalization
Gene/probe options: genes
Open in GenomeSpaceto download the files to GenomeSpace.
The output files should appear in your GenomeSpace home directory.
We will use the
ComparativeMarkerSelection module to identify genes which are differentially expressed and can distinguish between two phenotypes (e.g. normal vs. JQ1-treated), separately for the acute myeloid leukemia (AML) and multiple myeloma (MM) datasets. This module uses the GCT file and the CLS file.
NOTE: If you have not yet associated your GenomeSpace account with your GenePattern account, you will be asked to do so. If you do not yet have a GenePattern account, you can automatically generate a new account that will be associated with your GenomeSpace account.
Modulestab, and search for "ComparativeMarkerSelection". Once the module is loaded, change the following parameters:
input file: load the AML GCT file, e.g.,
GSE29799GPL6244_RNA_ORIGINALGENE_30916.gct. To do this, use the
GenomeSpacetab to navigate to the GSE29799_AML directory containing the AML GCT file, then drag the file to the input box.
cls file: load the AML treatment CLS file, e.g.,
treatment.cls, also located in the GSE29799_AML directory.
log transformed data: yes
ComparativeMarkerSelectionon the AML dataset.
Save to GenomeSpace.
input file: load the MM GCT file, e.g.,
GSE31365GPL6244_RNA_ORIGINALGENE_30917.gct. To do this, use the
GenomeSpacetab to navigate to the GSE31365_MM directory containing the MM GCT file, then drag the file to the input box.
cls file: load the MM treatment CLS file, e.g.,
treatment.cls, also located in the GSE31365_MM directory.
log transformed data: yes
ComparativeMarkerSelectionon the MM dataset.
Save to GenomeSpace.
We will load the two sets of differentially expressed genes from the AML and MM datasets into Galaxy. Then, we will use a pre-built GenomeSpace workflow to process the datasets, filtering and removing features that do not pass certain cutoffs. Finally, we will create a consensus signature and send a list of gene symbols back to GenomeSpace for additional analysis.
NOTE: If you have not yet associated your GenomeSpace account with your Galaxy account, you will be asked to do so. If you do not yet have a Galaxy account, you can automatically generate a new account that will be associated with your GenomeSpace account.
Get Data > GenomeSpace import
Send to Galaxy.
New type: tabular
Save. Ignore any warnings which may pop up.
We will use a pre-built GenomeSpace workflow to identify the consensus gene signature. This pre-built GenomeSpace workflow uses several steps to determine the overlap between the AML and MM datasets. First, we filter the AML and MM datasets to the top genes using the following cutoffs: (1) >= 1.5 differential expression; and (2) FDR < 0.05 as calculated by
Step 1: Input Dataset:
Step 2: Input Dataset:
Choose Target Directory: choose a directory to save the file to, e.g. your home directory.
We will use several GenePattern modules to extract the relevant information from our test dataset, which is the MYCN-amplified and MYCN-nonamplified neuroblastoma dataset. Then, we will project the consensus signature onto the neuroblastoma dataset and evaluate its ability to distinguish the two phenotypes (MYCN-amplified and MYCN-nonamplified) by clustering the resulting dataset.
We will use
SelectFeaturesColumns to filter the neuroblastoma dataset to only those samples that are MYCN-amplified or MYCN-nonamplified. There is a third group of samples (called 'NILL'), in which MYCN amplification status was not determined; therefore, we filter these samples out and work only with the annotated data.
Modulestab, and search for "SelectFeaturesColumns". Once the module is loaded, change the following parameters:
input filename: load the neuroblastoma GCT file, e.g.,
GSE12460GPL750_RNA_ORIGINALGENE_31813.gct. To do this, use the GenomeSpace tab to navigate to the GSE12460_MYCN directory containing the neuroblastoma GCT file, then drag the file to the input box.
columns: 0-2, 4-6, 8-16, 18-19, 21-25, 28, 30-36, 38-54
Jobstab, and reload the
SelectFeaturesColumnsmodule by clicking on the job and choosing
input filename: load the neuroblastoma CLS file, e.g.,
Myc.Expression.cls. To do this, click the next to the input filename parameter to remove the GCT file from the module. Then, use the GenomeSpace tab to navigate to the GSE12460_MYCN directory containing the neuroblastoma CLS file, then drag the file to the input box.
We will use
SelectFeaturesRows to filter the neuroblastoma dataset to only those gene symbols which are in the consensus signature.
Modulestab, and search for "SelectFeaturesRows". Once the module is loaded, change the following parameters:
MYCN.gene.exp.gct, the previously filtered neuroblastoma GCT file. To do this, use the
Jobstab to find the previous job results, then drag the file to the input box.
consensus.genelist.txt, the consensus signature gene list. To do this, use the
GenomeSpacetab to navigate to the directory containing the consensus signature gene list, then drag the file to the input box.
We will use
GENE-E to view the filtered neuroblastoma dataset, and to cluster the data by phenotype (MYCN-amplified vs. MYCN-nonamplified), to determine how well the consensus signature can distinguish between phenotypes.
Modulestab, and search for "GENE_E".
Jobstab, then change the following parameters:
sample information or class file:
Launchbutton to prompt a download of the
GENE-E. You may have to enter your GenePattern or GenomeSpace log in credentials.
GENE-Ehas been loaded, we will perform hierarchical clustering on the filtered neuroblastoma dataset:
Hierarchical Clusteringicon (), or navigate to the following menu:
Tools > Clustering > Hierarchical Clustering...
Column distance metric: Euclidean distance
Linkage method: Complete Linkage
OKto run the clustering algorithm.
This is an example interpretation of the results from this recipe. First, we identified a consensus gene signature of JQ1 activity by finding genes that became differentially expressed due to JQ1 treatment in both acute myeloid leukemia (AML) and multiple myeloma (MM). Then, we projected this consensus signature on a test dataset of neuroblastoma cells which were not treated with JQ1, but were either MYCN-amplified or MYCN-nonamplified. Since MYCN amplification is associated with an increased sensitivity to BET bromodomain inhibitors, such as JQ1, we expected that a signature of JQ1 activity would be able to separate MYCN-amplified and MYCN-nonamplified phenotypes.
These results suggest that the JQ1 consensus signature is capable of differentiating between MYCN-amplified neuroblastoma and MYCN-nonamplified neuroblastoma samples. In particular, we see that when we use hierarchical clustering to differentiate the two phenotypes, we observe three distinct groups of samples: (1) the majority of the MYCN-amplified samples (left cluster, light blue); (2) MYCN-nonamplified samples that are similar to MYCN-amplified samples (middle cluster, dark blue); and (3) MYCN-nonamplified samples which are distinct from MYCN-amplified samples (right cluster, dark blue). The significance of this possible result would need further confirmation.