Previous Recipe Version: 0

Saved about 2 years ago on 11/09/2016 20:00:44 UTC by sgaramsz
This version's status was: Published
Isdb gp galaxy

Identify and validate a consensus signature using gene expression data

Added by GenomeSpaceTeam on 2015.04.22 Official logo
Last updated on over 1 year ago.


Summary

 

 

This recipe provides one method for identifying a consensus gene signature from gene expression data, and validating the ability of the consensus signature to accurately distinguish phenotypes by using a test gene expression dataset. In particular, this recipe examines JQ1, a small-molecule inhibitor which binds and selectively inhibits the bromodomain and extra-terminal (BET) family of bromodomain proteins. This is thought to inhibit MYCN amplification in cancers such as neuroblastoma.

First, this recipe identifies a consensus gene signature of JQ1 by comparing the expression of JQ1-treated acute myeloid leukemia (AML) to JQ1-treated multiple myeloma (MM) cell lines. Finally, this recipe validates the ability of this consensus signature to distinguish between MYCN-amplified and MYCN-nonamplified neuroblastoma by projecting the consensus signature onto a test gene expression dataset.

This recipe recapitulates some of the results found by Puissant et al. in "Targeting MYCN in Neuroblastoma by BET Bromodomain Inhibition" (Cancer Discov. 2013).
 

 

 

Inputs

To complete this recipe, we will need several gene expression datasets, which will be obtained from InSilico DB; thus, we do not need any additional files from the GenomeSpace Public Folder.

Outputs

Recipe steps

  • InSilicoDB
    1. Getting gene expression datasets
  • GenePattern
    1. Identifying differentially expressed genes in the JQ1-treated datasets
  • Galaxy
    1. Loading data into Galaxy
    2. Identifying a consensus gene signature by comparing the AML and MM datasets
  • GenePattern
    1. Processing the neuroblastoma dataset
    2. Projecting the consensus gene list onto the neuroblastoma dataset
    3. Validating the consensus signature using clustering

In this example, we are using datasets that have already been normalized by the original authors. In these examples, the Robust Multiarray Averaging (RMA) method of microarray normalization and summarization was used on the datasets. We use the following datasets:

GSE29799
a JQ1-treated acute myeloid leukemia (AML) expression dataset (6 samples)
GSE31365
a JQ1-treated multiple myeloma (MM) expression dataset (12 samples)
GSE12460
a neuroblastoma expression dataset which has MCYN-amplified and MYCN-nonamplified samples (64 samples)

 

  1. Launch InSilico DB from GenomeSpace by clicking on the icon.
  2. Download the gene expression dataset from InSilico DB. For the first dataset, under the Public tab, search for "GSE29799".
  3. Click the Analyze button for the entry with, 'HuGene-1_0-st-v1' under Technology. Click the following buttons in the pop-up:
    1. Expression file
    2. Normalization options: Original normalization
    3. Gene/probe options: genes
    4. GenomeSpace
  4. Click Open in GenomeSpace to download the files to GenomeSpace.
  5. Repeat these steps to download GSE31365 and GSE12460.
  6. Once the files are downloaded, return to GenomeSpace. Optionally, rename the following folders:
    1. Rename folder 'GSE29799GPL6244...' to 'GSE29799_AML'.
    2. Rename folder 'GSE31365GPL6244...' to 'GSE31365_MM'.
    3. Rename folder 'GSE12460GPL750...' to 'GSE12460_MYCN'
  7. Optional: close InSilicoDB.

The output files should appear in your GenomeSpace home directory.

NOTE: If you have not yet associated your GenomeSpace account with your GenePattern account, you will be asked to do so. If you do not yet have a GenePattern account, you can automatically generate a new account that will be associated with your GenomeSpace account.

  1. Click on the GenePattern icon to launch the tool.
  2. Change to the Modules tab, and search for "ComparativeMarkerSelection". Once the module is loaded, change the following parameters:
    1. input file: load the AML GCT file, e.g., GSE29799GPL6244_RNA_ORIGINALGENE_30916.gct. To do this, use the GenomeSpace tab to navigate to the GSE29799_AML directory containing the AML GCT file, then drag the file to the input box.
    2. cls file: load the AML treatment CLS file, e.g., treatment.cls, also located in the GSE29799_AML directory.
    3. log transformed data: yes
    4. output filename: AML_genes.comp.marker.odf
  3. Click Run to run ComparativeMarkerSelection on the AML dataset.
  4. Once the job has finished running, save the resulting file back to GenomeSpace:
    1. Click on the file, and choose Save to GenomeSpace.
    2. Navigate to a directory of your choice and choose Save.
  5. Repeat these steps to identify differentially expressed genes for the MM dataset, GSE31365. Change the following parameters:
    1. input file: load the MM GCT file, e.g., GSE31365GPL6244_RNA_ORIGINALGENE_30917.gct. To do this, use the GenomeSpace tab to navigate to the GSE31365_MM directory containing the MM GCT file, then drag the file to the input box.
    2. cls file: load the MM treatment CLS file, e.g., treatment.cls, also located in the GSE31365_MM directory.
    3. log transformed data: yes
    4. output filename: MM_genes.comp.marker.odf
  6. Click Run to run ComparativeMarkerSelection on the MM dataset.
  7. Once the job has finished running, save the resulting file back to GenomeSpace, as before:
    1. Click on the file, and choose Save to GenomeSpace.
    2. Navigate to a directory of your choice and choose Save.
  8. Optional: close GenePattern.

NOTE: If you have not yet associated your GenomeSpace account with your Galaxy account, you will be asked to do so. If you do not yet have a Galaxy account, you can automatically generate a new account that will be associated with your GenomeSpace account.

  1. Click on the Galaxy icon to launch the tool.
  2. Navigate to the following menu: Get Data > GenomeSpace import
  3. Select the AML_genes.comp.marker.odf and MM_genes.comp.marker.odf files.
  4. Click Send to Galaxy.
  5. Once the files have been loaded, change the attributes for each file, by clicking the pencil icon and changing the following parameters:
    1. Switch to the Datatype tab.
    2. New type: tabular
    3. Click Save. Ignore any warnings which may pop up.
  1. Click on the following link: Official GenomeSpace Galaxy Workflow: Identify and Validate a Consensus Signature Using Gene Expression Data.
  2. Click the icon in the upper right corner to import the workflow.
  3. Click start using this workflow.
  4. Click on the workflow drop-down menu (e.g., imported: Identify and Validate a Consensus Signature Using Gene Expression Data), then choose Run.
  5. Load the files into the correct fields. The input fields should have annotation indicating which file should be loaded:
    1. Step 1: Input Dataset: AML_genes.comp.marker.odf
    2. Step 2: Input Dataset: MML_genes.comp.marker.odf
    3. Choose Target Directory: choose a directory to save the file to, e.g. your home directory.
  6. Click Run workflow.
  7. Once the workflow has finished running, the files will automatically saved to your GenomeSpace folder that was chosen in Step 5C.
  8. Optional: close Galaxy.

  1. Click on the GenePattern icon to launch the tool.
  2. Change to the Modules tab, and search for "SelectFeaturesColumns". Once the module is loaded, change the following parameters:
    1. input filename: load the neuroblastoma GCT file, e.g., GSE12460GPL750_RNA_ORIGINALGENE_31813.gct. To do this, use the GenomeSpace tab to navigate to the GSE12460_MYCN directory containing the neuroblastoma GCT file, then drag the file to the input box.
    2. columns: 0-2, 4-6, 8-16, 18-19, 21-25, 28, 30-36, 38-54
    3. output: MYCN.gene.exp.gct
  3. Click Run.
  4. Change to the Jobs tab, and reload the SelectFeaturesColumns module by clicking on the job and choosing Reload Job.
  5. Once the module is loaded, change the following parameters:
    1. input filename: load the neuroblastoma CLS file, e.g., Myc.Expression.cls. To do this, click the next to the input filename parameter to remove the GCT file from the module. Then, use the GenomeSpace tab to navigate to the GSE12460_MYCN directory containing the neuroblastoma CLS file, then drag the file to the input box.
    2. output: MYCN.gene.exp.cls
  6. Click Run.

  1. Change to the Modules tab, and search for "SelectFeaturesRows". Once the module is loaded, change the following parameters:
    1. input filename: MYCN.gene.exp.gct, the previously filtered neuroblastoma GCT file. To do this, use the Jobs tab to find the previous job results, then drag the file to the input box.
    2. list filename: consensus.genelist.txt, the consensus signature gene list. To do this, use the GenomeSpace tab to navigate to the directory containing the consensus signature gene list, then drag the file to the input box.
    3. output: MYCN.consensus.gct
  2. Click Run.
  1. Change to the Modules tab, and search for "GENE_E".
  2. Once the module is loaded, change to the Jobs tab, then change the following parameters:
    1. input file: MYCN.consensus.gct (output from SelectFeaturesRows).
    2. sample information or class file: MYCN.gene.exp.cls (output from SelectFeaturesColumns).
  3. Click Run.
  4. Once the job has finished running click on the Launch button to prompt a download of the GENE-E .jnlp file.
    NOTE: Mac users should be sure to to use ctrl + click on the downloaded .jnlp to overcome Mac's security gatekeeper function.
  5. Click the .jnlp file to launch GENE-E. You may have to enter your GenePattern or GenomeSpace log in credentials.
  6. Once GENE-E has been loaded, we will perform hierarchical clustering on the filtered neuroblastoma dataset:
    1. Click on the Hierarchical Clustering icon (), or navigate to the following menu: Tools > Clustering > Hierarchical Clustering...
    2. Check the Cluster columns box.
    3. Column distance metric: Euclidean distance
    4. Linkage method: Complete Linkage
  7. Click OK to run the clustering algorithm.

Results Interpretation

This is an example interpretation of the results from this recipe. First, we identified a consensus gene signature of JQ1 activity by finding genes that became differentially expressed due to JQ1 treatment in both acute myeloid leukemia (AML) and multiple myeloma (MM). Then, we projected this consensus signature on a test dataset of neuroblastoma cells which were not treated with JQ1, but were either MYCN-amplified or MYCN-nonamplified. Since MYCN amplification is associated with an increased sensitivity to BET bromodomain inhibitors, such as JQ1, we expected that a signature of JQ1 activity would be able to separate MYCN-amplified and MYCN-nonamplified phenotypes.

These results suggest that the JQ1 consensus signature is capable of differentiating between MYCN-amplified neuroblastoma and MYCN-nonamplified neuroblastoma samples. In particular, we see that when we use hierarchical clustering to differentiate the two phenotypes, we observe three distinct groups of samples: (1) the majority of the MYCN-amplified samples (left cluster, light blue); (2) MYCN-nonamplified samples that are similar to MYCN-amplified samples (middle cluster, dark blue); and (3) MYCN-nonamplified samples which are distinct from MYCN-amplified samples (right cluster, dark blue). The significance of this possible result would need further confirmation.


Posted by xiaojuw on March 01, 2016 03:47

Submit a Comment

History