GenomeSpace Recipe: Identify copy number variation (CNV)-associated transcriptional regulators with unique gene expression profiles by stepwise linkage analysis

Identify copy number variation (CNV)-associated transcriptional regulators with unique gene expression profiles by stepwise linkage analysis

Added by GenomeSpaceTeam on 2015.05.28 Official logo
Last updated on over 3 years ago.

microarray copy number variation gene sets module map

Summary

Are there specific transcriptional regulators, whose expression and copy number correlate with the expression of genes associated with a specific phenotype?

This recipe provides a method for identifying transcriptional regulators of a gene set associated with a specific phenotype. An example use of this recipe is a case where an investigator may want to identify determine which transcriptional regulators exhibit unique expression phenotypes (e.g. up-regulation or down-regulation). This recipe uses a procedure called "Stepwise Linkage Analysis of Microarray Signatures", first described by Adler et al. (Nat Genetics 2006). This recipe does not use the SLAMS software tool.

In particular, the phenotype is the embryonic stem cell (ESC) state, which is common to ESCs, as well as induced pluripotent stem cells (iPSCs), and also in a compendium of human cancers, such as breast cancer. In this recipe, we are interested in determining which genes transcriptionally regulate this 'stemness signature' of gene expression. This recipe recapitulates research by Wong et al., in Cell Stem Cell (2008), "Module map of stem cell genes guides creation of epithelial cancer stem cells". To recapitulate this research, we will use a procedure called Stepwise Linkage Analysis of Microarray Signatures (SLAMS), which is described by Adler et al. in Nature Genetics (2006), "Genetic regulators of large-scale transcriptional signatures in cancer". A summary description of the SLAMS procedure is listed below, and more information about SLAMS can be found in the review paper, "A SLAMS dunk for cancer regulators", by Kumar-Sinha and Chinnaiyan.

We use a gene expression dataset of primary human breast cancer tumor samples, with a complementary dataset of copy number variation data in array comparative genomic hybridization (aCGH) format, as described in Chin, K. et al, Cancer Cell, 2006. We use a set of stemness signature genes to separate breast cancer tumor samples into those which exhibit the stemness signature and those that do not by creating a module map in Genomica. A module map characterizes the expression of the gene expression dataset, providing information about sets of genes within the dataset.

We use the classified samples (e.g. stemness signature present vs. stemness signature absent) to normalize the copy number variation data in GenePattern. Next, we identify transcriptional regulators that correlate with the changes in the copy number dataset using a gene set collection from MSigDB, in Genomica. Finally, we identify transcriptional regulators whose amplification or deletion is correlated with up- or downregulation of gene expression. We consider these genes to be 'stemness regulators', i.e. genes which regulate the genes associated with the stemness signature.

Description of the Stepwise Linkage Analysis of Microarray Signatures (SLAMS) procedure (Kumar-Sinha and Chinnaiyan):

Sort tumor samples into groups based on whether the stemness signature is present (“ON”) or absent (“OFF”).
Compare the DNA copy number changes between the groups of tumor samples. Calculate the association between stemness expression and CNV datasets to identify amplifications/deletions associated with the stemness signature.
Select genes which are potential candidate regulators of the stemness signature, based on coordinate gene amplification/deletion and gene expression upregulation/downregulation.
Validate the candidate regulators by assessing their predictive ability in independent samples of tumor samples.

Inputs

To complete this recipe, we will need a gene expression dataset, an accompanying set of copy number variation (CNV) data, and a custom-built gene set of interest. In this example, we use data from primary human breast cancer tumor samples: gene expression data, and array comparative genomic hybridization (aCGH) copy number variation data, fully described in Chin, K. et al, Cancer Cell (2006). We also use a set of genes whose transcriptional signature is consistent with an embryonic stem cell state ("stemness signature"). We will need the following datasets, which can be downloaded from the following GenomeSpace Public folder:

Public > RecipeData > ExpressionData > breasttumor.preprocessed.collapsed.tab: This file contains the gene expression profile of primary human breast cancer tumor samples. The original dataset has been log-transformed, row-centered on the mean, and has had the probe IDs collapsed to HUGO Gene Symbols.

Public > RecipeData > VariantData > breasttumor.acgh.txt: This file contains accompanying copy number variation data in array comparative genomic hybridization (aCGH) format.

Public > RecipeData > GeneSets > stemness.geneset.tab: This file contains a list of genes which are associated with the embryonic stem cell state.

Outputs

This recipe identifies a set of transcription factors which regulate genes associated with an embryonic stem cell state. The recipe provides a list of stemness regulators, and also a heatmap of the stemness regulator expression in breast cancer tumor samples.

Recipe steps

Genomica

Sort breast cancer tumor samples based on the presence or absence of the stemness signature

GenePattern

Normalize breast cancer copy number variation profiles

MSigDB

Download a collection of chromosome cytoband gene sets

Genomica

Compare copy number variation data to identify regulators of expression
Obtain a list of candidate stemness regulators

Expand All Steps

Collapse All Steps

1: Sort breast cancer tumor samples based on the presence or absence of the stemness signature

First, we will create a module map of co-regulated genes with similar expression profiles, to identify breast cancer tumor samples in which the stemness signature is present (“Stemness ON”) or absent (“Stemness OFF”).

Launch Genomica on the gene expression dataset file (Public > RecipeData > ExpressionData > breasttumor.preprocessed.collapsed.tab) by clicking the file and dragging it to the Genomica icon. This will download a JNLP file to launch Genomica. Double-click the file to start the Genomica program. You should see a heatmap of gene expression when the program first loads.
Navigate to the following menu: Sets > Load Gene Sets from GenomeSpace File…
Choose the stemness gene set file (Public > RecipeData > GeneSets > stemness.geneset.tab), then click Select.
Navigate to the following menu: Algorithms > Create a Module Map…
Once the tool has loaded, change the following parameters:
1. Under Gene Sets, click the box next to ...stemness.geneset
2. Under Experiments, set Expression levels >= ___ are considered up-regulated to 0.5.
3. Under Experiments, set Expression levels <= ___ are considered down-regulated to -0.5.
Click Run to create a module map.
NOTE: It may take several minutes to learn and create the module map.
Export the results to GenomeSpace using the following steps:
1. In the window Exp.An, highlight the column INGENESET by clicking on the label.
2. Under Print Analysis, click Regulation.
3. Choose a folder on your local computer, and click Select.
  NOTE: Genomica will automatically create and save the following files to this folder: INGENESET.present.txt, INGENESET.absent.txt, and INGENESET.array.tab.
Close Genomica and return to GenomeSpace.
Save the files to GenomeSpace by clicking and dragging them into your GenomeSpace directory from your local directory.

2: Normalize breast cancer copy number variation profiles

Next, we will normalize the raw breast cancer CNV profiles (aCGH data) using the files we just created in Genomica’s module map. In particular, INGENESET.present is a list of breast cancer tumor samples which have the stemness signature upregulated, and INGENESET.absent is a list of breast cancer tumor samples which do not have the stemness signature upregulated. These files will be used to normalize the aCGH data.

Launch GenePattern from GenomeSpace.
Under the Modules tab, search for the "Acgh2Tab" module.
Once the module has loaded, change the following parameters:
1. acgh input file: breasttumor.acgh.txt (found in the following directory: Public > RecipeData > VariantData)
2. presentlist file: INGENESET.present.txt
3. absentlist file: INGENESET.absent.txt
4. genelocs file: hg18_refseq_genes.txt (this is the default setting)
Click Run to submit the job.
Once the job has finished running, save the converted aCGH file (acgh.avergene.tab) to GenomeSpace by clicking on the file and selecting Save to GenomeSpace, then choosing a directory and clicking Save.

3: Download a collection of chromosome cytoband gene sets

To identify the genes which regulate the stemness signature, we use a collection of gene sets which correspond to each human chromosome and each cytogenetic band. This collection of gene sets (C1) is obtained from the MSigDB, and are useful for identifying effects related to chromosomal deletions or amplifications.

Launch MSigDB from GenomeSpace.
Navigate to the Downloads tab on the MSigDB website (also located at http://www.broadinstitute.org/gsea/downloads.jsp).
Scroll down until you find the file, c1.all.vX.X.symbols.gmt. Click the file to download it to your local directory.
NOTE: The file version may be updated to a different number than what is displayed in the screenshot. Always select the most recent file versions.
Save the file to GenomeSpace by clicking and dragging the file into your GenomeSpace directory.
Convert the file to Genomica TAB format using the follow method:
1. Right-click on the file, then choose Convert
2. Convert to: geneset.tab
3. Click Convert on Server

4: Compare copy number variation data to identify regulators of expression

We will create a module map to identify which genes in cytogenetic bands regulate the aberrations found in breast cancer tumor copy number variation data.

Launch Genomica on the normalized breast cancer CNV profile (acgh.avergene.tab) by clicking Genomica icon. This will download a JNLP file to launch Genomica. Double-click the file to start the Genomica program. Next, navigate to GenomeSpace > Open from GenomeSpace File… and find the file in your directory. Click the file and choose Select to load the file. You should see a heatmap of expression values.
Navigate to the following menu: Sets > Load Gene Sets from GenomeSpace File…
Choose C1 collection of chromosome cytobands file from GenomeSpace (c1.all.v5.0.symbols.geneset.tab). Click Select.
Navigate to the following menu: Sets > Load Experiment Sets from GenomeSpace File…
Choose the experiment array set of samples that are present or absent for stemness genes (INGENESET.array.tab). Click Select.
Navigate to the following menu: Algorithms > Create a Module Map…
Once the tool has loaded, change the following parameters:
1. Under Gene Set, click the box next to …c1.all.v5.0.symbols.geneset
2. Under Experiment attributes, check the box next to …INGENESET.array
Click Run to create a module map.
In the window Exp.An, highlight the CHR8Q24 and CHR8Q22 columns (the top 2 most enriched cytobands).
Click the View Gene Hits button.
Check the box next to …INGENESET.array.
Click Analyze.
Once the new heatmap is loaded, under Print Analysis, click Columns. Give the resulting file of 145 candidate regulators a name, e.g. candidate_regulators.lst. Click Save.
Upload candidate_regulators.lst to GenomeSpace by clicking and dragging the file into your GenomeSpace directory.
In GenomeSpace, convert the LST file to a Genomica geneset TAB format:
1. Right-click on the file, then choose Convert
2. Convert to: geneset.tab
3. Click Convert on Server

5: Obtain a list of candidate stemness regulators

Candidate stemness regulators can be identified based on the coordinate profiles of the gene expression data and copy number variation data. Coordinated profiles occur when the gene is amplified in the CNV dataset and also upregulated in the gene expression dataset, or when the gene is deleted in the CNV dataset and also downregulated in the gene expression dataset.

Launch Genomica on the normalized breast cancer gene expression dataset (Public > RecipeData > ExpressionData > breasttumor.preprocessed.collapsed.tab) by clicking and dragging the file to the Genomica icon. You should see a heatmap of expression values.
Navigate to the following menu: Sets > Load Gene Sets from GenomeSpace File…
Choose the candidate regulators file (candidate_regulators.geneset.tab). Click Select.
Navigate to the following menu: Sets > Load Experiment Sets from GenomeSpace File…
Choose the experiment array set of samples that are present or absent for stemness genes (INGENESET.array.tab).
Navigate to the following menu: Algorithms > Create a Module Map….
1. Under Gene Sets, click the box next to …candidate_regulators.geneset.
2. Under Experiment attributes, check the box next to …INGENESET.array
Click Run to create a module map.
In the window Exp.An, highlight the INGENESET column.
Click the View Gene Hits button.
Check the box next to …INGENESET.array.
Click Analyze.

Results Interpretation

This is an example interpretation of the results from this recipe. First, we identified which breast cancer tumor samples exhibit the stemness signature and which did not, and used this classification to normalize copy number variation data. Next, we identified transcriptional regulators of the copy number variations by overlapping the copy number dataset with a gene set collection from MSigDB. Finally, we identified regulators of the stemness signature by identified genes with concordant profiles in the breast cancer tumor sample gene expression dataset, and in the copy number variation dataset, i.e. genes which exhibited copy number amplification and gene expression upregulation, or genes which exhibited copy number deletion and gene expression downregulation. Using this recipe we have identified 48 genes matching this description, which are the stemness regulators.

The heatmap below illustrates the expression of the stemness regulators in breast cancer tumor samples. Red indicates upregulation of expression, green indicates downregulation of expression. Gene names are listed to the right of the heatmap, and the sames of breast cancer tumor samples are listed above the heatmap. Note that only breast cancer tumor samples with the presence or absence of the stemness signature are included. To the left, a single column indicates which genes are in the ‘INGENESET’ category, which is the label for regulators of the copy number variation data. In this example, all genes fall into that category, i.e. they are associated with both copy number variation, and gene expression. The rows on the bottom of the heatmap indicates which breast cancer tumor sample falls into which category. In the first row, red indicates ‘presence’, and green indicates ‘absence’, in the second and third rows, red indicates ‘presence’. These are not related to gene expression changes.

Notice that these stemness regulatory genes tend to have the same overall expression patterns within one sample, i.e., both TAF2 and MYC are downregulated in sample b0404 (right-most green sample). Note that the pattern exhibits concordance between the stemness signature, and the stemness regulator. Sample b0404 lacks the stemness signature, and appears to have downregulation of stemness regulatory genes. In contrast, sample b0668 has the stemness signature present, and appears to have upregulation of stemness regulatory genes.

To save the list of stemness regulators from Genomica:
1. Under Print Analysis, click Columns.
2. Save the resulting file of 48 stemness regulators as, e.g., stemness_regulators.lst.
3. Click Save.
4. You can save this file to GenomeSpace by clicking and dragging the files into your GenomeSpace directory from your local directory.
To convert the LST format to a Genomica geneset TAB format:
1. Right-click on the file
2. Choose Convert
3. Convert to: geneset.tab
4. Click Convert on Server