Identify copy number variation (CNV)-associated transcriptional regulators with unique gene expression profiles by stepwise linkage analysis |
Added by GenomeSpaceTeam on 2015.05.28
Last updated on over 3 years ago.
Are there specific transcriptional regulators, whose expression and copy number correlate with the expression of genes associated with a specific phenotype?
This recipe provides a method for identifying transcriptional regulators of a gene set associated with a specific phenotype. An example use of this recipe is a case where an investigator may want to identify determine which transcriptional regulators exhibit unique expression phenotypes (e.g. up-regulation or down-regulation). This recipe uses a procedure called "Stepwise Linkage Analysis of Microarray Signatures", first described by Adler et al. (Nat Genetics 2006). This recipe does not use the SLAMS software tool.
In particular, the phenotype is the embryonic stem cell (ESC) state, which is common to ESCs, as well as induced pluripotent stem cells (iPSCs), and also in a compendium of human cancers, such as breast cancer. In this recipe, we are interested in determining which genes transcriptionally regulate this 'stemness signature' of gene expression. This recipe recapitulates research by Wong et al., in Cell Stem Cell (2008), "Module map of stem cell genes guides creation of epithelial cancer stem cells". To recapitulate this research, we will use a procedure called Stepwise Linkage Analysis of Microarray Signatures (SLAMS), which is described by Adler et al. in Nature Genetics (2006), "Genetic regulators of large-scale transcriptional signatures in cancer". A summary description of the SLAMS procedure is listed below, and more information about SLAMS can be found in the review paper, "A SLAMS dunk for cancer regulators", by Kumar-Sinha and Chinnaiyan.
We use a gene expression dataset of primary human breast cancer tumor samples, with a complementary dataset of copy number variation data in array comparative genomic hybridization (aCGH) format, as described in Chin, K. et al, Cancer Cell, 2006. We use a set of stemness signature genes to separate breast cancer tumor samples into those which exhibit the stemness signature and those that do not by creating a module map in Genomica. A module map characterizes the expression of the gene expression dataset, providing information about sets of genes within the dataset.
We use the classified samples (e.g. stemness signature present vs. stemness signature absent) to normalize the copy number variation data in GenePattern. Next, we identify transcriptional regulators that correlate with the changes in the copy number dataset using a gene set collection from MSigDB, in Genomica. Finally, we identify transcriptional regulators whose amplification or deletion is correlated with up- or downregulation of gene expression. We consider these genes to be 'stemness regulators', i.e. genes which regulate the genes associated with the stemness signature.
Description of the Stepwise Linkage Analysis of Microarray Signatures (SLAMS) procedure (Kumar-Sinha and Chinnaiyan):
To complete this recipe, we will need a gene expression dataset, an accompanying set of copy number variation (CNV) data, and a custom-built gene set of interest. In this example, we use data from primary human breast cancer tumor samples: gene expression data, and array comparative genomic hybridization (aCGH) copy number variation data, fully described in Chin, K. et al, Cancer Cell (2006). We also use a set of genes whose transcriptional signature is consistent with an embryonic stem cell state ("stemness signature"). We will need the following datasets, which can be downloaded from the following GenomeSpace Public folder:
Public
> RecipeData
> ExpressionData
> breasttumor.preprocessed.collapsed.tab
: This file contains the gene expression profile of primary human breast cancer tumor samples. The original dataset has been log-transformed, row-centered on the mean, and has had the probe IDs collapsed to HUGO Gene Symbols.
Public
> RecipeData
> VariantData
> breasttumor.acgh.txt
: This file contains accompanying copy number variation data in array comparative genomic hybridization (aCGH) format.
Public
> RecipeData
> GeneSets
> stemness.geneset.tab
: This file contains a list of genes which are associated with the embryonic stem cell state.
This recipe identifies a set of transcription factors which regulate genes associated with an embryonic stem cell state. The recipe provides a list of stemness regulators, and also a heatmap of the stemness regulator expression in breast cancer tumor samples.
First, we will create a module map of co-regulated genes with similar expression profiles, to identify breast cancer tumor samples in which the stemness signature is present (“Stemness ON”) or absent (“Stemness OFF”).
Public
> RecipeData
> ExpressionData
> breasttumor.preprocessed.collapsed.tab
) by clicking the file and dragging it to the Genomica icon. This will download a JNLP file to launch Genomica. Double-click the file to start the Genomica program. You should see a heatmap of gene expression when the program first loads.Sets > Load Gene Sets from GenomeSpace File…
Public
> RecipeData
> GeneSets
> stemness.geneset.tab
), then click Select
.Algorithms > Create a Module Map…
Gene Sets
, click the box next to ...stemness.geneset
Experiments
, set Expression levels >= ___ are considered up-regulated
to 0.5.Experiments
, set Expression levels <= ___ are considered down-regulated
to -0.5.Run
to create a module map.Exp.An
, highlight the column INGENESET
by clicking on the label.Print Analysis
, click Regulation
.Select
.INGENESET.present.txt
, INGENESET.absent.txt
, and INGENESET.array.tab
.Next, we will normalize the raw breast cancer CNV profiles (aCGH data) using the files we just created in Genomica’s module map. In particular, INGENESET.present
is a list of breast cancer tumor samples which have the stemness signature upregulated, and INGENESET.absent
is a list of breast cancer tumor samples which do not have the stemness signature upregulated. These files will be used to normalize the aCGH data.
Modules
tab, search for the "Acgh2Tab" module.acgh input file
: breasttumor.acgh.txt
(found in the following directory: Public
> RecipeData
> VariantData
)presentlist file
: INGENESET.present.txt
absentlist file
: INGENESET.absent.txt
genelocs file
: hg18_refseq_genes.txt
(this is the default setting)Run
to submit the job.acgh.avergene.tab
) to GenomeSpace by clicking on the file and selecting Save to GenomeSpace
, then choosing a directory and clicking Save
.To identify the genes which regulate the stemness signature, we use a collection of gene sets which correspond to each human chromosome and each cytogenetic band. This collection of gene sets (C1) is obtained from the MSigDB, and are useful for identifying effects related to chromosomal deletions or amplifications.
Downloads
tab on the MSigDB website (also located at http://www.broadinstitute.org/gsea/downloads.jsp).c1.all.vX.X.symbols.gmt
. Click the file to download it to your local directory.Convert
Convert to
: geneset.tab
Convert on Server
We will create a module map to identify which genes in cytogenetic bands regulate the aberrations found in breast cancer tumor copy number variation data.
acgh.avergene.tab
) by clicking Genomica icon. This will download a JNLP file to launch Genomica. Double-click the file to start the Genomica program. Next, navigate to GenomeSpace > Open from GenomeSpace File…
and find the file in your directory. Click the file and choose Select
to load the file. You should see a heatmap of expression values.Sets > Load Gene Sets from GenomeSpace File…
c1.all.v5.0.symbols.geneset.tab
). Click Select
.Sets > Load Experiment Sets from GenomeSpace File…
INGENESET.array.tab
). Click Select
.Algorithms > Create a Module Map…
Gene Set
, click the box next to …c1.all.v5.0.symbols.geneset
Experiment attributes
, check the box next to …INGENESET.array
Run
to create a module map.Exp.An
, highlight the CHR8Q24
and CHR8Q22
columns (the top 2 most enriched cytobands).View Gene Hits
button.…INGENESET.array
.Analyze
.Print Analysis
, click Columns
. Give the resulting file of 145 candidate regulators a name, e.g. candidate_regulators.lst
. Click Save
.candidate_regulators.lst
to GenomeSpace by clicking and dragging the file into your GenomeSpace directory.Convert
Convert to
: geneset.tab
Convert on Server
Candidate stemness regulators can be identified based on the coordinate profiles of the gene expression data and copy number variation data. Coordinated profiles occur when the gene is amplified in the CNV dataset and also upregulated in the gene expression dataset, or when the gene is deleted in the CNV dataset and also downregulated in the gene expression dataset.
Public
> RecipeData
> ExpressionData
> breasttumor.preprocessed.collapsed.tab
) by clicking and dragging the file to the Genomica icon. You should see a heatmap of expression values.Sets > Load Gene Sets from GenomeSpace File…
candidate_regulators.geneset.tab
). Click Select
.Sets > Load Experiment Sets from GenomeSpace File…
INGENESET.array.tab
).Algorithms > Create a Module Map…
.
Gene Sets
, click the box next to …candidate_regulators.geneset
.Experiment attributes
, check the box next to …INGENESET.array
Run
to create a module map.Exp.An
, highlight the INGENESET
column.View Gene Hits
button.…INGENESET.array
.Analyze
.This is an example interpretation of the results from this recipe. First, we identified which breast cancer tumor samples exhibit the stemness signature and which did not, and used this classification to normalize copy number variation data. Next, we identified transcriptional regulators of the copy number variations by overlapping the copy number dataset with a gene set collection from MSigDB. Finally, we identified regulators of the stemness signature by identified genes with concordant profiles in the breast cancer tumor sample gene expression dataset, and in the copy number variation dataset, i.e. genes which exhibited copy number amplification and gene expression upregulation, or genes which exhibited copy number deletion and gene expression downregulation. Using this recipe we have identified 48 genes matching this description, which are the stemness regulators.
The heatmap below illustrates the expression of the stemness regulators in breast cancer tumor samples. Red indicates upregulation of expression, green indicates downregulation of expression. Gene names are listed to the right of the heatmap, and the sames of breast cancer tumor samples are listed above the heatmap. Note that only breast cancer tumor samples with the presence or absence of the stemness signature are included. To the left, a single column indicates which genes are in the ‘INGENESET’ category, which is the label for regulators of the copy number variation data. In this example, all genes fall into that category, i.e. they are associated with both copy number variation, and gene expression. The rows on the bottom of the heatmap indicates which breast cancer tumor sample falls into which category. In the first row, red indicates ‘presence’, and green indicates ‘absence’, in the second and third rows, red indicates ‘presence’. These are not related to gene expression changes.
Notice that these stemness regulatory genes tend to have the same overall expression patterns within one sample, i.e., both TAF2 and MYC are downregulated in sample b0404 (right-most green sample). Note that the pattern exhibits concordance between the stemness signature, and the stemness regulator. Sample b0404 lacks the stemness signature, and appears to have downregulation of stemness regulatory genes. In contrast, sample b0668 has the stemness signature present, and appears to have upregulation of stemness regulatory genes.
Print Analysis
, click Columns
.stemness_regulators.lst
.Save
.Convert
Convert to
: geneset.tab
Convert on Server