Find subnetworks of differentially expressed genes and identify associated biological functions |
Added by GenomeSpaceTeam on 2015.03.16
Last updated on over 3 years ago.
What subnetworks of differentially expressed genes are enriched in my samples? What biological functions are they related to?
This recipe provides a method for identifying differentially expressed genes between two phenotypes, such as tumor and normal, to find subnetworks of interacting proteins and determine their functional annotations. An example use of this recipe is a case where an investigator may want to compare two phenotypes to determine which gene networks are similar between phenotypes, and to determine how functional annotation changes between phenotypes.
In particular, this recipe makes use of several GenePattern modules to identify differentially regulated genes, then uses several Cytoscape plugins to identify potential interactions between gene products, and to visualize the resulting network.
Why differential expression analysis? We assume that most genes are not expressed all the time, but rather are expressed in specific tissues, stages of development, or under certain conditions. Genes which are expressed in one condition, such as cancer tissue, are said to be differentially expressed when compared to normal conditions. To identify which genes change in response to specific conditions (e.g. cancer), we must filter or process the dataset to remove genes which are not informative.
Why protein interaction network analysis? Gene expression analysis results in a list of differentially expressed genes, but it does not explain whether these genes are connected biologically in a pathway or network. To better understanding the underlying biology that drives changes in gene expression analysis, we can perform network analysis to determine whether gene products (e.g. proteins) are reported to interact. To identify potential networks or pathways, we search for highly interconnected subnetworks within a large interaction network.
To complete this recipe, we will need a gene expression dataset describing two conditions or phenotypes, such as cancer tissue vs. normal tissue. In this example, we will use gene expression data from a study in which committed granulocyte macrophage progenitor cells (normal phenotype) were transformed into leukemia stem cells (leukemic phenotype) by introduction of the MLL-AF9 protein. This example data is derived from mouse (Mus musculus) cell lines. We will need the following datasets, which can be downloaded from the following GenomeSpace Public folder:
Public
> RecipeData
> ExpressionData
: Normal_Leu.gct
: This file contains gene expression data of two phenotypes: normal and leukemic. The file is available in GenePattern's GCT format.
Public
> RecipeData
> ExpressionData
: Normal_Leu.cls
: This file contains class assignments (normal or leukemic) for all the samples in the GCT file, as identified by the GenePattern CLS format.
The visual representation of a subnetwork of differentially expressed genes.
NOTE: If you have not yet associated your GenomeSpace account with your GenePattern account, you will be asked to do so. If you do not yet have a GenePattern account, you can automatically generate a new account that will be associated with your GenomeSpace account.
GenomeSpace
tab, then navigate to your personal directory.Tool: GenePattern
1. Click on the file (e.g., Normal_Leu.gct
) in GenomeSpace, then use the GenePattern context menu and click Launch on File
.
OR
2. Click on the file (e.g., Normal_Leu.gct
) in GenomeSpace, then drag it to the GenePattern icon to launch.
We will use the PreprocessDataset
module to filter out any genes that are not differentially expressed. In this recipe, we set the cut-off for differential expression at 3-fold up- or down-regulation. This module uses the GCT file.
Modules
tab, and search for "PreprocessDataset".input filename:
load the GCT file, e.g., Normal_Leu.gct
. To do this, navigate to the GenomeSpace
tab, and navigating to the folder containing the GCT file: Public
> RecipeData
> ExpressionData
. Load the file into the input filename
parameter by clicking and dragging the file to the input filename
input box.Run
to run PreprocessDataset
. This will generate a processed GCT file.
We will use the ComparativeMarkerSelection
module to identify genes which are differentially expressed and can distinguish between two phenotypes (e.g., normal vs. leukemic). This module uses the processed GCT file and the CLS file.
Modules
tab, and search for "ComparativeMarkerSelection".input file:
load the processed GCT file, e.g., Normal_Leu.preprocessed.gct
. To do this, navigate to the Jobs
tab, and find the preprocessed GCT file from the previous job. Click and drag the file to the input file
input box.cls file:
load the CLS file, e.g., Normal_Leu.cls
. To do this, navigate to the GenomeSpace
tab, and navigating to the folder containing the CLS file: Public
> RecipeData
> ExpressionData
. Click on the file and choose Send to cls file
, or by dragging the file to the cls file
input box.Run
to submit your job. This will generate an ODF file.
We will use the ExtractComparativeMarkerResults
module to select the top genes that distinguish between phenotypes. In this recipe, we will extract the top 50 genes by rank.
Modules
tab, and search for "ExtractComparativeMarkerResults".comparative marker selection filename:
load the ODF file from the previous job, e.g., Normal_Leu.preprocessed.comp.marker.odf
. To do this, navigate to the Jobs
tab, and find the processed ODF file from the previous job. Click and drag the file to the comparative marker selection filename
input box.dataset filename:
load the processed GCT file, e.g., Normal_Leu.preprocessed.gct
. To do this, navigate to the Jobs
tab, and find the preprocessed GCT file from a previous job. Click and drag the file to the dataset filename
input box.statistic:
Rankmax:
50Run
to submit your job. This will generate two filtered files, a filtered GCT file, and a filtered TXT file.
We will use the SelectFileMatrix
module to select the gene names from our list of top 50 genes, allowing us to later import the file into Cytoscape. This module selects features from a file based on the rows and columns specified by the user. In this recipe, we will extract only the gene names.
Modules
tab, and search for "SelectFileMatrix".input file:
load the filtered GCT file, e.g., Normal_Leu.preprocessed.comp.marker.filt.gct
. To do this, navigate to the Jobs
tab, and find the processed GCT file from the previous job. Click and drag the file to the input file
input box.output file base name:
set the output file base name parameter to a new output file name, e.g., Normal_Leu.genes
.start row:
3end row:
53start column:
2end column:
2start column
and end column
parameters to be blank. This will select all the columns in the file.Run
to submit your job. This will generate a new TXT file which contains only the gene names (column 2), with the first row as a header, e.g., Normals_Leu.genes.txt
.
Save the Normals_Leu.genes.txt
file to GenomeSpace using one of the following methods.
Normals_Leu.genes.txt
), then choose Save to GenomeSpace
. Save the file to your folder.Modules and Pipeline
start page, navigate to Jobs
. Click on the file, then choose Send to GenomeSpace
. Save the file to your folder.cytoscape.jnlp
file. Double-click the file to launch Cytoscape.Start New Session
label, choose With Empty Network
.OK
.File > Import > Network > GenomeSpace
.Normals_Leu.genes.txt
. Choose Select
to load the file. This will load a new menu for importing a network from a table.Import Network From Table
box, click the arrow to expand the menu. If no arrow appears, continue to step 8.Meaning:
box, choose the green circle icon that designates "source interaction".OK
to import the file.Yes
to confirm that the network should be imported.Table Panel
, select all the node ID names by clicking on the top/first node name (e.g. "Trf"), then choosing shift and scrolling downward. Then use ctrl+c (Windows) or ctrl+click (Mac) to copy the list of names.We will use the GeneMANIA
plugin to find the network of interacting proteins associated with our gene list. GeneMANIA
can find genes related to our set of input genes by using a very large set of functional association data, which includes protein and genetic interactions, pathways, co-expression, co-localization and protein domain similarity. GeneMANIA
can find new members of a pathway or complex, find additional genes that may have been missed in a screen, or find new genes with a specific function, such as protein kinases.
Apps > AppManager
.Install Apps
, search for "GeneMANIA".Install
. This may take several minutes.Close
.Apps > GeneMANIA > Search
.GeneMANIA
before, you will be prompted to install a database. Use the following steps:
Install Data...
.Download
tab, choose Mus musculus Mouse.Install
. This may take several minutes.Close
.GeneMANIA
is loaded, change the following parameters:
Organism:
M. musculus (mouse). If you are using your own database, make sure to select the appropriate species.Genes of Interest
: Click the empty box, then ctrl+v (Windows) or ctrl+click (Mac) to load your gene list into GeneMANIA
.GeneMANIA
database; you can click OK
.Start
to begin identifying connections between genes. This may take several minutes. Once the job is complete, close the GeneMania
pop-up.We will use the MCODE
plugin to find clusters (highly interconnected nodes) within the network. Clusters in a protein-protein interaction network may represent protein complexes or parts of a pathway, and therefore convey important biological information about the network.
Apps > AppManager
.Install Apps
, search for "MCODE".Install
. This may take several minutes.Close
.MCODE
plugin: Apps > MCODE > Open MCODE
.MCODE
is loaded, it will create a new tab in the Control Panel
. To run MCODE
, change the following parameters:
Find Cluster(s)
: in Whole NetworkAdvanced Options
Cluster Finding
: Check the Haircut
parameter.Analyze the Current Network
to identify subnetworks in your network. The results from this search will appear in the Results Panel
on the upper right.Results Panel
. This will also list the proteins in the Table Panel
.Layout > Edge Weighted Spring Embedded Layout
Layout > Circular Layout
Scale
function. Select Layout > Scale
, then use the sliding scale bar to increase or decrease network density.Style
tab in the Control Panel
. You can choose preset styles using a drop-down menu. You can create your own styles by clicking on the Defaults
pane, then adjusting the parameters.Edge
tab in the bottom part of the pane. Then click on EDGE_COLOR
, choose a new color, and click OK
. Some variables take numeric inputs, such as EDGE_LINE_WIDTH
.This is an example interpretation of the results from this recipe. First, we identified the top 50 genes which differentiated between two phenotypes, leukemic and normal. We then used the GeneMANIA
tool in Cytoscape to identify connections between these genes, although some genes were not annotated and therefore only a subset were actually analyzed. We included all possible sources of interaction, i.e. we are equally interested in connections between genes that arise from co-expression, as we are in connections arising from physical interactions.
After running GeneMANIA
we created a network which connected our subset of genes. We can see from the GeneMANIA
results that, e.g. 3 genes (out of 45) have the Gene Ontology (GO) annotation 'zinc ion homeostasis', which has a total of 16 genes associated with it. The significance of this enrichment is reported as a q-value, calculated from a FDR corrected hypergeometric test for enrichment. The q-value is analogous to a p-value, and therefore a lower q-value is considered more significant.
Once we learn about the functional enrichment associated with the genes in our network, we are interested in determining whether we can find subnetworks, i.e. areas of the network which have motifs. We use MCODE
to identify subnetworks. For example, MCODE
identifies a small subnetwork of 5 nodes and 6 edges. When we click on this subnetwork in the MCODE
panel, it will highlight the nodes in the network.
We can separate this set of genes into a subnetwork by clicking the Create sub-network
button in MCODE
. This generates a new view containing the subnetwork, arranged according to the MCODE
subnetwork view. Looking at this subnetwork of genes, we can examine their annotation and see that 3 of these genes are associated with zinc ion binding.
These results suggest that there is a collection of zinc ion binding genes in our set of 50 genes which differentiated between leukemic and normal phenotypes. However, the results in this example are not necessarily significant and are only a simple representation of possible results.