Previous Recipe Version: 0

Saved about 2 years ago on 10/14/2016 16:11:06 UTC by sgaramsz
This version's status was: Published
Content gp to cytoscape small

Find subnetworks of differentially expressed genes and identify associated biological functions

Added by GenomeSpaceTeam on 2015.03.16 Official logo
Last updated on over 1 year ago.


Summary

This recipe provides one method of using genes that are differentially expressed between two phenotypes, such as normal and tumor, to find subnetworks of interacting proteins and determine their functional annotations using Gene Ontology. In particular, this recipe makes use of several GenePattern modules to identify differentially regulated genes, then uses several Cytoscape plugins to identify potential interactions between gene products, and to visualize the resulting network.

Why differential expression analysis? We assume that most genes are not expressed all the time, but rather are expressed in specific tissues, stages of development, or under certain conditions. Genes which are expressed in one condition, such as cancer tissue, are said to be differentially expressed when compared to normal conditions. To identify which genes change in response to specific conditions (e.g. cancer), we must filter or process the dataset to remove genes which are not informative.

Why protein interaction network analysis? Gene expression analysis results in a list of differentially expressed genes, but it does not explain whether these genes are connected biologically in a pathway or network. To better understanding the underlying biology that drives changes in gene expression analysis, we can perform network analysis to determine whether gene products (e.g. proteins) are reported to interact. To identify potential networks or pathways, we search for highly interconnected subnetworks within a large interaction network.

Inputs

To complete this recipe, we will need a gene expression dataset describing two conditions or phenotypes, such as cancer tissue vs. normal tissue. In this example, we will use gene expression data from a study in which committed granulocyte macrophage progenitor cells (normal phenotype) were transformed into leukemia stem cells (leukemic phenotype) by introduction of the MLL-AF9 protein. This example data is derived from mouse (Mus musculus) cell lines. We will need the following datasets, which can be downloaded from GenomeSpace's Public folder:

Normal_Leu.gct
This file contains gene expression data of two phenotypes: normal and leukemic. The file is available in GenePattern's GCT format.
Normal_Leu.cls
This file contains class assignments (normal or leukemic) for all the samples in the GCT file, as identified by the GenePattern CLS format.

Getting Data

  1. In GenomeSpace, navigate to the following Public data folder: Public > RecipeData > ExpressionData
  2. The following files will be used in this recipe:
    1. Normal_Leu.gct
    2. Normal_Leu.cls

Outputs

The visual representation of a subnetwork of differentially expressed genes.

Recipe steps

  • GenePattern
    1. Loading data
    2. Filtering genes by expression value
    3. Identifying differentially expressed genes
    4. Selecting the top genes
    5. Save the files to GenomeSpace
  • Cytoscape
    1. Loading data into Cytoscape
    2. Identifying interacting proteins
    3. Finding differentially expressed subnetworks
    4. Exploring the subnetworks

  1. Click on the file (e.g., Normal_Leu.gct) in GenomeSpace, then use the GenePattern context menu and click Launch on File
  2. Click on the file (e.g., Normal_Leu.gct) in GenomeSpace, then drag it to the GenePattern icon to launch
  3. Open GenePattern from GenomeSpace, navigate to the GenomeSpace tab, then navigate to your personal directory.

1. Click on the file (e.g., Normal_Leu.gct) in GenomeSpace, then use the GenePattern context menu and click Launch on File.

OR

2. Click on the file (e.g., Normal_Leu.gct) in GenomeSpace, then drag it to the GenePattern icon to launch.

  1. Change to the Modules tab, and search for "PreprocessDataset".
  2. Once the module is loaded, change the following parameters:
    1. input filename: load the GCT file, e.g., Normal_Leu.gct. To do this, navigate to the GenomeSpace tab, and navigating to the folder containing the GCT file. Load the file into the input filename parameter by clicking and dragging the file to the input filename input box.
  3. Click Run to run PreprocessDataset. This will generate a processed GCT file.

  1. Change to the Modules tab, and search for "ComparativeMarkerSelection".
  2. Once the module is loaded, change the following parameters:
    1. input file: load the processed GCT file, e.g., Normal_Leu.preprocessed.gct. To do this, navigate to the Jobs tab, and find the preprocessed GCT file from the previous job. Click and drag the file to the input file input box.
    2. cls file: load the CLS file, e.g., Normal_Leu.cls. To do this, navigate to the GenomeSpace tab, and navigating to the folder containing the file. Click on the file and choose Send to cls file, or by dragging the file to the cls file input box.
  3. Click Run to submit your job. This will generate an ODF file.

  1. Change to the Modules tab, and search for "ExtractComparativeMarkerResults".
  2. Once the module is loaded, change the following parameters:
    1. comparative marker selection filename: load the ODF file from the previous job, e.g., Normal_Leu.preprocessed.comp.marker.odf. To do this, navigate to the Jobs tab, and find the processed ODF file from the previous job. Click and drag the file to the comparative marker selection filename input box.
    2. dataset filename: load the processed GCT file, e.g., Normal_Leu.preprocessed.gct. To do this, navigate to the Jobs tab, and find the preprocessed GCT file from a previous job. Click and drag the file to the dataset filename input box.
    3. statistic: Rank
    4. max: 50
  3. Click Run to submit your job. This will generate two filtered files, a filtered GCT file, and a filtered TXT file.

  1. Change to the Modules tab, and search for "SelectFileMatrix".
  2. Once the module is loaded, change the following parameters:
    1. input file: load the filtered GCT file, e.g., Normal_Leu.preprocessed.comp.marker.filt.gct. To do this, navigate to the Jobs tab, and find the processed GCT file from the previous job. Click and drag the file to the input file input box.
    2. output file base name: set the output file base name parameter to a new output file name, e.g., Normal_Leu.genes.txt.
    3. start row: 3
    4. end row: 53
    5. start column: 2
    6. end column: 2
      NOTE: to keep more information from this gene list, e.g. the values in the remaining columns, set the start column and end column parameters to be blank. This will select all the columns in the file.
  3. Click Run to submit your job. This will generate a new TXT file which contains only the gene names (column 2), with the first row as a header, e.g., Normal_Leu.genes.txt.

 

Save the Normal_Leu.genes.txt file to GenomeSpace using one of the following methods.

  1. From the job processing view, click the context menu (blue arrow) next to the dataset (e.g., Normal_Leu.genes.txt), then choose Save to GenomeSpace. Save the file to your folder.
  2. From the Modules and Pipeline start page, navigate to Jobs. Click on the file, then choose Send to GenomeSpace. Save the file to your folder.
  3. OPTIONAL: close GenePattern.

  1. Launch Cytoscape from GenomeSpace by clicking on the context menu and choosing Launch, prompting the download of a cytoscape.jnlp file. Double-click the file to launch Cytoscape.
  2. Once Cytoscape is launched, navigate to the GenomeSpace import menu: File > Import > GenomeSpace. Choose Load network from table.
  3. Navigate to the file which contains the gene names, e.g., Normals_Leu.genes.txt. Click Open to load the file. Choose Select to load the file.
  4. Under Source Interaction, choose the column which contains your gene names. There may be only one column.
  5. If you have a file header, make sure the Show Text File Import Options box is checked. This will show advanced options.
  6. Check the Transfer first line as attribute names in order to prevent the header from being imported as a 'gene name'.
  7. Click Import to load the file. You should see many nodes (gene names), but no interaction edges.
  8. Use your mouse to highlight all the nodes. Under Data Panel, select all the node IDs, then right-click (Windows) or ctrl+click (Mac) and select Copy.

  1. In Cytoscape, navigate to the GeneMANIA plugin: Plugins > GeneMANIA > Search.
    NOTE: if you have never run GeneMANIA before, you will be prompted to install a database. Use the following steps:
    1. Choose the most recent database. Select Core, then Download. Under the Download tab, choose Mus musculus Mouse. Click Install.
      NOTE: This process may take several minutes. You may use your own database files; however, the species should match that of your expression data; i.e., if you are using H. sapiens data, make sure the database is also from H. sapiens.
    2. Once the databases are installed, click Close.
  2. Once GeneMANIA is loaded, change the following parameters:
    1. Step 1a: Choose an Organism: M. musculus (mouse). If you are using your own database, make sure to select the appropriate species.
    2. Step 2: Choose Genes of Interest: Click the empty box, then right-click (Windows) or ctrl+click (Mac) and select Paste to load your gene list into GeneMANIA.
      NOTE: A dialog box may appear, telling you that some genes were not found in the GeneMANIA database; you can click OK.
  3. Click Start to begin identifying connections between genes. This may take several minutes. Once the job is complete, close the GeneMania pop-up.

  1. In Cytoscape, navigate to the MCODE plugin: Plugins > MCODE > Start MCODE
  2. Once MCODE is loaded, change the following parameters:
    1. Find Cluster(s): in Whole Network
    2. Advanced Options > Cluster Finding: Check the Haircut parameter.
  3. Click Analyze to identify subnetworks in your network. The results from this search will appear in the Results Panel on the upper right.
  • To highlight the proteins in a cluster, select the cluster in the Results Panel. This will also list the proteins in the Data Panel.
  • The Genes and Functions tab will provide more information on the genes and proteins found in the cluster.
  • Cytoscape provides many options for displaying a network. For example:
    • A force-directed layout finds an optimal way to display nodes and edges by simulating nodes as objects and edges as springs connecting objects together. Cytoscape provides several variations on a force-directed layout. To create a force-directed layout, navigate through the following menu:
      Layout > Cytoscape Layouts > Edge Weighted Force Directed (BioLayout) > All Nodes > unweighted
    • A circular layout arranges nodes in a circle. Layouts can use information about a node, such as the node's degree, to determine the order of nodes within the circle. To create a circular layout, navigate through the following menu:
      Layout > Cytoscape Layouts > Circular Layout > All Nodes
  • To control the density of the network (how far nodes are from one another), you can use the Scale function. Select Layout > Scale, then use the sliding scale bar to increase or decrease network density.
  • To change the visual style of the network, e.g., by coloring nodes or adding arrows to edges, navigate to the VizMapper™ tab in the Control Panel. You can choose preset styles using a drop-down menu. You can create your own styles by clicking on the Defaults pane, then adjusting the parameters.
    For example, to adjust edge color first choose the Edge tab in the bottom part of the pane. Then click on EDGE_COLOR, choose a new color, and click OK. Some variables take numeric inputs, such as EDGE_LINE_WIDTH.

Results Interpretation

This is an example interpretation of the results from this recipe. First, we identified the top 50 genes which differentiated between two phenotypes, leukemic and normal. We then used the GeneMANIA tool in Cytoscape to identify connections between these genes, although some genes were not annotated and therefore only a subset were actually analyzed. We included all possible sources of interaction, i.e. we are equally interested in connections between genes that arise from co-expression, as we are in connections arising from physical interactions.
After running GeneMANIA we created a network which connected our subset of genes. We can see from the GeneMANIA results that, e.g. 3 genes (out of 45) have the Gene Ontology (GO) annotation 'zinc ion homeostasis', which has a total of 16 genes associated with it. The significance of this enrichment is reported as a q-value, calculated from a FDR corrected hypergeometric test for enrichment. The q-value is analogous to a p-value, and therefore a lower q-value is considered more significant.

Once we learn about the functional enrichment associated with the genes in our network, we are interested in determining whether we can find subnetworks, i.e. areas of the network which have motifs. We use MCODE to identify subnetworks. For example, MCODE identifies a small subnetwork of 5 nodes and 6 edges. When we click on this subnetwork in the MCODE panel, it will highlight the nodes in the network.

We can separate this set of genes into a subnetwork by clicking the Create sub-network button in MCODE. This generates a new view containing the subnetwork, arranged according to the MCODE subnetwork view. Looking at this subnetwork of genes, we can examine their annotation and see that 3 of these genes are associated with zinc ion binding.

These results suggest that there is a collection of zinc ion binding genes in our set of 50 genes which differentiated between leukemic and normal phenotypes. However, the results in this example are not necessarily significant and are only a simple representation of possible results.


Submit a Comment

History