Mageck title image

Identify essential genes and associated subnetworks from Genome-Scale CRISPR-Cas9 knockout screens

Added by forrestkim on 2017.05.25 Official logo
Last updated on about 2 years ago.


Summary

What genes are essential to a cell’s survival in a specific environment?

This recipe provides a way to process the results of genome-wide CRISPR-Cas9 knockout screens. In these screens, single guide RNAs (sgRNAs) are designed to bind to and inhibit specific target DNA sequences in genes. Multiple sgRNAs may target the same gene to increase knockout efficiency. In positive screens, essential genes are identified through the sequencing of surviving cells post-selection. The loss of these ‘winning’ genes create cells that are resistant to the selective pressure. In negative screens, essential genes are identified by measuring which genes are lower in abundance post selection. These screens require a non-selected control, which is used to find which genes are essential to survival  under the given selective pressures (Miles et al., 2016). Since a large number of sgRNAs can be introduced in a single screen, many genes can be tested for a selection criteria. However, there are many factors to consider in processing of sequenced reads; often multiple sgRNAs in a library target the same gene but with different specificities and efficiencies, and read count distributions vary depending on library and study designs. Additionally, positive selection screens often result in relatively few sgRNAs that dominate the total sequenced reads. The MAGeCK (Li et al., 2014) method was specifically developed for CRISPR screen analyses with these conditions in mind.

How can we find the molecular mechanism responsible for resistance?

By looking at how the hits in the screen aggregate on an interaction network, we can get an idea of the mechanisms that are essential for the organism to survive an environmental challenge.  The network neighborhood that contains a high concentration of essential genes is strongly implicated as the molecular mechanism by which an organism handles the challenge.

We can find the network neighborhood that is enriched for the screen hits through an algorithm called network propagation (Carlin et al., in press) that is implemented as a feature of the popular network analysis program Cytoscape.  This algorithm will find the closely clustered hits and their network neighbors to build a network diagram of the resistance mechanism.  We can then use GeneMANIA plugin to find enriched terms that easily summarize the biological terms that are enriched in the diagram.

What is Model-based Analysis of Genome-wide CRIPSR/Cas9 Knockout (MAGeCK)?

Model-based Analysis of Genome-wide CRIPSR/Cas9 Knockout (MAGeCK) is an algorithm for identifying both positively and negatively selected sgRNAs and genes from genome-scale CRIPSR/Cas9 knockout screens. The MAGeCK method can be summarized by the following steps:

1. sgRNA read counts are median-ratio normalized.

2. Mean-variance modeling is then used to model each replicate. The statistical significance of each sgRNA is calculated using the learned mean-variance model.

3. Essential genes are determined by looking for genes with consistently highly significant sgRNAs using robust rank aggregation.

Use Case: MAGeCK enables robust identification of essential genes from genome-scale CRISPR/Cas9 knockout screens (Li et al. 2014).

Inputs

For this recipe, we will need two datasets, the experimental screen dataset and a control screen dataset, as well as a sgRNA library for sgRNA to gene relationship information. The sample CRISPR/Cas9 knockout screen treatment, control, and sgRNA library used in this recipe are from the paper "Genome-wide recessive genetic screeening in mammalian cells with a lentiviral CRISPR-guide RNA library" (Koike-Yusa et al. 2014). These datasets can be downloaded from the following GenomeSpace Public folders:

Treatment dataset:

Public > RecipeData > SequenceData > MAGeCK > ERR376999.fastq.gz: This file contains sequence data of the Cas9-expressing mouse ESCs after they were transfected with the targeted sgRNA expression vectors. The cell was then treated with alpha-toxin for selection for 2 days. The surviving cells were pooled and genomic DNA extracted for PCR and sequencing.

Control dataset:

Public > RecipeData > SequenceData > MAGeCK > ERR376998.fastq.gz: This file contains sequence data of the Cas9-expressing mouse ESCs after they were transfected with the pBluescript (control) vector. The cells were grown, pooled, and genomic DNA extracted similar  to the treated cells for sequencing.

sgRNA library:

Public > RecipeData > SequenceData > MAGeCK > yusa_library.csv: This file contains the sgRNAs and their corresponding target sequences.

NCI Pathway Interaction Database (in Cytoscape):

NDEx > NCI Pathway Interaction Database - Diffusion Demo Copy: This file contains a network derived from the latest BioPAX3 version of the Pathway Interaction Database (PID) curated by NCI/Nature.

Note: For help loading your own data into GenomeSpace, see: "Upload Data To GenomeSpace".

Outputs

Recipe steps

  • GenePattern
    1. Loading Data into GenePattern
    2. Determining and Preprocessing Read Counts
    3. Identifying Significant sgRNAs and Genes
  • Cytoscape 3
    1. Loading an Interaction Network
    2. Importing Essential Gene List to Network
    3. Discovering Interactions with Network Diffusion
    4. Functionally Annotating Subnetwork(s) in Most Related Genes

NOTE: If you have not yet associated your GenomeSpace account with your GenePattern account, you will be asked to do so. If you do not yet have a GenePattern account, you can automatically generate a new account that will be associated with your GenomeSpace account.

  1. Open GenePattern from GenomeSpace, and navigate to the GenomeSpace tab, then navigate to the folder containing the files (Public > RecipeData > SequenceData).

Click on the file (e.g. ERR376999.fastq.gz) in GenomeSpace, then use the GenePattern context menu and click Launch on File.

OR

Click on the file (e.g. ERR376999.fastq.gz) in GenomeSpace, then drag it to the GenePattern icon to launch.


MAGeCK.Count

  1. Launch GenePattern from GenomeSpace
  2. Change to the Modules tab, and search for MAGeCK.Count.
  3. Once the module has loaded, add the two FASTQ files (Click "Add Another Group name" to upload the second FASTQ file):
    1. Group name: esc1, fastq file: ERR376999.fastq.gz (found in Public > RecipeData > SequenceData)
    2. Group name: plasmid, fastq file: ERR376998.fastq.gz (found in Public > RecipeData > SequenceData)
  4. Change the following parameters:
    1. sgRNA list: yusa_library.csv (found in Public > RecipeData > SequenceData)
    2. output prefix: escneg
    3. trim 5 prime: 23
    4. normalization method: median
  5. Click Run to run MAGeCK.Count.

Save to GenomeSpace

  1. From the MAGeCK.Count outputs in the Jobs tab, click on the text file (e.g. escneg.count_normalized.txt), then choose Save to GenomeSpace and save the file to your desired directory.

MAGeCK.Test

  1. Change to the Modules tab, and search for MAGeCK.Test.
  2. Once the module has loaded, change the following parameters:
    1. count table: escneg.count_normalized.txt (from MAGeCK.Count job output)
    2. treatment id: esc1.1
    3. control id: plasmid.1
    4. normalization method: median
    5. output prefix: esccp
  3. Click Run to run MAGeCK.Test.

Save to GenomeSpace

  1. From the MAGeCK.Test outputs in the Jobs tab, click on the text file (e.g. esccp.gene_summary.txt), hen choose Save to GenomeSpace and save the file to your desired directory.

NOTE: The results of this module can also be used with the MAGeCK Pathways.Analysis to test if a pathway is enriched in one particular gene ranking using RRA


"Several existing algorithms, although not specifically designed for CRISPR/Cas9 knockout screens, can be also be used to identify significantly selected sgRNAs or genes. For example, edgeR, DESeq, baySeq and NBPSeq are commonly used algorithms for differential RNA-Seq expression analysis. These algorithms are able to evaluate the statistical significance of hits in CRISPR/Cas9 knockout screens, although only at the sgRNA level. Algorithms designed to rank genes in genome-scale short interfering RNA (siRNA) or short hairpin RNA (shRNA) screens can also be used for CRISPR/ Cas9 knockout screening data, including RNAi Gene Enrichment Ranking (RIGER) and Redundant siRNA Activity (RSA). However, these methods are designed to identify essential genes mostly from oligonucleotide barcode microarray data, and a new algorithm is needed to prioritize sgRNAs, as well as gene and pathway hits from high-throughput sequencing data" (Li et al., 2014).


NOTE: For Macintosh Users: JNLP files from the internet are labeled insecure. In order to open the JNLP, find the file in your Finder, right-click the file, and press open. This will open a window that will ask for permission to open the file. Press open to access the JNLP.

  1. Launch Cytoscape from GenomeSpace by clicking on the Cytoscape icon in the tool menu, prompting the download of a cytoscape.jnlp file. Double-click this file to launch Cytoscape.

  2. Once Cytoscape has launched, it will prompt a start menu. Close the start menu.
  3. Once the Cytoscape 3 software has loaded, it will prompt the user to name their network. Feel free to change the network name, or leave the parameters as default. Click OK.
  4. To load the network into Cytoscape, we will use the CyNDEx app. To install this, use the following steps:
  5. Navigate to Apps > App Manager

  6. Search for CyNDEx. Click on the app and click Install to install it.
  7. Navigate to Apps > NDEx > Import Networks from NDEx
  8. Search for Diffusion Demo. Make sure the full network title matches.
    1. Load the network into your workspace by pressing Load Network
    2. Then press Done Loading Network to view the loaded Network​

If you are using Cytoscape version 3.6.0+, you will have to use the newer version of the NDEx App, CyNDEx-2, to import the NCI Pathway Interaction Database. Once you have installed CyNDEx-2, you can search for and download the network by selecting the icon and typing "final revision" in the search bar (shown below).

  1. Select the Import Table from File button to load our gene summary output (esccp.gene_summary.txt) from Step 3.

  2. Select To a Network Collection for the Where to Import Table Data parameter. Make sure the Network Collection indicated is the NCI Pathway Interaction Database - Diffusion Demo Copy. Use the default parameters for the remaining options.
  1. To perform network diffusion, we will use the Diffusion app (NOTE: for Cytospace version 3.6.0, the Diffusion app is already installed). To install this, use the following steps:
    1. Navigate to Apps > App Manager

    2. Search for Diffusion. Select the app and click Install to install it.
  2. Select statistically significant essential genes. Go to the Select tab under the Control Panel.
    1. Press on the plus icon (+) and select Column Filter
    2. Choose the Node: neg|p-value
    3. Set neg|p-value: between 0 and 0.05 inclusive


  3. Navigate to Tools -> Diffuse -> Selected Nodes
  4. Select the 200 most related genes.
    1. Current Rank: 200. Press Set.
    2. Press Create to create a new network from the selection.
  5. Change the aesthetics to better visualize network interactions.
    1. Navigate to Layout -> yFiles Layouts -> Organic (Note: Cytoscape 3.6.0 uses a new yLayout App. To use them, download the yFiles Layouts through the Cytoscape portal and select "yFiles Organic Layout" in the Layout dropdown menu)
    2. Apply visual style to highlight original input nodes. For the Fill color column Map.:
      1. Column: diffusion_input
      2. Mapping type: discrete mapping
      3. Color: yellow

  1. To annotate our networks, we will use the GeneMANIA app. To install this, use the following steps:
    1. Navigate to Apps > App Manager
    2. Search for GeneMANIA. Click on the app and click Install to install it.
  2. Annotate networks with GeneMANIA.
    1. Select a network of genes by holding down Shift while drag selecting the to highlight the network of interest.
    2. Highlight all rows in the Node Table. Copy the selected gene names (Windows: Ctrl + C, Mac: Command + C).
    3. Navigate to Apps -> GeneMANIA -> Search...
    4. Paste the selected genes from the Node Table by pressing the keyboard shortcut for paste (Windows: Ctrl + V, Mac: Command + V).
    5. Make sure M. musculus (mouse) is selected under Organism
    6. Start the analysis by pressing Start.

As a result, we have a network with associated interactions and functions that can be further explored in the Result Panel.

Results Interpretation

Given sgRNA read counts for our treatment and control, we use the MAGeCK algorithm on GenePattern to determine essential genes, negatively selected genes needed for ESCs proliferation in the pressence of alpha toxin. Cytoscape then allows us to understand network interactions between them. We use network diffusion to discover modules that the essential genes interact with. With the resulting subnetworks, GeneMANIA provides functional annotation by searching across a large catalogue of gene sets to understand the known processes that are enriched.

GeneMANIA provides a set of networks, genes, and functional annotations from its analysis that we can explore to understand. We can see that many of the top “Functions” listed are essential biological processes and include many DNA repair genes, which are essential for ESC proliferation in the pressence of alpha toxin. These results are consistent with results found in Koike-Yusa et. al. 2014


Submit a Comment

History