Expand all recipe descriptions

Found 2 recipes

How do I obtain and analyze data from The Cancer Genome Atlas (TCGA)? Which TCGA datasets have specific mutations in my gene of interest?

This recipe provides a method for identifying and obtaining specific datasets of interest from The Cancer Genome Atlas (TCGA), through a web-based tool called FireBrowse. An example use of this recipe is a case where an investigator may have a gene they are interested in, such as ERCC2, and would like to know if there are mutations in this gene in specific datasets of interest, such as bladder cancer.


Tumors arise from mutational changes to healthy cells, and are frequently deficient in one or more DNA repair pathways. The accumulation of mutations in tumor can be described by the “mutational signature”, a pattern of genetic mutations found in tumor DNA, which reflect different mutation events. Mutational signatures can be specific to certain tissues or cancer types. Many of these mutational signatures are associated with DNA repair pathways.

An in-depth study of urothelial carcinoma, which causes ~150,000 deaths annually, by Kim et al. (Nature Genetics, 2016) has identified a mutational signature in bladder cancer involving the nucleotide-excision repair (NER) pathway. Kim et al. identified a mutational signature involving ERCC2, a gene encoding a DNA helicase which plays a critical role in the NER pathway. Somatic mutations in this gene may prevent proper functioning of the NER pathway, allowing mutations to accumulate. Uniquely, urothelial cancer is the only known tumor type to date in which ERCC2 is significantly mutated.

Kim et al. used the collection of bladder carcinoma (BLCA) samples in The Cancer Genome Atlas (TCGA) to complete their analysis. Data were downloaded from the Broad Institute TCGA Genome Data Analysis Center, and samples were categorized based on mutational status. Tumors with somatic, missense mutations in ERCC2 were compared to non-mutated (wild-type) samples to identify the comprehensive mutational landscape of bladder cancer (also described in this TCGA paper).

This recipe provides a method for processing data from The Cancer Genome Atlas (TCGA), to identify samples which have mutations in specific genes. The purpose of this recipe is to categorize data by mutational status, for further downstream analysis (e.g. comparing tumors of different mutational status, etc.). Data is collected from FireBrowse; Galaxy and GenePattern are used to categorize samples by mutational status and generate GCT and CLS files. The RNA-seq datasets are gene-level normalized RSEM expression estimates.


TCGA Barcodes

The Cancer Genome Atlas labels its datasets with the TCGA barcode, an identifer that describes the metadata associated with sample. You can learn more about the TCGA Barcodes on the NIH National Cancer Institute Wiki page (see also: working with TCGA data).

TCGA barcodes adhere to a certain format: TCGA-00-1111-22A-33B-4444-55. For this recipe, we are interested in the Sample type, indicated by the 22A section of the barcode. For this recipe we are interested in samples with designation 01 (solid tumor, or TP) or 11 (solid tissue normal, or NT), which are paired tumor-normal samples.

What genes are essential to a cell’s survival in a specific environment?

This recipe provides a way to process the results of genome-wide CRISPR-Cas9 knockout screens. In these screens, single guide RNAs (sgRNAs) are designed to bind to and inhibit specific target DNA sequences in genes. Multiple sgRNAs may target the same gene to increase knockout efficiency. In positive screens, essential genes are identified through the sequencing of surviving cells post-selection. The loss of these ‘winning’ genes create cells that are resistant to the selective pressure. In negative screens, essential genes are identified by measuring which genes are lower in abundance post selection. These screens require a non-selected control, which is used to find which genes are essential to survival  under the given selective pressures (Miles et al., 2016). Since a large number of sgRNAs can be introduced in a single screen, many genes can be tested for a selection criteria. However, there are many factors to consider in processing of sequenced reads; often multiple sgRNAs in a library target the same gene but with different specificities and efficiencies, and read count distributions vary depending on library and study designs. Additionally, positive selection screens often result in relatively few sgRNAs that dominate the total sequenced reads. The MAGeCK (Li et al., 2014) method was specifically developed for CRISPR screen analyses with these conditions in mind.

How can we find the molecular mechanism responsible for resistance?

By looking at how the hits in the screen aggregate on an interaction network, we can get an idea of the mechanisms that are essential for the organism to survive an environmental challenge.  The network neighborhood that contains a high concentration of essential genes is strongly implicated as the molecular mechanism by which an organism handles the challenge.

We can find the network neighborhood that is enriched for the screen hits through an algorithm called network propagation (Carlin et al., in press) that is implemented as a feature of the popular network analysis program Cytoscape.  This algorithm will find the closely clustered hits and their network neighbors to build a network diagram of the resistance mechanism.  We can then use GeneMANIA plugin to find enriched terms that easily summarize the biological terms that are enriched in the diagram.

What is Model-based Analysis of Genome-wide CRIPSR/Cas9 Knockout (MAGeCK)?

Model-based Analysis of Genome-wide CRIPSR/Cas9 Knockout (MAGeCK) is an algorithm for identifying both positively and negatively selected sgRNAs and genes from genome-scale CRIPSR/Cas9 knockout screens. The MAGeCK method can be summarized by the following steps:

1. sgRNA read counts are median-ratio normalized.

2. Mean-variance modeling is then used to model each replicate. The statistical significance of each sgRNA is calculated using the learned mean-variance model.

3. Essential genes are determined by looking for genes with consistently highly significant sgRNAs using robust rank aggregation.

Use Case: MAGeCK enables robust identification of essential genes from genome-scale CRISPR/Cas9 knockout screens (Li et al. 2014).

Filter by analysis type

Filter by data type

Filter by all available tags

Filter by tool