Integrative analysis of TCGA pan-cancer data

Speaker Name: 
Chen-Hsiang Yeang
Speaker Organization: 
Institute of Statistical Science, Academia Sinica, Taiwan
Start Time: 
Friday, September 11, 2015 - 10:00am
End Time: 
Friday, September 11, 2015 - 11:00am
599 Engineering 2
Josh Stuart, UC Santa Cruz Genomics Institute
Cancer is a systemic disease where molecular aberrations on DNAs, RNAs and proteins alter the biological processes of cells and henceforth drive the malignancy of tumors. Big consortia such as TCGA and ICGC have probed multiple facets of the molecular landscapes in multiple cancer types. Major efforts of analyzing multi-modal cancer genomic data focus on categorization of tumors in terms of molecular signatures and identification of molecular alterations pertaining to clinical phenotypes. However, unraveling the statistical and mechanistic associations between upstream molecular alterations (sequence mutations, copy number variations, DNA methylations, etc.) and downstream gene expression responses is less emphasized.
In this work, we propose a modeling and computational framework to infer the statistical and causal dependencies between molecular alterations and gene expressions. Genes modulated by the same molecular alterations with the same direction were grouped as association modules. We employed this modeling framework to the integrated TCGA data of 14 cancer types. While great heterogeneity exists among distinct cancer types, several molecular alterations recurrently modulate common sets of genes across multiple cancer types and manifest strong prognostic outcomes. Moreover, the target genes modulated by these recurrent molecular alterations are predominantly enriched in three biological processes: cell cycle control, immune response, and matrisome maintenance. Network analysis on the inferred associations further enhances this picture. The preliminary discovery of TCGA pan-cancer analysis provides a common ground for in-depth analysis to understand oncogenesis.