Prerequisites: Statistics (advanced), Probability Theory, Statistical Models 2
Target skills and knowledge: Introduction to biological problems that can be explored with data from new generation sequencing technologies.
Introduction to statistical models for transcriptomic and genomic data.
Ability to perform a complete data analysis: from raw data to the interpretation of the results.
Ability to write a short report on the analysis of a dataset assigned by the instructor.
Examination methods: Written exam and short report on the analysis of a dataset assigned by the instructor.
Assessment criteria: The evaluation criteria include: the clarity of the explanations in the written report, the correct choice of the statistical methods for the data analysis, the correctness and completeness of the answers in the written exam.
Additional criteria: the critical analysis of the results, the independence in the data analysis project.
Course unit contents: With the completion of the human genome project, and with the systematic genome sequencing of several complex organisms, a massive amount of genomic, proteomic, and transcriptomic data is now publicly accessible. The wide availability of biological data is revolutionizing genetic research and our comprehension of several biological mechanisms, such as gene regulation, protein interactions, and the activation and suppression of metabolic pathways. In this context, the amount and complexity of the data make the statistical analysis challenging.
The course will cover the following topics:
- Introduction to genomics, transcriptomics, and epigenomics.
- Sequence alignment. Alignment algorithms, global and local alignments, application to the quantification of RNA expression.
- Analyisis of gene expression data from RNA-seq experiments. Data normalization, global and local methods (lowess), variance-stabilizing transformations, discriminant and cluster analysis. Hypothesis testing for the identification of differentially expressed genes, moderated tests, permutational approaches. Multiple testing problems, control of the False Discovery Rate (FDR), classification methods, gene set analysis.
- Introduction to the analysis of other types of genomic data, such as DNA sequencing, chromatin accessibility, protein-RNA interactions (immunoprecipation).
Planned learning activities and teaching methods: Lectures and computer labs
Additional notes about suggested reading: Teaching material provided by the instructor
  • Irizarry, Rafael A.; Love, Michael I., Data Analysis for the Life Sciences with R. Boca Raton: Chapmann and Hall CRC, 2017. Cerca nel catalogo

