First cycle
degree courses
Second cycle
degree courses
Single cycle
degree courses
School of Science
DATA SCIENCE
Course unit
STATISTICAL METHODS FOR HIGH DIMENSIONAL DATA
SCP9087918, A.A. 2019/20

Information concerning the students who enrolled in A.Y. 2018/19

Information on the course unit
Degree course Second cycle degree in
DATA SCIENCE
SC2377, Degree course structure A.Y. 2017/18, A.Y. 2019/20
N0
bring this page
with you
Number of ECTS credits allocated 6.0
Type of assessment Mark
Course unit English denomination STATISTICAL METHODS FOR HIGH DIMENSIONAL DATA
Website of the academic structure http://datascience.scienze.unipd.it/2019/laurea_magistrale
Department of reference Department of Mathematics
Mandatory attendance No
Language of instruction English
Branch PADOVA
Single Course unit The Course unit can be attended under the option Single Course unit attendance
Optional Course unit The Course unit can be chosen as Optional Course unit

Lecturers
Teacher in charge BRUNO SCARPA SECS-S/01

ECTS: details
Type Scientific-Disciplinary Sector Credits allocated
Core courses SECS-S/01 Statistics 6.0

Course unit organization
Period First semester
Year 2nd Year
Teaching method frontal

Type of hours Credits Teaching
hours
Hours of
Individual study
Shifts
Lecture 6.0 48 102.0 No turn

Calendar
Start of activities 30/09/2019
End of activities 18/01/2020
Show course schedule 2019/20 Reg.2017 course timetable

Examination board
Examination board not defined

Syllabus
Prerequisites: Statistical learning, Stochastic methods
Target skills and knowledge: This course aims at introducing the students to the main statistical features and concepts underlying the analysis of high dimensional data, as well as providing statistical solutions to problems arising when analysing real dataon many different fields (business, society, medicine, psycology, physics, etc).
Examination methods: Practical and oral exams
Assessment criteria: Students will be evaluated according to their level of knowledge of the key concepts in analysing high dimensional data and their ability to apply them to real cases.
Course unit contents: Every year some of the following topics will be presented, according also to the preferences of the students.

1. REGRESSION MODELS FOR HIGH-DIMENSIONAL DATA
1.1 Incremental algorithms with limited memory, stochastic gradient
descent, inference
1.2 Sparsity, penalization inducing sparsity
1.3 Recall of Lasso and Elastic-Net for GLM
1.4 Extensions: adaptation, fusion, dealing with categorical variables
1.5 Group LASSO
1.6 Non-convex penalties

2. STATISTICAL ANALYSIS OF NETWORK DATA
2.1 Introduction to network structures of data
2.2 Network and nodes indicators
2.3 Community detection
2.4 Basics statistical models and inference (Erdos-Renyi, p1, ERGM)
2.5 Bayesian models (Stochastic block models, Latent space models)

3. STATISTICAL METHODS FOR TEXT MINING
3.1 Introduction
3.2 Data preparation and preprocessing (text scanning, stemming, tagging)
3.2 Dimensionality reduction and t-SNE
3.3 Topic models and Latent Dirichlet Allocation
3.4 Classification models
3.5 Sentiment analysis and iSA (integrated Sentiment Analysis

4. CLUSTERING
4.1 Introduction to clustering and recall of basic algorithms
(hierarchical and non-hierarchical
4.2 Model-based clustering
4.3 Gaussian mixtures

5. TOPICS IN STATISTICAL LEARNING AND DATA MINING METHODS
5.1 Generalization of boosting: Adaboost as additive logistic model,
Gradient boosting and XGboosting
5.2 Association rules and Market basket analysis

6. COMPUTATIONAL ISSUES
Planned learning activities and teaching methods: Class lessons. Laboratory sessions using R
Additional notes about suggested reading: The teacher in charge will provide lecture notes, exercises and scientific papers
Textbooks (and optional supplementary readings)

Innovative teaching methods: Teaching and learning strategies
  • Lecturing
  • Laboratory
  • Problem based learning
  • Case study
  • Interactive lecturing
  • Working in group
  • Questioning
  • Action learning
  • Story telling
  • Problem solving

Innovative teaching methods: Software or applications used
  • Latex

Sustainable Development Goals (SDGs)
Quality Education