First cycle
degree courses
Second cycle
degree courses
Single cycle
degree courses
School of Science
Course unit
SC01111799, A.A. 2019/20

Information concerning the students who enrolled in A.Y. 2019/20

Information on the course unit
Degree course Second cycle degree in
SC1176, Degree course structure A.Y. 2014/15, A.Y. 2019/20
bring this page
with you
Number of ECTS credits allocated 6.0
Type of assessment Mark
Course unit English denomination DATA MINING
Website of the academic structure
Department of reference Department of Mathematics
Mandatory attendance No
Language of instruction English
Single Course unit The Course unit can be attended under the option Single Course unit attendance
Optional Course unit The Course unit can be chosen as Optional Course unit

Teacher in charge ANNAMARIA GUOLO SECS-S/01

ECTS: details
Type Scientific-Disciplinary Sector Credits allocated
Educational activities in elective or integrative disciplines SECS-S/01 Statistics 6.0

Course unit organization
Period Second semester
Year 1st Year
Teaching method frontal

Type of hours Credits Teaching
Hours of
Individual study
Laboratory 2.0 16 34.0 No turn
Lecture 4.0 34 66.0 No turn

Start of activities 02/03/2020
End of activities 12/06/2020
Show course schedule 2019/20 Reg.2014 course timetable

Examination board
Board From To Members of the board
8 a.a. 2018/2019 01/10/2018 28/02/2020 CATTELAN MANUELA (Presidente)
SCARPA BRUNO (Membro Effettivo)
CRAFA SILVIA (Supplente)

Prerequisites: Basic knowledge of Computer science, Databases. Basic knowledge of Probability and Statistics is useful although not essential.
Target skills and knowledge: The student is expected to obtain the following target skills and knowledge:

- understanding the principles of data mining to handle datasets, also with high-dimensional features;
- construction of an appropriate model for data analysis and prediction according to the specific data features;
- data analysis with the R software, including graphical and modeling analyses;
- critical interpretation and evaluation of the results;
- capability of communicating the performed data analysis and the related conclusions.
Examination methods: The examination is composed by two parts.
1) The first part is a written examination (1 hour) about linear regression models and it includes questions with multiple choices and exercises. The exercises regard the analysis of a real dataset, including numerical evaluations, interpretation of results from R and comments on graphical outputs. The first part of the examination will take place after the middle of the course.
During the written examination the use of a pocket calculator is allowed.

2) The second part is a practical examination carried out in laboratory (2 hours and 30 minutes) and it is constituted by the analysis of a real data set using R. The student is required to write a report describing the data analysis performed, including the most relevant graphical analyses and model estimation and an appropriate interpretation of the results.
During the practical examination students are allowed to bring with them and consult a copy of the textbook, the slides of the course, the laboratory notes.

The final evaluation will be the mean of the results from the two parts.

Students who do not take the first assessment in the middle of the course will have a written examination immediately after the practical final examination.
Assessment criteria: The examination has the aim of
1) evaluating the knowledge acquired by the students about the construction and selection of a linear regression model and the critical interpretation of the graphical and numerical results;
2) evaluating the knowledge acquired by the students about the application of appropriate modeling techniques for the analysis of different real datasets and prediction, especially in case of high-dimensional data;
3) evaluating the capability of using the functionalities of the R software to carry out a complete real data analysis;
4) evaluating the capability of interpretation and communication of the results of a real data analysis.
Course unit contents: - Introduction to the course: Data analysis as a tool for decision support. Motivations and context for data mining.
- Simple linear and multiple linear regression model: estimation, confidence intervals, hypothesis test, p-value, prediction, model selection, residual analysis, spurious correlation, multicollinearity
- Classification methods: logistic regression, linear discriminant analysis and extensions
- Model selection criteria: cross-validation, adjusted R2, AIC, BIC, automatic selection
- Regularisation: ridge regression and lasso
- Principal components regression
- Semiparametric regression: regression splines, smoothing splines, generalized additive models
Planned learning activities and teaching methods: The course consists of
1) lectures, where the contents of the course will be illustrated through slides on theoretical aspects and analyses of real datasets, with the aim of promoting discussion and critical reflection;
2) laboratory classes, to introduce the students to the analysis of real datasets using software R.
Additional notes about suggested reading: Textbook. Additional material will be made available through the Moodle platform and it will include
1) slides of the course;
2) notes about data analysis in R;
3) papers and notes from statistical and data mining literature.
Textbooks (and optional supplementary readings)
  • Gareth, J., Witten, D., Hastie, T., Tibshirani, R., An Introduction to Statistical Learning with Applications in R. --: Springer, 2013. Cerca nel catalogo