
Course unit
HUMAN DATA ANALYTICS
INP9087860, A.A. 2019/20
Information concerning the students who enrolled in A.Y. 2019/20
ECTS: details
Type 
ScientificDisciplinary Sector 
Credits allocated 
Core courses 
INGINF/03 
Telecommunications 
6.0 
Course unit organization
Period 
Second semester 
Year 
1st Year 
Teaching method 
frontal 
Type of hours 
Credits 
Teaching hours 
Hours of Individual study 
Shifts 
Lecture 
6.0 
48 
102.0 
No turn 
Examination board
Board 
From 
To 
Members of the board 
3 a.a 2019/2020 
01/10/2019 
28/02/2021 
ROSSI
MICHELE
(Presidente)
ERSEGHE
TOMASO
(Membro Effettivo)
ZANUTTIGH
PIETRO
(Membro Effettivo)
BADIA
LEONARDO
(Supplente)
CALVAGNO
GIANCARLO
(Supplente)
CORVAJA
ROBERTO
(Supplente)
LAURENTI
NICOLA
(Supplente)
MILANI
SIMONE
(Supplente)
TOMASIN
STEFANO
(Supplente)
VANGELISTA
LORENZO
(Supplente)
ZANELLA
ANDREA
(Supplente)
ZORZI
MICHELE
(Supplente)

2 a.a 2018/2019 
01/10/2018 
31/12/2019 
ROSSI
MICHELE
(Presidente)
ZANUTTIGH
PIETRO
(Membro Effettivo)
BADIA
LEONARDO
(Supplente)
LAURENTI
NICOLA
(Supplente)
TOMASIN
STEFANO
(Supplente)
ZANELLA
ANDREA
(Supplente)

Prerequisites:

Prior knowledge on Calculus and Linear Algebra (vector spaces, singular value decomposition, etc.), Probability Theory (random variables, conditional probability and Bayes formulas, probability distributions), and some basic computer programming (e.g, Matlab and some exposure to Python) is useful. Although not strictly required, basic knowledge of signal processing techniques (e.g., discrete Fourier transforms) is also helpful.
Note that the instructor will review basic concepts from the above fields whenever necessary, providing material and/or pointer to refresh the related theories. So, although such previous knowledge is very helpful to the student, the course is intended to be selfcontained.
Although non mandatory, prospective students will benefit from prior attendance of the course "Machine Learning" from the Master Degree in ICT for Internet and Multimedia, course code: INP6075419. 
Target skills and knowledge:

1. Become skilled in the main clustering algorithms for vector data, their pros and cons, their evaluation metrics
2. Become skilled in the main unsupervised learning techniques for unsupervised vector quantization, their performance, advantages and their use within real problems involving biosignals
3. Become skilled in the main modeling techniques for multivariate time series and their use within selected applications in the human data domain
4. Acquire the principles and main algorithms for supervised learning through neural networks (ffed forward and convolutional), their programming in Python, and their use to solve real world problems
5. Get to know the main application domains for the techniques at the previous points 1, 2, 3 and 4 and how they were exploited to tackle relevant problems within the "human data" domain
6. Acuire the sensibility and set of abilities needed to comprehend, select and know how to use the techniques at the previous points 1, 2, 3 and 4
7. Being able to solve a real world problem involving human data analysis and: 1) summarize its solution through a professional written report, 2) present the work done via a conferencestyle talk, by also showcasing the software written for the project
8. Being able to implement and use the techniques at points 1, 2, 3 and 4 in Python 
Examination methods:

This is a course on advance and applied machine learning techniques, that are applied to real world problem within the human data domain. Given this, the examination of the student will be carried out through a project which will involve the following phases of work:
1. The instructor will identify a problem to solve, using an open, rich, and freely accessible data set. The problem to tackle will be thus described by the instructor during a specific lesson where he will as well present how to carry out the final exam, which will consist of: 1) delivering a written report and 2) giving a conferencestyle talk
2. The students will split into groups, with a maximum of two students per group, and will start to work to the assigned project. The choice of the specific technique to use, the data preprocessing algorithm to obtain informative features, etc., will all be identified in full autonomy by the students, as a first step. The instructor will be available to steer the work and follow the students along all the work phases
3. Each group will solve the assigned problem using the selected technique and will: 1) present a final written report, 2) give a conferencestyle talk describing: the problem, the selected models / techniques, the software written as part of the project development, the obtained results. It is also recommended that the students will showcase their software during the presentation
A final grade will be provided by the instructor upon a close inspection of the written report at point 1) and the assessment of the talk at point 2). 
Assessment criteria:

The evaluation criteria, upon which the instructor will verify the competences acquired by the students, will be:
1. Completeness of the acquired knowledge
2. Ability to analyze a real world problem through the techniques that are treated in the course
3. Correctness in the technical terminology used, both written and oral
4. Originality and independence in the identification of the chosen solution to tackle the project assignment
5. Competence and coherence in the discussion and interpretation of the obtained results
6. Ability in the use of the software tools for the solution of the assigned problem
7. Quality of oral presentation
8. Quality of written report 
Course unit contents:

Part I – Introduction (2 hours)
 Intro: course outline, graduation rules, office hours, etc.
 Applications: health, activityaware services, security and emergency management, authentication systems, analyzing human dynamics
Part II – Vector Quantization (12 hours)
 Vector quantization (VQ):
 Aims, quality metrics
 Kmeans, soft Kmeans, Expectation Maximization
 Unsupervised VQ algorithms:
 SelfOrganizing Maps (SOM), Gas Neural Networks (GNG)
 Application to quasiperiodic biometric signals (ECG):
 Signal preprocessing, normalization, segmentation
 Dictionary learning: concepts, architectures
 Efficient representation of ECG signals: description of stateoftheart algorithms
 Unsupervised dictionary designs for ECG via GNGbased dictionaries
 Final system design and numerical results
Part II – Sequential data analysis (10 hours)
 Hidden Markov Models (HMM):
 Maximum Likelihood for the HMM
 Forwardbackward algorithm
 Sumproduct algorithm, Viterbi algorithm
 Applications
 Authentication: user identification from keyboard keystroke dynamics
 Speech recognition: audio feature extraction, automatic speech recognition through HMM
Part III  Deep Neural Networks (10 hours)
 Gradient descent and general concepts (supervised learning, overfitting, cost models, etc.)
 Feed Forward Neural Networks: models, training, backpropagation
 Convolutional Neural Networks (CNN): structure, description of constituting blocks, training
 Applications: human activity learning
 Activities & sensors: definitions, classes of activities
 Features: sequence features, statistical features, spectral features, activity context features
 Activity recognition: activity segmentation, sliding windows, unsupervised segmentation, performance measures and results
 User authentication from motion signals: combination of CNNSVM and sequential estimation theory
 Object / face recognition through CNN
Part IV: Laboratory classes (12 hours)
In the laboratory classes the students will go through a guided tour through the construction of Python code for neural networks, writing all the building blocks related to: the creation of the neural network structure, its training using several gradient descentbased algorithms. The students will be exposed to Python programming, including the use of the Keras and TensorFlow frameworks for the implementation and training of neural network structures. The software composing the different blocks of the presented neural network architectures will be prewritten and checked for correctness, so that the students, after attempting to implement their own version of it, will succeed to combine the various blocks and complete the assigned task. Upon connecting the blocks into the selected neural network architecture, the obtained neural network models will be trained using several gradient descent algorithms, and tested against selected and real datasets. The topics that will be covered are:
 Introduction to Python programming
 Solving a baseline inference problem
 Feed forward neural networks
 Convolutional neural networks 
Planned learning activities and teaching methods:

The course is structured as follows:
 Frontal lessons (36 hours): the instructor will present the topics in the syllabus mainly using slides. Some mathematical derivations and a few examples will be carried out at the blackboard.
 One lesson will be dedicated to the introduction of the course project, including the data set to be used, and the required steps to finalize the exam.
 The course is designed, for each of its technical sections (parts II to IV), as follows: first, the theoretical models and the algorithms underpinning a certain domain are discussed; thus, they are used within selected applications, presenting architectural choices and numerical performance.
 Laboratory classes (12 hours): six lessons are dedicated to laboratory activities, where the techniques presented during the theoretical lessons are implemented and experimentally characterized through the Python programming language.
The slides and the software that will be utilized for the course are all available through the course Website (username and password protected):
http://www.dei.unipd.it/~rossi/courses/HumanData/HDA.html
Access credentials will be given by the instructor during the first introductory lesson. 
Additional notes about suggested reading:

Technical reports, scientific papers, software and other material will be provided by the instructor whenever necessary. This material will be available through the course Website.
Further useful books are:
To review linear algebra concepts:
 J. R. Magnus and H. Neudecker, "Matrix Differential Calculus with Applications in Statistics and Econometrics," Wiley, 1999.
To study audio models and the use of HMM for automatic speech recognition:
 D. Jurafsky, J. H. Martin, “Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition,” Prentice Hall, 2nd Edition, 2008. 
Textbooks (and optional supplementary readings) 

Bishop, Christopher M., Pattern recognition and machine learningChristopher M. Bishop. New York: Springer, 2006.

Bengio, Yoshua; Courville, Aaron; Goodfellow, Ian, Deep Learning. Cambridge: MIT Press, 2016.

Watt, Jeremy; Borhani, Reza; K. Katsaggelos, Aggelos, Machine learning refined. New York: Cambridge University Press, 2016.

Innovative teaching methods: Teaching and learning strategies
 Lecturing
 Laboratory
 Problem based learning
 Case study
 Working in group
 Questioning
 Problem solving
Innovative teaching methods: Software or applications used
 Latex
 Matlab
 Python, TensorFlow
Sustainable Development Goals (SDGs)

