First cycle
degree courses
Second cycle
degree courses
Single cycle
degree courses
School of Science
DATA SCIENCE
Course unit
HUMAN DATA ANALYTICS
SCP7079397, A.A. 2018/19

Information concerning the students who enrolled in A.Y. 2018/19

Information on the course unit
Degree course Second cycle degree in
DATA SCIENCE
SC2377, Degree course structure A.Y. 2017/18, A.Y. 2018/19
N0
bring this page
with you
Number of ECTS credits allocated 6.0
Type of assessment Mark
Course unit English denomination HUMAN DATA ANALYTICS
Website of the academic structure http://datascience.scienze.unipd.it/2018/laurea_magistrale
Department of reference Department of Mathematics
Mandatory attendance No
Language of instruction English
Branch PADOVA
Single Course unit The Course unit can be attended under the option Single Course unit attendance
Optional Course unit The Course unit can be chosen as Optional Course unit

Lecturers
Teacher in charge MICHELE ROSSI ING-INF/03

Mutuated
Course unit code Course unit name Teacher in charge Degree course code
INP7080694 HUMAN DATA ANALYTICS MICHELE ROSSI IN2371
INP7080694 HUMAN DATA ANALYTICS MICHELE ROSSI IN2371
INP7080694 HUMAN DATA ANALYTICS MICHELE ROSSI IN2371

ECTS: details
Type Scientific-Disciplinary Sector Credits allocated
Core courses ING-INF/03 Telecommunications 6.0

Course unit organization
Period Second semester
Year 1st Year
Teaching method frontal

Type of hours Credits Teaching
hours
Hours of
Individual study
Shifts
Lecture 6.0 48 102.0 No turn

Calendar
Start of activities 25/02/2019
End of activities 14/06/2019

Examination board
Examination board not defined

Syllabus
Prerequisites: Prior knowledge on Calculus and Linear Algebra (vector spaces, singular value decomposition, etc.), Probability Theory (random variables, conditional probability and Bayes formulas, probability distributions), and some basic computer programming (e.g, Matlab and some exposure to Python) is useful. Although not strictly required, basic knowledge of signal processing techniques (e.g., discrete Fourier transforms) is also helpful.

Note that the instructor will review basic concepts from the above fields whenever necessary, providing material and/or pointer to refresh the related theories. So, although such previous knowledge is very helpful to the student, the course is intended to be self-contained.
Target skills and knowledge: 1. Become skilled in the main clustering algorithms for vector data, their pros and cons, their evaluation metrics
2. Become skilled in the main unsupervised learning techniques for unsupervised vector quantization, their performance, advantages and their use within real problems involving biosignals
3. Become skilled in the main modeling techniques for multivariate time series and their use within selected applications in the human data domain
4. Acquire the principles and main algorithms for supervised learning through neural networks (ffed forward and convolutional), their programming in Python, and their use to solve real world problems
5. Get to know the main application domains for the techniques at the previous points 1, 2, 3 and 4 and how they were exploited to tackle relevant problems within the "human data" domain
6. Acuire the sensibility and set of abilities needed to comprehend, select and know how to use the techniques at the previous points 1, 2, 3 and 4
7. Being able to solve a real world problem involving human data analysis and: 1) summarize its solution through a professional written report, 2) present the work done via a conference-style talk, by also showcasing the software written for the project
8. Being able to implement and use the techniques at points 1, 2, 3 and 4 in Python
Examination methods: This is a course on advance and applied machine learning techniques, that are applied to real world problem within the human data domain. Given this, the examination of the student will be carried out through a project which will involve the following phases of work:

1. The instructor will identify a problem to solve, using an open, rich, and freely accessible data set. The problem to tackle will be thus described by the instructor during a specific lesson where he will as well present how to carry out the final exam, which will consist of: 1) delivering a written report and 2) giving a conference-style talk

2. The students will split into groups, with a maximum of two students per group, and will start to work to the assigned project. The choice of the specific technique to use, the data pre-processing algorithm to obtain informative features, etc., will all be identified in full autonomy by the students, as a first step. The instructor will be available to steer the work and follow the students along all the work phases

3. Each group will solve the assigned problem using the selected technique and will: 1) present a final written report, 2) give a conference-style talk describing: the problem, the selected models / techniques, the software written as part of the project development, the obtained results. It is also recommended that the students will showcase their software during the presentation

A final grade will be provided by the instructor upon a close inspection of the written report at point 1) and the assessment of the talk at point 2).
Assessment criteria: The evaluation criteria, upon which the instructor will verify the competences acquired by the students, will be:

1. Completeness of the acquired knowledge
2. Ability to analyze a real world problem through the techniques that are treated in the course
3. Correctness in the technical terminology used, both written and oral
4. Originality and independence in the identification of the chosen solution to tackle the project assignment
5. Competence and coherence in the discussion and interpretation of the obtained results
6. Ability in the use of the software tools for the solution of the assigned problem
7. Quality of oral presentation
8. Quality of written report
Course unit contents: Part I – Introduction (2 hours)
- Intro: course outline, graduation rules, office hours, etc.
- Applications: health, activity-aware services, security and emergency management, authentication systems, analyzing human dynamics

Part II – Vector Quantization (12 hours)
- Vector quantization (VQ):
--- Aims, quality metrics
--- K-means, soft K-means, Expectation Maximization
- Unsupervised VQ algorithms:
--- Self-Organizing Maps (SOM), Gas Neural Networks (GNG)
- Application to quasi-periodic biometric signals (ECG):
-- Signal pre-processing, normalization, segmentation
--- Dictionary learning: concepts, architectures
--- Efficient representation of ECG signals: description of state-of-the-art algorithms
--- Unsupervised dictionary designs for ECG via GNG-based dictionaries
--- Final system design and numerical results

Part II – Sequential data analysis (10 hours)
- Hidden Markov Models (HMM):
--- Maximum Likelihood for the HMM
--- Forward-backward algorithm
--- Sum-product algorithm, Viterbi algorithm
- Applications
--- Authentication: user identification from keyboard keystroke dynamics
--- Speech recognition: audio feature extraction, automatic speech recognition through HMM

Part III - Deep Neural Networks (10 hours)
- Gradient descent and general concepts (supervised learning, overfitting, cost models, etc.)
- Feed Forward Neural Networks: models, training, back-propagation
- Convolutional Neural Networks (CNN): structure, description of constituting blocks, training
- Applications: human activity learning
--- Activities & sensors: definitions, classes of activities
--- Features: sequence features, statistical features, spectral features, activity context features
--- Activity recognition: activity segmentation, sliding windows, unsupervised segmentation, performance measures and results
- User authentication from motion signals: combination of CNN-SVM and sequential estimation theory
- Object / face recognition through CNN

Part IV: Laboratory classes (12 hours)
In the laboratory classes the students will go through a guided tour through the construction of Python code for neural networks, writing all the building blocks related to: the creation of the neural network structure, its training using several gradient descent-based algorithms. The students will be exposed to Python programming, including the use of the Keras and TensorFlow frameworks for the implementation and training of neural network structures. The software composing the different blocks of the presented neural network architectures will be pre-written and checked for correctness, so that the students, after attempting to implement their own version of it, will succeed to combine the various blocks and complete the assigned task. Upon connecting the blocks into the selected neural network architecture, the obtained neural network models will be trained using several gradient descent algorithms, and tested against selected and real datasets. The topics that will be covered are:

- Introduction to Python programming
- Solving a baseline inference problem
- Feed forward neural networks
- Convolutional neural networks
Planned learning activities and teaching methods: The course is structured as follows:

- Frontal lessons (36 hours): the instructor will present the topics in the syllabus mainly using slides. Some mathematical derivations and a few examples will be carried out at the blackboard.

--- One lesson will be dedicated to the introduction of the course project, including the data set to be used, and the required steps to finalize the exam.
--- The course is designed, for each of its technical sections (parts II to IV), as follows: first, the theoretical models and the algorithms underpinning a certain domain are discussed; thus, they are used within selected applications, presenting architectural choices and numerical performance.

- Laboratory classes (12 hours): six lessons are dedicated to laboratory activities, where the techniques presented during the theoretical lessons are implemented and experimentally characterized through the Python programming language.

The slides and the software that will be utilized for the course are all available through the course Website (username and password protected):

http://www.dei.unipd.it/~rossi/courses/HumanData/HDA.html

Access credentials will be given by the instructor during the first introductory lesson.
Additional notes about suggested reading: Technical reports, scientific papers, software and other material will be provided by the instructor whenever necessary. This material will be available through the course Website.

Further useful books are:

To review linear algebra concepts:
- J. R. Magnus and H. Neudecker, "Matrix Differential Calculus with Applications in Statistics and Econometrics," Wiley, 1999.

To study audio models and the use of HMM for automatic speech recognition:
- D. Jurafsky, J. H. Martin, “Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition,” Prentice Hall, 2nd Edition, 2008.
Textbooks (and optional supplementary readings)
  • Bishop, Christopher M., Pattern recognition and machine learningChristopher M. Bishop. New York: Springer, --. Cerca nel catalogo
  • Bengio, Yoshua; Courville, Aaron; Goodfellow, Ian, Deep Learning. Cambridge: MIT Press, 2016. Cerca nel catalogo
  • Watt, Jeremy; Borhani, Reza, Machine learning refinedrisorsa elettronicafoundations, algorithms, and applicationsJeremy Watt, Reza Borhani, Aggelos Katsaggelos. New York: Cambridge University Press, 2016. Cerca nel catalogo

Innovative teaching methods: Teaching and learning strategies
  • Lecturing
  • Laboratory
  • Problem based learning
  • Case study
  • Working in group
  • Problem solving
  • Loading of files and pages (web pages, Moodle, ...)

Innovative teaching methods: Software or applications used
  • Python, Keras, TensorFlow

Sustainable Development Goals (SDGs)
Good Health and Well-Being Industry, Innovation and Infrastructure