First cycle
degree courses
Second cycle
degree courses
Single cycle
degree courses
School of Science
STATISTICS FOR TECHNOLOGY AND SCIENCE
Course unit
DATABASES 2
SCP4063825, A.A. 2019/20

Information concerning the students who enrolled in A.Y. 2017/18

Information on the course unit
Degree course First cycle degree in
STATISTICS FOR TECHNOLOGY AND SCIENCE
SC2094, Degree course structure A.Y. 2014/15, A.Y. 2019/20
N0
bring this page
with you
Number of ECTS credits allocated 9.0
Type of assessment Mark
Course unit English denomination DATABASES 2
Website of the academic structure http://www.stat.unipd.it/studiare/ammissione-lauree-triennali
Department of reference Department of Statistical Sciences
E-Learning website https://elearning.unipd.it/stat/course/view.php?idnumber=2019-SC2094-000ZZ-2017-SCP4063825-N0
Mandatory attendance No
Language of instruction Italian
Branch PADOVA
Single Course unit The Course unit can be attended under the option Single Course unit attendance
Optional Course unit The Course unit can be chosen as Optional Course unit

Lecturers
Teacher in charge MASSIMO MELUCCI ING-INF/05

ECTS: details
Type Scientific-Disciplinary Sector Credits allocated
Educational activities in elective or integrative disciplines ING-INF/05 Data Processing Systems 9.0

Course unit organization
Period Second semester
Year 3rd Year
Teaching method frontal

Type of hours Credits Teaching
hours
Hours of
Individual study
Shifts
Laboratory 2.0 12 38.0 No turn
Lecture 7.0 52 123.0 No turn

Calendar
Start of activities 02/03/2020
End of activities 12/06/2020
Show course schedule 2019/20 Reg.2014 course timetable

Examination board
Examination board not defined

Syllabus
Prerequisites: * Computer Systems 1
* Data Structures and Programming
Target skills and knowledge: We intend to educate a professionals able to describe, collect, organize, manage and analyze large amounts of heterogeneous data through rigorous IT methods. To this end, we intend to promote the knowledge of the main methods and tools for the management, extraction and analysis of large databases.
Examination methods: The exam consists of a written report on a mini-project supplemented by an oral exam about the contents of the course. The mini-project focuses on methods of representation, indexing, retrieval and ordering of unstructured data processed during teaching.
The mini-project is a project of an IR service also known as "search engine". It is chosen and led by an autonomous group of one, two or three students. The aim of the project is to put into practice the contents of the discipline illustrated during the lessons. The group must be able to explain the problems, methodologies, tools and results obtained with its own mini-project.
An experimental collection will be distributed to all groups; only and this whole collection will have to be used.
The content of the mini-project must include:
1. one or more indexing programs of the corpus of documents;
2. one or more document retrieval programs in response to each question in the corpus of queries; among the retrieval programs there are any retrieval functions purposely made by the group as an alternative to those provided by the IR library for python;
3. a baseline run and at least one comparison run;
4. files containing the output of trec eval for each run;
5. precise and complete documentation to be able to conduct experiments by means of command lines and terminals;
6. a graphical interface based on a browser, as shown during the laboratory lessons, in order to interrogate the collection in an interactive way.
There are some requirements:
• the application software must be developed in Python; other tools are allowed, but only "outline", such as R for statistical analysis and graphics; programs and data must be delivered in compressed archives or folders named with the group name;
• the software must be written "clean" and must be commented in English or Italian; the names of objects and functions must be self-explanatory; the file names of the programs and data must also be self-explanatory;
• the final application must be accompanied by the file named README.txt in which the files and modes of use are briefly described.
The mini-project must be described in a report whose versions are to be delivered by moodle within the following deadlines; deviations from the lineup must be agreed with the teacher:
• mid-May: first draft of the report also incomplete; the teacher will give advice on how to proceed;
• end of June: final report; the teacher will proceed with the evaluation of the mini-project;
• the report consists of no more than 12 pages, in Italian or English, in PDF format and with the LNCS style distributed by Springer and loaded on moodle for both LATEX and Microsoft Word.
Assessment criteria: First of all the completeness and accuracy of the report will be evaluated. The ability to use computers and autonomously produce results will be assessed. As for the oral examination, the general knowledge of the course contents, both theoretical and practical, will be evaluated. The score of a test remains valid until the last appeal scheduled for the academic year in which the test was held.
Course unit contents: * INTRODUCTION AND MOTIVATIONS: technology evolution, Information Retrieval, World Wide Web, search engines
* REPRESENTATION AND INDEXING: lexical analysis, stoplist, stemming, index statistics, terms and positions
* RETRIEVAL AND RANKING: logical operators, level of coordination
* PRINCIPLES AND MODELS: vector space model, probabilistic model
* MEASUREMENT AND EVALUATION: test collection, methods and measures
* EXPANSION AND FEEDBACK: query expansion, feedback, latent semantic analysis
* LEARNING AND MINING: classification, clustering, word embedding, learning to rank, optimization
* LABORATORY AND PROGRAMMING: python, an Information Retrieval library, query management, indexing of text documents, ranking of documents, retrieval functions, measurement, presentation of results (snippet), management of large amounts of data, analyzers, WWW interface and search engine
Planned learning activities and teaching methods: The main learning activity takes place in classroom as lectures held in Italian with the aid of the blackboard and the video projector.
Although the frequency of the lessons is optional, it is advisable to attend the lessons anyway, especially those in the laboratory.
Another important activities is individual study and in particular homeworks.
Textbooks (and optional supplementary readings)
  • Melucci, Massimo, Information Retrieval. --: Franco Angeli, 2013. Cerca nel catalogo
  • Leskovec, Jure; Rajaraman, Anand; Ullman, Jeffrey D., Mining of massive datasets. Cambridge, UK: Cambridge University, 2014. http://www.mmds.org/ Cerca nel catalogo
  • Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze, Introduction to information retrieval. New York: Cambridge University Press, 2008. Cerca nel catalogo