Course: Information retrieval and natural language processing
ECTS credits: 3
Language: Croatian
Duration: 1 semester
Status: Compulsory, elective
Method of teaching: 1 hour of lectures, 1 hour of exercises
Prerequisite: No
Assessment: Complete set of weekly writing tasks, final exam
Course description:
In this course we focus on a series of natural language processing tasks for text retrieval. The course starts with introducing basic concepts like tokenization, indexing and weighting, and NLP problems such as morphological normalization (stemming and lemmatization) and document similarity assessment. It introduces multiple information retrieval paradigms such as the vector space model and probabilistic information retrieval. The course ends with a practical task of applying the supervised machine learning paradigm on document classification.
Course objectives:
Students master the basic IR-related NLP tasks such as tokenization, construction of an inverted index, TF-IDF weighting, document vectorization, cosine vector similarity, stemming and lemmatization. They get acquainted with two information retrieval paradigms: the vector space model and the probabilistic information retrieval. Finally, they master the basics of supervised machine learning and its evaluation on a document classification task.
Reading list:
C. D. Manning, P. Raghavan, H. Schütze (2008.), Introduction to Information Retrieval, Cambridge University Press (selected chapters)