Subject: Natural language processing
Course: Data-driven language modelling
ECTS credits: 4
Language: Croatian, English
Duration: 1 semester
Status: non-obligatory
Method of teaching: 1 h lectures + 2 h exercises per week
Prerequisite: none
Assessment: written report, oral exam

Course description: Data-driven natural language processing. The role of language resources in language technology. Goals of language modeling. Deriving language models from language resources across the layers of linguistic processing. N-gram models. Algorithms for building and using language models. Statistical approaches to data sparsity. Hybrid language processing systems. Morphosyntactic tagging. Syntactic and semantic parsing. Information extraction.

Course objectives: Acquiring basic theoretical understaning and practical skills in language modeling for natural language processing using data-driven approaches.

Quality check and success of the course: Internal evaluation by teachers and students, external evaluation as defined by the University.
Reading list:
Manning, Schütze: Foundations of statistical natural language processing, MIT Press, 1999.
Jurafsky, Martin: Spech and language processing, Prentice-Hall, 2008.
Manning, Raghavan, Schütze: Introduction to information retrieval. CUP, 2008.