Subject: Information technologies and applications
Course: Language databases
ECTS credits: 3
Language: Croatian
Duration: 1 semester
Status: obligatory for DHI study, elective for all other studies
Method of teaching: 1 lecture hour and 1 hour of practical work every week
Prerequisites: Databases
Assessment: student has to develop a language database that consist of data from a printed language resource and supply the required documentation

Course description:
The students are introduced with the basic concepts in the field of computational processing of language resources. Data structures like the relational data model and the XML markup language which are used in modelling of the language resources are being discussed. Also, basic terms such as corpora, dictionary and lexical databases, lexical and semantic relations and semantic networks are being introduced. Basic language tools like spelling checkers, morphological generators and analyzers are also being introduced. Finally, the process of digitizing printed language resources and the automated segmentation and structuring of the given data is presented. Each course unit ends with a written assessment.

Course objectives:
Students should get the basic knowledge on the principles and forms of constructing and using different sorts of language databases. Students are expected to learn the techniques of designing language databases from text digitization to the structuring of a language database.

Quality check and success of the course: Quality check and success of the course will be done by combining internal and external evaluation. Internal evaluation will be done by teachers and students using survey method at the end of semester. The external evaluation will be done by colleagues attending the course, by monitoring and assessment of the course.

Reading list:
1. Fellbaum, Christiane. WordNet: An Electronic Lexical Database (Language, Speech, and Communication). Cambridge: Bradford Books, 1998.
2. Modeli znanja i obrada prirodnog jezika / uredio Miroslav Tuđman. Zagreb: Zavod za informacijske studije, 2003.
3. Natural Language Processing, Computational Linguistics and Speech Recognition. New Jersey: Prentice Hall, 2000.
4. Tadić, Marko. Jezične tehnologije i hrvatski jezik. Zagreb: Ex libris, 2003.

Additional reading list:
1. Briscoe, Ted; Boguraev, Bran. Computational lexicography for natural language processing. New York: Longman Publishing Group, 1989.
2. Jurafsky, Daniel; Martin, James H. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition. New Jersey: Prentice Hall, 2000.
3. Text Encoding Initiative. http://www.tei-c.org (12.01.2005.)
4. Feddema, Helen. Microsoft Access version 2002 inside out. Redmond: Microsoft Press, 2002.