Home

Subject: Text and language processing
Course: Text and language processing
ECTS credits: 6
Language: Croatian
Duration: 2 semesters
Status: obligatory for DHI study, elective for all other studies
Method of teaching: 2 lecture hours and 2 hours of practical work every week
Prerequisites: none
Assessment: the course ends with a written exam and a practical assignment which should demonstrate the student's capability of advanced usage and programming skills within a text processing tool

Course description:
The course gives insight into the basic terms in the field of text processing in a broader sense. Terms such as text input, encoding, storing, editing and text printing are explained. The students will become familiar with the principles of optical character recognition and text encoding (code pages, Unicode standard, Unicode transformation formats). Furthermore, they will be introduced with different text formats like TXT, HTML, XML, RTF, MS Word, PostScript and PDF. They will also be introduced to the basic terms in typography like glyphs, fonts and font types. In the practical part of the course the student is being qualified for advanced usage of a text processing tool such as MS Word, including programming in Visual Basic for Applications. Finally, the student will become familiar with basic principles of language tools for text processing. Each course unit ends with a written assessment.

Course objectives:
Students should gain the understanding of advanced principles of formal text processing and should develop skills in using standard text processing tools. Furthermore, they should develop the skills in programming in Visual Basic for Applications and to be able to program on their own.

Quality check and success of the course: Quality check and success of the course will be done by combining internal and external evaluation. Internal evaluation will be done by teachers and students using survey method at the end of semester. The external evaluation will be done by colleagues attending the course, by monitoring and assessment of the course.

Reading list:
1. Microsoft Typography. http://www.microsoft.com/typography/ (12.01.2005.)
2. Milijaš, Ljiljana. PC škola - Office XP. Varaždin: Pro-mil, 2002.
3. Willett, Edward C.; Cummings, Steve. Office XP Biblija. Mikro knjiga: Beograd, 2002.
4. Jurafsky, Daniel; Martin, James H. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition. New Jersey: Prentice Hall, 2000. (odabrana poglavlja)
5. Microsoft Visual Basic for Applications Home Page. http://msdn.microsoft.com/vba/ (12.01.2005.)

Additional reading list:
1. Text Encoding Initiative. http://www.tei-c.org (12.01.2005.)
2. Unicode Home Page. http://www.unicode.org (12.01.2005.)
3. Sperberg-McQueen, C. M.; Burnard, Lou; Bauman, Syd. The Tei Consortium: Guidelines for Electronic Text Encoding and Interchange. Oxford: Humanities Computing Unit, University of Oxford, 2002.
4. Microsoft Corporation. Microsoft Office XP Developer's Guide. Portland: Microsoft Press, 2001.
5. Tadić, Marko. Jezične tehnologije i hrvatski jezik. Zagreb: Ex libris, 2003.
6. Willett, Edward C.; Cummings, Steve. Office XP Biblija. Mikro knjiga: Beograd, 2002.