Odsjek za informacijske i komunikacijske znanosti
Filozofski fakultet
Ivana Lučića 3, 10000 Zagreb

Radionica 14
NooJ – Linguistic Engineering Development Environment

Voditelj radionice:
Max Silberztein

Odsjek za informacijske znanosti
Odsjek za lingvistiku

petak, 13. ožujka 2009.
Filozofski fakultet u Zagrebu

Broj polaznika je ograničen na 30.

Radionica će se održati na engleskom jeziku.

NooJ is a freeware, linguistic engineering development environment used to formalize various types of textual phenomena (orthography, lexical and productive morphology, local, structural and transformational syntax) using a large gamut of computational devices (from Finite-State Automata to Augmented Recursive Transition Networks). NooJ includes tools to construct, test, debug, maintain and accumulate large sets linguistic resources, and can apply them to large texts.

Modules for a dozen languages are already available for free download: Arabic, Armenian, Bulgarian, Catalan, Chinese, English, French, Hebrew, Hungarian, Italian, Polish, Portuguese and Spanish. A dozen other modules are under construction among them Croatian, as well.

NooJ™'s most exclusive characteristics are:
- NooJ can process texts and corpora in over 100+ file formats, including HTML, PDF, MS-OFFICE, all variants of UNICODE, ASCII, etc. It can import information from, and export its annotations back to XML documents.
- NooJ's linguistic engine uses an annotation system that allows all levels of grammars to be applied to texts without modifying them; this allows linguists to formalize various phenomena independently, and to apply the corresponding grammars in cascade. For instance, by combining inflection, derivation and syntactic data, NooJ can perform Harris-type transformations.

NooJ is used as a linguistic engineering development platform, a corpus processor, an information extraction system, a terminological extractor, an Machine Translation development tool as well as to teach linguistics and computational linguistics.

To learn more about NooJ, download the software, linguistic resources, manual, tutorials and reference papers:

Max Silberztein has constructed the first package of Finite State tools for NLP together with the French DELAC-DELACF dictionaries for compound words for his PhD research from 1986 to 1989 at the LADL (University of Paris 7-CNRS) under the supervision of prof. Maurice Gross. Since 2002. He has been working on NooJ. He teaches at the Universite de Franche-Comte in Paris. He teaches:
1. Linguistic engineering at the University Paris 4 (Sorbonne nouvelle, Paris) 
2. The formalization of natural languages at the Institut National des Langues et Civilisations Orientales (INALCO, Paris) 
3. Automatic Lexical Analysis at the Univ. de Franche-Comte (UFC, Besancon) 
4. Text and Discourse Analysis at the Univ. de Franche-Comte (UFC, Besancon)

Program radionice

9:15 - 9:45 Registracija
9:45 - 10:00
Uvod u radionicu
10:00 - 10:45
Managing texts and corpora, queries, concordances, statistical reports
10:45 - 11:30
Describing a natural language, Linguistic Atomic Units, Lexicons, Inflectional and Derivational Morphology
11:30 - 11:45
Pauza za kavu/čaj
11:45 - 12:30
The graph editor, grammar structures, contracts and debugger
12:30 - 13:30
Pauza za ručak
13:30 - 14:30
Transducers and Annotations; The text annotation structure, XML
14:30 - 14:45
Pauza za kavu/čaj
14:45 - 15:45
Syntactic Parsing, Transformations, Semantic Analysis, Automatic Translation
15:45 - 16:00 Zaključak radionice i diskusija
16:00 Dodjela certifikata

Na kraju je polaznicima dodijeljena potvrda o pohađanju radionice.