Research Institute of Linguistics, Hungarian Academy of Sciences

www.nytud.hu

Profile activities related to language and speech technology

 

The primary tasks of the Research Institute include research in Hungarian linguistics, general, theoretical and applied linguistics, Uralic linguistics, and phonetics, as well as the preparation of a comprehensive dictionary of the Hungarian language, and the maintenance of its archival materials. Other research projects investigate various aspects, and different variants of Hungarian as well as minority languages in and outside Hungary, including issues of language policy within the framework of European integration. Further tasks include the assembly of linguistic corpora and databases, and laying the linguistic groundwork for computational software and applications. Besides, the Institute operates a public counseling service on language and linguistics, and prepares expert reports on relevant affairs on demand.

 

In research

 

The department has accumulated significant research experience and has made remarkable achievements in language technology, especially in the development of linguistic resources. It has participated in several successful international projects which were aiming, on the one hand, to adopt certain processes developed for western European languages and now considered part of the standard for the analysis of Hungarian (Multext-East, Gramlex) and, on the other hand, to develop new standards of creating linguistic resources (electronic dictionary databases, CONCEDE). The researchers at the department have acquired significant knowledge about computerized language processing systems and technologies developed or applied in these projects, and have played an active role in adapting these to the needs of Hungarian.

 

It was in the Department of Language Technology that the first version of the Hungarian National Corpus was created. This is the most recent reference corpus of present-day Hungarian, which reflects written use and now consists of 181.3 million words. In November, 2005 language variants form Slovakia, Subcarpathia, Transylvania and Vojvodina were also added to the Corpus, making it truly National. The use of the processes and programs which have already been applied successfully in the course of the processing of the corpus (i.e., those used for tokenizing or disambiguating on the basis of statistical data), and of the technologies used in international projects for building the lexical database (e.g., SGML/XML editors, validating programs and descriptive grammars) has provided an opportunity for the researchers at the department to test and develop important language processing applications for Hungarian.

 

The Department’s work also includes sociolinguistic research on the variability and the changes of the dialects of Hungarian spoken within Hungary and in Budapest, although some investigations extend to the varieties of Hungarian spoken in the neighboring countries and overseas. Other research topics include that of language shift and the study of the language use of bilingual minorities in Hungary.  

 

The Department of Phonetics deals with speech from articulatory, acoustic-phonetic and perceptional points of approach, and investigates the processes which underlie human communication from speech production to utterance comprehension. The Department’s projects match international academic trends, and studies here focus on several important problems of speech research, naturally taking into account the language specific characteristics of Hungarian speech. The Department’s research areas fall partly in the domain of basic research and partly in that of applied research. In some cases it is difficult to separate these two, but sometimes this is happily not strictly necessary, as in the identification of the speaker on the basis of his/her speech. One main topic is the acoustic phonetic and perceptual analysis of spontaneous speech. It is clear, however, that there are differences between the so-called ‘laboratory speech’ and people’s natural verbal communication. The goal here is an ever clearer picture of the segmental and supra-segmental structure of speech; how the pronunciation of speech sounds is modified in continuous speech, what effect neighboring sounds have on each other, what influences the speed of speech, or the features of melody or stress. The analyses, which begin theoretically and continue through the processing of the experimental data, target the dependencies between articulation, acoustic structure, and perception. In the course of research we try to model the articulation operations, the acoustic consequences, and check their linguistic relevance with perception experiments.

 

The following research topics are also investigated at the Department of Phonetics: the role of the vocal cords in the production of speech; the analysis of speech sounds and of the effect they have on each other (database approaches); new approaches to coarticulation (nasality, hiatus filling, voicing); the development of a model predicting the length of Hungarian speech sounds; and the description of the dependencies between  speech melody and sentence type; disharmonic phenomena in spontaneous speech; constant development of the ‘Slip-of-the-tongue Corpus’ (collection, data, labeling, analysis); setting up a Spontaneous Speech Database; phonetically based identification of speaker; experimental studies about the development of speech perception; articulatory, acoustic and perceptional analysis of atypical speech (in childhood and adulthood).

 

The studies of the Department of Theoretical Linguistics include all subsystems of grammar (phonology, morphology, syntax, semantics, and the lexicon) and their interfaces from theoretical, descriptive, and computational linguistic perspectives. Beyond the borders of grammar, they restrict their attention to the study of fields which show the promise of bringing about substantial discoveries regarding the above sub-domains, like pragmatics, cognitive approaches, or language typology. The object of the descriptive work and the source of theoretical innovations is mostly, but not exclusively, the linguistic phenomena of Hungarian.

 

In education

 

The staff of the Department of Theoretical Linguistics created and operates the theoretical linguistics programs at ELTE: they serve as instructors for the regular university major, its doctoral program and the independent courses opened for non-majoring students.

 

Non-market oriented activities

 

▪ Corpus of the Academic Dictionary of Hungarian / Hungarian Historical Corpus

 

▪ Hungarian National Corpus

 

▪ E-Vocabulary – education supplement

 

▪ Hunglish: parallel English-Hungarian Corpus

 

▪ Beszédarchívum: Nyelvjárási hangfelvételek Hegedűs Lajos gyűjtéséből

 

▪ Database for a presentation of sound duration-maps of Hungarian words

 

▪ Consonant clusters in Hungarian speech – Acoustic representation in words

 

▪ Sound clusters in Hungarian speech – Acoustic representation in words

 

Mazsola – Research Tool for Hungarian Verb Argument Structures

▪ Language counseling

 

Major tendered and contracted work since 2005

 

Ongoing projects:

 

▪ European Federation of National Institutions for Language – 2008–

 

▪ EFNILEX– 2008–

 

▪ Common Language Resources and Technology Infrastructure – 2007–2009

 

▪ Cross-Language Access to Catalogues and On-line Libraries – 2007–2009

 

▪ – Examination of National and Ethnic Identity by Means of Computerised Content-analysis of Narratives pertaining to Historic Events – 2006–2008

 

Finished projects:

 

▪ The development of a Hungarian Ontology and its application in information extraction systems – 2005–2007

 

▪ Hungarian to English machine translation system – 2004–2007

 

▪ Hungarian Unified Ontology – 2004–2006

 

Number of staff: 120 people