Professor, National Taiwan University, Taiwan
Voice-Driven Multimedia Systems in the Network Era
Keynote speech abstract:
Multimedia content over the Internet is very attractive, while the spoken part of such content very often tells the core information. It is therefore possible to retrieve, browse or operate multimedia content primarily based on the spoken part. Systems with such functionalities are referred to as Voice-driven Multimedia Systems here. Note that the multimedia content may include only audio/video signals but not necessarily the text. The audio/video signals are not easily summarized on-screen, and not easily scanned and selected by the user. Some user-system interaction mechanisms for such scenario is thus needed, for which more speech processing technologies may be helpful. In this talk, some research work along this direction and some experiences in constructing such systems will be reported, while a few such systems will also be demonstrated.
Professor, director of the Department of Corpus
Studies and the Center for Corpus Development,
National Institute for Japanese Language and Linguistics (NINJAL)
Linguistics-Oriented Language Resource Development at
the National Institute for Japanese Language and Linguistics
Keynote speech abstract:
The aim of this talk consists in the introduction to the language-resource-related activities of the National Institute for the Japanese Language and Linguistics (NINJAL), into which the most part of the former National Institute for Japanese Language (NIJLA) was dissolved and absorbed in October 2009. Since the last half of the 1990s, NIJLA played a central role in the development of Japanese language resources by constructing corpora like Corpus of Spontaneous Japanese (CSJ) and Taiyo Corpus. In 2006, the language resource group of NIJLA started a Japanese corpus compilation initiative called KOTONOHA, and set about the construction of 100 million words Balanced Corpus of Contemporary Written Japanese (BCCWJ) under the support of a 5 year long governmental grant-in-aid for scientific research priority-area program Japanese Corpus to the present author. This activity was inherited by newly established NINJAL Center for Corpus Development. Now that the construction of the BCCWJ was completed successfully in the end of March 2011, the NINJAL center set about two new projects of exploratory nature: historical Japanese corpus project and 10-billion-word ultra-large-scale contemporary written Japanese corpus project. In addition to the presentation of the NIJLA-NINJAL activities, language resource development in Japanese institutions other than NINJAL will be introduced briefly.
Boyd Michailovsky, Alexis Michaud and Severine Guillaume
A simple architecture for the fine-grained documentation
of endangered languages: the LACITO multimedia archive
The LACITO multimedia archive provides free access to documents of connected, spontaneous speech, mostly in "rare" or endangered languages, recorded in their cultural context and transcribed in consultation with native speakers. Its goal is to contribute to the documentation and study of a precious human heritage: the world's languages. It has a special strength in languages of Asia and the Pacific. The LACITO archive program was built with little personnel and less funding. It has been devised, developed and maintained over two decades by two researchers assisted by one single engineer. Its simple architecture is based on current standards: XML encoding of annotations, Unicode character coding, OLAC standards for metadata. The talk with emphasize the technical simplicity of the tools developed at LACITO, which make them suitable for the creation of similar databases at other institutions. (For instance, tools from the LACITO archive were successfully adapted in the creation of the Formosan Languages archive, http://formosan.sinica.edu.tw/ )
Human Speech conveys speaker’s emotional state along with linguistic intelligence. Meaning of a speech sample changes when it is uttered with different emotions. The present paper gives a description of different types of studies conducted to analyze , perceive and recognize commonly occurring emotions in Hindi speech. These have been classified as anger, happiness, fear, sadness, surprise in addition to neutral.
Intonation patterns changes due to changes in sentence types depending upon whether they are affirmative, negative, interrogative, imperative, doubtful, desiderative, conditional or exclamatory. A relationship among the measured acoustic parameters and the patterns have been established to classify them. Experiments have been conducted to study changes in the acoustic parameters (F0, F1, F2, F3) and also the MFCC parameters of the vowels segmented from the spoken sentences due to changes in emotions. These parameters were used for machine recognition of emotions using Discriminant and the Neural Net classifiers. Similarly in another experiment isolated words (Hindi digits) spoken in different emotional conditions were used to find their recognition performance using Neural Net classifier based on both the MFCC and prosodic parameters. Finally, experiments were conducted for large no. of emotionally spoken short sentences and their recognition performance was evaluated using the above mentioned techniques. Human perception experiments have been conducted at all levels of experiments and compared the results with machine recognition performance. In most cases it has been found that machine recognition was found to be better compared to human performance.
Co-workers: Jyoti Garg, Sunita Arora,Shweta Sinha