Comprehensive List of Researchers "Information Knowledge"
Department of Media Science
- Name
- MATSUBARA, Shigeki
- Group
- Speech and Image Science Group
- Title
- Professor
- Degree
- Dr. of Engineering
- Research Field
- Natural language processing / Information retrieval / Digital library
Current Research
Natural Language Processing Using Very-Large Corpora
OUTLINEWe are promoting research on natural language processing with the aim of creating an intelligence infrastructure in which linguistic information is utilized as knowledge resources. We have so far been developed basic technologies such as parsing, discourse analysis and language generation, as well as applied technologies such as machine translation and spoken dialogue.
TOPICS
(1) Computational Linguistics
As basic technologies of user-friendly and robust natural language systems, we are working to improve the speed of language processing. We have already devised an algorithm for incremental language parsing, and implemented it using probabilistic context-free grammar, tree adjoining grammar, dependency grammar, finite-state automata, etc. This technology was also applied to the development of advanced systems such as simultaneous machine interpretation. Simultaneous interpretation is analogous to highly advanced human language performance, so we are also engaged in linguistic analysis to clarify the underlying structure.
(2) Spoken Language Processing
Spoken monologues, such as a lecture, are valuable knowledge resources, and preparing an environment that can accumulate and reuse this type of speech greatly contributes to the advancement of our information society. In order to reuse speech information effectively, it is important not only to record and transcribe the speech but also to acquire the semantic structuring. We are developing a method to structure monologue speech based on spoken language parsing. Moreover, we are studying speech communication toward smooth human-computer interaction.0 In order to realize a robust conversational system, we have developed corpus-based dialogue processing technologies: spoken language parsing, speech intention understanding, intelligent dialogue control, response generation, etc.
(3) Information Access
Including the WWW, a huge number of digital documents exist in the world, providing an environment for referring to target documents easily. In this situation, we are further developing text processing technologies such as classification, summarization, and paraphrasing, and text access technologies such as information retrieval and information extraction. Moreover, we are making refinements through actual experiments using the digital documents produced at Nagoya University. By unifying such technologies, we will be able to construct a scholarly information infrastructure with a digital library function and an information dispatch environment for returning intellectual products to society.
(4) Very-Large Corpus
It is important to learn about human language behavior to develop language-processing technology. We are collecting many language corpora, such as simultaneous interpretation, in-car speech dialogue, commentary programs, technical papers, and judicial precedents, and then utilizing them as analytical, statistical, and experimental data. To use them as highly valuable language resources, we aim to improve both scalability and quality.
FUTURE WORK
Against the background of the above technologies, we will develop systems for intelligent distribution of speech and documents. Specifically, we will advance large-scale language computing, speech mining, campus web as a digital library and language resource sharing, while making full use of field data. We will strive toward a ubiquitous knowledge society through construction of a linguistic information infrastructure that supports human intellectual activity effectively.
Figure : Construction of language resources
Career
- Shigeki Matsubara received the Dr. of Engineering degree from Nagoya University in 1998.
- He was a Research Fellow of the JSPS from 1996 to 1998 and Research Assoc. from 1998 to 2002.
- Since 2002, he has been an Assoc. Prof. of the Information Technology Center, Nagoya University.
Academic Societies
- IEEE
- ACM
- ACL
- ISCA
- IEE
- IEICE
- IPS
Publications
- Simultaneous English-Japanese Spoken Language Translation based on Incremental Dependency Parsing and Transfer, Proc. of COLING/ACL-2006, pp. 683-690 (2006).
- Robust Dependency Parsing of Spontaneous Japanese Spoken Language, IEICE Trans. Inf. & Syst., E88-D(3), pp. 545-552 (2005).
- Construction and Analysis of a Multi-layered In-car Spoken Dialogue Corpus, "DSP for In-Vehicle and Mobile Systems," pp. 1-17, Springer (2004).