Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Corpus Linguistics and Linguistic Theory

Founded by Gries, Stefan Th. / Stefanowitsch, Anatol

Ed. by Wulff, Stefanie

2 Issues per year

IMPACT FACTOR 2017: 1.200
5-year IMPACT FACTOR: 1.386

CiteScore 2017: 0.80

SCImago Journal Rank (SJR) 2017: 0.288
Source Normalized Impact per Paper (SNIP) 2017: 0.930

See all formats and pricing
More options …

C-ORAL-JAPON: Corpus of Spontaneous Spoken Japanese

Marta Garrote / Chieko Kimura / Kengo Matsui / Antonio Moreno Sandoval / Emi Takamori
Published Online: 2013-05-10 | DOI: https://doi.org/10.1515/cllt-2013-0004


Research on Language Technologies demands a wide range of linguistic resources. C-ORAL-JAPON is a corpus of spontaneous spoken Japanese. It has more than 12 recording hours and 149,380 words, divided into three types of texts: monologues, dialogues and conversations. The corpus is made up of 39 text files (transcription) aligned with the corresponding audio files. In total, 59 Japanese native speakers of different ages have participated in the corpus recordings. Thanks to its characteristics, this corpus is an ideal resource for linguistic or sociolinguistic studies, or to be used as working material in teaching Japanese as a foreign language.

Keywords: corpus linguistics; Japanese; spontaneous spoken corpus; linguistic resources

About the article

Marta Garrote

Marta Garrote is an Associate Professor of EFL at the Philology and Didactics Department, at UAM. Currently, she is a member of the research groups Discourse Analysis and Intercultural Communication and Technology Enhanced Language Learning. Previously, she worked at LLI-UAM and the Grupo de Bases de Datos Avanzadas at the Universidad Carlos III de Madrid.

Chieko Kimura

Chieko Kimura is an Assistant Professor of Japanese at UAM. Her research interests are focused on Japanese spontaneous speech, especially of young native speakers and their way of using honorific expressions and foreign vocabulary.

Kengo Matsui

Kengo Matsui is a Ph.D. student at Tokyo University of Foreign Studies. Currently he is a lecturer of Spanish at Dokkyo University.

Antonio Moreno Sandoval

Antonio Moreno-Sandoval is Head of the Laboratorio de Lingüística Informática at UAM. His research focuses on compilation of language resources and development of taggers and parsers for annotation of corpora. He has been the Principal Investigator of the UAM Spanish Treebank and the Spanish C-ORAL-ROM corpus, among others.

Emi Takamori

Emi Takamori studied Japanese language and Linguistics at Sophia University and Tokyo University of Foreign Studies. She is currently an Assistant Professor at the Linguistics Dept. at UAM. Ms. Takamori is working on a learner corpus of spontaneous Japanese.

Published Online: 2013-05-10

Published in Print: 2015-10-01

Citation Information: Corpus Linguistics and Linguistic Theory, Volume 11, Issue 2, Pages 373–392, ISSN (Online) 1613-7035, ISSN (Print) 1613-7027, DOI: https://doi.org/10.1515/cllt-2013-0004.

Export Citation

©2015 by De Gruyter Mouton.Get Permission

Comments (0)

Please log in or register to comment.
Log in