Marta Garrote, Chieko Kimura, Kengo Matsui, Antonio Moreno Sandoval, Emi Takamori
May 10, 2013
Research on Language Technologies demands a wide range of linguistic resources. C-ORAL-JAPON is a corpus of spontaneous spoken Japanese. It has more than 12 recording hours and 149,380 words, divided into three types of texts: monologues, dialogues and conversations. The corpus is made up of 39 text files (transcription) aligned with the corresponding audio files. In total, 59 Japanese native speakers of different ages have participated in the corpus recordings. Thanks to its characteristics, this corpus is an ideal resource for linguistic or sociolinguistic studies, or to be used as working material in teaching Japanese as a foreign language.