Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Journal of Artificial Intelligence and Soft Computing Research

The Journal of Polish Neural Network Society, the University of Social Sciences in Lodz & Czestochowa University of Technology

4 Issues per year

Open Access
See all formats and pricing
More options …

Order Estimation of Japanese Paragraphs by Supervised Machine Learning and Various Textual Features

Masaki Murata
  • Department of Information and Electronics, Tottori University, 4-101 Koyama-Minami, Tottori 680-8552, Japan
/ Satoshi Ito
  • Department of Information and Electronics, Tottori University, 4-101 Koyama-Minami, Tottori 680-8552, Japan
/ Masato Tokuhisa
  • Department of Information and Electronics, Tottori University, 4-101 Koyama-Minami, Tottori 680-8552, Japan
/ Qing Ma
  • Department of Applied Mathematics and Informatics, Ryukoku University Seta, Otsu, Shiga 520-2194, Japan
Published Online: 2015-10-29 | DOI: https://doi.org/10.1515/jaiscr-2015-0033


In this paper, we propose a method to estimate the order of paragraphs by supervised machine learning. We use a support vector machine (SVM) for supervised machine learning. The estimation of paragraph order is useful for sentence generation and sentence correction. The proposed method obtained a high accuracy (0.84) in the order estimation experiments of the first two paragraphs of an article. In addition, it obtained a higher accuracy than the baseline method in the experiments using two paragraphs of an article. We performed feature analysis and we found that adnominals, conjunctions, and dates were effective for the order estimation of the first two paragraphs, and the ratio of new words and the similarity between the preceding paragraphs and an estimated paragraph were effective for the order estimation of all pairs of paragraphs.


  • [1] Rakesh Agrawal and Ramakrishnan Srikant. Mining sequential patterns. Proceedings of ICDEf95, pages 3-14, 1995.Google Scholar

  • [2] Danushka Bollegala, Naoaki Okazaki, and Mitsuru Ishizuka. A bottom-up approach to sentence ordering for multi-document summarization. Proceedings of the 44th Annual Meeting of the Association of Computational Linguistics, pages 385-392, 2006.Google Scholar

  • [3] Jaime Carbonell and Jade Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 335-336, 1998.Google Scholar

  • [4] Nello Cristianini and John Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, 2000.Google Scholar

  • [5] Fosca Giannotti, Mirco Nanni, and Dino Pedreschi. Efficient mining of temporally annotated sequences. Proceedings of the 2006 SIAM International Conference on Data Mining, pages 348-359, 2006.Google Scholar

  • [6] Yuya Hayashi, Masaki Murata, Liangliang Fan, and Masato Tokuhisa. Japanese sentence order estimation using supervised machine learning with rich linguistic clues. In Proceedings of the 14th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2013), pages 1-12, 2013.Google Scholar

  • [7] Nikiforos Karamanis and Hisar Maruli Manurung. Stochastic text structuring using the principle of continuity. In Proceedings of the second International Natural Language Generation Conference (INLGf02), pages 81-88, 2002.Google Scholar

  • [8] Taku Kudoh. TinySVM: Support Vector Machines. http://cl.aist-nara.ac.jp/taku-ku//software/TinySVM/index.html, 2000.Google Scholar

  • [9] Mirella Lapata. Probablistic text structuring: Experiments with sentence ordering. Proceedings of the 41st Annual Meeting of the Association of Computational Linguistics, pages 542-552, 2003.Google Scholar

  • [10] William C. Mann and Sandra A. Thompson. Rhetorical structure theory: Toward a functional theory of text organization. Text, 8(3):243-281, 1988.Google Scholar

  • [11] Yuji Matsumoto, Akira Kitauchi, Tatsuo Yamashita, Yoshitaka Hirano, Hiroshi Matsuda, and Masayuki Asahara. Japanese morphological analysis system ChaSen version 2.0 manual 2nd edition. 1999.Google Scholar

  • [12] Kathleen R. McKeown, Judith L. Klavans, Vasileios Hatzivassiloglou, Regina Barzilay, and Eleazar Eskin. Towards multidocument summarization by reformulation: Progress and prospects. In Proceedings of AAAI/IAAI, pages 453-460, 1999.Google Scholar

  • [13] Masaki Murata and Hitoshi Isahara. Automatic detection of mis-spelled Japanese expressions using a new method for automatic extraction of negative examples based on positive examples. IEICE Transactions on Information and Systems, E85- D(9):1416-1424, 2002.Google Scholar

  • [14] Masaki Murata, Satoshi Ito, Masato Tokuhisa, and Qing Ma. Order estimation of Japanese paragraphs by supervised machine learning. Proceedings of SCIS-ISIS 2014, pages 1096-1101, 2014.Google Scholar

  • [15] Naoaki Okazaki, Yutaka Matsuo, and Mitsuru Ishizuka. Improving chronological sentence ordering by precedence relation. In Proceedings of the 20th International Conference on Computational Linguistics (COLING 04), pages 750-756, 2004.Google Scholar

  • [16] Kiyotaka Uchimoto, Masaki Murata, Qing Ma, Satoshi Sekine, and Hitoshi Isahara. Word order acquisition from corpora. In COLING ’2000, pages 871-877, 2000.Google Scholar

About the article

Published Online: 2015-10-29

Published in Print: 2015-10-01

Citation Information: Journal of Artificial Intelligence and Soft Computing Research, ISSN (Online) 2083-2567, DOI: https://doi.org/10.1515/jaiscr-2015-0033.

Export Citation

© 2015. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. BY-NC-ND 4.0

Comments (0)

Please log in or register to comment.
Log in