Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Information Technologies and Control

The Journal of Institute of Information and Communication Technologies of Bulgarian Academy of Sciences

4 Issues per year

Open Access
Online
ISSN
1312-2622
See all formats and pricing
More options …

An Approach for Ontology Based Information Extraction

N. Borisova
  • South-West University “Neofit Rilski”, Department of Informatics 66 Ivan Mihaylov Blvd., 2700 Blagoevgrad, Bulgaria
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
Published Online: 2015-12-04 | DOI: https://doi.org/10.1515/itc-2015-0007

Abstract

An approach for Ontology based Information Extraction (OBIE) from unstructured text in the Bulgarian language is presented in this paper. The presented method and algorithm provide a solution for automatic data extraction from text documents exploiting ontologies. To this end, in addition to the standard tools for processing language resources in an open source free software, a dictionary-based lemmatizer for Bulgarian has been developed and integrated. It is distributed as free software, publicly available to download and use under the GPL v3 license. Due to the specifics of inflection in Bulgarian the developed tools for lemmatization will contribute to improving the results of the POS tagger. This approach will offer opportunities for developing a dynamically created gazetteer that is, in combination with a few other generic GATE resources, capable of producing ontologybased annotations over the given content with regards to the given ontology. This algorithm can also be used in the processes of content creation and management of information and knowledge.

Keywords: Ontology-based Information Extraction; Semantic web; NLP; Bulgarian grammar; GATE

References

  • 1. Berners-Lee, T., J. Hendler, O. Lassila. The Semantic Web. Scientific American Magazine, May 17, 2001.Google Scholar

  • 2. Bontcheva, K., H. Cunningham, A. Kiryakov, V. Tablan. Semantic Annotation and Human Language Technology. Semantic Web Technologies: Trends and Research in Ontology-based Systems (Eds. J. Davies, R. Studer and P. Warren), John Wiley & Sons, Ltd, Chichester, UK, 2006. doi: 10.1002/047003033X.ch3.CrossrefGoogle Scholar

  • 3. Gobinda, G. G. Natural Language Processing. Annual Review of Information Science and Technology, 37, 2003, 1, 51-89.Google Scholar

  • 4. Cunningham, H., K. Bontcheva, D. Maynard, V. Tablan. GATE - A New Release. - ELSNews, 11 2002, 1. http://www.elsnet.org/publications/elsnews/11.1.pdf.Google Scholar

  • 5. Borisova, N., G. Iliev, E. Karashtranova. On Detecting Noun- Adjective Agreement Errors in Bulgarian Language Using GATE. Proceedings of the Fifth International Conference of FMNS, Blagoevgrad, 2013, 180-187, 2013.Google Scholar

  • 6. Tablan, V., C. Ursu, K. Bontcheva, H. Cunningham, D. Maynard, O. Hamza, T. Mcenery, P. Baker, M. Leisher. A Unicode-based Environment for Creation and Use of Language Resources. Proceedings of 3rd Language Resources and Evaluation Conference, 66-71. http://citeseerx.ist.psu.edu/viewdocdownload;jsessionid=F9063E0E70FAA70A6878E6502D7F0968?doi=10.1.1.18.5528&rep=rep1&type=pdf.Google Scholar

  • 7. https://gate.ac.uk.Google Scholar

  • 8. Simov, K., P. Osenova, M. Slavcheva. BulTreeBank Morphosyntactic Tagset. Technical Report BTB-TR03, BulTreeBank Project, March 2004.Google Scholar

  • 9. Nakov, P. BulStem: Design and Evaluation of Inflectional Stemmer for Bulgarian. Proceedings of Workshop on Balkan Language Resources and Tools (1st Balkan Conference in Informatics), Thessaloniki, Greece, November, 2003. http://lml.bas.bg/~nakov/selected_papers_list/nakov_BLRT_BulStem.pdf.Google Scholar

  • 10. Cunningham, H., D. Maynard, K. Bontcheva, V. Tablan, N. Aswani, I. Roberts, G. Gorrell, A. Funk, A. Roberts, D. Damljanovic, T. Heitz, M. A. Greenwood, H. Saggion, J. Petrak, Y. Li, W. Peters, et al. Developing Language Processing Components with GATE Version 8. The University of Sheffield, Department of Computer Science, 2014.Google Scholar

  • 11. Iliev, G., N. Borisova, E. Karashtranova, E., D. Kostadinova. A Publicly Available Cross-Platform Lemmatizer for Bulgarian. Proceedings of the Sixth International Scientific Conference - SWU, FMNS 2015, Blagoevgrad, 2015, 147-151.Google Scholar

  • 12. Karashtranova, E., G. Iliev, N. Borisova, Y. Chankova, I. Atanasova. Evaluation of the Accuracy of the BGLemmatizer. Proceedings of the Sixth International Scientific Conference - SWU, FMNS 2015, Blagoevgrad, 2015, 152-156.Google Scholar

  • 13. Krustev, B. The Morphology of the Bulgarian Language in 187Type Tables. 1990.Google Scholar

  • 14. Kiryakov, A., D. Ognyanov, D. Manov. OWLIM-a Pragmatic Semantic Repository for OWL. Web Information Systems Engineering - WISE 2005 Workshops, Lecture Notes in Computer Science Volume 3807, 2005, 182-192. Google Scholar

About the article

Received: 2015-08-06

Published Online: 2015-12-04

Published in Print: 2014-03-01


Citation Information: Information Technologies and Control, Volume 12, Issue 1, Pages 15–20, ISSN (Online) 1312-2622, DOI: https://doi.org/10.1515/itc-2015-0007.

Export Citation

© 2015. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. BY-NC-ND 4.0

Comments (0)

Please log in or register to comment.
Log in