Jump to ContentJump to Main Navigation
Show Summary Details
More options …

it - Information Technology

Methods and Applications of Informatics and Information Technology

Editor-in-Chief: Conrad, Stefan

Online
ISSN
2196-7032
See all formats and pricing
More options …
Volume 58, Issue 2

Issues

Wikidition: Automatic lexiconization and linkification of text corpora

Alexander Mehler / Rüdiger Gleim / Tim vor der Brück / Wahed Hemati / Tolga Uslu / Steffen Eger
Published Online: 2016-04-09 | DOI: https://doi.org/10.1515/itit-2015-0035

Abstract

We introduce a new text technology, called Wikidition, which automatically generates large scale editions of corpora of natural language texts. Wikidition combines a wide range of text mining tools for automatically linking lexical, sentential and textual units. This includes the extraction of corpus-specific lexica down to the level of syntactic words and their grammatical categories. To this end, we introduce a novel measure of text reuse and exemplify Wikidition by means of the capitularies, that is, a corpus of Medieval Latin texts.

Keywords: Wikidition; linkification; lexiconization; digital edition; text mining

ACM CCS: Computing methodologies→Artificial intelligence→Natural language processing; Information systems→Information storage systems; Information systems→Information retrieval

About the article

Alexander Mehler

Alexander Mehler is professor for Computational Humanities at Goethe University and head of the Text Technology Lab. He is member of the executive committee of the Center for the Digital Foundation of Research in the Humanities, Social, and Educational Sciences (CEDIFOR). His research interests include computational models of linguistic networks.

Goethe-Universität Frankfurt, Robert-Mayer-Straße 10, D-60325 Frankfurt am Main

Rüdiger Gleim

Rüdiger Gleim is scientific assistant at Goethe University. He worked within the Special Research Center Alignment in Communication at Bielefeld University. His research interests include semantic databases and text mining.

Goethe-Universität Frankfurt, Robert-Mayer-Straße 10, D-60325 Frankfurt am Main

Tim vor der Brück

Dr. Tim vor der Brück studied Computer Science at Saarland University. Currently, he is research associate at the Lucerne University of Applied Sciences and Arts. His research interests include text mining and multimodal computing.

Hochschule Luzern, Technikumstr. 21, 6048 Horw

Wahed Hemati

Wahed Hemati is project member of the CEDIFOR at Goethe University and works on machine reading and text mining.

Goethe-Universität Frankfurt, Robert-Mayer-Straße 10, D-60325 Frankfurt am Main

Tolga Uslu

Tolga Uslu is project member of the CEDIFOR at Goethe University and works on image-giving methods of text mining.

Goethe-Universität Frankfurt, Robert-Mayer-Straße 10, D-60325 Frankfurt am Main

Steffen Eger

Dr. Steffen Eger is project member of the CompHistSem project at Goethe University. His research interests concern mathematical methods of computational linguistics and social network analysis.

Goethe-Universität Frankfurt, Robert-Mayer-Straße 10, D-60325 Frankfurt am Main


Accepted: 2016-01-11

Received: 2015-09-19

Published Online: 2016-04-09

Published in Print: 2016-03-01


Citation Information: it - Information Technology, Volume 58, Issue 2, Pages 70–79, ISSN (Online) 2196-7032, ISSN (Print) 1611-2776, DOI: https://doi.org/10.1515/itit-2015-0035.

Export Citation

©2016 Walter de Gruyter Berlin/Boston.Get Permission

Comments (0)

Please log in or register to comment.
Log in