Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter Oldenbourg April 9, 2016

Wikidition: Automatic lexiconization and linkification of text corpora

Alexander Mehler

Alexander Mehler is professor for Computational Humanities at Goethe University and head of the Text Technology Lab. He is member of the executive committee of the Center for the Digital Foundation of Research in the Humanities, Social, and Educational Sciences (CEDIFOR). His research interests include computational models of linguistic networks.

Goethe-Universität Frankfurt, Robert-Mayer-Straße 10, D-60325 Frankfurt am Main

EMAIL logo
, Rüdiger Gleim

Rüdiger Gleim is scientific assistant at Goethe University. He worked within the Special Research Center Alignment in Communication at Bielefeld University. His research interests include semantic databases and text mining.

Goethe-Universität Frankfurt, Robert-Mayer-Straße 10, D-60325 Frankfurt am Main

, Tim vor der Brück

Dr. Tim vor der Brück studied Computer Science at Saarland University. Currently, he is research associate at the Lucerne University of Applied Sciences and Arts. His research interests include text mining and multimodal computing.

Hochschule Luzern, Technikumstr. 21, 6048 Horw

, Wahed Hemati

Wahed Hemati is project member of the CEDIFOR at Goethe University and works on machine reading and text mining.

Goethe-Universität Frankfurt, Robert-Mayer-Straße 10, D-60325 Frankfurt am Main

, Tolga Uslu

Tolga Uslu is project member of the CEDIFOR at Goethe University and works on image-giving methods of text mining.

Goethe-Universität Frankfurt, Robert-Mayer-Straße 10, D-60325 Frankfurt am Main

and Steffen Eger

Dr. Steffen Eger is project member of the CompHistSem project at Goethe University. His research interests concern mathematical methods of computational linguistics and social network analysis.

Goethe-Universität Frankfurt, Robert-Mayer-Straße 10, D-60325 Frankfurt am Main

Abstract

We introduce a new text technology, called Wikidition, which automatically generates large scale editions of corpora of natural language texts. Wikidition combines a wide range of text mining tools for automatically linking lexical, sentential and textual units. This includes the extraction of corpus-specific lexica down to the level of syntactic words and their grammatical categories. To this end, we introduce a novel measure of text reuse and exemplify Wikidition by means of the capitularies, that is, a corpus of Medieval Latin texts.

About the authors

Alexander Mehler

Alexander Mehler is professor for Computational Humanities at Goethe University and head of the Text Technology Lab. He is member of the executive committee of the Center for the Digital Foundation of Research in the Humanities, Social, and Educational Sciences (CEDIFOR). His research interests include computational models of linguistic networks.

Goethe-Universität Frankfurt, Robert-Mayer-Straße 10, D-60325 Frankfurt am Main

Rüdiger Gleim

Rüdiger Gleim is scientific assistant at Goethe University. He worked within the Special Research Center Alignment in Communication at Bielefeld University. His research interests include semantic databases and text mining.

Goethe-Universität Frankfurt, Robert-Mayer-Straße 10, D-60325 Frankfurt am Main

Tim vor der Brück

Dr. Tim vor der Brück studied Computer Science at Saarland University. Currently, he is research associate at the Lucerne University of Applied Sciences and Arts. His research interests include text mining and multimodal computing.

Hochschule Luzern, Technikumstr. 21, 6048 Horw

Wahed Hemati

Wahed Hemati is project member of the CEDIFOR at Goethe University and works on machine reading and text mining.

Goethe-Universität Frankfurt, Robert-Mayer-Straße 10, D-60325 Frankfurt am Main

Tolga Uslu

Tolga Uslu is project member of the CEDIFOR at Goethe University and works on image-giving methods of text mining.

Goethe-Universität Frankfurt, Robert-Mayer-Straße 10, D-60325 Frankfurt am Main

Steffen Eger

Dr. Steffen Eger is project member of the CompHistSem project at Goethe University. His research interests concern mathematical methods of computational linguistics and social network analysis.

Goethe-Universität Frankfurt, Robert-Mayer-Straße 10, D-60325 Frankfurt am Main

Acknowledgement

This work has been funded by the German Federal Ministry of Education via the projects CompHistSem (www.comphistsem.org) and CEDIFOR (www.cedifor.de).

Received: 2015-9-19
Accepted: 2016-1-11
Published Online: 2016-4-9
Published in Print: 2016-3-1

©2016 Walter de Gruyter Berlin/Boston

Downloaded on 2.12.2022 from frontend.live.degruyter.dgbricks.com/document/doi/10.1515/itit-2015-0035/html
Scroll Up Arrow