Jump to ContentJump to Main Navigation
Show Summary Details
More options …

it - Information Technology

Methods and Applications of Informatics and Information Technology

Editor-in-Chief: Conrad, Stefan / Molitor, Paul

6 Issues per year

Online
ISSN
2196-7032
See all formats and pricing
More options …
Volume 56, Issue 1

Issues

Online horizontal partitioning of heterogeneous data

Kai Herrmann / Hannes Voigt / Wolfgang Lehner
Published Online: 2014-01-31 | DOI: https://doi.org/10.1515/itit-2014-1015

Abstract

In an increasing number of use cases, databases face the challenge of managing heterogeneous data. Heterogeneous data is characterized by a quickly evolving variety of entities without a common set of attributes. These entities do not show enough regularity to be captured in a traditional database schema. A common solution is to centralize the diverse entities in a universal table. Usually, this leads to a very sparse table. Although today's techniques allow efficient storage of sparse universal tables, query efficiency is still a problem. Queries that address only a subset of attributes have to read the whole universal table including many irrelevant entities. A solution is to use a partitioning of the table, which allows pruning partitions of irrelevant entities before they are touched. Creating and maintaining such a partitioning manually is very laborious or even infeasible, due to the enormous complexity. Thus an autonomous solution is desirable.

In this article, we define the Online Partitioning Problem for heterogeneous data. We sketch how an optimal solution for this problem can be determined based on hypergraph partitioning. Although it leads to the optimal partitioning, the hypergraph approach is inappropriate for an implementation in a database system. We present Cinderella, an autonomous online algorithm for horizontal partitioning of heterogeneous entities in universal tables. Cinderella is designed to keep its overhead low by operating online; it incrementally assigns entities to partition while they are touched anyway during modifications. This enables a reasonable physical database design at runtime instead of static modeling.

Keywords: ACM CCS→Information systems→Database design and models→Physical data models; ACM CCS→Information systems→Database design and models; ACM CCS→Information systems→Database administration→Autonomous database administration

About the article

Kai Herrmann

Kai Herrmann is a PhD student at the Database Technology Group at TU Dresden. He received his Computer Science master's degree from the TU Dresden in April 2013. For his thesis he developed a configurable schema mapping layer which allows flexible management of irregular data sets. From 2009 to 2013, he was a student research assistant at the Database Technology Group focusing on flexible data management.

Technische Universität Dresden, Database Technology Research Group, Nöthnitzer Straße 46, 01187 Germany, Tel.: +49-351-46337895

Hannes Voigt

Hannes Voigt received his Master in Computer Science from the TU Dresden in 2008. Since graduation, he pursues his research activities as a research assistant in the Database Technology Group at TU Dresden. In 2010/2011, he was a visiting scientist at SAP Labs, Palo Alto. His research interests are in flexible data management, graph database, and physical design.

Technische Universität Dresden, Database Technology Research Group, Nöthnitzer Straße 46, 01187 Germany

Wolfgang Lehner

Wolfgang Lehner received his Master, Ph. D., and habilitation in Computer Science from the University of Erlangen-Nuremberg. Since 2002, Wolfgang Lehner is full professor and head of the Database Technology Group at TU Dresden. He was a visiting scientist at IBM Almaden, Microsoft Research Redmond, and SAP Walldorf. His major research focuses on the efficient processing of empirically collected mass data with advanced database technology.

Database Technology Group, Faculty of Computer Science, Technische Universität Dresden, 01062 Dresden, Germany, Tel.: +49-351-46338383


Accepted: 2013-10-02

Received: 2013-06-10

Published Online: 2014-01-31

Published in Print: 2014-02-28


Citation Information: it – Information Technology, Volume 56, Issue 1, Pages 4–12, ISSN (Online) 2196-7032, ISSN (Print) 1611-2776, DOI: https://doi.org/10.1515/itit-2014-1015.

Export Citation

©2014 Walter de Gruyter Berlin/Boston.Get Permission

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

[1]
Angela Bonifati, George Fletcher, Hannes Voigt, and Nikolay Yakovets
Synthesis Lectures on Data Management, 2018, Volume 10, Number 3, Page 1

Comments (0)

Please log in or register to comment.
Log in