Jump to ContentJump to Main Navigation
Show Summary Details
More options …

it - Information Technology

Methods and Applications of Informatics and Information Technology

Editor-in-Chief: Conrad, Stefan / Molitor, Paul

6 Issues per year

Online
ISSN
2196-7032
See all formats and pricing

Access brought to you by:

provisional account

More options …
Volume 58, Issue 4

Issues

Real-time stream processing for Big Data

Wolfram Wingerath / Felix Gessert / Steffen Friedrich / Norbert Ritter
Published Online: 2016-06-24 | DOI: https://doi.org/10.1515/itit-2016-0002

Abstract

With the rise of the web 2.0 and the Internet of things, it has become feasible to track all kinds of information over time, in particular fine-grained user activities and sensor data on their environment and even their biometrics. However, while efficiency remains mandatory for any application trying to cope with huge amounts of data, only part of the potential of today's Big Data repositories can be exploited using traditional batch-oriented approaches as the value of data often decays quickly and high latency becomes unacceptable in some applications. In the last couple of years, several distributed data processing systems have emerged that deviate from the batch-oriented approach and tackle data items as they arrive, thus acknowledging the growing importance of timeliness and velocity in Big Data analytics.

In this article, we give an overview over the state of the art of stream processors for low-latency Big Data analytics and conduct a qualitative comparison of the most popular contenders, namely Storm and its abstraction layer Trident, Samza and Spark Streaming. We describe their respective underlying rationales, the guarantees they provide and discuss the trade-offs that come with selecting one of them for a particular task.

Keywords: Distributed real-time stream processing; Big Data analytics

ACM CCS: General and reference →Document types →Surveys and overviews; Computer systems organization →Architectures →Distributed architectures →Cloud computing; Computer systems organization →Real-time systems →Real-time system architecture; Information systems →Data management systems →Database management system engines →Stream management; Computing methodologies →Distributed computing methodologies

About the article

Wolfram Wingerath

Wolfram Wingerath is a Ph.D. student under supervision of Norbert Ritter teaching and researching at the University of Hamburg. He was co-organiser of the BTW 2015 conference and has held workshop and conference talks on his published work on several occasions. Wolfram is part of the databases and information systems group and his research interests evolve around scalable NoSQL database systems, cloud computing and Big Data analytics, but he also has a background in data quality and duplicate detection. His current work is related to real-time stream processing and explores the possibilities of providing always-up-to-date materialised views and continuous queries on top of existing non-streaming DBMSs.

Univ. of Hamburg, CS Dept., D-22527 Hamburg, Germany

Felix Gessert

Felix Gessert is a Ph.D. student at the databases and information systems group at the University of Hamburg. His main research fields are scalable database systems, transactions and web technologies for cloud data management. His thesis addresses caching and transaction processing for low-latency mobile and web applications. He is also founder and CEO of the startup Baqend that implements these research results in a cloud-based backend-as-a-service platform. Since their product is based on a polyglot, NoSQL-centric storage model, he is very interested in both the research and practical challenges of leveraging and improving these systems. He is frequently giving talks on different NoSQL topics.

Univ. of Hamburg, CS Dept., D-22527 Hamburg, Germany

Steffen Friedrich

Steffen Friedrich is a Ph.D. student working under supervision of Norbert Ritter at the University of Hamburg. He has taken part in several workshops and conferences, both as presenter (e.g. DMC2014) and as co-organiser (BTW 2015). Being a member of the databases and information systems group, Steffen is interested in large-scale data management and data-intensive computing. Furthermore, in his Master thesis, he also dealt with data quality issues, specifically with duplicate detection in probabilistic data. His research project is primarily concerned with benchmarking of non-functional characteristics (e.g. consistency and availability) in distributed NoSQL database systems.

Univ. of Hamburg, CS Dept., D-22527 Hamburg, Germany

Norbert Ritter

Prof. Dr.-Ing. Norbert Ritter is a full professor of computer science at the University of Hamburg, where he heads the databases and information systems group. He received his Ph.D. from the University of Kaiserslautern in 1997. His research interests include distributed and federated database systems, transaction processing, caching, cloud data management, information integration and autonomous database systems. He has been teaching NoSQL topics in various courses for several years. Seeing the many open challenges for NoSQL systems, he and Felix Gessert have been organizing the annual Scalable Cloud Data Management Workshop (www.scdm2015.com) for three years to promote research in this area.

Univ. of Hamburg, CS Dept., D-22527 Hamburg, Germany


Accepted: 2016-05-02

Received: 2016-01-15

Published Online: 2016-06-24

Published in Print: 2016-08-28


Citation Information: it - Information Technology, Volume 58, Issue 4, Pages 186–194, ISSN (Online) 2196-7032, ISSN (Print) 1611-2776, DOI: https://doi.org/10.1515/itit-2016-0002.

Export Citation

©2016 Walter de Gruyter Berlin/Boston.Get Permission

Comments (0)

Please log in or register to comment.
Log in