SEARCH CONTENT

You are looking at 1 - 10 of 5,492 items :

  • IT-Security and Cryptology x
Clear All

Abstract

Harmony Search Algorithm (HSA) is an evolutionary algorithm which mimics the process of music improvisation to obtain a nice harmony. The algorithm has been successfully applied to solve optimization problems in different domains. A significant shortcoming of the algorithm is inadequate exploitation when trying to solve complex problems. The algorithm relies on three operators for performing improvisation: memory consideration, pitch adjustment, and random consideration. In order to improve algorithm efficiency, we use roulette wheel and tournament selection in memory consideration, replace the pitch adjustment and random consideration with a modified polynomial mutation, and enhance the obtained new harmony with a modified β-hill climbing algorithm. Such modification can help to maintain the diversity and enhance the convergence speed of the modified HS algorithm. β-hill climbing is a recently introduced local search algorithm that is able to effectively solve different optimization problems. β-hill climbing is utilized in the modified HS algorithm as a local search technique to improve the generated solution by HS. Two algorithms are proposed: the first one is called PHSβ–HC and the second one is called Imp. PHSβ–HC. The two algorithms are evaluated using 13 global optimization classical benchmark function with various ranges and complexities. The proposed algorithms are compared against five other HSA using the same test functions. Using Friedman test, the two proposed algorithms ranked 2nd (Imp. PHSβ–HC) and 3rd (PHSβ–HC). Furthermore, the two proposed algorithms are compared against four versions of particle swarm optimization (PSO). The results show that the proposed PHSβ–HC algorithm generates the best results for three test functions. In addition, the proposed Imp. PHSβ–HC algorithm is able to overcome the other algorithms for two test functions. Finally, the two proposed algorithms are compared with four variations of differential evolution (DE). The proposed PHSβ–HC algorithm produces the best results for three test functions, and the proposed Imp. PHSβ–HC algorithm outperforms the other algorithms for two test functions. In a nutshell, the two modified HSA are considered as an efficient extension to HSA which can be used to solve several optimization applications in the future.

Abstract

Extracting information from large biological datasets is a challenging task, due to the large data size, high-dimensionality, noise, and errors in the data. Gene expression data contains information about which gene products have been formed by a cell, thus representing which genes have been read to activate a particular biological process. Understanding which of these gene products can be related to which processes can for example give insights about how diseases evolve and might give hints about how to fight them.

The Next Generation RNA-sequencing method emerged over a decade ago and is nowadays state-of-the-art in the field of gene expression analyses. However, analyzing these large, complex datasets is still a challenging task. Many of the existing methods do not take into account the underlying structure of the data.

In this paper, we present a new approach for RNA-sequencing data analysis based on dictionary learning. Dictionary learning is a sparsity enforcing method that has widely been used in many fields, such as image processing, pattern classification, signal denoising and more. We show how for RNA-sequencing data, the atoms in the dictionary matrix can be interpreted as modules of genes that either capture patterns specific to different types, or else represent modules that are reused across different scenarios. We evaluate our approach on four large datasets with samples from multiple types. A Gene Ontology term analysis, which is a standard tool indicated to help understanding the functions of genes, shows that the found gene-sets are in agreement with the biological context of the sample types. Further, we find that the sparse representations of samples using the dictionary can be used to identify type-specific differences.

Abstract

There is an increasing interest in fusing data from heterogeneous sources. Combining data sources increases the utility of existing datasets, generating new information and creating services of higher quality. A central issue in working with heterogeneous sources is data migration: In order to share and process data in different engines, resource intensive and complex movements and transformations between computing engines, services, and stores are necessary.

Muses is a distributed, high-performance data migration engine that is able to interconnect distributed data stores by forwarding, transforming, repartitioning, or broadcasting data among distributed engines’ instances in a resource-, cost-, and performance-adaptive manner. As such, it performs seamless information sharing across all participating resources in a standard, modular manner. We show an overall improvement of 30 % for pipelining jobs across multiple engines, even when we count the overhead of Muses in the execution time. This performance gain implies that Muses can be used to optimise large pipelines that leverage multiple engines.

Abstract

The Internet of Things (IoT) sparks a revolution in time series forecasting. Traditional techniques forecast time series individually, which becomes unfeasible when the focus changes to thousands of time series exhibiting anomalies like noise and missing values. This work presents CSAR, a technique forecasting a set of time series with only one model, and a feature-aware partitioning applying CSAR on subsets of similar time series. These techniques provide accurate forecasts a hundred times faster than traditional techniques, preparing forecasting for the arising challenges of the IoT era.

FREE ACCESS

Abstract

The selection of input data is a crucial step in virtually every empirical study. Experimental campaigns in algorithm engineering, experimental algorithmics, network analysis, and many other fields often require suited network data. In this context, synthetic graphs play an important role, as data sets of observed networks are typically scarce, biased, not sufficiently understood, and may pose logistic and legal challenges. Just like processing huge graphs becomes challenging in the big data setting, new algorithmic approaches are necessary to generate such massive instances efficiently. Here, we update our previous survey [] on results for large-scale graph generation obtained within the DFG priority programme SPP 1736 (Algorithms for Big Data); to this end, we broaden the scope and include recently published results.

Abstract

Mathematical optimization is at the algorithmic core of machine learning. Almost any known algorithm for solving mathematical optimization problems has been applied in machine learning and the machine learning community itself is actively designing and implementing new algorithms for specific problems. These implementations have to be made available to machine learning practitioners which is mostly accomplished by distributing them as standalone software. Successful well-engineered implementations are collected in machine learning toolboxes that provide a more uniform access to the different solvers. A disadvantage of the toolbox approach is a lack of flexibility as toolboxes only provide access to a fixed set of machine learning models that cannot be modified. This can be a problem for the typical machine learning workflow that iterates the process of modeling, solving and validating. If a model does not perform well on validation data, it needs to be modified. In most cases these modifications require a new solver for the entailed optimization problems. Optimization frameworks that combine a modeling language for specifying optimization problems with a solver are better suited to the iterative workflow since they allow to address large problem classes. Here, we provide examples of the use of optimization frameworks in machine learning. We also illustrate the use of one such framework in a case study that follows the typical machine learning workflow.

Abstract

Network science methodology is increasingly applied to a large variety of real-world phenomena, often leading to big network data sets. Thus, networks (or graphs) with millions or billions of edges are more and more common. To process and analyze these data, we need appropriate graph processing systems and fast algorithms. Yet, many analysis algorithms were pioneered on small networks when speed was not the highest concern. Developing an analysis toolkit for large-scale networks thus often requires faster variants, both from an algorithmic and an implementation perspective. In this paper we focus on computational aspects of vertex centrality measures. Such measures indicate the (relative) importance of a vertex based on the position of the vertex in the network. We describe several common (and some recent and thus less established) measures, optimization problems in their context as well as algorithms for an efficient solution of the raised problems. Our focus is on (not necessarily exact) performance-oriented algorithmic techniques that enable significantly faster processing than the previous state of the art – often allowing to process massive data sets quickly and without resorting to distributed graph processing systems.

Abstract

Big Data applications are characterized by processing an amount of data too huge to be stored. Cryptographic protocols are by construction supposed to define huge data spaces that cannot be handled by any attacker. Nevertheless, the task of protocol cryptanalysis is to properly select cryptographic parameter lengths that guarantee both efficiency and security. This requires to break cryptographic protocols and their underlying hardness assumptions for mid-sized parameters. But even for mid-sized parameters cryptographic search spaces are way too huge to be stored. This asks for technical solutions that traverse the search space without storing elements. As an appealingly simple example, we address the subset sum problem which lies at the heart of many modern cryptographic protocols designed to offer security even against quantum computers. In the subset sum problem, one obtains integers a1,,an and an integer target t, and has to find a subset of the ai’s that exactly sums to t. A trivial memory-less algorithm tests for all 2n subsets, whether their sum equals t. It may come as a surprise that there exist memory-less algorithms significantly faster than 2n. We give a survey on recent memory-less techniques, that apply but are not limited to the subset sum problem. We start by describing a general collision finding technique that was introduced in 1994 in the seminal work of van Oorschot and Wiener. Applied to subset sum the van Oorschot-Wiener technique leads to a 20.75n-algorithm. This was improved in 2011 by Becker, Coron and Joux to 20.72n using the representation technique. Recently, Esser and May presented a memory-less algorithm achieving 20.65n using two-layered collision finding. These running times have to be compared to the optimal 20.5n lower bound for collision finding algorithms.