Evolution of Group-Theoretic Cryptology Attacks using Hyper-heuristics

In previous work, we developed a single Evolutionary Algorithm (EA) to solve random instances of the Anshel-Anshel-Goldfeld (AAG) key exchange protocol over polycyclic groups. The EA consisted of six simple heuristics which manipulated strings. The present work extends this by exploring the use of hyper-heuristics in group-theoretic cryptology for the first time. Hyper-heuristics are a way to generate new algorithms from existing algorithm components (in this case the simple heuristics), with the EAs being one example of the type of algorithm which can be generated by our hyper-heuristic framework. We take as a starting point the above EA and allow hyper-heuristics to build on it by making small tweaks to it. This adaptation is through a process of taking the EA and injecting chains of heuristics built from the simple heuristics. We demonstrate we can create novel heuristic chains, which when placed in the EA create algorithms which out-perform the existing EA. The new algorithms solve a markedly greater number of random AAG instances than the EA for harder instances. This suggests the approach could be applied to many of the same kinds of problems, providing a framework for the solution of cryptology problems over groups. The contribution of this paper is thus a framework to automatically build algorithms to attack cryptology problems.


Introduction
On NP-hard problems, the time taken to produce an algorithm to solve such problems is often vast. In such cases, users may use an "off the shelf" algorithm to obtain approximate solutions within an appropriate time. In this paper, we take a different approach and attempt to design an algorithm in response to feedback from similar instances of the problem. Examples of such problems are those in group-theoretic cryptology (multiple conjugacy, Anshel-Anshel-Goldfeld (AAG, [1]) and word decomposition, for instance). These problems have been posed over varying types of groups serving as the base problems for key exchange protocols (KEPs) [2,1,32,15,30] and subsequently attacked [19,20,17,12,13,33,34,35,39]. The group structures used are often intended to provide an extra encryption layer through the scrambling induced by the group presentation.
In this work, a preliminary hyper-heuristic framework is detailed which takes as input a proposed cryptographic base problem and a group structure, and, via machine learning techniques, generates operations for a length attack algorithm which aims to solve an acceptable proportion of random instances of the problem. The framework is implemented in the GAP 4.8.7 [41] language (due to its compatibility with the ParGAP package [11], allowing use of MPI intra-core communication). This is tested on a case study of an AAG KEP [2,1] posed over polycyclic groups defined by a number field [18]. The aim is to generate mutation operators for algorithms which outperform the existing human-designed EA. These mutation operators are chains of simple heuristics, which are composed or learned. The generation of crossover, selection, and other heuristic components are outside the scope of this work.
Our contribution is an approach that, contrary to the above manual design of attacks, automatically builds attack mechanisms and attempts to break the above AAG KEP. This approach is trained on small set of instances and then validated on a second larger independent set of instances, illustrating it generalises. This paper is not proposing a single algorithm to attack, but rather a framework in which algorithms can automatically be generated and then tested, and is an example of the generate-and-test paradigm which has many applications in science, engineering, mathematics, and daily life. One of the drawbacks of our method is the large amount of computation time required; it takes relatively little time to generate an algorithm but a relatively long time to test it. One of the benefits of our approach, however, is that we can take existing approaches (as we do in this paper, an EA [43,42]), and use it as a starting point from which we can improve.
In [13] we observed that a human-designed EA performs better than the length attack algorithm of [18]. In this paper, we observe that an automatically designed EA performs better than the human designed EA. We also conjecture that a random search algorithm will perform poorly on this problem. This is a pattern of performance typically seen in the metaheuristics literature. The reason for this ordering of four types of solver lies in the nature of the resulting search landscape. A human designed EA is essentially a more sophisticated length attack algorithm, and a machine-designed EA is essentially slightly more sophisticated than an human designed EA.
Typically, during the design of an algorithm, we need an understanding of the problem to design an algorithm. The algorithm thus capture our intuition about how to solve that problem. (consider the problem of sorting and the large number of algorithms available, for instance). An algorithm is an explicit formalisation of our intuition: with cryptology, we have very little in the way of intuition to guide us. This is an opportunity for an automated method (which is largely unbiased) to invent new algorithms.
It is acknowledged that the detailed protocol has already been broken by [13,33] (the latter reference being a "field attack"), but the wish is to present this work as a preliminary study with a view towards application to other cryptanalytic problems. It is argued that this type of algorithm has a future in the disciplines of cryptology and possibly algorithmic questions in combinatorial group theory, and may be extended to other structures and problem types.
This work is organised as follows: In Section 2 we give an introduction to group-based cryptography, reviewing previously-proposed KEP problems, before turning to an overview of hyper-heuristics. This is followed by Section 3, which introduces the notation and formalisation. In Section 4 we describe the experimental approach and detail parameter settings, discussing the results of our approach in Section 5. In Section 6 we conclude the article, including a discussion of further work resulting from this study and raising future research directions.

Background
In this section we will first introduce group-based cryptography. We then give an introduction to hyper-heuristics.
2.1. Introduction to Group-Based Cryptography. Group-based cryptography uses groups in the construction of cryptosystems and KEPs and has been an active area of research since approximately the late 1990s. Proposed cryptosystems and their subsequent attacks (purported breaks) iterate one after the other with the aim of producing increasingly secure cryptography over time.
The late nineties were when group-based cryptography began in earnest, when the likes of [2,1,32] proposed KEPs based upon braid groups. As mentioned in the introduction, the braid groups were used due to the scrambling induced by the presentation of the group, and the consequent thought that the underlying problems (various guises of the conjugacy problem) were thought to be extremely difficult to solve. Solving the underlying problem would, in many cases, break the KEP and render any keys exchanged open to misuse by adversaries.
Both KEPs, and the underlying problems, were attacked in the next few years. Example of such attacks were super summit set attacks [17] and the more practical length-based attacks (LBAs) [29]. These latter algorithms (also known as hillclimbers) build up solutions to instances of the problem gradually, beginning with a short candidate solution and making alterations to it based upon randomness. This altered solution is then compared to the old solution by some metric, mostly with regards to how "well" the candidate solves the instance (for example, how many symbols remain after all possible cancellations have been conducted). If the altered solution proves to be an improvement then the current solution is set equal to the altered solution and the process is repeated. If not, then the altered solution is discarded.
Being practical and fast, LBAs became increasingly sophisticated through [20,19,34,39]. As LBAs became also increasingly capable of solving instances of the aforementioned KEPs, researchers began, in a search for more attack-resistant structures, to look for new groups and problems while keeping the general methodology. Examples of these platform groups are right-angled Artin groups [12] (a homomorphic pre-image of braid groups), small cancellation groups [40], matrix groups, Thompson's group and Grigorchuk's group, to name but a few.
Polycyclic groups were first proposed as a new platform group in 2004 [15] and were followed ten years later by the works of [30] and [18], applying two distinct types of polycyclic groups to the AAG [1] problem (multiple conjugacy). The systems introduced were, in turn, broken by the works of [5] (for generalised Heisenberg groups), [33] and [13] (via a parallelised EA). The latter work was demonstrated to be more efficient, and more successful, than previous LBA attacks. Although the approach on the proposed KEP was successful, we wish to take it further into the domain of hyper-heuristics and use the KEP as a test bed for our framework. An excellent summary of group-theoretic cryptology in general can be found in [35].

2.2.
Introduction to Hyper-Heuristics. Informally, hyper-heuristics offer to take a number of existing computational search techniques, and combine them, to make a new heuristic. This new heuristic is intended to have more of the strengths of each of the heuristics, and less of their weaknesses. The motive of a hyper-heuristic is not to out-perform a state-of-the-art algorithm on a single instance of a problem. Rather, the aim of hyper-heuristic approaches is to perform well across a range of problem instances. In other words, hyper-heuristics attempt to offer robust performance across a set of problems rather than specialised performance on a narrow set of specific instances. These problems could be problem instances from a given domain, such as the travelling salesman problem. Or the problem instances could be drawn from different problem domains, for example exam timetabling and vehicle routing. In this paper we are developing a hyper-heuristic framework to solve problem instances from a single domain: cryptology.
We should also be careful about the distinction between optimisation and supervised machine learning. Optimisation typically has an objective function we wish to evaluate and a parameter value which is a global optimum. Often this is difficult to achieve, and also difficult to know when it has been achieved. In contrast, with supervised machine learning, we typically have a set of example cases which we use to train a model. We then have a second set of independent example cases which are used to determine if the model performs well in general on cases which were not included in the training phase. Optimisation has a single stage (optimising), while machine learning has two main stages (training and testing). Nor do we have the issue of over-fitting in optimisation, but the issue of over-fitting may arise in machine learning. In summary, in this paper we are using a machine learning approach (hyper-heuristics), with an independent training and test set, to build a heuristic which we used for optimisation, the objective function being to minimise the length.
Hyper-heuristics can be viewed in the context of heuristics and metaheuristics. These three terms are often confused. Let us begin by looking first at heuristics, metaheuristics, and finally hyper-heuristics.
A heuristic is domain-specific algorithm (often called a rule of thumb) which does not solve a problem to optimality (as such problems are often NP-hard or NP-complete), but rather offers to deliver suboptimal solutions in feasible time. That is, a heuristic is a strategy that aims to deliver an approximation to a solution to a given problem in a fast, rather than an overly elaborate, way. An example of a heuristic is the Lin-Kernighan algorithm which is applied to the Traveling Salesman Problems (TSP). It does not make sense to apply the Lin-Kernighan algorithm to the knapsack problem, as it is specific to TSP problems. The Lin-Kernighan algorithm could be applied to other graph-based problems with a representation similar to the TSP, but the algorithm may not perform well as this is not what it was intended for. A metaheuristic is a general search-based algorithm which can be applied to spaces consisting of bit strings or permutations, for example, depending on the representation of the problem instances. An example of a metaheuristic is a genetic algorithm which searches the space of bit strings of a given length.
Hyper-heuristics are different again. Typically a hyper-heuristic uses a metaheuristic to search the space of problem specific heuristics. That is, a hyper-heuristic is a "search methodolog[y] for choosing or generating (combining, adapting) heuristics [...], in order to solve a range of optimisation problems" [8, p. 2]. For example, see [3]. Hyper-heuristics have successfully been applied to a number of different problem domains.
As combinatorial optimisation problems are a subset of all NP hard problems, it is not surprising that hyper-heuristics have been a popular approach. Applications include exam timetabling [4] bin packing [38] and employee rostering [10]. There have also been a number of well-referenced survey articles, including [37,7,9].
Hyper-heuristics typically do not generate complete algorithms; rather a component of an algorithm is targeted to be automatically designed by a generate and test approach. Hyper-heuristics have been used, for example, to generate components of evolutionary algorithms such as genetic algorithms and evolutionary programming (e.g., crossover operators [21], mutation operators [27]) and form a large part of the literature in the automated design of algorithms [26].
In the context of this paper, we are using hyper-heuristics in the following manner. We will take seven low-level heuristics, which are chained together randomly to effectively create new heuristics. These new chains of heuristics are then inserted into a standard EA (depicted in the work of [13]) which is used to tackle the problem. This work begins in the next section.

Notation and Formalisation
In this section, the AAG KEP over a certain type of polycyclic group is discussed. This is followed by the notation needed for the implementation of the hyper-heuristic. In this section, the notation broadly follows that of [13] which describes the aforementioned EA.
3.1. Setup of Problem. The AAG KEP [2,1] was posed over polycyclic groups in [18,30], and subsequently attacked in two distinct ways by the work of [13] and [33]. The main details of the protocol, following the exposition given in [13] for a group G = g 1 , g 2 , . . . , g n | R , are as follows.
First, Alice chooses a subgroup A = a 1 , a 2 , . . . , a N ≤ G generated by words a i in the generators of G such that L 1 ≤ l G (a i ) ≤ L 2 . Bob then does similarly to produce a subgroup B = b 1 , b 2 , . . . , d N ≤ G. All of A, B and G are made public. Alice chooses her private key and sends these to Bob. Bob does similarly, producing B −1 a i B for i = 1, . . . , N and sends these to Alice (his private key is B). From the information now exchanged, each individual can now produce the shared key (the commutator) If an adversary wishes to find either the private key A (or equivalently, B), they may intercept the above conjugates either party sends to the other. Thus the problem to be solved may be simply expressed as a subgroup restricted multiple conjugacy problem in the following way. Each instance of this problem is a set of N (frequently twenty) conjugacy posed over a finitely presented platform group G. A solution to the problem means that all the above equations are satisfied. One function of the rewriting rules (relators) R of G is to serve cryptographically as word obfuscators and thus hide the secret word (private key) A.
The problem is posed over polycyclic groups O ⋊ U , where, by [13], O is the additive group of the ring of integers of a number field K and its group of units is U . The number field is written as To recap, the instance parameters associated to this setup are then the number of equations, N , the polynomial f , length L of the private key A in A, and L 1 and L 2 (the lower and upper bounds, respectively, on the lengths of a i in G).
Note that, in this work, we refer to either an exact solution or a candidate solution as appropriate. However, most references will be to candidate solutions but for the sake of brevity will be named solutions. In the context of hyper-heuristics and cryptology, we are using hyper-heuristics to generate candidate solutions to find an exact solution to the cryptographic problem. In this spirit, there are several functions at work in this paper which we need to distinguish between.
3.2. Pertinent Functions. The following functions are recapped from [13]. Let a word w be expressed in the form w = f ir for non-zero e j ∈ Z and f 1 , f 2 , . . ., f n are the generators of the free group F . The length functions associated to the group G are then given by where, as in the above, ω j is the "sum of the lengths of the normal forms of the commutators [g j , g k ] in G for k = 1, . . . , n". That is, the length of w is the sum of the absolute powers (respectively, the weighted absolute powers) of individual generators f i that make up the word w.
The basic EA cost function measures the quality of the candidate solutions produced by the EA and is given by where α is the current EA solution (i.e., the approximation of the private key A). This function has output of the sum of lengths of (normal form) reduced equations E 1 , E 2 , . . . , E N . That is, the length of summand i (where i ∈ {1, . . . , N }) of the cost function is equal to the reduced length of each equation E i after its substitution with α. This function is used to drive search in the EA, since the population ranking is performed with respect to it. The cost used in the EA is broadly the cost vector produced by this basic function, involving the sum c, maximum and mean lengths of summands of c for the weighted and non-weighted length functions, given in [13, p. 8-9]. The global optimum (minimum value) of c is zero; at this value, no fragments of the equations remain and the instance is completely solved.
The heuristic objective function is the metric used to compare the current heuristic chain over the given set of instances (training or testing) and is a vector given in the following order, each element computed over the set of instances: • The mean best cost c over the unsuccessful EA runs; • The negative of the success rate as a proportion of the total number of runs; • From the successful runs, the mean number of generations used by the EA.
That is, this function tells the hyper-heuristic how good a given heuristic chain is. For the validation process the first and second elements of the above objective function are swapped, since we are now more concerned with the success rate. The hyper-heuristic attempts to minimise the above objective function, indicating a successful heuristic chain, as far as possible. Comparison of heuristic objective vectors, produced by two distinct heuristic chains, is performed lexicographically. Note that this function is often termed a fitness function in the evolutionary computation community.

3.3.
Simple Heuristics on the Group. In previous work [13], six simple heuristics were used in an EA to break a proposed key exchange [18]. These are listed in Table 1 as H 1 -H 6 . In this paper, we are building new heuristic chains to inject into an EA. We have also added a seventh heuristic H 7 (swap) to this set of heuristics. Evolutionary operators may be otherwise thought of as heuristic on group elements w = f  [13].
Heuristic H 7 (swap) is designed to assist when symbols are in the 'wrong place' in a word w, swapping two symbols at random positions and potentially trigger subsequent cancellation of symbols (and, thus, an EA cost reduction). Essentially all heuristics in the above table are random, with operations performed with random words or generators at random positions. The above is not a list of minimal heuristics: it is noted, for example, that heuristic H 1 can be achieved through repeated application of H 2 , as can H 5 and H 6 (which were specialised to the conjugacy problem).
3.4. EA Parameter Settings. The EA parameters are given in Table 2. The parameters were produced by copious experimentation, and scaling down the parameters in [13] to approximately one quarter of their original values to achieve an effective set of EA parameters. This increases the speed of the EA. The original population size was 100. We do not claim optimality for these parameter settings.
All heuristics are performed by firstly choosing a solution at random from the top 40% of the population (by cost). The selection operator is elitist; i.e., if n s solutions are to be selected then the first is the solution of minimum cost with the remaining n s − 1 solutions selected from the top 40% (i.e., after ranking by minimum cost) of the population at random. All random choices are made uniformly (as in [13]).
It was chosen to have four solutions from each generation created by a heuristic chain. Testing this alongside the remaining nineteen solutions in each generation created by the same heuristic chain, it was found that this choice of four solutions turned out to be more   Table 2 as it does not operate in isolation (as part of the EA of [13]), only in the context of the other six heuristics. Crossover is performed by choosing two words (from the top 40% of the population) w 1 , w 2 . Choosing two random positive integers r 1 ≤ ℓ(w 1 ), r 2 ≤ ℓ(w 2 ), one of the two words is output [13] (where w[s . . . t] is the subword between and including positions s and t of the word w). The next section details the operation of the hyper-heuristic and the experimental setup.

4.1.
Hyper-heuristic Implementation. As above, our objective is to create a hyperheuristic that, given the AAG problem (Section 3.1) and a polycyclic group as previously stated, generates an algorithm which solves an acceptable number of instances of the problem. The term "acceptable" in this instance is taken to mean a higher number of instances than the original EA of [13] with H 2 inserted (cf. 'H 2 ' column of Tables 3-5 given later). To recap, our hyper-heuristic controls the injection of heuristic chains into an EA in order to determine the best heuristic chain. The initial heuristic chain can be the best heuristic known (i.e., H 2 ) or a random chain. If the initial heuristic chain is random, then the heuristic generator is called. This random chain is set to a random length between 2 and 10. We now present the core algorithmic contribution and how these algorithms are related. Algorithm 1 tests heuristic chains; the parameters used are H max = 20, N train = 15, N test = 50 and N valid = 50.
The EA referred to in Algorithm 1 is the EA of [13] run on an input collection of instances. The EA parameter values are reduced as in Table 2. Note also that there is a probability, p h , that the current chain will be accepted if it does not perform better than the best chain found (on the training instances) so far.

Algorithm 1 Heuristic generation and testing methodology
Input: Group G; parameters: number of training instances N train ; number of testing instances N test ; number of validation instances N valid ; initial heuristic chain; maximum number, C max , of heuristics to generate. Output: Runtime statistics; best heuristic chain found. Call heuristic chain generator (Algorithm 2), giving chain C i . if M i,train < M train * then ⊲ Better chain found for training set; test chain on the testing set.

13:
Execute the EA, with injected chain C i on all test instances.
⊲ Get metric M i,test .

14:
If M 1,test does not exist then execute the EA with injected chain C 1 on all testing instances. Let M * test ← M 1,test .

15:
if M i,test < M * test then 16: i * ← i ⊲ A better chain has been found on the testing set. Compare chain C p with chain C 1 on the validation set of instances via execution of the EA with injected chains (i) C p and (ii) C 1 .

25:
return timeout and C i * . End.

26: end if
The group definition of G is a piece of code which simply defines the group, its instance parameters over which the instance will be computed, and the cost functions. Next is the heuristic chain generator, Algorithm 2. If the initial heuristic chain is a random chain, then this random chain is created by appending a given number (here, a random number between two and ten) of simple heuristics randomly chosen from H 1 -H 7 . Otherwise, the heuristic generator (Algorithm 2) generates new chains of simple heuristics from the chain given by the current step of Algorithm 1 by a process of insertion, deletion or substitution at random positions in the heuristic chain. The heuristic is then returned in the form of a series of commands written into a file read by the EA when it is time to execute the chain. Chains not allowed include the set of all chains of the form H k 3 for some k > 0 (i.e., a chain consisting solely of deletions) or chains that are identical to those already examined in the hyper-heuristic run. We let p i = p s = 0.4 and p d = 0.2.

Algorithm 2 Heuristic chain generator
Input: Set of heuristic chains C = {C 1 , . . . , C k } already examined. Output: New heuristic chain C ′ . C ′ ← C i , the heuristic chain given by Algorithm 1.
Choose operation at random subject to probabilities p i , p s , p d (of insertion, substitution and deletion respectively). Perform chosen operation on C ′ with a simple heuristic chosen at random from H 1 -H 7 (if not deletion). 2: end while 3: return heuristic chain C ′ . End.
An instance generator is also used. This creates instances at random, with random number seed based upon the computer clock. Included are instance parameters (N , ℓ, L 1 , L 2 , G -Section 3.1), a random word function, and the cost functions as in Section 3.1.

4.2.
Details of Implementation. During early development of the hyper-heuristic, issues with speed were noted. A number of measures were put into place to increase processing speed. Firstly, an EA population size of 25 was used (with one slave processor being assigned to each population member). In addition, smaller EA iteration limits than [13] were set. On the training and testing instances, 'maxsteps' is set to 50 for degrees 1, 2 and 3 of the polynomial f defining the number field K (which, of course, defines G), and 100 for degrees 5 and 7. On the validation instances, 'maxsteps' is set to 1250 for degrees 1, 2, and 3, and 2500 for degrees 5 and 7); this had the effect of a small decrease in the success rate of the EA compared to that of [13] (and so the results are not directly comparable). The polynomials f used for the above degrees were x − 1, being consistent with those of [18,13].
All instances are run with an initial word length of 10 generators (in EA generation 1) to avoid bias to the insertion operators which would occur with an initial length of 1 (for example). No instances of degree 9 or above were attempted due to the time complexity of computation and reduction of words in the groups concerned (for more details the interested reader should consult [13]). The number of instances used in each phase of the hyperheuristic were fifteen (training), fifty (testing) and fifty (validation). The number of heuristic chains run by the hyper-heuristic is H max = 20.
All experiments were run on a high-performance cluster containing Intel Xeon E5620 CPU processors, each running at 2.40 GHz. The hyper-heuristic was implemented in the GAP language [41], and the Polycyclic [16] package for GAP was used for computation with polycyclic group elements. The ParGAP [11] package was also used to handle MPI communications between processors. Due to the domain, the popular hyper-heuristic packages such as Hyflex [36] are not suitable for use because we are using GAP, a specialist group theory language. As above, each experiment was run on 26 cores (1 'master' core to control, and 25 'slave' cores, one for each EA population member). The code referred to in this section is available from https://github.com/MJCraven/Hyperheuristic_group, with the instances available at [14].

Experimental Results
In this section, hyper-heuristic experiments are run, varying initial input and instance parameters. To recap, the EA with the heuristic chains injected is then executed on the previously detailed fifteen training instances. If the performance improves over that of previous heuristic chains then it is run with the testing set (fifty random instances). If the performance over this set improves over that of previous heuristic chains then the current chain is assigned as the new best chain. This is continued until the end of the run, after which the chain is validated over the validation set of (a distinct set of) fifty random instances. For a single hyper-heuristic run, for each of twenty heuristic chains and, assuming at least one better heuristic chain is found, around 500 problem instances are run are total.
5.1. The Best Simple Heuristic. A LBA attack (i.e., a hillclimber) was created for each simple heuristic. These attacks were run on a selection of random instances, with the percentage of successful runs as 1.7%, 51.7%, 0%, 0%, 1.7% and 1.7% respectively for H 1 -H 6 . This indicates that a heuristic on its own, unless it builds appropriate solutions, is unlikely to be successful for a large set of random instances. In this case, H 2 seems to be more successful since it builds solutions by gradually increasing solution length. Hence, the hyper-heuristic is initialised with the chain composed solely of a single execution of H 2 .

5.2.
Observations on the Evolution to Build Heuristic Chains. The following details are presented for each experiment. The first column of Tables 3-5 is the degree of the polynomial f , one of the main instance parameters. The second column is the validation set metric (success rate, mean cost from unsuccessful runs, mean number of generations from successful runs to solve the instance) from the EA with the best known heuristic chain (insertion -H 2 ). The third column is the validation set metric from the EA with best injected heuristic chain found, followed by the iteration on which the best heuristic chain was found. The fifth column gives the chain, where H k i refers to k repeated executions of heuristic H i . The last column is the number of hyper-heuristic runs it took to find the best heuristic; unsuccessful runs were those for which either either no better chain than H 2 was found, or, more commonly a better chain (on the grounds of testing and training performance) was found but performed worse than H 2 on the validation set.  Table 3: Comparison of results on fifty validation instances. The parameters used were N = 20, L 1 = 10, L 2 = 13, L = 5, as in Section 4.1. Those instances used by the present work are taken from the same distributions as those used by [13].     Tables 3-5, it is clear that the approach enables the creation of more successful heuristic chains than the EA of [13]. Since the hyper-heuristic relies on a stochastic algorithm (the EA), some runs are more successful than others. For example, some hyper-heuristic runs may uncover several chains proving more successful than the initial heuristic chain (e.g., Table 4 with d = 7). On the other hand, however, some hyper-heuristic runs may discover no chains at all that are more effective than the initial heuristic (recall this information is recorded in the final column of Tables 3-5). This latter conclusion seems to be more common for d = 1 where a high percentage of instances are solved by the EA with the initial heuristic H 2 .
Note, in addition, that for many small d (e.g., d = 1 or d = 2), all problem instances are solved by the EA with the injected simple heuristic H 2 . Thus, the only option to improve performance, in the sense it is measured in this work, is to solve those instances in a smaller mean number of generations. For example, the degree d = 1 on Table 3, shows that 100% of problem instances are solved by the initial heuristic in a mean of 7.88 generations. This is improved marginally, solving all instances with a mean of 7.62 generations by the later chain H 2 H 1 H 4 . This suggests that for larger d ≥ 5, for example, more 'room for improvement' is possible by the hyper-heuristic.
As is often the case with EAs and hyper-heuristics, high performance computing is an advantage due to the large amounts of time required to solve a large number of instances. The parameter with the largest influence is the degree d (see [13] for further details). All the above results exhibit an improvement over the results of [13] (and so [18]). By the above results, there do not seem to be patterns formed in the best heuristic found and so it is probable that there do not exist chains that work better for one particular degree.

5.4.
Characteristics of the Framework. Through experience, and by the above analysis, some characteristics of the framework (in the context of the AAG problem and polycyclic groups defined by a number field) are observed.
First, to the best of the authors' knowledge, random instances have not been classified in terms of difficulty. For example, an EA that solved instance A of a problem in an average of 3000 generations (over, say, ten repetitions) may well solve instance B, with the same instance parameters, in 100 generations. That is, for a given set of instance parameters (N , L, L 1 and L 2 ) there is a large variability in difficulty for randomly-generated problems. In the experience of the authors, this effect seems to worsen for higher degrees. Recall that L is the key length in the subgroup A ≤ G. Due to the lengths L 1 and L 2 (in G) of elements in A ≤ G the length of the key may be rather large after mapping to its image in G. Combined with the relator lengths in the presentations of the groups, this makes problem hardness difficult to classify. This imposes a constraint on the hyper-heuristic, since a consistent measure of performance over a small number of instances is difficult to obtain. Hence, a relatively large number of instances are needed, at least on the testing and validation.
Combinatorial optimisation problems typically have an objective function where, when a small change is made to the input, there is a correspondingly small change in the output value. This is reflected in the so-called "deep-valley hypothesis" [25]. This property is often assumed when metaheuristics are applied, as metaheuristics typically make a small change to the solution in order to bring about a small improvement in the objective values. However, the objective function in this paper is, because of the group presentations used, unlikely to display the deep-valley property and this means that the feedback provided by a more rugged "landscape" does not guide the search as efficiently. This is manifested by a heuristic chain having a low success rate on the training instances but also having a high success rate on the testing instances, or vice-versa.
Hyper-heuristics may be applied to continuous optimisation problems, where real-valued feedback from the objective function may guide the search process. The situation is more complex for the current optimiser since the objective value is discrete: that is, the optimiser has either solved a given instance or it has not. This work goes some way to ameliorate this issue by including the least EA cost reached as part of the performance metric. The hyper-heuristic is hill-climbing in the space of heuristic chains. In the next section, the paper is concluded.

Concluding remarks
This work exhibits the automatic generation of novel heuristic chains to improve an existing EA which has previously been demonstrated to effectively attack a given KEP. That is, this approach is a framework for learning (i.e., generating and testing in a hyper-heuristics setting) cryptanalytic attacks. We are not proposing a single algorithm to tackle this problem as many previous papers have done. Our stance is distinctly different: we propose a framework to automatically generate algorithms for the attack. One of the advantages of this approach is that it automates the rather mechanical task of generating new attack algorithms for which there are often few design principles to guide us. This is thus an ideal match for a generative hyper-heuristics approach where novel algorithms can be freely and easily generated. This avoids the task of manually generating algorithms for which we often have scant means of evaluating their effectiveness other than actually testing them out on problems of interest. An evaluation metric is all a hyper-heuristic need to produce new heuristic chains. This article makes the following key contributions to the field: (1) the proposal that hyper-heuristics is a suitable framework in which to generate and test heuristic chains to break a KEP. (2) the implementation and application of a hyper-heuristic to automatically build chains of simple heuristics to break a given KEP. (3) the demonstration that chains of simple heuristics trained on one set of problem instances can then generalise to solve a second independent set of problem instances.
In the realms of further work, we would like to generalise the hyper-heuristic framework to work towards proving or disproving security of proposed group-theoretic KEPs. The framework exhibited is expandable, enabling other groups and group-theoretic problems to be used. Possible other uses could be to show that some proposed KEPs prove resistant to LBA attacks (that is, KEPs for which the hyper-heuristic does not yield high-performing heuristic chains after many runs). For example, would the conjugacy search problem in finitely generated metabelian groups or generalised metabelian Baumslag-Solitar groups [24] be breakable (the authors in a preprint suggest that LBA algorithms are ineffective) with the approach? Similarly, would the n-root and subgroup membership search problems in polycyclic groups [23], or the conjugacy problem and hidden subgroup problem in Engel groups [31], be breakable? Further, by [22], there are open questions related to complexity of some problems in polycyclic groups (power conjugacy problem, geodesic length problem, n-root problem, or the subgroup membership search problem) or other problems that may be used in KEPs. The complexity of the above problems may be analysed using the hyperheuristic framework, potentially giving further information about the exact solutions of these problems (if they exist).
A final area is of determining the effectiveness of using the hyper-heuristic for very high parameter settings across general group-theoretic problems. In this sense, the hyperheuristic approach would prove very slow (since, by the experience of the authors, the vast majority of runtime tends to spent evaluating the EA cost function) and thus research into surrogate cost function approaches would be of interest [6].
It is hoped this work may encourage machine learning and hyper-heuristic approaches in cryptology. This may have an impact upon post-quantum cryptography, with such problems as the hidden subgroup problem ripe for the attack [28]. If the approach is effective, then it confirms a given problem combined with a platform group is breakable, whereas if it is not effective then this may provide further evidence to validate as "quantum-safe" proposed cryptographic structures.