Making a rapid unpredictable decision from choices of unequal value is a common control task. When the cost of predictability can be modelled as a penalty hidden under a single option by an intelligent adversary, then an optimal strategy can be found efficiently in steps using an approach described by Sakaguchi for a zero-sum hide-search game. In this work, we extend this to two games with multiple parallel predictions, either coordinated or drawn independently from the optimal distribution, both of which can be solved with the same scaling. An open-source code is provided online at https://github.com/pec27/rams.
In scenarios that involve adversarial behaviour, it often pays to act in a manner that is not entirely predictable. Given a zero-sum competition with two adversaries and known payoffs (or expected payoffs) for every combination of finite choices, minimax probabilities can be found via linear programming .
For potential choices and predictions of that choice, the general approach requires us to enumerate all outcomes in the normal form. Solving the corresponding linear program can be performed via interior point methods in almost as little as operations [2,3]. Parametric models for the payoffs only improve the computational scaling if the resultant linear program is amenable to a faster algorithm.
Real-time control applications frequently seek superior scalings. A common problem in artificial intelligence in video games (Game AI) is for simulated agents in an adversarial scenario to pick their next destination tactically dynamically sampled locations. Choices are calculated from instantaneous measures of the local environment and positions of neighbouring agents, and must both emulate rational unpredictable behaviour and be computable for all agents within a few milliseconds [4,5].
One model that satisfies this requirement is the zero-sum hide-search game, following the approach of Sakaguchi . Here the choice is that of a Hider who is rewarded for choosing site with payoff , excepting that the Searcher (predictor) has also chosen , in which case an additional strictly positive penalty is applied. This game has normal form decomposable into a column-independent matrix minus a positive-definite diagonal matrix, i.e.
is the payoff for the Hider. This imposes sufficient structure on the off-diagonal terms (the predictions are exact) that minimax strategies can be computed in steps, with bounds due to the ordering of the .
For general extensions this scaling is sacrificed, for example in security games where the requirement of exact cover (prediction) is dropped (e.g. ), or multiple non-interchangeable search resources are introduced (e.g. ), both of which require the more general linear program to be solved. Allowing additional stages where sequential choices are restricted to neighbours becomes a search-evasion game [13,14], and whilst they have received considerable attention even some of the most trivial search games remain unsolved .
The contribution of this article is to extend (1) to two games with multiple parallel searches where optimal strategies, and their sampling, can be computed with the same algorithmic scaling. In the first, the Searcher is allowed to coordinate searches, with the penalty applied once if any of the searches succeed. The additional parameter modifies the value of the game and its strategies in a way that is not a simple re-scaling of the and . Drawing samples without replacement from the marginal distribution is performed via the method of ref. . In the second “non-coordinated” variation, the Searcher draws independent search sites (i.e. with replacement) from an identical optimised distribution. An interesting aspect of this variation is that it introduces non-linearity, and follows a method resembling the resource allocation problems of refs  and .
The structure of this article is as follows. In Section 2, we provide some basic definitions, along with a proof of the single prediction case, and illustrate with a simple example. In Section 3, we extend this to multiple searches, where in Section 3.1, we treat the case of multiple coordinated predictions, and in Section 3.2 multiple predictions drawn independently from an identical distribution. We summarise in Section 4.
2 Basic definitions and the single search case
Let us follow the convention that the receiver of payoff is named the Hider and the Searcher, where the Hider and Searcher choose sites , respectively, their respective probabilities and following the usual conditions that they are positive and sum to unity. The expected payoff of the game for the Hider is
and linearity w.r.t. and individually guarantees minimax, .
(Single prediction) The minimax strategy for this game has expected payoff
The minimax strategy for the Searcher is unique and has probability
which is, in general, mixed.
The minimax strategy for the Hider is unique unless . In either the unique or degenerate case, however, the minimax strategies are those, and only those, which satisfy
with the normalisation fixed by summation of the to unity.
This is a slight generalisation of results described in [8, 8.1 “Scud Hunt”] and [9, 1.7.7], which describe this solution with a choice of zero for the case of (5). Special cases such as (assuming and ) or (also for ) are treated in refs  and [10, Part II 3.7.17], respectively.
A useful result for solving such problems, which will be extensively used here, is the Gibbs lemma. This is a necessary first-order condition to find a maximum over , with the constraint that the must be non-negative. For zero-sum games this can be written as:
for and , respectively. We now describe the solution to Theorem 2.1.
The set of supported sites of the Searcher are contained in those of the Hider, i.e.
From the Gibbs lemma w.r.t. we have
i.e. is the maximum value of . Since the are strictly positive, and at least some of the must be strictly positive (in order to sum to unity), then this maximum must be strictly positive. The first case thus gives
i.e. strictly positive ∀i s.t. yi > 0. Conversely, .□
The supported sites for the Searcher are contiguous over the largest ,
i.e. the Searcher will only visit those sites with rewards above some .
Assume this was not the case, i.e. there is some s.t. and . From the Gibbs lemma w.r.t. , we have
However, since (our assumption), and strictly positive, we have a contradiction.□
For convenience let us define as the set of indices of elements larger than or equal to ,
and a measure of these defined as the sums of harmonic ,
The minimax strategy for y must correspond to some , where
From Lemma 2.3 we know that , and since at least some of non-zero, we can index the non-zero using
to re-write the index set for non-zero using (11),
For at minimax, and any , we have
From (10) is an upper bound for the , i.e.
Now decompose into
where we have defined as the ratio of measures
Since at minimax we have , then (13) can be re-written
at minimax .
Using the same decomposition as (17) we have
Since all the probabilities must be positive, it follows from (13) that
and so we have
Now since and , which gives
The strategy for the Hider is at minimax iff it is of the form in (5),
with the normalisation K fixed by summation to unity.
The expected payoff of any of these strategies is .
For (5) to describe all the minimax for the Hider we must show it is both necessary and sufficient.
First let us show that it is necessary. To do this we need to show that, given the Searcher’s optimal strategy, the Hider cannot improve its payoff.
If we fix Searcher’s probabilities to be its optimal strategy, the payoff for the Hider can be written as:
and by inspection we see that, since the probabilities must sum to unity, the Hider can maximise its payoff by transferring any probability from the sites to those with , i.e. . In combination with (7) we see that these conditions are necessary. N.B. the payoff in this case is given by
We now need to check that these are sufficient, i.e. if we fix the Hider’s strategy to (5), we must check that the Searcher cannot improve its payoff by deviating its strategy. To this end, let us consider deviations to the minimax strategy of
with the condition that the must sum to zero and that the resulting probability must be positive. This latter condition implies .
The payoff for the Searcher of such a deviation can thus be written as:
It thus follows that increases in the payoff correspond to positive for and (by summation to zero) corresponding negative for . Since this latter set , however, then those must be positive and we have a contradiction. Thus, there is no better strategy for the Searcher, and those strategies in (5) are optimal.□
2.1 Remarks and example
The reason this game is soluble by hand is that the dependence of the payoff on the site choice of the Searcher is restricted entirely to whether it chooses the same location as the Hider. In terms of algorithm, we see from Theorem 2.1 that the solution is described by a maximum over cumulative sums. These can be performed in operations, though we must first rank the in steps, making this the asymptotic scaling.
A special case that may be of interest is when the Hider suffers a fixed (still strictly positive) penalty for choosing the same site as the Searcher, i.e. . In this case, the Hider has probabilities uniform over , and zero for . The Searcher o.t.o.h has a probability structure very closely resembling , since (4) gives .
(Alice and Bob) Alice and Bob play a game where they each pick a number between 1 and 10. If they choose different numbers, then Bob gives Alice the value of her number in dollars; however, if they choose the same, Alice must give Bob 10 dollars. This corresponds to and . By (3) we have , and the smallest number that either player should pick is 5. The solution is for Alice to pick with probability
and Bob to pick with probability
which is plotted in Figure 1.
3 Extensions to multiple searches
Let us now consider the case where the Searcher can pick multiple sites, applying a penalty if any of the predictions are exactly correct, and the penalty is only applied once (the case where the penalty is linear in the number of correct predictions can be treated as a re-scaling of the problem in Section 2).
Let us assume there are searches at sites with . With only a slight abuse of notation we write the payoff for the Hider,
and we additionally define
a lower bound on the value of the game if the Hider were always caught.
Two things are immediately apparent, that optimal choices for the Searcher are disjoint, and that the expectation depends only on the marginal probabilities of a search of site (i.e. it is independent of permutations). This “coordinated” problem is similar to combinatorial games such as [19, chapter 2] and we provide a solution in Section 3.1.
In Section 3.2, a second problem studied is where each search is i.i.d., with distribution set by the Searcher. This is a separable non-linear problem, similar to the resource allocation problem of ref. .
Algorithms to solve these cases are composed of steps commonly described elsewhere (sampling a marginal distribution without replacement, solving piecewise differentiable monotonic functions etc.); however, testing an implementation for computational efficiency and for errors in special cases is not always trivial, therefore, we provide one in the following link: https://github.com/pec27/rams.
3.1 Y coordinated searches
For coordinated searches let us define is the marginal (or inclusion) probability of a search at site . The expected payoff for the Hider can then be written as:
with the requirement that and .
The value of this game to the Hider is
where we have defined
The optimal strategies for the Searcher are those, and only those, with inclusion probabilities for the searches that satisfy
with the lower bound replaced by equality when .
The probabilities for the Hider can be classified into two cases depending on which of or is greater, and these are described as follows.
requiring the sum to unity. In the case summation to unity is fixed via the normalisation factor K, analogous to (5). In the case note since the maximal value.
The procedure of this proof is to first try to solve the problem with one constraint removed and test if this solution also satisfies the removed constraint. In the case that it does not, we guess the solution set and prove stability.
Let us first consider the problem where we remove the constraint that . The solution to this is analogous to the problem with only one Searcher, with no upper bound on the except that indirectly imposed by the normalisation . We denote the optimal solution for the inclusion probabilities for the searches of this reduced problem as
with defined as in (32) (taking note of in the numerator there to account for the normalisation).
We now check whether this satisfies the constraint . Since , rearrangement of (35) gives us
and this is true for all iff
with as defined in (29).
The corresponding probability for the Hider is
with chosen to fix summation to unity, which is the upper case of (34).
Let us now consider the solution when this is violated.
3.1.2 Solution for
In this case, the Hider can guarantee a return of by picking a site with maximal . The strategy for the Searcher is to choose all of these sites with probability one, degenerate over the choice of the remainder.
Formally, let us begin with an ansatz for the inclusion probability distribution for the searches. We consider
chosen s.t. . Note this is always possible since and by substitution
What is the best response to this probability distribution? Since the problem is linear we should maximise our probability at the sites of maximal expected value .
Note and the LHS achieves its maximum at and maximal, which occurs at . The best response strategy to is thus to distribute all probability over these maximal values, i.e.
and the expected value of this strategy is (since there is always a search with probability 1 at each site the Hider visits).
We now ask if there is a better strategy for the Searcher, and we can trivially see this is false, since it already catches the Hider every time (and there is no direct payoff dependency on the sites of the searches). Having showed these strategies satisfy minimax, let us proceed to verify that no other strategies do.
We check that for any other strategy for the Searcher, the Hider has a better response, and correspondingly for any other strategy for the Hider, the Searcher has a better response. Beginning with the Searcher, suppose some s.t.
i.e. some site where is strictly positive we choose a probability outside the interval in (39). We see this is only possible for . Then let us set and the expected payoff for the Hider in , i.e. and we are done.
Correspondingly for the Hider, we try for . Note, however, that since then the Searcher has probability to ‘spare,’ i.e. every site can be made to have value (for the Hider) without saturation of the probabilities (in the sense of 40). The Searcher can trivially add another to this site, making the expected value and the payoff for the Hider is reduced.
3.1.3 Algorithmic solution and remarks
Given the explicit formulae for the value and probabilities in Theorem 3.1, the only operation that requires a non-trivial algorithm is that of choosing sites from without replacement given the . Such an algorithm is given by the splitting method of ref. , which recursively performs a (weighted) random decision between reducing the number of sites by 1, or performing all the remaining samples from a uniform distribution, dependent upon the largest and smallest (in the remaining sites). This process is guaranteed to complete in at most steps, and as such we are bounded by ranking the , i.e. in steps.
3.2 Non-coordinated searches
In this section, we apply the additional restriction to the game in (28) that the Searcher chooses sites via independent draws from distribution (with replacement). The expected payoff for the Hider is thus
with constraints that the and be non-negative and sum to unity. The term appears as the probability that zero searches are performed on site (independence).
Minimax still applies to this problem since it is a sum of concave–convex functions (for ) (see e.g. ref. ). Some remarks on the algorithmic solution to the following theorem are given in Section 3.2.2.
The optimal distribution for the searches is given by
to choose site , with being the value of this game for the Hider. is the single root of the equation
The optimal strategy/strategies for the Hider depends on whether there is a pure solution for the Searcher, i.e. whether . These are explicitly
with (in the lower case) chosen to fix the sum of to unity.
The Gibbs lemma w.r.t. gives
This gives us the following corollaries.
Combining cases of (47) we have . Since all terms on the RHS non-negative we have .□
For , . This immediately follows from the lower case of (47) (we know and strictly positive).
First let us show . If , then at any where we have from Corollary 3.4. For the upper case of (47) for the LHS to be zero we must have either or . However, since must be non-zero for some (in order to sum to unity), then there must be at least one s.t. .
Now let us show . At this , and so the upper case of (47) must apply. Substitution of (and knowing ) we have .□
(Pure solutions are equal).
By summation of the to unity we have either 0 or 1, and there is only one s.t. . From Corollary 3.4 we have at , and since the must also sum to unity we must have , and so .□
(Searcher does not choose sites the Hider does not visit)
The Gibbs lemma w.r.t. gives
where the constant on the RHS has been noted as to fulfil the sum in (43).
Suppose otherwise, i.e. . Substitution into the combined cases of (48) gives and we have a contradiction.□
can be written
defined over (which by inspection is equivalent to 44).
The lower case follows immediately from Corollary 3.9.
Taking from (44) to be functions of , i.e. , then they are monotonically decreasing, and strictly decreasing where .
By inspection they are monotonically decreasing. over the domain . Over the open interval we have and its derivative is
strictly negative for , , hence strictly decreasing over .□
The root of lies in the interval , i.e. (45).
By substitution of into the combined case of (48) we have
By substitution of into (44) we see
and since the sum is a monotonically decreasing function, is a strict upper bound.□
is the unique root of .
By minimax we know some solution exists in Lemma 3.12. The arguments of the sum are monotonically decreasing, and to match the RHS there must at least be some for which . By Lemma 3.11, it is strictly decreasing here and consequently the sum also. Hence, the root is unique.□
For the case , the solutions for the Hider are those and only those which satisfy (46 lower case), i.e.
For sufficiency, we can substitute the explicit expression for back into (43) and find the optimal . Since that is a convex problem, however, we need to only verify that substitution of (44) satisfies the first-derivative conditions of (48) with value . This completes the proof of Theorem 3.2.□
3.2.2 Algorithmic solution and remarks
In the left panel of Figure 2 we illustrate root-finding for the monotonic piecewise-analytic function in (45) that is applied in the companion code. For the left-most point , marked by the open circle, the derivative is singular (though the sum itself is finite). For all intervals right of this we have bounded derivative, and since they are convex and monotonically decreasing the Newton method from the left point has guaranteed convergence. For the left-most interval a binary search is performed until we have a new left-bound and the above can be applied.
In terms of algorithmic complexity, solving for the root of a convex strictly monotonically decreasing function over a finite interval for a fixed precision can be considered a constant number of evaluations of and its derivative (albeit typically ). In our case, is the sum of up to terms and thus we have an bound. The combined algorithm (including sorting the ) is thus .
In the right panel of Figure 2, we illustrate the behaviour of the hider probabilities as a function of the reward and the number of searches . In all cases, the hider probability is non-zero only for rewards above the value of the game. As the number of searches increases (keeping the fixed), the value of the game falls as the hider is forced to pick less rewarding sites. For the single search case ( ), the hider probability is independent of above the value of the game, whilst at a positive dependence is acquired. This is a consequence of the non-coordination, since for the coordinated version a constant value is always possible (34).
A novel feature of this solution is that it has an analytic continuation to continuous , with and computed in the usual way. This has no meaningful interpretation (the Searcher can only perform an integral number of searches) but does provide some intuition for the behaviour.
In this work, we addressed the decision problem of making a rapid unpredictable choice from unequal options using a hide-search game approach. We extend the single search game to include multiple simultaneous searches, both with coordination and without. The game with coordinated searches, is solved in terms of marginal probabilities and we give explicit solutions in all cases. For the game with multiple non-coordinated searches, we describe the value implicitly as the single root of a monotonically decreasing piecewise-convex function. Unlike more general two-player zero-sum games, these permit algorithms to compute and sample their mixed strategies in steps. We provide a complete open-source implementation of all three algorithms.
The author would like to thank Thomas S. Ferguson and Annika Lang for reading drafts of this article and their comments and support, and to thank Ali Khan and Graeme Leese for early discussions of the problem in Example 2.10. PEC is employed at Mercuna Developments, an AI middleware company registered in Scotland, number SC545088.
Conflict of interest: Author states no conflict of interest.
Data availability statement: Data sharing is not applicable to this article as no datasets were generated or analysed during the current study.
 J. von Neumann, O. Morgenstern, and A. Rubinstein, Theory of Games and Economic Behavior (60th Anniversary Commemorative Edition), Princeton, NJ, Princeton University Press, 1944. ISBN 9780691130613. Search in Google Scholar
 P. M. Vaidya, “Speeding-up linear programming using fast matrix multiplication,” in: Proceedings of the 30th Annual Symposium on Foundations of Computer Science, SFCS ’89, IEEE Computer Society, USA, 1989, pp. 332–337. ISBN 0818619821, 10.1109/SFCS.1989.63499. Search in Google Scholar
 S. Jiang, Z. Song, O. Weinstein, and H. Zhang, Faster Dynamic Matrix Inverse for Faster LPs. arXiv e-prints, art. arXiv:2004.07470, April 2020. Search in Google Scholar
 E. Johnson, “Guide to effective auto-generated spatial queries,” in: Game AI Pro 3, chapter 26, S. Rabin, Ed., Boca Raton, CRC Press, 2017, pp. 309–325. 10.4324/9781315151700-26Search in Google Scholar
 M. Sakaguchi, “Two-sided search games,” J. Operat. Res. Soc. Japan, vol. 16, no. 4, pp. 207–225, Dec 1973. Search in Google Scholar
 M. Dresher, Games of Strategy: Theory and Applications. Englewood Cliffs, NJ, Prentice-Hall, 1961. Search in Google Scholar
 T. S. Ferguson, Game Theory, Second edition. Hackensack, NJ, World Scientific, 2014. Search in Google Scholar
 C. Kiekintveld, M. Jain, J. Tsai, J. Pita, F. Ordóñez, and M. Tambe, “Computing optimal randomized resource allocations for massive security games,” in: Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), vol. 1, pp. 689–696, 2009, ISBN 9780981738161, 10.5555/1558013.1558108. Search in Google Scholar
 J. Letchford and V. Conitzer, “Solving security games on graphs via marginal probabilities,” Proc. AAAI Conference Artif. Intell., vol 27, no. 1, pp. 591–597, June 2013. 10.1609/aaai.v27i1.8688Search in Google Scholar
 S. Alpern, R. Fokkink, R. Lindelauf, and G.-J. Olsder, “The ‘Princess and Monster’ game on an interval,” SIAM J. Control Optimization, vol. 47, no. 3, pp. 1178–1190, 2008. 10.1137/060672054Search in Google Scholar
 J. C. Deville and Y. Tillé, “Unequal probability sampling without replacement through a splitting method,” Biometrika, vol. 85, pp. 89–101, March 1998, 10.1093/biomet/85.1.89. Search in Google Scholar
 J. Croucher, “Application of the fundamental theorem of games to an example concerning antiballistic missile defense,” Naval Res. Logistics Quarter., vol. 22, pp. 197–203, March 1975, 10.1002/NAV.3800220117. Search in Google Scholar
 V. J. Baston and A. Y. Garnaev, “A search game with a protector,” Naval Res. Logistics, vol. 47, no. 2, pp. 85–96, 2000. https://eprints.soton.ac.uk/29734/. 10.1002/(SICI)1520-6750(200003)47:2<85::AID-NAV1>3.0.CO;2-CSearch in Google Scholar
 W. H. Ruckle, Geometric Games and their Applications, Pitman, 1983. Search in Google Scholar
© 2022 Peter E. Creasey, published by De Gruyter
This work is licensed under the Creative Commons Attribution 4.0 International License.