Eﬃcient Hierarchical Multi-Object Segmentation in Layered Graphs

We propose a novel eﬃcient seed-based method for the multi-object segmentation of images based on graphs, named Hierarchical Layered Oriented Image Foresting Transform (HLOIFT). It uses a tree of the relations between the image objects, represented by a node. Each tree node may contain diﬀerent individual high-level priors and deﬁnes a weighted digraph, named as layer . The layer graphs are then integrated into a hierarchical graph, considering the hierarchical relations of inclusion and exclusion. A single energy optimization is performed in the hierarchical layered weighted digraph leading to globally optimal results satisfying all the high-level priors. The experimental evaluations of HLOIFT, on medical, natural, and synthetic images, indicate promising results comparable to the state-of-the-art methods, but with lower computational complexity. Compared to the hierarchical segmentation by the min-cut/max-ﬂow algorithm, our approach is less restrictive, leading to globally optimal results in more general scenarios, and has a better running time.


Introduction
Image segmentation is one of the most fundamental and challenging problems in image processing and computer vision. It is the task of partitioning an input image into objects of interest by assigning distinct labels to their composing pixels. Hence, image segmentation can be interpreted as a classification problem 5 at the pixel level. Not surprisingly, machine learning-based methods are among the most prominent solutions to the segmentation problem, especially after the advent of deep learning techniques [1]. In this context, a state-of-the-art method in grabcut-style [2] for interactive segmentation with minimal user involvement is Deep Extreme Cut [3]. It considers extreme points (left-most, right-most, top 10 and bottom pixels at the object boundary) as 4 clicks for each object, and the CNN produces the segmented masks. This type of user input is usually a good choice for single-shot algorithms, since from the 4 indicated points we can infer object's bounding box, and consequently, all points outside the box belong to background. At first glance, this seems to be better than other types of user 15 inputs, since arbitrary points do not allow us to draw any further conclusions. However, it is not possible in general to get a segmentation result with ground truth quality using only a 4 points selection, since considerable residual errors all around the boundary are quite common (Figures 1a and c). Moreover, user input in grabcut-style is usually not appropriate to perform further corrections, such 20 that a different type of corrective action is usually employed in approaches like GrabCut [2], resulting in a less user-friendly interface due to the multiple types of input. On the other hand, deep learning techniques require a lot of annotated data for training the network and for many applications the availability of gold standard is quite limited. 25 In this work, we focus on the interpretation of the image segmentation as a graph partition problem subject to hard constraints, given by seed pixels selected in the image domain, such that a refined segmentation can be obtained with an arbitrary level of precision, leading to a high-quality ground-truth data. The multidimensional images, they allow several customizations, such as the usage of different image elements as its nodes (e.g., superpixels [5]), the learning of graph weights by means of machine learning techniques [4] (see Figures 1b and d), or even the incorporation of learning by deep learning techniques [6,7]. For the sake of simplicity and space constraints, here we focus only on the base 35 form of the proposed method at the pixel level, without the use of training images, and discuss the use of high-level priors that can be easily set based on the application knowledge.
Examples of seed-based methods are watershed [8], random walks [9], fuzzy connectedness [10], graph cuts (GC) [11], grow cut [12], minimum barrier dis- 40 tance [13], and image foresting transform (IFT) [14,15]. Some methods, including the min-cut/max-flow algorithm, can provide global optimal solutions according to a graph-cut measure in graphs and can be described in a unified manner according to a common framework, to which we refer as Generalized GC (GGC) [16]. Oriented Image Foresting Transform (OIFT) [17] and Oriented 45 Relative Fuzzy Connectedness (ORFC) [18] are the GGC methods designed for directed weighted graphs, which have low computational complexity as compared to the min-cut/max-flow algorithm [11].
In the context of multiple object segmentation, each object may present its own distinctive features, requiring different high-level priors. In order to obtain a good segmentation result, the segmentation method must attend all 55 individual object priors and capture the contextual or structural relations between them. However, many existing methods either do not include any form of structural information or include only the high-level priors for single object segmentation [19,22,20,17]. Consequently, they may be inappropriate in the context of multiple objects.

60
Most of the methods for multi-object segmentation that include structural information are based on graph-cut optimization and are performed by a mincut/max-flow algorithm [26,27,28]. These methods usually use priors, based on inclusion or exclusion interactions between objects. However, their globally optimal results are restricted only to some particular cases and they usually 65 have a high computational cost. The methods based in LOGISMOS [29,30] require an approximated pre-segmentation whenever the objects present complex shapes. Note that the fast segmentation obtained by our proposed method could also be used as a starting point for LOGISMOS.
In the context of segmentation by the Image Foresting Transform (IFT) 70 framework, in order to incorporate structural information among objects, Fuzzy Object Models (FOMs) [31, 32,33,34,35,36,37,38,39,40] are usually employed. However, these approaches are based on separate IFT executions per object, that do not incorporate contextual information and the high-level priors of all objects into a single energy optimization, limiting their potential.

75
In this work, we circumvent the aforementioned problem of IFT-based methods, by proposing a hierarchical layered graph-based approach for the multiple object segmentation by Oriented Image Foresting Transform (OIFT) [17], named as Hierarchical Layered OIFT (HLOIFT), using a single energy optimization. A graph used in HLOIFT for a segmentation of m objects is formed 80 with m layers, copies of image scene graph, each associated with an object.
The hierarchy considers inclusion and exclusion constraints between objects as in [26], but each object in the image can also have its own set of high-level priors. Further, we formulate the integration of individual object constraints and structural priors from layers, within a single energy optimization, over-85 coming the mentioned limitations from previous works and conserving the low computational cost of OIFT.
Specifically, our main contributions are as follows. Theoretical: We propose a new seed-based method for multi-object segmentation allowing high-level priors for image objects and the hierarchical constraints between them. Gen-90 erality: Our approach is less restrictive than most methods in use and leads to the globally optimal results in more general scenarios. Complexity: Our method has low computational complexity as compared to methods based on min-cut/max-flow.
HLOIFT was first presented in a conference paper [41] and later demon-95 strated for use in graphs of superpixels [5]. In this extended work, we provide more details in the presentation of the method, including a formal definition of the obtained objects, the complete algorithm, a formal mathematical proof of the correctness of the algorithm, and more experiments, quantitatively comparing with the hierarchical graph cut algorithm [26,27], one of the few existing 100 methods that can also use a full hierarchy prior as an input.
Our work is organized as follows: In Section 2, we introduce some required definitions and review the OIFT method. Our new algorithm HLOIFT is described in Section 3 and the proof of its correctness is given in Section 4. The experimental evaluation of HLOIFT is presented in Section 5, while our conclu-105 sions are stated in Section 6.

Background
An (n-dimensional) image is a pair (I, I), where I is a finite set of pixels identified with the n-tuples of integers (i.e., I ⊂ Z n ) and I is a mapping that assigns to each pixel t ∈ I its intensity I(t), that is, either a real number (in 110 case of monochromatic image) or an appropriate vector in R k . An image can be interpreted as a weighted digraph (i.e., directed graph) G = (N , A, ω) whose nodes (vertices) are the pixels in the image domain N = I, arcs/edges listed in A are the pairs (s, t) ∈ I 2 of pixels (usually, in 2D images, identified with either 4-or 8-neighborhoods), and the weight map ω associates to each arc (s, t) ∈ A 115 a value ω(s, t) ∈ [−∞, ∞] (usually defined as ω(s, t) = I(t) − I(s) ). 1 We use the notations t ∈ A(s) and (s, t) ∈ A to indicate that t is adjacent to s. A digraph G is symmetric if for all (s, t) ∈ A, the pair (t, s) is also an arc of G, that is, (t, s) ∈ A. All digraphs we consider below are symmetric. Notice, that in the symmetric weighted digraphs we may still have ω(s, t) = ω(t, s). In particular, the seeds give us a partial labeling, L(t) = 1 for all t ∈ S 1 and L(t) = 0 for all t ∈ S 0 , which is propagated to all unlabeled pixels during the 130 OIFT algorithm [17].
In the case of directed weighted graphs, there are two important classes of energy formulations within the Generalized GC framework, the Max-Min 2 and Min-Sum optimizers [16].
For the sake of simplicity, the OIFT algorithm is being presented here without the explicit forest calculation, from which its name comes from. We can do 140 this, since the label map L without the forest is sufficient for the segmentation task. The OIFT segmentation, indicated by L, can be computed by Algorithm 1, in a connected and symmetric digraph G, as described in [42].
2 Min-Max optimizer is a dual equivalent problem.

Output:
The
Remove s from Q such that V (s) is minimum;

10.
For each (s, t) ∈ A such that S(t) = 0 do

15.
If t / ∈ Q then insert t in Q.
Note that in line 12 of Algorithm 1, the arc weight ω(t, s) of the reversed arc 145 (t, s) is used (rather than that of chosen (s, t) ∈ A). That is why a symmetric digraph is required. The value V (t) assigned in line 14 constitutes a contribution, to the energy ε min (L) given by (1), that a consecutive change of labeling L(t) from L(s), already fixed, to 1 − L(s) would represent. For every node t considered in lines 10-15 the value V (t) represents the worst (minimum) weight 150 (w 0 or w 1 , as defined in Remark 1) of the arcs (v, t) ∈ A from already processed object nodes v (i.e., with S(v) = 1). The map V is related to the cost of paths in OIFT [42].

Remark 1.
Notice, that if we define the weight functions w 0 and w 1 on A as w 1 (s, t) = ω(s, t) and w 0 (s, t) = ω(t, s), then the execution of lines 11 and 12 in 155 Algorithm 1 is equivalent to the execution of a single line: "tmp ← w L(s) (s, t)." Also, if consecutive line 14 is executed, then V (t) becomes w L(s) (s, t). The use of different affinities for different objects has a long history, as described in [43].
However, so far they were mainly used for symmetric affinities and Fuzzy Con- In the case of OIFT algorithm, the appro- which is explicitly defined in [42], where it is denoted by symbols f i,ω and f o,ω . It is not difficult to see that Algorithm 1 is equivalent (in a sense defined in [44] and [45]) to a generalized Dijkstra algorithm from [15] 165 used with the path cost function ψ last .
To explore the boundary polarity, the arcs weights ω(s, t) are defined as a combination of an undirected dissimilarity measure ψ(s, t) between neighboring pixels s and t, multiplied by an orientation factor, as follows: where α ∈ [−1, 1] and we usually have ψ(s, t) = |I(t) − I(s)|. Other options for 170 ψ(s, t) are discussed in [4,44]. Note that, in general, we have ω(s, t) = ω(t, s) for α = 0. For α > 0, the segmentation by OIFT favors transitions from bright to dark pixels, and α < 0 favors the opposite orientation.
Finally, for multi-object segmentation we consider multiples α i values, each associated to a different object.

Hierarchical Layered OIFT
We propose the Hierarchical Layered Oriented Image Foresting Transform (HLOIFT) as a new seed-based method for multi-object segmentation.  Figure 2 shows an overview of our framework. For a given input image, seeds sets for some objects, and the tree of relations between objects, the HLOIFT 180 method has the following steps: (1) Each layer is constructed as a weighted digraph representing one object with its own priors (described in Section 3.2).
(2) HLOIFT defines a setup for the inter-layer connections representing the hierarchical constraints, such as inclusion and exclusion relations (described in Section 3.3). (3) HLOIFT uses an extension of the OIFT algorithm to compute 185 an optimal cut over the hierarchical layered digraph, giving as output a labeled image (described in Section 3.4).

Notations and definitions
Let L = {1, . . . , m} denote an index set, where each element in L is associated with an object we consider and m is the number of objects to be segmented.
The HLOIFT graph associated with L and image (I, I) will be defined on the set of nodes N = L × I. The HLOIFT resulted segmentation of the image will be identified with a binary variable X : N → {0, 1}, where, for i ∈ L, the ith object O i and the background O 0 are defined, respectively, as Each object/background object O i , i ∈ L ∪ {0}, will be identified with a corre- In addition, we will use the distance parameter ρ ≥ 0, which indicates the minimum distance between the boundaries of siblings and of the parent-offspring 205 pairs of objects. More specifically, The hierarchy h, with no parameter ρ, is understood as the consistency condition 210 (C) with ρ = 0. For ρ > 0, we require that the assumptions of h are, in a sense, strongly satisfied.

Construction of the layers of our digraph
The first step of HLOIFT is to create a set of m layers, where each layer in the case of OIFT. We will usually assume that, in the case of 2D images, A I is the 4-or 8-neighborhood adjacency. We define the intra-layer adjacency Similarly, if for an ith object we already have defined a weight function ω i on the image (depending on the image intensities and, in some cases, according to a given higher level prior), then the intra-layer weight function ω i is defined, . For HLOIFT to work properly 230 we need to assume that the values of these intra-layer weight functions are the finite real numbers (unless they represent hard constraints, see star convexity constraint next), as opposed to the inter-layer weight, defined below, which will have infinite values.
Of course, ω i should highlight the desired boundaries for O i as clearly as 235 possible and we would like to incorporate in its definition the higher level priors whenever it is appropriate. In particular, to utilize the object-contour orientations, that is, the boundary polarity priors, we use in our experiments the same scheme that was adopted by the regular OIFT method, defining e.g., [4,44].) In this setting, each object O i has its own constant α i ∈ [−1, 1] (used in the expression (3)), so that we can favor the segmentation of O i with transitions from bright to dark pixels with α i > 0, or the opposite orientation, with α i < 0. Note that α i = 0 can be used to indicate that O i has no boundary polarity prior.

245
If the ith layer is created using the Geodesic Star Convexity prior (GSC), then we will be prioritizing the segmentation of O i with more regular shape. A common shape prior is the star shape prior (SSP) [24], where a region has a star shape with respect to a given center c if for any point p inside the region, all points on the straight line segment between the center c and p also lie inside The set of all inter-layer arcs coming from the siblings is denoted by A s , while A p will stand for those associated with a parent/offspring pair. The weights of these inter-layer arcs s, t , with s ∈ N i and t ∈ N j (or, equivalently, with λ(s) = i and λ(t) = j), are defined as follows. Basically, the weight −∞ enforces that the vertex connected to one just processed will be examined with the highest possible priority, to ensure satisfaction of the consistency requirement (C).
In Figure 5, we give an example of the hierarchical layered digraph construction for a case that cannot be optimized under the graph cut framework [26]. that we have two mutually exclusive objects O i and O j , both contained within another object O k . Globally optimal segmentation in this case cannot be modeled with graph cuts, because it cannot be converted to a submodular energy, more details in [26]. The proposed HLOIFT method can compute globally opti-295 mal results in this case and also in other more sophisticated cases, as described below.

Energy optimization
In the final step, we execute the HLOIFT algorithm, which constitutes a modified OIFT algorithm from [42,17]. It is applied to the hierarchical layered graph H constructed above and its output maximizes a single energy ε h min , a modification of energy ε min (L) = min{ω(s, t) : (s, t) ∈ A & L(s) > L(t) = 0} defined in (2). This modification of ε min is designed to ensure that the output satisfies also the hierarchical constraints imposed by h and ρ.
Specifically, for a binary map X : N → {0, 1} described in Section 3.1 the energy ε h min of X is defined as where In other words, all arcs in the cut, except for the inter-layer arcs A s associated to siblings, are treated directionally, in the same way as in ε min . This part is expressed by the energy ε incl min . However, the inter-layer arcs from A s must be treated differently. First of all, there is not directional information 310 between them. Moreover, the siblings s and t influence one another, according to (C), only when they both belong to their respective objects, that is, when X(s) = X(t) = 1. This justifies the format of ε excl min . The formula for the combined energy ε h min is defined as a minimum of the costs of all relevant arcs in the cut, the same way as in ε min (only the meaning of the term "relevant" being

8.
Remove s from Q such that V (s) is minimum;

10.
For each (s, t) ∈ A such that S(t) = 0 do

15.
If t / ∈ Q then insert t in Q;

16.
If (s, t) ∈ As and X(s) = 1 then ) when H is a sparse graph, which is more than quadratic-time using a push-relabel based on the highest label node selection rule [47].
As a final note, observe that an interesting variation of Algorithm 2 could 345 be considered. Specifically, in the case of inclusion relation h(i) = j, we can additional require that the seeds in S j belong exclusively to the parent object (i.e., O i ∩ S j = ∅). In other words, S j should be inserted as background on its children's layers. Then, we can additionally impose that the seeds in S j do not belong to the offspring objects of j in a way similar that we ensure disjointedness 350 of siblings.

Proof of Theorem 1
Since the digraph H is built from an image, it is symmetric and connected.
So, it is easy to see that upon its termination HLOIFT returns X defined on the entire set N . Therefore, to finish the proof, we need only to show that such Notice also that if t in Q is not a seed, then there exists an s ∈ D such 365 that (s, t) ∈ A, X(t) = X(s), and V (t) = w X(s) (s, t), where we use notation w i from Remark 1. In addition, the value V (t) represents a "penalty" (i.e., a contribution to the final energy ε h min cost) of a potential change of the value X(t) from its current value X(s) to 1 − X(s).
Directly after the execution of the initialization loop 1-6, the queue Q will 370 consists precisely of the nodes from the set S = { s, i : i ∈ L & s ∈ S i }, that is, seed representations in the HLOIFT graph N . Let X 0 denote the map X defined on S at this time. Since, at this time, we also have X(t) = −∞ for every t ∈ S, these values cannot ever be changed, as the condition in line 13 is never satisfied. Also, for any X : N → {0, 1}, 375 X satisfies the seed constrains if, and only if, X extends X 0 .
In particular, any X : N → {0, 1} returned by HLOIFT has this property. Therefore, to finish the proof, it is enough to show that such X also satisfies the consistency requirements (C).
By consistency of S with (C), the family X of all X : N → {0, 1} satisfying 380 the seed constraints and the consistency requirement (C) is non-empty. In particular, any X ∈ X extends X 0 and has energy ε h min (X) > −∞ (as, by the choice of inter-layer weights, ε h min (X) = −∞ only when X fails the consistency requirement (C)). Let E be the maximum of ε h min (X) over all X ∈ X and let X 0 be the family of all X ∈ X with ε h min (X) = E. Notice that E > −∞.

385
The two key steps of our proof are the following closely related versions of preservation properties (P) and (Q). The above discussion shows that the assumptions of both of these properties are satisfied directly after the execution of the initialization loop.
The property (P) reads as follows.

390
(P) Assume that, directly before an execution of the line 8, the current X (defined on a D ⊂ N ) is a restriction of anX ∈ X 0 . If, during the following execution of line 8, we remove t from Q with V (t) < E, then the extension of X to D ∪ {t} also agrees withX, that is, Indeed, if t ∈ S, then, as we argued above,X(t) = X 0 (t) = X(t). Otherwise,

395
there is an s ∈ D such that (s, t) ∈ A and the current values of V (t) and X(t) were fixed during the execution of lines 8-17, when our node s was removed from Q. This means that we could not have simultaneously (s, t) ∈ A s and X(s) = 0.
To finish the argument for (P) we can assume that (s, t) / ∈ A s . Then, the executions of lines 8-17 ensures that X(t) = X(s). At the same time we must have thatX(s) =X(t), as otherwise impossibility.

405
The second preservation property is as follows.
(Q) Assume that, directly before an execution of the line 8, the current X (defined on a D ⊂ N ) can be extended to anX ∈ X 0 . If, during the following execution of line 8, we remove t from Q with V (t) ≥ E, then the extension X to D ∪ {t} has a further extension to anX ∈ X 0 . We have For the main part of the argument for (Q) assume that we also have V (t) < ∞.
As V (t) is finite, there must exist an s ∈ D such that (s, t) ∈ A is an intraarc and the current values of V (t) and X(t) were fixed during the execution of lines 8-17, when our node s was removed from Q. We need to show that the 415 extension X to D ∪ {t} has a further extension to anX ∈ X 0 . For i ∈ {0, 1} let If X(t) = 0, then the indicator (i.e., characteristic) function χ R1 is the desiredX ∈ X 0 extending X. Indeed, it certainly extends X (which, in turn, extends X 0 ) and has an energy ε h min (X) = min{ε incl min (X), ε excl where the equation is justified by ε excl min (X) = ε excl min (X) (which holds, as we have {s ∈ N :X(s) = 1} = {s ∈ N :X(s) = 1}) while the inequality ≥ by So, assume that X(t) = 1 and notice that (s, t) / ∈ A s : for X(s) = 0 this is ensured by line 13, while for X(s) = 1 by line 16. In particular, we also have X(s) = 1. A first impulse could be to defineX as the indicator χ N \R0 .
However, ε h min (χ N \R0 ) ≤ ε excl min (χ N \R0 ) and it is likely that ε excl min (χ N \R0 ) = −∞. So, in this case,X needs to be defined more carefully. Specifically, let i = λ(t) and let L i be the set of all ancestors of i: all j ∈ L such that, according to h, and putX = χ R . Then,X extends X which, in turn, extends X 0 . So,X 425 satisfies seeds constrains and, to show thatX ∈ X 0 , it is enough to prove that To see this, first notice that ε excl min (X) = ∞. Indeed, otherwise there would exist u, v ∈ R with (u, v) ∈ A s andX(u) =X(v) = 1. However, they cannot both belong to R 1 , since this would mean that E = ε h min (X) = −∞. They ever increased during execution of HLOIFT. With v ∈ Q, this contradicts the where the equation is justified by ε excl min (X) = ∞ = ε excl min (X), while the inequality ≥ by the facts that ε incl min (X) ≥ min{V (s) : s ∈ Q} = E and ε excl min (X) ≥ ε h min (X) = E. This completes the proof of (Q) in case V (t) < ∞.

440
To finish the proof of (Q) it is enough to show that V (t) = ∞ is impossible.
Indeed, the consistency of the seeds ensures that either λ(t) or some of its descendants, say j, contains a seed, say u. In particular, s 0 = j, u ∈ D and X(s 0 ) = X 0 (s 0 ) = 1. Also, H contains a path from s 0 to s = λ(t), u with each connecting arc having weight −∞. Therefore, s must belong to D, as otherwise 445 we would have E = −∞. Since s ∈ N λ(t) and N λ(t) is connected and symmetric, it contains adjacent u and v with u ∈ D and v ∈ Q. Hence, finishing the proof of (Q).

Experiments
In this section, an experimental evaluation of HLOIFT is presented. First, showing that HLOIFT is a versatile method, which allows us to customize the segmentation to some target objects according to their global properties (shape and polarity). Then, we assess the performance of HLOIFT against the previously existing method for multiple object segmentation by IFT [48] and the 455 segmentation of multiple objects by the hierarchical min-cut/max-flow algorithm [26], using medical and natural images. Our code and data are available upon request.

Setting the high-level priors
The proper configuration of the high-level priors for each object is essential 460 to obtain the desired segmentation of the target objects. Figure 6 shows a two-object segmentation problem for a synthetic image composed of four nested boundaries. We present the results using one internal seed for the object O 1 , an external seed for the background, the inclusion relation h(1) = 2, ρ = 1.5 and ω i defined via formula (3)

Qualitative comparison with the IFT method
In this section, we compare the results obtained by HLOIFT against the IFT method [48] for multi-object segmentation by seed competition, which encompasses the watershed transform from markers. OIFT is not included here since    It can be seen, the results obtained by HLOIFT are closer to the ground-truth 525 compared to the IFT results.

Comparison with multi-object segmentation by min-cut/max-flow algorithm
In this section, we compare HLOIFT with the multi-object segmentation by the min-cut/max-flow algorithm in layered graphs [26]. For the sake of simplicity, we considered only the inclusion relation, since many cases combining the inclusion and exclusion hierarchical constraints, such that when we have two mutually exclusive objects O i and O j , both contained within another object O k , cannot be converted to a submodular energy to perform the optimization by the graph-cut framework [26]. We used the max-flow/min-cut code from [47].   can lead to errors in some finer parts of the object, such as the petals (Fig-540 ure 12b). Besides being able to compute globally optimal results with arbitrary hierarchy constraints, the proposed HLOIFT method also has a better running time compared to the min-cut/max-flow algorithm. Table 1 shows the running times for the flower segmentation using different image resolutions in a laptop Intel Core i3-5005U CPU 2.00GHz ×4.

Quantitative accuracy experiments
For quantitative accuracy results, we compared HLOIFT, with the multiobject segmentation by IFT [48], its improved version by the relaxation procedure proposed by Malmberg et al. [50], and the hierarchical layered graph cut [26,27], denoted as HLGC, using two datasets of medical images.

550
For fair comparison, only methods with the same type of user input were considered here, in the form of a partial labeling. 3 In order to stress the methods to their limit, in a more challenging situation, we only consider simple arc weights with ψ(s, t) = |I(s) − I(t)|, so that no a priori information on the brightness (or color/texture) distribution was taken into account [4], nor enhancements based 555 on deep learning techniques [6,7].
For the first dataset, we used 40 slices from thoracic CT studies of size 512×512 to segment the liver and the abdomen as its parent object. The second dataset was composed of 40 real MR images to segment the talus and calcaneus bones as siblings, taking the foot region as their parent. The same seeds were 560 used for all the methods, which were progressively obtained by eroding the ground truth objects and background for different radius sizes (Figure 13a). Figure 13 illustrates an example of the segmentation of the liver and abdomen for different methods. For the bones we used α = −0.5 and ρ = 3 pixels, while for the liver we used α = 0.9 and ρ = 5 pixels. HLGC could not exploit the 565 exclusion relation for the bones, since it cannot be converted to a submodular energy for these sibling objects inside the foot.
The mean accuracy curves according to the Dice coefficient are shown in Figure 14, being the results in the left column obtained without shape constraints and the ones in the right with Geodesic Star Convexity (GSC). The 570 accuracy curves for the parent objects (foot and abdomen) are not shown, since 3 Note that drawing scribbles for a partial labeling with a large brush can be quite effective to quickly mark several object/background pixels, while accurately selecting extreme boundary points requires more user attention and caution. Therefore comparing these different types of constraints is quite subjective. they had almost perfect results (Dice > 99%). HLOIFT had the best results in most cases. Note also that it could benefit more from the usage of the shape constraint by GSC for the liver compared to HLGC, since this latter approach was already producing an object with regular shape and therefore was almost 575 not affected by the inclusion of GSC.

Conclusion
We proposed a new graph-based algorithm, named as HLOIFT, for multiregion segmentation, allowing the integration of the individual high-level priors of each object and the geometric constraints between them into a single energy optimization. The HLOIFT algorithm was described in details, including its proof of correctness. Besides the theoretical contribution in the context of the multi-region segmentation problem, our experiments show that good segmentation results can be obtained, even when considering a simple measure of intensity dissimilarity. Besides being faster than hierarchical min-cut/max-flow 585 based approaches, it is also less restrictive, allowing globally optimal results for arbitrary hierarchies.
As future works, we intend to combine some training information by deep learning approaches in HLOIFT, similar to what was done in [6,7], and to incorporate in HLOIFT the relaxation procedure for directed graphs from [51], in 590 order to circumvent the irregular boundaries from max-norm energy of HLOIFT.
At last, we would like also to handle relations of partial intersection by using a negative parameter ρ.