# Convergence rate of the modified Levenberg-Marquardt method under Hölderian local error bound

• Lin Zheng , Liang Chen and Yangxin Tang
From the journal Open Mathematics

## Abstract

In this article, we analyze the convergence rate of the modified Levenberg-Marquardt (MLM) method under the Hölderian local error bound condition and the Hölderian continuity of the Jacobian, which are more general than the local error bound condition and the Lipschitz continuity of the Jacobian. Under special circumstances, the convergence rate of the MLM method coincides with the results presented by Fan. A globally convergent MLM algorithm by the trust region technique will also be given.

MSC 2010: 65K05; 90C30

## 1 Introduction

We consider the system of nonlinear equations

(1.1) F ( x ) = 0 ,

where F ( x ) : R n R n is continuously differentiable. Denote by X and , the solution set of (1.1) and 2-norms, respectively. Throughout the article, we assume that X is nonempty because equation (1.1) may have no solutions for the nonlinearity of F ( x ) .

Nonlinear equations play an important role in many fields of science, and many numerical methods are developed to solve nonlinear equations [1,2,3]. Many efficient solution techniques such as the Newton method, quasi-Newton methods, the Gauss-Newton method, trust region methods, and the Levenberg-Marquardt method are available for this problem [1,2, 3,4,5, 6,7,8, 9,10,11, 12,13,14, 15,16,17, 18,19,20, 21,22].

The most common method to solve (1.1) is the Newton method. At every iteration, it computes the trial step

(1.2) d k N = J k 1 F k ,

where F k = F ( x k ) and J k = F ( x k ) is the Jacobian. If J ( x ) is Lipschitz continuous and nonsingular at the solution, then the convergence of the Newton method is quadratic. However, the Newton method has some disadvantages, especially when the Jacobian matrix J k is singular or near singular. To overcome the difficulties caused by the possible singularity of J k , the Levenberg-Marquardt method [2,3] computes the trial step by

(1.3) d k LM = ( J k T J k + λ k I ) 1 J k T F k ,

where λ k > 0 is the LM parameter that is updated in every iteration.

However, it is too strong to assume that the Jacobian is nonsingular. The local error bound requirement is weaker than the nonsingularity condition. It is necessary that

(1.4) c dist ( x , X ) F ( x ) , x N ( x )

holds for some constant c > 0 , where dist ( x , X ) is the distance from x to X and N ( x ) is some neighborhood of x X .

Under the local error bound condition, Yamashita and Fukushima [4] and Fan and Yuan [5] show that the LM method has quadratic convergence if the LM parameter was chosen as λ k = F k 2 and λ k = F k α with α [ 1 , 2 ] , respectively. Interested readers are referred to [6,7,8] for related work.

Inspired by the two-step Newton’s method, Fan [9] presented a modified Levenberg-Marquardt (MLM) method with an approximate LM step

(1.5) d k MLM = ( J k T J k + λ k I ) 1 J k T F ( y k ) ,

where y k = x k + d k LM , and the trial step is

(1.6) s k = d k LM + d k MLM .

The MLM method has cubic convergence under the local error bound condition. For more general cases, Fan [19] gave an accelerated version of the MLM method. She also extended the LM parameter λ k = F ( x k ) α from α [ 1 , 2 ] to α [ 0 , 2 ] . The convergence order of the accelerating MLM method is min { 1 + 2 α , 3 } , which is a continuous function of α .

To save more Jacobian calculations and achieve a fast convergence rate, Zhao and Fan [10] and Chen [11] presented a higher-order Levenberg-Marquardt method by computing the approximate step twice, and the method has biquadratic convergence under the local error bound condition.

At present, the Levenberg-Marquardt method is very widely used. It is a classical method for solving nonlinear least squares problems, and it can be used to solve financial problems [23,24]. In real applications, some nonlinear equations may not satisfy the local error bound condition but satisfy the Hölderian local error bound condition defined as follows.

## Definition 1.1

We say that F ( x ) provides a Hölderian local error bound of order γ ( 0 , 1 ] in some neighborhood of x X , if there exists a constant c > 0 such that

(1.7) c dist ( x , X ) F ( x ) γ , x N ( x ) .

We can see that the Hölderian local error bound condition is more generalized from (1.4) and (1.7); when γ = 1 , the local error bound condition is included as a special case. Hence, the local error bound condition is stronger. For example, the Powell singular function [25]

h ( x 1 , x 2 , x 3 , x 4 ) = ( x 1 + 10 x 2 , 5 ( x 3 x 4 ) , ( x 2 2 x 3 ) 2 , 10 ( x 1 x 4 ) 2 ) T

satisfies the Hölderian local error bound condition of order 1 2 around the zero point but does not satisfy the local error bound condition [12]. In a biochemical reaction network, the problem of finding the moiety conserved steady state can be formulated as a system of nonlinear equations, which satisfies the Hölderian local error bound condition [12]. Recently, some scholars discussed the convergence results of the LM method under the Hölderian local error bound condition and the Hölderian continuity of the Jacobian [12,13, 14,22]. In this article, we will investigate the convergence rate of the MLM method under the Hölderian local error bound condition and Hölderian continuity of the Jacobian, which are more general than the local error bound condition and the Lipschitz continuity of the Jacobian.

This article is organized as follows. In Section 2, we propose an MLM algorithm and show that it converges globally under the Hölderian continuity of the Jacobian. In Section 3, we study the convergence rate of the algorithm under the Hölderian local error bound condition and the Hölderian continuity of the Jacobian. We finish the work with some conclusions and references.

## 2 A globally convergent MLM algorithm

In this section, we propose an MLM algorithm by the trust region technique and then prove that it converges globally under the Hölderian continuity of the Jacobian.

We take

(2.1) Φ ( x ) = F ( x ) 2

as the merit function for (1.1). We define the actual reduction of Φ ( x ) at the k th iteration as

(2.2) Ared k = F k 2 F ( x k + d k LM + d k MLM ) 2 .

The predicted reduction needs to be nonnegative.

Note that the step d k LM in (1.3) is the minimizer of the convex minimization problem

(2.3) min d R n F k + J k d 2 + λ k d 2 φ k , 1 ( d ) .

If we let

(2.4) Δ k , 1 = d k LM = ( J k T J k + λ k I ) 1 J k T F k ,

then it can be verified that d k LM is also a solution of the trust region subproblem

(2.5) min d R n F k + J k d 2 s.t. d Δ k , 1 .

We obtain from the result given by Powell [15] that

(2.6) F k 2 F k + J k d k LM 2 J k T F k min d k LM , J k T F k J k T J k .

In the same way, the step d k MLM in (1.5) is not only the minimizer of the problem

(2.7) min d R n F ( y k ) + J k d 2 + λ k d 2 φ k , 2 ( d ) ,

but also the solution of the trust region problem

(2.8) min d R n F ( y k ) + J k d 2 s.t. d Δ k , 2 ,

where

(2.9) Δ k , 2 = d k MLM = ( J k T J k + λ k I ) 1 J k T F ( y k ) .

Thus, we also obtain

(2.10) F ( y k ) 2 F ( y k ) + J k d k MLM 2 J k T F ( y k ) min d k MLM , J k T F ( y k ) J k T J k .

We define the newly predicted reduction from (2.6) and (2.10) as follows:

(2.11) Pred k = F k 2 F k + J k d k LM 2 + F ( y k ) 2 F ( y k ) + J k d k MLM 2 ,

which satisfies

(2.12) Pred k J k T F k min d k LM , J k T F k J k T J k + J k T F ( y k ) min d k MLM , J k T F ( y k ) J k T J k .

The ratio of the actual reduction to the predicted reduction

(2.13) r k = Ared k Pred k

is used in deciding whether to accept the trial step and how to update the MLM parameter λ k .

The algorithm is presented as follows.

## Algorithm 2.1

Given x 1 R n , 0 < α 2 , μ 1 > m > 0 , 0 < p 0 p 1 p 2 < 1 , a 1 > 1 > a 2 > 0 . Set k 1 .

Step 1. If J k T F k = 0 , then stop. Solve

(2.14) ( J k T J k + λ k I ) d = J k T F k with λ k = μ k F k α

to obtain d k LM and set

y k = x k + d k LM .

Solve

(2.15) ( J k T J k + λ k I ) d = J k T F ( y k )

to obtain d k MLM and set

s k = d k LM + d k MLM .

Step 2. Compute r k = Ared k / Pred k . Set

(2.16) x k + 1 = x k + s k , if r k p 0 , x k , otherwise .

Step 3. Choose μ k + 1 as

(2.17) μ k + 1 = a 1 μ k , if r k < p 1 , μ k , if r k [ p 1 , p 2 ] , max { a 2 μ k , m } , if r k > p 2 .

Set k k + 1 and go to Step 1.

We give some assumptions before studying the global convergence of Algorithm 2.1.

## Assumption 2.2

1. The Jacobian J ( x ) is Hölderian continuous of order v ( 0 , 1 ] , i.e., there exists a positive constant κ h j such that

(2.18) J ( x ) J ( y ) κ h j x y v , x , y R n .

2. J ( x ) is bounded above, i.e., there exists a positive constant κ b j such that

(2.19) J ( x ) κ b j , x R n .

From (2.18), we can obtain

(2.20) F ( y ) F ( x ) J ( x ) ( y x ) = 0 1 J ( x + t ( y x ) ) ( y x ) d t J ( x ) ( y x )

y x 0 1 J ( x + t ( y x ) ) J ( x ) d t κ h j y x 1 + v 0 1 t v d t = κ h j 1 + v y x 1 + v .

## Theorem 2.3

Let Assumption 2.2 hold. Then Algorithm 2.1 terminates in finite iterations or satisfies

(2.21) lim k J k T F k = 0 .

## Proof

We prove the theorem by contradiction. Suppose that (2.21) is not true, then there exists a positive constant τ and infinitely many k such that

(2.22) J k T F k τ .

Let S 1 , S 2 be the sets of the indices as follows:

S 1 = { k J k T F k τ } , S 2 = k J k T F k τ 2 and x k + 1 x k .

Then, S 1 is an infinite set. In the following, we will derive the contradictions whether S 2 is finite or infinite.

Case I: S 2 is finite. Then, the set

S 3 = { k J k T F k τ and x k + 1 x k }

is also finite. Let k ˜ be the largest index of S 3 . Then, x k + 1 = x k holds for all k { k > k ˜ k S 1 } . Define the indices set

S 4 = { k > k ˜ J k T F k τ and x k + 1 = x k } .

If k S 4 , we can deduce that J k + 1 T F k + 1 τ and x k + 2 = x k + 1 . Hence, we have k + 1 S 4 . By induction, we know that J k T F k τ and x k + 1 = x k hold for all k > k ˜ , which implies that r k < p 0 . Therefore, we have

(2.23) μ k and λ k

due to (2.14). Hence, we obtain

(2.24) d k LM 0 .

Moreover, it follows from (2.8), (2.20), and (2.23) that

(2.25) d k MLM = ( J k T J k + λ k I ) 1 J k T F ( y k ) ( J k T J k + λ k I ) 1 J k T F k + ( J k T J k + λ k I ) 1 J k T J k d k LM + κ h j 1 + v d k LM 1 + v ( J k T J k + λ k I ) 1 J k T d k LM + d k LM + κ h j κ b j λ k ( 1 + v ) d k LM 1 + v c 1 d k LM

holds for all sufficiently large k , where c 1 is a positive constant. Therefore, we have

(2.26) s k = d k LM + d k MLM ( 1 + c 1 ) d k LM .

Furthermore, it follows from (2.12), (2.19), (2.22), (2.24), and (2.26) that

(2.27) r k 1 = Ared k Pred k Pred k F ( x k + d k LM + d k MLM ) 2 F k + J k d k LM 2 + F ( y k ) 2 F ( y k ) + J k d k MLM 2 J k T F k min d k LM , J k T F k J k T J k + J k T F ( y k ) min d k MLM , J k T F ( y k ) J k T J k F k + J k s k O ( d k LM 1 + v ) + O ( d k LM 2 + 2 v ) + F k + J k d k LM O ( d k LM 1 + v ) J k T F k min d k LM , J k T F k J k T J k F k + J k d k O ( d k LM 1 + v ) + O ( d k LM 2 + v ) + O ( d k LM 2 + 2 v ) J k T F k min d k LM , J k T F k J k T J k 0 ,

which implies that r k 1 . In view of the updating rule of μ k , we know that there exists a positive constant m ˜ > m such that μ k < m ˜ holds for all sufficiently large k , which is a contradiction to (2.23).

Case II: S 2 is infinite. It follows from (2.12) and (2.19) that

(2.28) F 1 2 k S 2 ( F k 2 F k + 1 2 ) k S 2 p 0 Pred k k S 2 p 0 J k T F k min d k LM , J k T F k J k T J k + J k T F ( y k ) min d k MLM , J k T F ( y k ) J k T J k k S 2 p 0 τ 2 min d k LM , τ 2 κ b j 2 ,

which implies

(2.29) lim k , k S 2 d k LM = 0 .

Then, from definition of d k LM , we have

(2.30) λ k + , k S 2 .

Similarly to (2.25), there exists a positive c 2 such that

(2.31) d k MLM c 2 d k LM

holds for all sufficiently large k S 2 . From (2.28), we obtain

(2.32) s k d k LM + d k MLM ( 1 + c 2 ) d k LM .

So, we derive that

(2.33) k S 2 s k = k S 2 d k LM + d k MLM < + .

Furthermore, it follows from (2.18) and (2.19) that

k S 2 J k T F k J k + 1 T F k + 1 < + .

Since (2.22) holds for infinitely many k , there exists a large k ˆ such that J k T F k τ and

k S 2 , k k ˆ J k T F k J k + 1 T F k + 1 < τ 2 .

From (2.28) to (2.31), we can deduce that lim k x k exists and

(2.34) d k LM 0 , d k MLM 0 .

Therefore, we can obtain

(2.35) μ k + .

In the same way as proved in case I, we can also have

r k 1 .

Hence, there exists a positive constant m ¯ > m such that μ k < m ¯ holds for all sufficiently large k , which is a contradiction to (2.35). The proof is completed.□

## 3 Convergence rate of Algorithm 2.1

In this section, we analyze the convergence rate of Algorithm 2.1 under the Hölderian local error bound condition and the Hölderian continuity of the Jacobian. We assume that the sequence { x k } generated by the MLM method converges to the solution set X of (1.1) and lies in some neighborhood of x X .

First, we will make the following assumption for studying the local convergence theory.

## Assumption 3.1

1. F ( x ) provides a Hölderian local error bound of order γ ( 0 , 1 ] in some neighborhood of x X , i.e., there exist constants c > 0 and 0 < b < 1 such that

(3.1) c dist ( x , X ) F ( x ) γ , x N ( x , b ) ,

where N ( x , b ) = { x R n x x b } .

2. J ( x ) is Hölderian continuous of order v ( 0 , 1 ] , i.e., there exists a positive constant κ h j such that

(3.2) J ( x ) J ( y ) κ h j x y v , x , y N ( x , b ) .

Similar to (2.20), we have

(3.3) F ( y ) F ( x ) J ( x ) ( y x ) κ h j 1 + v y x 1 + v , x , y N ( x , b ) .

Moreover, there exists a constant κ b f > 0 such that

(3.4) F ( y ) F ( x ) κ b f y x , x , y N ( x , b ) .

In the following, we denote by x ¯ k the vector in X that satisfies

x ¯ k x k = dist ( x k , X ) .

### 3.1 Properties of d k LM and d k MLM

In the section, we investigate the relationship among d k LM , d k MLM , and dist ( x k , X ) .

Suppose the singular value decomposition (SVD) of J ( x ¯ k ) is

J ¯ k = U ¯ k Σ ¯ k V ¯ k T = ( U ¯ k , 1 , U ¯ k , 2 ) Σ ¯ k , 1 0 V ¯ k , 1 T V ¯ k , 2 T = U ¯ k , 1 Σ ¯ k , 1 V ¯ k , 1 T ,

where Σ ¯ k , 1 = diag ( σ ¯ k , 1 , , σ ¯ k , r ) with σ ¯ k , 1 σ ¯ k , 2 σ ¯ k , r > 0 . The corresponding SVD of J k is

J k = U k Σ k V k T = ( U k , 1 , U k , 2 , U k , 3 ) Σ k , 1 Σ k , 2 0 V k , 1 T V k , 2 T V k , 3 T = U k , 1 Σ k , 1 V k , 1 T + U k , 2 Σ k , 2 V k , 2 T ,

where Σ k , 1 = diag ( σ k , 1 , , σ k , r ) with σ k , 1 σ k , 2 σ k , r > 0 , and Σ k , 2 = diag ( σ k , r + 1 , , σ k , r + q ) with σ k , r σ k , r + 1 σ k , r + q > 0 . In the following, if the context is clear, we will omit the subscription k in Σ k , i , U k , i and V k , i ( i = 1 , 2 , 3 ) and write J k as

J k = U 1 Σ 1 V 1 T + U 2 Σ 2 V 2 T .

### Lemma 3.2

Under the conditions of Assumption 3.1, if x k , y k N ( x , b / 2 ) , then there exists a constant c 3 > 0 such that

(3.5) s k c 3 dist ( x k , X ) min ( 1 , 1 + v α / 2 γ , ( 1 + v ) ( 1 + v α / 2 γ ) + v α / γ )

holds for all sufficiently large k.

### Proof

Since x k N ( x , b / 2 ) , we obtain

x ¯ k x x ¯ k x k + x k x 2 x k x b ,

which implies that x ¯ k N ( x , b ) . From (3.1) and (2.17), we have

(3.6) λ k = μ k F k α m c α / γ x ¯ k x k α / γ .

From (3.3), we can obtain

F k + J k ( x ¯ k x k ) 2 = F ( x ¯ k ) F k J k ( x ¯ k x k ) 2 κ h j 1 + v 2 x ¯ k x k 2 + 2 v .

Since d k LM is the minimizer of φ k , 1 ( d ) , we have

(3.7) d k LM 2 φ k , 1 ( d k LM ) λ k φ k , 1 ( x ¯ k x k ) λ k = F k + J k ( x ¯ k x k ) 2 + λ k x ¯ k x k 2 λ k κ h j 2 c α / γ m ( 1 + v ) 2 x ¯ k x k 2 + 2 v α / γ + x ¯ k x k 2 c 4 2 x ¯ k x k 2 min ( 1 , 1 + v α / 2 γ ) ,

where c 4 = κ h j 2 c α / γ / m ( 1 + v ) 2 + 1 . Then

(3.8) d k LM c 4 x ¯ k x k min ( 1 , 1 + v α / 2 γ ) .

It follows from (3.3), we have

(3.9) d k MLM = ( J k T J k + λ k I ) 1 J k T F ( y k ) ( J k T J k + λ k I ) 1 J k T F k + ( J k T J k + λ k I ) 1 J k T J k d k LM + κ h j 1 + v d k LM 1 + v ( J k T J k + λ k I ) 1 J k T 2 d k LM + κ h j 1 + v d k LM 1 + v ( J k T J k + λ k I ) 1 J k T .

Now, using the SVD of J k , we can obtain

(3.10) ( J k T J k + λ k I ) 1 J k T = ( V 1 , V 2 , V 3 ) Σ k , 1 Σ k , 2 0 U 1 T U 2 T U 3 T

( Σ 1 2 + λ k I ) 1 Σ 1 ( Σ 2 2 + λ k I ) 1 Σ 2 0 Σ 1 1 λ k 1 Σ 2 .

By the theory of matrix perturbation [26], we have

diag ( Σ 1 Σ ¯ 1 , Σ 2 , 0 ) J k J ( x ¯ k ) κ h j x ¯ k x k v .

The above inequalities imply

(3.11) Σ 1 Σ ¯ 1 κ h j x ¯ k x k v , Σ 2 κ h j x ¯ k x k v .

Since { x k } converges to x , without loss of generality, we assume that κ h j x ¯ k x k v σ ¯ r 2 holds for all large k . From (3.11), we have

(3.12) Σ 1 1 1 σ ¯ r κ h j x ¯ k x k v 2 σ ¯ r .

From (3.6), we can derive

(3.13) λ k 1 Σ 2 = Σ 2 μ k F ( x k ) α κ h j m c α / γ x ¯ k x k v α γ .

From (3.9) and (3.10), we have that there exist positive c 5 and c ¯ such that

(3.14) d k MLM 2 d k LM + c 5 d k LM 1 + v x ¯ k x k v α / γ c ¯ x ¯ k x k min { 1 , 1 + v α / 2 γ , ( 1 + v ) ( 1 + v α / 2 γ ) + v α / γ }

holds for all sufficiently large k . Therefore, we can obtain

(3.15) s k d k LM + d k MLM d k LM + d k MLM c 3 x ¯ k x k min { 1 , 1 + v α / 2 γ , ( 1 + v ) ( 1 + v α / 2 γ ) + v α / γ } ,

where c 3 is a positive constant. The proof is completed.□

The updating rule of μ k indicates that μ k is bounded below. Next, we show that μ k is also bounded above.

### Lemma 3.3

Under the conditions of Assumption 3.1, if x k , y k N ( x , b / 2 ) and

v > max 1 γ 1 , 1 γ ( 1 + v ) α 2 1 , 1 γ γ ( 1 + v ) α 2 ,

then there exists a constant M > m such that

(3.16) μ k M

holds for all sufficiently large k.

### Proof

First, we prove that for all sufficiently large k

(3.17) Pred k c ˘ F k d k LM max { 1 / γ , 1 / ( γ ( 1 + v ) α / 2 ) , ( 1 γ ) / ( γ ( 1 + v ) α / 2 ) + 1 } ,

where c ˘ is a positive constant.

We consider two cases:

Case 1: x ¯ k x k d k LM . It follows from (3.1), (3.3), (3.8), and v > 1 / γ 1 that

(3.18) F k F k + J k d k LM F k F k + J k ( x ¯ k x k ) c 1 / γ x ¯ k x k 1 / γ κ h j 1 + v x ¯ k x k 1 + v c 1 / γ x ¯ k x k 1 / γ c 6 d k LM max { 1 / γ , 1 / ( γ ( 1 + v ) α / 2 ) }

holds for some c 6 > 0 .

Case 2: x ¯ k x k > d k LM . From (3.18), we can obtain

(3.19) F k F k + J k d k LM F k F k + d k LM x ¯ k x k J k ( x ¯ k x k ) F k 1 d k LM x ¯ k x k F k + d k LM x ¯ k x k ( F k + J k ( x ¯ k x k ) ) d k LM x ¯ k x k ( F k F k + J k ( x ¯ k x k ) ) c 7 d k LM x ¯ k x k 1 / γ 1 c ˘ d k LM max { 1 / γ , ( 1 γ ) / ( γ ( 1 + v ) α / 2 ) + 1 }

holds for some c 7 , c ˘ > 0 .

From (3.18) and (3.19), we have

(3.20) F k 2 F k + J k d k LM 2 = ( F k + F k + J k d k LM ) ( F k F k + J k d k LM ) F k ( F k F k + J k d k LM ) c ˘ F k d k LM max ( 1 / γ , 1 / ( γ ( 1 + v ) α / 2 ) , ( 1 γ ) / ( γ ( 1 + v ) α / 2 ) + 1 } .

Since d k MLM is a solution of (2.8), we know that F ( y k ) 2 F ( y k ) + J k d k MLM 2 0 . Hence, we obtain

Pred k = F k 2 F k + J k d k LM 2 + F ( y k ) 2 F ( y k ) + J k d k MLM 2 F k 2 F k + J k d k LM 2 c ˘ F k d k LM max ( 1 / γ , 1 / ( γ ( 1 + v ) α / 2 ) , ( 1 γ ) / ( γ ( 1 + v ) α / 2 ) + 1 } .

It follows from (3.3), (3.8), and (3.17) that

r k 1 = Ared k Pred k Pred k = F ( x k + d k LM + d k MLM ) 2 F k + J k d k LM 2 + F ( y k ) 2 F ( y k ) + J k d k MLM 2 Pred k F k + J k s k O ( d k LM 1 + v ) + O ( d k LM 2 + 2 v ) + F k + J k d k LM O ( d k LM 1 + v ) c ˘ F k d k LM max { 1 / γ , 1 / ( γ ( 1 + v ) α / 2 ) , ( 1 γ ) / ( γ ( 1 + v ) α / 2 ) + 1 } .

In view of (3.4), (3.8), (3.9), and (3.14), we have

(3.21) F k + J k d k LM F k

and

(3.22) F k + J k s k F k + J k d k LM + J k d k MLM F k + κ b f d k MLM O ( x ¯ k x k min { 1 , 1 + v α / 2 γ , ( 1 + v ) ( 1 + v α / 2 γ ) + v α / γ } ) .

Since

v > max 1 γ 1 , 1 γ ( 1 + v ) α 2 1 , 1 γ γ ( 1 + v ) α 2 ,

then, we can obtain

r k 1 .

Therefore, there exists a positive constant M > m such that μ k M holds for all sufficiently large k . The proof is completed.□

Lemma 3.3 together with (3.4) indicates that the MLM parameter satisfies

(3.23) λ k = μ k F k α M κ b f α x ¯ k x k α .

Hence, the MLM parameter is also bounded above.

### 3.2 Convergence rate of Algorithm 2.1

By the SVD of J k