Accuracy of the typicality approach using Chebyshev polynomials

Trace estimators allow to approximate thermodynamic equilibrium observables with astonishing accuracy. A prominent representative is the finite-temperature Lanczos method (FTLM) which relies on a Krylov space expansion of the exponential describing the Boltzmann weights. Here we report investigations of an alternative approach which employs Chebyshev polynomials. This method turns out to be also very accurate in general, but shows systematic inaccuracies at low temperatures that can be traced back to an improper behavior of the approximated density of states with and without smoothing kernel. Applications to archetypical quantum spin systems are discussed as examples.


I. INTRODUCTION
The (numerically) exact evaluation of thermodynamic quantum equilibrium observables is restricted to small systems due to the exponential growth of the Hilbert space for systems with finite-size single-site Hilbert spaces such as Heisenberg or Hubbard models. For quantum systems with unrestricted single-site spaces the situation is even more severe. Only very few analytically solvable systems are known which creates a massive need for numerical (approximation) schemes. One rather successful means to approximate thermodynamic quantities rests on trace estimators which approximate a trace by an expectation value with respect to a random vector [1][2][3][4][5][6][7][8][9][10][11][12]. These schemes, sometimes also called typicality or (microcanonical) thermal pure quantum states [13][14][15][16], have been used very successfully in particular in the field of correlated electron systems, see e.g.  but also in quantum chemistry [38,39].
Despite this success, the authors of [8] suggest that an alternative approximation using an expansion of the density of states in terms of Chebyshev polynomials should be more accurate [8]. The major argument is that this expansion does not suffer from the loss of orthogonality during recursive state generation used in Krylov space methods. This property is certainly responsible for the high accuracy obtained in numerical unitary time evolution using a Chebyshev expansion, see e.g. [8,[46][47][48][49][50].
In the present paper, we therefore study several Heisenberg quantum spin systems and derive numerical as well as formal conclusions about the accuracy of the method. * jschnack@uni-bielefeld. de We can summarize that the approach via Chebyshev polynomials is indeed accurate, but not more accurate than FTLM [44]. On the contrary, under certain circumstances the employed kernel which smoothens (unphysical) oscillations of the approximated density of states introduces systematic inaccuracies. The same holds for the mapping of the energy spectrum onto the interval [−1 + ε/2, 1 − ε/2] to comply with the domain of definition of the polynomials.
The paper is organized as follows. In Section II we recapitulate the Chebyshev method. In Section III we present our numerical examples. The article closes with a discussion in Section IV.

II. METHOD
In this section, we briefly introduce the Chebyshev method and its parameters to be able to discuss the method's accuracy. For a more detailed description of the algorithm we recommend [8].
In a quantum mechanical system with a discrete energy spectrum, the microcanonical density of states is defined as The canonical partition function Z(β) is determined by the integral over the density of states weighted with the Boltzmann factor: with β = 1 k B T . Correspondingly, the heat capacity is evaluated as For the susceptibility we employ the S ∼ z -symmetry of Heisenberg systems and decompose the density into contributions from all orthogonal subspaces with total magnetic quantum number M , i.e.
We further calculate only contributions for M ≥ 0, since the respective contributions for negative M are degenerate and can be added accordingly.
The idea of the Chebyshev algorithm is to expand the microcanonical density of states ρ(E) in terms of Chebyshev polynomials and then approximate the integral (2) by Gauss-Chebyshev integration. We would like to state already at this stage that some accuracy problems shown later in this article arise if the approximated density of states does not behave like a proper density, e.g, if it becomes negative.
Since the Chebyshev polynomials are restricted to the interval [−1, 1], a variable transformation of the Chebyshev polynomials to arbitrary intervals must be introduced as in [51,Sec. 1.3.2]. The transformation of the Hamiltonian results in where, as suggested in [8], a parameter ε is introduced to prevent truncation of the approximated delta peaks corresponding to the extremal eigenvalues. The original energy interval is thus scaled to the interval [−1+ ε 2 , 1− ε 2 ]. The corresponding scaled density of states ρ(x) is then expanded in terms of Chebyshev polynomials C n (x): It can be shown that the coefficients of the expansion are given by the traces These traces are approximated using the typicality approach, i.e. with being a random vector with Gaussian distributed components r ν with respect to a chosen orthonormal basis { | ν }. The relative error of an estimate Θ n (R) is proportional to 1/ R dim(H) as shown in e.g. [8], where R is the number of random vectors and dim(H) the dimension of the Hilbert space. At this point it should be noted that these traces can also be evaluated to numerical accuracy using a complete basis. This possibility will be used to distinguish statistical and systematic deviations later on. Due to the finite order of the expansion, so-called Gibbs' oscillations can occur which cause the approximated density of the states to have negative values. If one wants to obtain a "physical" representation of the density, i.e. without negative values, one can modify the coefficients µ n by a kernel [8]. The kernel fixes this problem at the cost of introducing a systematic error which vanishes for N deg → ∞. In this paper, we restrict our discussion to the use of the Jackson kernel whose coefficients read In figure captions or legends we will write g n =JK when the kernel is applied, otherwise g n = 1.
For an arbitrary function f (x) the Gauss-Chebyshev integration gives rise to the approximation where the supporting points read This approximation is an exact identity if f (x) is a polynomial of order 2Ñ − 1 or smaller [51]. In the case at hand, f (x) has to be chosen as and is thus no polynomial of order smaller than 2Ñ − 1. However, the approximation through Gauss-Chebyshev integration is still a good choice for numerical purposes as it can be computed through a discrete cosine-transform (type III). Deploying the f (x) to Eq. (12) gives where the values of the weights γ k read If one choosesÑ ≥ N deg , the sum can be complemented to an upper limit ofÑ with additional g n = 0 terms. The γ k can then be computed through a discrete cosinetransform (type III) of the coefficients g n µ n which allows a faster computation of the sum. The time needed scales withÑ lnÑ instead ofÑ N deg [8].
If there are known symmetries the scheme can be performed for each orthogonal subspace H Γ separately. The approximated partition function can then be written as All of the following systems possess S ∼ z symmetry which implies orthonormal subspaces corresponding to the quantum number of the total magnetization Γ = M . In our numerical examples, subspaces with dimension D < 15, 000 are fully diagonalized for larger systems. For smaller systems, i.e. those containing only subspaces of dimension D < 15, 000, only subspaces with dimension D < 1, 000 are fully diagonalized. In a real application one would of course diagonalize all subspaces numerically exactly where this is possible.
To summarize, the Chebyshev method depends on several parameters that can have an effect on the accuracy of the results. These are the order of the expansion N deg , the scaling parameter ε, the number of random vectors R, the use of a kernel g n , and the number of supporting pointsÑ .

III. NUMERICAL RESULTS
The Chebyshev algorithm uses random vectors for trace estimation, compare Eq. (9), therefore the results are expected to exhibit a statistical distribution. To assess this statistical behavior, we perform two kinds of studies: (A) We investigate a thermodynamic observable as a function of the number R of random vectors used for the trace estimator (9), and (B) we study the variance among P realizations per fixed parameter set (for some of the cases presented below).
By considering each realization O i (β) as a random measurement, a mean O(β) and a variance δO(β) 2 can be defined: If additionally an exact result O E is known, the systematic deviation Figure 1. Systems investigated in sec. III A, from top to bottom: ladder, chain, sawtooth chain. Periodic boundary conditions will be applied.
As specific quantum spin systems we investigate three archetypical systems that show fundamentally different behavior at low temperatures, namely a spin ladder that is gapped in the thermodynamic limit [53], a spin chain that is gapless in the thermodynamic limit [54], and a sawtooth chain in the vicinity of a quantum-critical point [55][56][57], compare Fig. 1.

A. Heisenberg ladder
In this subsection, the accuracy of the Chebyshev algorithm is investigated using a Heisenberg ladder for various numbers of spins N with spin quantum number s = 1/2 and periodic boundary conditions. The Hamiltonian reads where the first subscript i ∈ {1, . . . , N/2} of the spin operators denotes the rung and the second subscript j ∈ {1, 2} denotes the leg of the spin. Thus, the exchange interaction J 1 connects nearest neighbor spins on rungs, J 2 does the same on legs. Both are chosen to be antiferromagnetic, J 1 = J 2 = 1. standard configuration of parameters used in the following. The order of expansion N deg and the number of random states R are chosen for low computation times and sufficiently accurate results. Their influences on the accuracy are discussed in Sec. III A 2 a and Sec. III A 1.
As the parameter ε is introduced as a corrective variable, its influence will be shown separately in Sec. III A 2 b and is omitted for the time being. The same argument holds for the kernel g n , shown in Sec. III A 2 d. To make use of the discrete cosine-transform (type III), the number of points of integrationÑ has to be greater than or equal to N deg . The equality is chosen as a starting point.

Statistical deviations
Since the approximation of the traces by using random vectors is the cause of the statistical variations in the result, the number of random vectors R is varied to investigate the latter. The values given in Tab. I are used as a standard configuration. In Fig. 2 and 3, the heat capacity and susceptibility for various values of R are plotted next to the result determined by exact diagonalization. In addition, a result determined by the Chebyshev algorithm is shown where the traces µ n are computed numerically exactly instead of approximating them using random vectors.
One can see that both the heat capacity (Fig. 2) as well as the differential susceptibility (Fig. 3) match the respective curve derived from exact diagonalization very well for T /|J 1 | > 10 −2 . However, for R = 10 a noticeable deviation in the maximum of the main peak of both observables can be seen. Additionally, all curves of the heat capacity show a "ghost dip" at low temperatures for T /|J i | ≈ 10 −3 − 10 −2 . Nevertheless, for most purposes the achieved accuracy for the standard parameter configuration and R > 100 is more than sufficient at higher temperatures.
It is noticeable that in the region of the "ghost dip", all approximate curves deviate from the exact solution independently of R, suggesting a small statistical but significant systematic error. This is confirmed by the fact that the curve determined with numerically exact traces shows this deviation as well.

Systematic deviations
Next, we discuss how tuning the parameters N deg , ε,Ñ , and g n affects systematic deviations. This is mostly done by observing the behavior of the"ghost dip" of the heat capacity under variation of each parameter.
a. The order of expansion can be increased to push the systematic deviations, i.e. the "ghost dip", to lower temperatures. This is demonstrated in Fig. 4. Computation time increases linearly with N deg .
b. The scaling parameter ε seems to have a negative rather than the intended positive effect on the heat capacity as shown in Fig. 5. The best result is achieved for ε = 0. Conversely, when considering the results for the density of states of the largest subspace with M = 0 (see Fig. 6), one can see that the parameter ε has the intended effect to prevent the peaks of the lowest and highest eigenvalues to be cut off. There are also situations where a ε ≈ 10 −6 seemingly decreases the depth of the dip compared to ε = 0. Anyhow, such small values of ε are not sufficient to prevent the cut-off of the density of states, and the improvement was not significant when compared to statistical deviations.
c. The number of supporting pointsÑ , is investigated in Fig. 7. It is difficult to give universal recommendations regarding this parameter. Our experience is   that it should be chosen equal to the order of expansion N deg . Other systems could show very different results, but it is important to note that a good choice ofÑ scales with N deg . d. The smoothing with the Jackson kernel causes the unphysical ghost dip to become a ghost peak, see     more easily identified as an error than a peak. A good approach could be to always compare both the smoothened and the native result. This can be done without great computational effort.  For larger systems a kernel can have an even stronger negative effect as is demonstrated in Fig. 9 for N = 24 and s = 1/2. The Jackson kernel is significantly setting back the convergence of the expansion. For an order of N deg = 100, which produces very accurate approximations without kernel, the application of the kernel renders the result to become unusable, compare top of Fig. 9. One needs to expand the polynomial to an order of N deg = 500 to counteract the inaccuracy introduced by the kernel, but even then the result with kernel is still not significantly better than the result without the kernel. The result without the kernel seems already sufficiently accurate for k B T /|J 1 | < 10 −2 and N deg = 100.

B. Heisenberg ring
Since the Heisenberg ladder is a gapped system, i.e a spin system with a non-zero excitation energy between the groundstate and the first excited state in the thermodynamic limit, we would also like to investigate a system that is gapless in the thermodynamic limit. The behavior of thermodynamic functions of the system at low temperatures highly depends on this excitation energy. One could argue that the deviation shown in the previous section are due to this dependency. Hence, in this section we will discuss an antiferromagnetic Heisenberg ring with s = 1/2 for which the excitation energy vanishes in the thermodynamic limit.
In Fig. 10 the results for the Heisenberg ring with N = 24 spins is displayed. One can see that the deviations here are very similar to the ones for the Heisenberg ladder with the same system size and choice of parameters, compare Fig. 9, even though here, they occur at slightly lower temperatures.
To further investigate the influence of the gap's size on the deviations in the results the curves for the heat capacity per site for various numbers of spins are displayed in Fig. 11. While the differences of the results between k B T /|J| = 2 · 10 −2 and 2 · 10 −1 are mostly due to finite-size effects, so even the exact curves would deviate from each other, the differences between k B T /|J| = 10 −3 and 10 −2 on the other hand are due to the method's inability to reproduce the low temperature behavior of the Heisenberg ring for various system sizes. However, there is no definite trend of better results for larger systems recognizable, see the lower graph in Fig. 11. Thus, the inaccuracies of the results do not directly depend on the gap size. The antiferromagnetic Heisenberg ring with N = 10, s = 5/2 and nearest neighbor interaction is an interesting example as well, as this system is realized as a magnetic molecule (abbreviated Fe 10 ) called the "ferric wheel" [44] that can be accurately described by this model. In Fig. 12 P = 100 estimates with the Chebyshev method using R = 1 random vectors and their mean are compared to an estimate using R = 100 random vectors. They are displayed with and without kernel.
One can see that without kernel the estimates with R = 1 are broadly scattered at low temperatures while their mean and the estimate with R = 100 random vectors are almost perfectly aligned with the exact result. When employing the kernel the estimates with R = 1 are distributed less broadly but their mean and the estimate with R = 100 deviate strongly from the result of the exact diagonalization. From the experience collected before, we assume that these deviations can be resolved by using higher orders of expansion N deg , see Fig. 13 with a linear temperature axis. Note that even for N deg = 200 the result without kernel outperforms the one with kernel. Further more, the result with N deg = 100 without kernel is more accurate than the result with N deg = 200 and kernel. However, because of the narrower distribution of the estimates when using the kernel we want to investigate the statistical behavior of the results obtained with a higher order of expansions, see Fig. 14 for the results for N deg = 200. One can see that the narrowing of the R = 1 estimates by using the kernel is less significant than in the N deg = 100 case but still non-negligible especially in the low temperature regime. This can be seen in the deviation of mean from the exact result which is greater in the case without kernel. But again the deviation is still there for the same and even slightly higher temperatures. The best result is obtained with the R = 100 estimates without kernel. In this case an order of expansion of N deg = 100 is sufficient.
Also for the differential susceptibility (not shown) we obtain that the mean of the R = 1 estimates deviates from the exact diagonalization result without kernel more strongly than with kernel. The R = 100 estimate on the other hand is almost accurate without kernel but deviates as strongly as the mean in the case with kernel.

C. Sawtooth chain
The sawtooth chain (also known as delta chain) is an example with a highly degenerate spectrum. The Hamiltonian reads with periodic boundary conditions, ferromagnetic nearest neighbor interaction J 1 < 0 and antiferromagnetic next-nearest neighbor interaction J 2 > 0. We select a case with |J 2 /J 1 | = 0.45 which is close to the quantum critical point (QCP) at |J 2 /J 1 | = 1/2 [55]. The typicality approach has shown to be very efficient for this systems in schemes such as FTLM [44]. This can be confirmed for the Chebyshev method as well. In Fig. 15 the R = 1 estimates of the heat capacity are distributed very narrowly around their mean which itself is perfectly aligned with R = 100. While the result without kernel is also aligned with the FTLM estimate, the result with kernel shows significant deviations from the FTLM result. We again show another result for a higher order of expansion N deg = 500, the lower graph in Fig. 16. The deviation due to the kernel can be minimized, but still does not fall below those of the results without kernel.

IV. DISCUSSION AND CONCLUSIONS
We have seen that the Chebyshev method achieves very accurate results when handled with care. There are many possible choices for the parameters introduced for this method. We tried to identify some "good" choices and some methods to optimize them.
In particular we found that the number of points of integrationÑ should be chosen closely to the order of expansion which itself has to be chosen according to the dimension of the problem. In the cases investigated, N deg = 100 − 200 is a sufficient choice as well as R ≥ 100. The parameter ε had no positive effect at least not when trying to approximate thermodynamic functions. So it seems advisable to set it equal to zero.
The most interesting "parameter" was whether to smooth the result with the Jackson kernel or not. We found here as well that for the investigated systems there was no positive effect. However, there might be a different use of the kernel. When the expansion of the density of states is completed the kernel can be employed without great computational effort to check the approximation with and without kernel for differences. For good results with a high order of the expansion the kernel changes the results only where they were already wrong, e.g. where heat capacity is negative. Therefore, if the kernel does not change the result too much one can be reasonably sure that the choice of the order of expansion is suffi- cient.
Finally, we can summarize that the approach via Chebyshev polynomials is accurate, but does not show any advantage compared to FTLM [44].