Fast maximum likelihood estimation of parameters for square root and Bessel processes

: Explicit formulae for maximum likelihood estimates of the parameters of square root processes and Bessel processes and first and second order approximate sufficient statistics are supplied. Applications of the estimation formulae to simulated interest rate and index time series are supplied, demonstrating the accuracy of the approximations and the extreme speed-up in estimation time. This significantly improved run time for parameter estimation has many applications where ex-ante forecasts are required frequently and immediately, such as in hedging interest rate, index and volatility derivatives based on such models, as well as modelling credit risk, mortality rates, population size and voting behaviour.


Introduction
Fast and accurate statistical methods in finance and other areas of applied probability are becoming increasingly important due to the enormous amounts of data available that have to be analysed. We have extremely fast and accurate methods for modelling Gaussian dynamics, where in a continuous-time setting linearly transformed Brownian motions drive model dynamics, for example, as for the logarithm of an asset price under the Black-Scholes model (see Black and Scholes (1973)) or as for the short term interest rate under the Vasicek model. This seems to be not the case for next-generation models that are based on Bessel processes and square root processes like the CIR model in finance. In population size modelling (see e.g. Feller (1971)) the square root process or the squared Bessel process emerges naturally as a suitable model. Similarly, the voting behaviour seems to be well modelled, similar to population size, by a squared Bessel process; see Karlin and Taylor (1981). Also, in finance the minimal market model (see Platen (1997)) emerges naturally as a suitable model. Interest has evolved also for the 3/2 stochastic volatility model (see Platen (1997) and Heston (1997)) which appears to be an excellent volatility model; see Goard and Mazur (2013). Furthermore, square root processes have many relevant applications in credit risk modelling (see Bielecki, Jeanblanc, and Rutkowski (2011) for an application to intensity-based models) and in stochastic mortality modelling (see Biffis (2005) and Blackburn and Sherris (2013)).
Various methods have been proposed to estimate the parameters of the model which fit a given time series of observed short rates. For example, in Poulsen (1999) a lognormal approximation is employed. In Fergusson and Platen (2015) the parameters of the CIR short rate model were estimated using the maximum likelihood method using the Newton-Raphson iterative method applied to both the log-likelihood function and an approximate log-likelihood function based on the lognormal distribution. In Fergusson and Platen (2015) the parameters of the 3/2 short rate model were also estimated using a gradient ascent method. The asymptotic properties of the Heston model are studied in Barczy, Pap, and Szabo (2016), which involves modelling the volatility process with the square root model in (3). Their estimates, based on discrete-time observations, are derived from conditional least squares estimates of modified parameters, assuming that the diffusion parameter σ is known. On the other hand, in Dokuchaev (2017) estimators of σ are supplied that do not require knowledge of the drift parameters. In Overbeck and Rydén (1997), estimators of parameters of (3), employing observations at equispaced times and based on conditional least squares, are studied. Another approach involving quasi-maximum likelihood estimation via the Wagner-Platen approximation is introduced in Huang (2013). However, in the current paper we supply closed-form approximate formulae for the maximum likelihood estimates of the parameters of the square root process and, consequently, transformations of this, including squared Bessel processes and Bessel processes.
In Section 2 we describe the probabilistic framework for our stochastic processes. In Section 3 we state theorems for closed-form approximate formulae for the parameters r, κ and σ of (3) based on second order and first order approximations to the log-likelihood function. A corollary is supplied which provides estimates of (5). In Section 4 we demonstrate applications of the theorem to simulated data sets of squared Bessel processes and to an actual short rate data set, exemplifying the accuracy of the formulae. Finally, in Section 5 we conclude with some comments.

Probabilistic framework
A Bessel process is specified by the stochastic differential equation (SDE) where Z (Z t ) t≥0 is a Wiener process adapted to a filtered probability space (Ω, ℱ , ℱ , P), ℱ (ℱ t ) t≥0 is a family of sigma algebras, R 0 x is the initial value and α, β and γ are constants. The case when these realvalued functions are Markov chains has been considered in Elliott and Platen (2001), but we confine ourselves to the case when they are constants.
Letting r t R 2 t and using the Itô formula, we have the SDE for the squared Bessel process, otherwise termed a square root process, as dr t 2α + γ 2 + 2βr t dt + 2γ r t √ dZ t , where r 0 x 2 > 0 is the positive initial value. A common application of the square root process to finance, is modelling the short rate, which is the continuously compounded rate of interest paid on a cash deposit for an infinitesimally small unit of time. In reality, the short rate is represented by short-term rates such as the overnight cash rate, which is targeted by central banks, or money market rates having terms up to 12 months. The Cox-Ingersoll-Ross (CIR) model of the short rate, given in Cox Ingersoll, and Ross (1985), is specified by the SDE where Z (Z t ) t≥0 is as before and r, κ and σ are positive constants. It is readily seen that (3) is the particular case of a squared Bessel process in (2) with A zero-valued short rate is avoided because we set κ r > 1 2 σ 2 > 0. The parameter κ denotes the speed of reversion of the short rate r t to the mean reverting level r. The r can be thought of as a smoothed average short rate which is targeted by the central bank.
Related to the CIR short rate model is the 3/2 short rate model, derived in Platen (1997Platen ( , 1999) and subsequently studied in Ahn and Gao (1999), where p > 0, q and σ > 0 are constants satisfying q < σ 2 /2 to avoid explosive values of r t . The SDE for the reciprocal of the 3/2 short rate is that for the CIR short rate model and, therefore, maximum likelihood estimation of the parameters of the 3/2 model is equivalent to that for the CIR model. Both the CIR model and 3/2 model have the property that for the above conditions on the parameters the interest rates can never be negative.
The transition density function of the short rate process in (3) is that of the non-central chi-square distribution, which, for times s and t with 0 ≤ s < t, is given by where φ t φ 0 + 1 4κ σ 2 ( e κt − 1) and ν 4κr σ 2 and where (7) is the power series expansion of the modified Bessel function of the first kind.
Restated, for t > s the conditional random variable given r s has a non-central chi-squared distribution with ν 4κr/σ 2 degrees of freedom and non-centrality parameter λ r s e κs /(φ t − φ s ), namely Let t 0 , t 1 , t 2 , …, t n be n + 1 equispaced ordered times at which the short rate is observed, where the spacing is Δ t i − t i−1 , for i 1, 2, …, n. We assume that the observed short rates are all positive. The log-likelihood function which we seek to maximise with respect to the parameters r, κ and σ is given by where the transition density function p r is as in (6).

Main theorem and corollary
Let us first state and prove the main theorem of this paper, which concerns parameter estimates of the square root process based on a second order approximation to the log-likelihood function.
Before doing this, we motivate our definition of an approximately sufficient statistic by recalling the following definition of a sufficient statistic.
Definition 1: In respect of sampled data r t 0 , r t 1 , …, r t n from the stochastic process r (r t ) t≥0 , having parameter θ ∈ R d , a statistic T T(r t0 , r t1 , …, r tn ) ∈ R k is said to be sufficient if knowledge of the value of T allows us to specify the likelihood function L(θ) determined by r t 0 , r t 1 , …, r t n , up to a constant of proportionality which does not depend on the parameter θ.
Remark 1: We see that a sufficient statistic T constitutes a sufficient reduction of the data for the purpose of calculating the maximum likelihood estimates. Indeed, this is a consequence of the Fisher-Neyman factorisation criterion which characterises a sufficient statistic for a probability distribution having density f θ (x) as one for which f θ (x) factorises as f θ (x) g θ (T) h(x 0 , …, x n ), where g θ is a function of T depending on θ, and h is a function independent of θ.
In our formulation of the main theorem we require a definition of an approximate sufficient statistic, where we are interested in the degree to which a stochastic model should be modified to make the statistic sufficient; see McCullagh (1984) for a discussion of local sufficiency or Chapter 5 of Le Cam (1986) for a discussion of approximate sufficiency, for example.
Definition 2: In respect of equispaced sampled data r t 0 , r t 1 , …, r t n , with time step Δ t 1 − t 0 , from the stochastic process r (r t ) t≥0 , having parameter θ ∈ R d , a statistic T T(r t 0 , r t 1 , …, r t n ) ∈ R m , for some m ∈ Z + , is said to be approximately sufficient of order k if knowledge of the value of T allows us to specify the θ-dependent part of the likelihood function determined by r t 0 , r t 1 , …, r t n up to a tolerance O(n Δ k+1 ), as Δ → 0. Mathematically speaking, there exist a function g θ (T), which depends on the statistic T and parameter θ, a function h(r t0 , r t1 , …, r tn ), which is independent of θ, and positive constants K 0 and K 1 such that for all Δ ∈ [0, K 1 ], logL r t 0 , r t 1 , …, r tn ; θ − logg θ T r t 0 , r t 1 , …, r tn − logh r t 0 , r t1 , …, r tn ≤ K 0 n Δ k+1 .
Theorem 1: Given the observations of the square root process in (10), the maximum likelihood estimates of the parameters r, κ, σ of the model specified in (3) are approximately given by, as Δ → 0, respectively, subject to the condition and where we have the second order approximate sufficient statistics L, R i for i ∈ {1, 2, 3, 5} Also, the Fisher information matrix is approximately given by and Finally, we have that as n → ∞, Proof: See Appendix A for a proof.
The following theorem supplies parameter estimates of the square root process using a first order approximation to the log-likelihood function.
Theorem 2: Given the observations of the short rate in (10), the maximum likelihood estimates of the parameters r, κ, σ of the CIR model specified in (3) are approximately given by, as Δ → 0, respectively, subject to the condition a and where we have the first order approximate sufficient statistics L, R i for i ∈ {1, 2, 3} given in (14)- (18). Also, the Fisher information matrix is given approximately in (22). Finally, asymptotic normality, as n → ∞, is as in (26).
The following corollary to Theorem 1 is straightforwardly established.
Corollary 1: Given the observations of the short rate in (10), the maximum likelihood estimates of the parameters p, q and σ of the 3/2 short rate model given in (5) are where formulae for r, κ and σ, based upon the reciprocals of the short rates, are supplied in Theorem 1.

Applications
In this section we supply various aspects of the estimation methods. Firstly, we check the ability of the estimated parameters from simulated data to recover the actual parameter values, for varying lengths of the series. Secondly, we fit the models to US interest rate data. Thirdly, we compare the run times with the Newton-Raphson method.

Estimation from simulated data
Here we simulate data sets from each of the processes, squared-Bessel, 3/2 and Bessel processes, and estimate its parameters over varying window lengths and time steps Δ.

CIR model
Commencing with r 0 0.05, we simulate with a daily time step, Δ 1/365, the CIR model having parameter values r 0.0427, κ 0.095 and σ 0.06439, which correspond closely to those estimated for the CIR model in Table 1. We compare estimation of parameters over varying window lengths, from one quarter to 150 years, and K. Fergusson: Fast maximum likelihood estimation time steps Δ ∈ 1 4 , 1 12 , 1 52 , 1 365 . The evolutions of estimates of the parameters r, κ, σ and ν 4rκ/σ 2 are depicted respectively in Figures 1-4, varying the time steps. We maintain shorter time horizons in these figures to allow the differences in estimated parameter values among the various time steps to be shown clearly. Because the effects of varying the time steps on estimated values become less pronounced over the long term, making them almost indistinguishable, in Figure 5 we show only the estimates of the parameters using Δ 1/365. This allows for a comparison of the behaviour of estimated values of different parameters over various window lengths up to 150 years.
In Figure 1 we show the effect of Δ on the estimate of r. For Δ 1/52 and Δ 1/365 the estimates are almost indistinguishable. For Δ 1/4 the estimates with window lengths of less than 3 years are wildly unstable, which is to be expected when using so few data points. It is not until around 3 years, corresponding to 12 data Table : Parameter estimates of the CIR, / and Bessel process short rate models fitted to monthly US short rates via several methods.

Model
Method Log-likelihood Parameter estimates Vasicek points, that we see some stability. For Δ 1/12 the estimates of r are unstable over window lengths less than one year, corresponding to fewer than 12 data points. We see from Figure 5a that window lengths of many years are required to estimate the true value of r. In fact, after 25 years the estimates remain within about 50% of the true value and accurate estimates of r are achieved for windows around 100 years in length.
In Figure 2 we show the effect of Δ on the estimate of κ. Beyond window lengths of about 10 years all four time steps give rise to similar estimated parameter values. However, for window lengths of less than 10 years all time steps fail to give good estimates of κ. We see from Figure 5b that the 95% confidence intervals contain the true value of κ for window lengths of at least 10 years. Also, it is not until around 25 years when we see estimates of κ being within about 80% of the true value and until around 85 years for estimates to be within roughly 50% of the true value.
In Figure 3 we show the effect of Δ on the estimate of σ. For Δ 1/4, the estimates stabilise after two and a half years, corresponding to 10 or more data points. For Δ 1/12, estimates stabilise after one year, namely for 12 or more data points. For Δ 1/52, estimates stabilise after half a year, which is for 26 or more data points. Finally, for Δ 1/365, the estimates are stable over windows at least as long as one quarter of a year. We see in Figure 5c that estimates of σ are within 0.001 of the true value over horizons beyond 10 years and are almost exact to four decimal places after 50 years.
In Figure 4 we show the effect of Δ on the estimate of ν 4rκ/σ 2 , which is the degrees of freedom parameter and which is crucial in ascertaining whether the process avoids hitting zero, namely when ν ≥ 2. For all Δ ∈ 1 4 , 1 12 , 1 52 , 1 365 , the estimates are unstable for windows shorter than two and a half to three years, after which the estimates provide similar values. In Figure 5d we see that the estimates do not appear to converge to the true value until around 30 years. Yet, even for windows longer than 30 years the estimates hover above the true value by 1 to 1.25, which is roughly 25% of the true value.
In conclusion, σ and r are the only parameters which can be accurately estimated over windows shorter than 150 years. With daily sampling of the process, good estimates of σ can be obtained in about half a year and with weekly sampling, good estimates are obtainable in about two years. For windows longer than 10 years, quarterly, monthly, weekly and daily sampling give rise to estimates of similar accuracy. With such accurate estimates of σ, we see from the relation σ 2 4rκ/ν that inaccuracies in any one of the parameters r, κ and ν will result to some extent in inaccuracies in the other two. This is exemplified in Figures 1, 2 and 4 and, over the longer term, in Figure 5. In practical applications, such as when modelling interest rates, the choice of Δ will depend on the window length, that is the length in years of the data series, all other variables being equal. Assuming interest rate cycles of around 20-40 years, having 50-100 years of data generally allows sufficient time for interest rates to revert, meaning that r and κ, in particular, can be estimated fairly reliably, and meaning that using Δ 1/4, or even Δ 1, will give good estimates. On the other hand, having only five years of data, using Δ ≤ 1/52 will give reliable estimates only for σ, with r and κ having large standard errors.

3/2 model
Commencing with r 0 0.05, we simulate with a daily time step, Δ 1/365, the 3/2 model having parameter values p 0.11, q −0.019 and σ 2.25093, which correspond closely to those estimated for the 3/2 model in Table 1. We compare estimation of parameters over the same varying window lengths and time steps examined   in the CIR model. The evolutions of estimates of the parameters p, q, σ and ν 4(1 − q/σ 2 ) are depicted respectively in Figures 6-9, varying the time steps. For the same reasons given for the CIR model, we maintain shorter time horizons in these figures, whereas in Figure 10 we show only the estimates of the parameters using Δ 1/365 over window lengths up to 150 years.
In Figure 6 we show the effect of Δ on the estimate of p. In (31) we see that p is equal to κ in the corresponding CIR model for the reciprocal of the 3/2 process. Therefore the effects of Δ on p correspond well with those on κ shown in Figure 2. Beyond window lengths of about 10 years all four time steps give rise to similar estimated parameter values. However, for windows shorter than 10 years all time steps fail to give good estimates of p. We see from Figure 10a that the 95% confidence intervals contain the true value of p for window lengths of at least 10 years. Also, it is not until after around 25 years when we see estimates of p remaining within about 100% of the true value and until around 85 years for estimates to be within roughly 50% of the true value.
In Figure 7 we show the effect of Δ on the estimate of q. In (31) we see that q is equal to σ 2 − rκ in the corresponding CIR model, and this can be rewritten in terms of σ and ν as σ 2 (1 − ν/4). Therefore, if σ is accurate then we would expect the effect of Δ on q to be similar to that on ν. Indeed Figure 10b appears to be a mirror image of Figure 5d. Beyond window lengths of about 10 years all four time steps give rise to similar estimated parameter values. We see from Figure 10b that windows at least around 25 years in length give estimates of q within 2 of the true value and windows at least around 85 years gives estimates within 1 of the true value.
In Figure 8 we show the effect of Δ on the estimate of σ, which are similar to those on σ in the corresponding CIR model. For windows longer than 14 years, the effects of Δ wear off. For Δ 1/4 and Δ 12, the estimates stabilise after 12 years or so, corresponding to at least 48, respectively 144, data points. For Δ 1/52, estimates stabilise after 2 years, corresponding to 104 or more data points. Finally, for Δ 1/365, the estimates are stable over windows at least as long as one quarter of a year. We see in Figure 10c that estimates of σ are within 0.05 of the true value after 5 years and are within 0.01 over horizons beyond 45 years.
In Figure 9 we show the effect of Δ on the estimate of ν 4(1 − q/σ 2 ), which is the degrees of freedom parameter and which is crucial in ascertaining whether the process avoids hitting zero, namely when ν ≥ 2. For all Δ ∈ 1 4 , 1 12 , 1 52 , 1 365 , the estimates are unstable for windows shorter than two and a half to three years, after which the estimates provide similar values. In Figure 10d we see that the estimates do not appear to converge to the true value until around 30 years. Yet, even for windows longer than 30 years the estimates hover above the true value by 1-1.25, which is roughly 25% of the true value.   In conclusion, σ is the most accurately estimated parameter over all window lengths up to 150 years, followed by p and lastly q. With daily sampling of the process, good estimates of σ can be obtained in about half a year and with weekly sampling, good estimates are obtainable in about 2 years. For windows longer than 10 years, quarterly, monthly, weekly and daily sampling give rise to estimates of similar accuracy. With such accurate estimates of σ, we see from the relation q σ 2 (1 − ν/4) that inaccuracies in q and ν are linearly related. This is exemplified in Figures 7 and 9 and, over the longer term, in Figures 10b and 10d. In practical applications, the choice of Δ is similar to the choice for the CIR model.

Bessel model
Commencing with r 0 0.05, we simulate with a daily time step, Δ 1/365, the Bessel model having parameter values α 0.000151, β −0.107 and γ 0.01682, which correspond closely to those estimated for the Bessel model in Table 1. We compare estimation of parameters over the same varying window lengths and time steps examined in the CIR model. The evolutions of estimates of the parameters α, β, γ and ν 1 + 2α/γ 2 are depicted respectively in Figures 11-14, varying the time steps. For the same reasons given for the CIR model, we maintain shorter time horizons in these figures, whereas in Figure 15 we show only the estimates of the parameters using Δ 1/365 over window lengths up to 150 years.
In Figure 11 we show the effect of Δ on the estimate of α, where stability in estimates is achieved in windows of around 5.7 years in length and longer, regardless of the time step. For shorter windows, the estimates all diverge from the true value by about 0.005 for windows longer than one year, and diverge in varying amounts for windows shorter than one year. We see from Figure 15a that estimates are within 0.00015 for window lengths between 7.5 and 130 years, and converging very close to the true value beyond 130 years.
In Figure 12 we show the effect of Δ on the estimate of β. For windows shorter than 5.7 years the estimates vary with Δ but fail to achieve any value consistently close to the true value. Beyond this, all values of Δ provide similarly valued estimates, within 0.1 of the true value of −0.107. We see from Figure 15b that windows between 60 and 100 years in length provide estimates within 0.05 of the true value, and after 100 years the estimates are almost exact.
In Figure 13 we show the effect of Δ on the estimate of γ. For Δ 1/365 the estimates are very close to the true value, with estimates being almost indistinguishable from the true value after 7.5 years. The larger the value of Δ the worse the estimates are, although most of estimates improve beyond 7.5 years, except for Δ 1/4. We see in Figure 15c that estimates of γ, based on Δ 1/365, are within 0.0001 of the true value after 10 years and are within about 0.00005 beyond 50 years.
In Figure 14 we show the effect of Δ on the estimate of ν 1 + 2α/γ 2 , for the same reasons given for the CIR and 3/2 models. For all Δ ∈ 1 4 , 1 12 , 1 52 , 1 365 , the estimates are unstable for windows shorter than 5.5 years, after which the estimates provide similar values and achieving stability after 7.5 years. In Figure 15d we see that the estimates are within 1 of the true value beyond 10 years and within 0.1 beyond 133 years.
In conclusion, γ is the most accurately estimated parameter over all windows, particularly for Δ 1/365, which gives good estimates over windows about half a year in length. The other parameters can be estimated fairly reliably over longer windows. The choice of Δ is largely determined by the data window. For shorter windows using Δ 1/365 is imperative. However, for windows over several decades, quarterly sampling provides reliable estimates.

Estimation from historical interest rates
In this section we demonstrate the parameter estimation for each of the processes when used to model interest rates. For the set of monthly US short rates over the period 1871-2019, as given by the one-year rates in Shiller (2019) and supplemented by six-month LIBOR rates in Federal Reserve Bank of St. Louis (2019), we obtain the maximum likelihood estimates in Table 1. The Newton-Raphson approximation algorithm was used to compute the exact parameter estimates for each of the models, whereas the estimates derived using the first and second term approximations employ the formulae given in Theorem 1 and Corollary 1. The parameter estimates of the Vasicek model were used in determining an initial set of parameter estimates for the Newton-Raphson method applied to the CIR model, as described in Fergusson and Platen (2015).
We can draw several conclusions from Table 1, where the data series is monthly and nearly 150 years long. Firstly, simulations performed in Section 4.1.1 for the CIR model, having the same parameters used in Table 1, indicate that the minimum window length for reasonable estimates of r is around 30 years, as shown in Figure 5a. For κ, this is around 90 years, as shown in Figure 5b and for σ, this is around 14 years for monthly or less frequent observations, as shown in Figure 3. So based on the monthly frequency and 150 year series length, we would expect the parameter estimates for the CIR model in Table 1 to be reliable. Secondly, analogous simulations performed in Section 4.1.2 for the 3/2 model indicate that the minimum window length for estimates of p within 0.1 of the true value is around 20 years, as shown in Figure 10a. For q being within 1 of its true value, this is around 90 years, as shown in Figure 10b and for σ, this is around 14 years, as shown in Figure 8. As for the CIR model, we would expect the parameter estimates for the 3/2 model in Table 1 to be reliable. Thirdly, analogous simulations performed in Section 4.1.3 for the Bessel model indicate that the minimum window length for estimates of α within 0.00006 of the true value is around 133 years, as shown in Figure 15a. For β being within 0.01 of its true value, this is around 101 years, as shown in Figure 15b and for γ being within 0.001 of its true value, this is around   8 years, as shown in Figure 13. As for the two previous models, we would expect the parameter estimates for the Bessel model in Table 1 to be reliable. In general, the minimum series length is around 100 years for estimating parameters of any of our models, for all choices of Δ ≤ 1/4. Finally, using either the log-likelihood, AIC or BIC as the criterion, the CIR model is the best of the three models.
To illustrate the estimation methods for more frequent observations, for the set of daily US Effective Federal Funds rates over the period July 1954 to January 2020, as given in Federal Reserve Bank of St. Louis (2020), we obtain the maximum likelihood estimates in Table 2. We note that the second order approximation in respect of the Bessel model fails due to Condition A not being met.  We can draw several conclusions from Table 2, where the data series is daily and roughly 65 1/2 years long. Firstly, simulations performed in Section 4.1.1 for the CIR model, despite being based on different parameters, indicate that the respective minimum window lengths for reasonable estimates of r, κ and σ are around 30, 90 and 14 years for daily observations. So based on the daily frequency and 65 1/2 year series length, we would expect the parameter estimates for r and σ of the CIR model in Table 2 to be reliable, while the estimate of κ to be unreliable. Secondly, analogous simulations performed in Section 4.1.2 for the 3/2 model indicate that the respective minimum window lengths for estimates of p, q and σ to be around 20, 90 and 14 years for daily observations. So, the parameter estimates of p and σ are reliable while the estimate of q is not. Thirdly, analogous simulations performed in Section 4.1.3 for the Bessel model indicate that the respective minimum window lengths for estimates of α, β and γ are around 133, 101 and 8 years. So in Table 2, only the estimate of γ is reliable, while estimates of α and β are not. As mentioned previously, the minimum series length is around 130 to 140 years for estimating parameters for any of our models, for all choices of Δ ≤ 1/4. With this caveat in mind, based on either the log-likelihood, AIC or BIC, the Bessel model is the best of the three models.

Comparison of run times
To give an idea of the substantial improvement in run times of the first order and second order approximation methods over the standard Newton-Raphson estimation method, we simulated in MATLAB 1000 data sets of CIR short rates, based on the CIR parameter values r 0.041954, κ 0.093950 and σ 0.064619 (i.e. ν 3.775778), and then computed the average estimation run time per simulation, along with the averages of the parameter estimates. The results are shown in Table 3, where it is evident that the approximation methods take roughly two or three ten-thousandths of a second, far less than the 1 s or so taken on average by the Newton-Raphson method. This represents a speed-up of roughly 5,000 times with almost the same level of accuracy.

Conclusion
The closed-form formulae presented in this paper permit accurate parameter estimates of the CIR and 3/2 short rate model to be performed in spreadsheets and other high-level software packages, such as MATLAB and R, and permit exceedingly fast computations in computer algorithms. This significantly improved run time for parameter estimation has many applications where ex-ante forecasts are required frequently and immediately, such as in hedging interest rate derivatives and volatility based on such models.

Appendix A Proofs of theorems 1 and 2
In this technical appendix we supply proofs of Theorems 1 and 2. Our proof of Theorem 1 is as follows. Proof: For convenience, we make the change of variables given in (23) and we rewrite the log-likelihood function in (11) in terms of these new variables as where L, R 0 and R 1 are as in (14)-(16). Our proof of the theorem involves solving for the new variables using an approximation to the first order derivatives of the log-likelihood function in (32).
The modified Bessel function of the first kind given in (7) can also be written as from which we can derive the asymptotic expansion, given in §7.23 of Watson (1966), and arg z ∈ ( −π/2, 3π/2). Employing the first few terms of the expansion gives where the subscript ν in the big-O notation indicates dependence of the implied constant upon ν. Taking logarithms of both sides gives In terms of our new variables given in (23) we rewrite the log-likelihood function in (32) as where ℓ 1 (a, k, v) n log 2 2π and where L, R 0 , R 1 , R 2 , R 3 , R 4 , R 5 and R 6 are as in (14)-(21). According to (23), a ≈ 1 4 σ 2 Δ, for small Δ, so that in (38), 1 n ℓ(a, k, v) − 1 n ℓ 1 (a, k, v) O ν, σ, R 6 (Δ 3 ) as Δ → 0, where the subscripted variables in the big-O notation indicate dependence of the implied constant on these variables. Therefore, knowledge of the values of L, R 0 , R 1 , R 2 , R 3 and R 5 is sufficient to determine the (r, κ, σ)-dependent part of ℓ 1 and therefore the statistic T (L, R 0 , R 1 , R 2 , R 3 , R 5 ) is approximately sufficient of the second order.
Our plan is firstly to show that the solution x 1 (a 1 , k 1 , v 1 ) ⊤ to and secondly to solve (40).
Write h x − x 1 and by Taylor's Theorem for some ξ 1 ∈ (0, 1), where Therefore, We know that and that It is apparent from (44) and (45) that h is close to zero, that is, x 1 is close to x. Specifically, To solve (40), we rewrite ℓ 1 (a, k, v) as a quadratic in v, with the consequent inequality where and where, for ease of legibility in our algebra, the functions f, g and h are defined as Equality in (48) is achieved when v k + 1 2 L a R 3 + 1 2 a 2 R 5 .
We now focus on determining the optimum parameter values of the new log-likelihood function ℓ 2 (a, k). The first order partial derivatives of ℓ 2 with respect to our new set of variables a and k are readily deduced to be The maximum likelihood estimates are found by equating these first order partial derivatives to zero and solving for a and k. Thus, from (52) we have the two equations Rearranging (54) gives the quadratic equation in a In the light of (54), we can reduce the degree of the polynomial in a in (53) as follows This is a cubic equation in a that can be rearranged as Euclid's greatest common divisor algorithm applied to the polynomial equations (55) and (57) gives a monic polynomial in a. The first step of this algorithm is (57) minus 3 2 a of (55), giving the quadratic polynomial equation in a The second step of this algorithm is (58) minus 2

R5
− 5 8 R 5 f k − 9 4 R 3 of (55), giving the monic polynomial equation in a we have from (59) the solution for a a b k c k .
Note that replacing a in (55) by the RHS expression in (61) gives which only involves the variable k. Defining the function q k as and making the assumption that k is close to zero, an approximate solution can be found by solving where Therefore, from (64) we have where the choice of sign of the square root is the same as that of q′ 0 , thus being the root closest to zero. Then we can determine a from (61) and, finally, determine v from (51).
The original parameters are estimated as The second order partial derivatives are and the Fisher information matrix with respect to the variables a, k and v is simply From (23), for small values of κΔ/2, and we have Finally, asymptotic normality in (26) of the maximum likelihood estimates follows from (12) and asymptotic theory of MLEs; see for example Theorem 3.10 in Lehmann and Casella (1998). Our proof of Theorem 2 is as follows.
Proof: We continue in a similar vein as in the proof of Theorem 1 up to the equations formed by equating the first order partial derivatives in (52) to zero and solving for a and k, where instead the log-likelihood function in (32) is rewritten as where ℓ 1 (a, k, v) n log 2 2π with equality achieved when v h k a R 3 , and where the first order partial derivatives of ℓ 2 are ∂ 2 ℓ ∂a 2 n 2a 2 − n a 3 R 0 e −k + R 1 e k − 2R 2 , ∂ 2 ℓ ∂a ∂ k n 2a 2 − R 0 e −k + R 1 e k , and the Fisher information matrix with respect to the variables a, k and v is simply The Fisher information matrix I with respect to the variables r, κ and σ is derived as in the proof of Theorem 1. Table 1 We supply here the full details of the calculations of the standard errors of the parameters for the CIR model, given in Table 1. The calculations of the standard errors of the parameters for the 3/2 and Bessel models are calculated analogously. Once the maximum likelihood estimates of the parameters of the CIR model have been obtained, namely r 0.042712, κ 0.094537, σ 0.064243, we compute numerically the Fisher Information Matrix, approximated by the observed Fisher Information Matrix,
Author contribution: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
Research funding: This research is supported by an Australian Government Research Training Program Scholarship.
Conflict of interest statement: The authors declare no conflicts of interest regarding this article.