Sliced Inverse Regression: application to fundamental stellar parameters

We present a method for deriving stellar fundamental parameters. It is based on a regularized sliced inverse regression (RSIR). We first tested it on noisy synthetic spectra of A, F, G, and K-type stars, and inverted simultaneously their atmospheric fundamental parameters: Teff, log g, [M/H] and vsini. Different learning databases were calculated using a range of sampling in Teff, log g, vsini, and [M/H]. Combined with a principal component analysis (PCA) nearest neighbors (NN) search, the size of the learning database is reduced. A Tikhonov regularization is applied, given the ill-conditioning of SIR. For all spectral types, decreasing the size of the learning database allowed us to reach internal accuracies better than PCA-based NN-search using larger learning databases. For each analyzed parameter, we have reached internal errors that are smaller than the sampling step of the parameter. We have also applied the technique to a sample of observed FGK and A stars. For a selection of well studied stars, the inverted parameters are in agreement with the ones derived in previous studies. The RSIR inversion technique, complemented with PCA pre-processing proves to be efficient in estimating stellar parameters of A, F, G, and K stars.


Introduction
Astronomical surveys, either spaceborne or ground-based, are gathering an unprecedented amount of data. One can mention the SDSS DR14 data (Abolfathi et al., 2018) that contains 154 TB of millions of spectroscopic and photometric data. The DR5 of the LAMOST survey (Cui et al., 2012) contains 9 million spectra in total. Gaia DR2 provides information about 1.3 billion stars (Katz & Brown, 2017). These space and ground-based surveys quantify the size of the data the astronomical community will face in a near future.
Spectroscopic analysis is crucial for the derivation of fundamental stellar atmospheric parameters which are the effective temperature (T eff ), the surface gravity (log g), and the metallicity ([M/H]). In addition to these fundamentals, and because it may strongly affect the shape of the observed spectra, the projected equatorial rotational velocity, v sin i, is also retrieved from spectroscopic information. Many authors have for long been using spectroscopic data to estimate the stellar atmospheric parameters (Buchhave et al., 2012, Dieterich et al., 2017, Fabbro et al., 2018, Latham et al., 2002, McWilliam, 1990, Schönrich & Bergemann, 2014. However in order to extract the most relevant and accurate information from high-resolution, and large bandwidths stellar spectra, still more endeavour is required. Most of the traditional approaches and developed pipelines rely on standard procedures such as comparing an observed spectrum with a set of theoretical spectra (Morris et al., 2018, Valenti & Piskunov, 1996. The requirement for advanced computational techniques rises from the generated large dimensionality of the data due to the wide wavelength coverage together with high spectral resolution. Many new techniques are being developed. In Ness et al. (2015) and Casey et al. (2016), a data-driven approach is introduced (CANNON) for determining stellar labels (fundamental parameters and detailed stellar abundances) from spectroscopic data. Their learning databases (LDB) are based on a subset of reference objects for which the stellar labels are known with high accuracy. Dimension reduction techniques are also developed and used, such as applying the Principal Component Analysis (PCA) for data reduction (see e.g. Jolliffe 1986). PCA has shown its effectiveness in inverting the fundamental stellar atmospheric parameters in several studies (Bailer-Jones et al., 1998, Gebran et al., 2016, Paletou et al., 2015a,b, Re Fiorentin et al., 2007. Xiang et al. (2017) estimated the stellar atmospheric parameters as well as the absolute magnitudes and α-elements abundances from the LAMOST spectra with a multivariate regression method based on kernelbased PCA. The LAMOST spectroscopic survey data has also been recently analyzed by Boeche et al. (2018) to invert stellar parameters and chemical abundances in which several combined approaches and techniques were compared. The authors developed a code called SP_Ace which utilizes nearest neighbor comparison and non-linear model fitting techniques. In Wilkinson et al. (2017), a spectral fitting code (FIREFLY) was developed to derive the stellar population properties of stellar systems. FIREFLY uses a χ-squared minimization fitting procedure that fits stellar population models to spectroscopic data, following an iterative best-fitting process controlled by a Bayesian information criterion. Their approach is efficient to overcome the so-called "ambiguities" in the spectra. More recently, Gill et al. (2018) used wavelet decomposition to distinguish between noise, continuum trends, and stellar spectral features in the CORALIE FGK-type spectra. By calculating a subset of wavelet coefficients from the target spectrum and comparing it to those from a grid of models in a Bayesian framework, they were able to derive T eff , [M/H], and v sin i for these stars. Ting et al. (2018) presented The Payne, a general method for the precise and simultaneous determination of numerous stellar labels from observed spectra. Using a simple neural-net-like functional form and a suitable choice of training labels, The Payne yields a spectral flux prediction good to 10 −3 rms across a wide range of T eff and log g. Ting et al. (2018) applied this approach to the APOGEE DR14 data set and obtained precise elemental abundances of 15 chemical species. In the same context, Fabbro et al. (2018) applied a deep neural network architecture to analyse both SDSS-III APOGEE DR13 and synthetic stellar spectra. Their convolutional neural network model, StarNet, was able to predict precise stellar parameters when trained on APOGEE spectra or on synthetic data.
In this study, we apply techniques such as, reduction of dimensionality with PCA, and a PCA-based nearest neigbor search (Gebran et al., 2016, Paletou et al., 2015a complemented with a Regularized Sliced Inverse Regression (Bernard-Michel et al., 2007, 2009 (RSIR) procedure in order to derive simultaneously T eff , log g, [M/H] and v sin i from spectra of A, and FGK type stars. Up to now, sliced inverse regression has been rarely used in astronomy (Bernard-Michel et al., 2009, Watson et al., 2017. When combined with PCA-based techniques, the derivations of the fundamental atmospheric parameters are achieved with higher accuracy compared to the sole/mere PCA-based nearest neighbor inversion (Gebran et al., 2016, Paletou et al., 2015a. The mathematical description of our method is detailed in Sec. 2. Section 3 describes all elements used for the enhancement of the computational abilities of SIR. Section 4 discusses the application of the technique on synthetic spectra for A, F, G, and K-type like stars. In section 5, we show the results of inversions of real stars. Discussion and conclusion are gathered in Sec. 6.

Sliced inverse regression (SIR)
SIR, originally formulated by Li (1991), is a statistical technique that reduces multivariate regression to a lower dimension. It finds an inverse functional relationship between the response and the predictor which are the fundamental parameters and the flux respectively. Synthetic spectra flux values, x syn , are usually calculated based on the set of stellar atmospheric parameters in the form of: (1) The inverse functional relation is used to predict the parameters of the observed flux values, x obs , in the form of: In our work, we have derived a functional relationship for each parameter in the following way: where j = 1, 2, 3, 4 for T eff , log g, [M/H], and v sin i.

Global covariance matrix Σ
SIR starts with the computation of the covariance matrix Σ of all the synthetic spectra x i of the LDB: First, spectra are gathered in a matrix of dimension N spectra × N λ , where N λ is the number of wavelength points per spectrum and N spectra is the total number of spectra in the LDB. Then, the covariance matrix Σ, is defined as: where the global mean x is defined as: x i being a row vector containing the flux values of spectrum i.

Intra-slices covariance matrix
In SIR, all spectra are organized based on an increasing order of the considered parameter for inversion. For example, if we are to invert T eff of each star, the spectra database should be organized in increasing order of T eff while having the other parameters ordered randomly. We then build-up subsets of spectra, also called "slices", having the same value of the parameter one wishes to determine first. These slices should not overlap each other (Li, 1991). Then we calculate the means x h of the slice of the spectra found in each slice S h that contains n h synthetic spectra (h being the index of each slice). For the inversion of each parameter, x h and x are used to calculate the "intra-slices" covariance matrix, Γ: where

Dimension reduction and parameter inversion
SIR aims to build a reducing subspace that maximizes the variance between the slices while minimizing the variance within the slices which creates a reduced predictor versus response regressive relationship to predict the parameters of the observed stars. This is applied by the process of stacking the spectra by an increase order of similar or close valued parameters and averaging them into a single spectra and projecting them on a new subspace.
These new projection will later be used predictors of the functional relationship. Since the reduced projections are formed from spectra having close parameter values, this insures a higher accuracy of regressive predictions (Watson et al., 2017). On the other hand, slicing the spectra based on non-overlapping similar parameters insures this inter-slice maximization and intra-slice minimization. The matrix Σ −1 Γ is then calculated where Σ and Γ are the two previously defined matrices. One eigenvector of Σ −1 Γ, called β λ and corresponding to an eigenvalue λ, is used to form the reduction subspace. This will allow us to do regression in a 2-dimensional space using an inverse functional relationship. This relationship is constructed via a linear piecewise interpolation between the projection coordinates of the slices on the single eigenvector of Σ −1 Γ and the parameters.
The selection of β λ , is based on a metric C λ that quantifies the relationship between the spectra and the parameters. C λ , defined as the "sliced inverse regression criteria" (Bernard-Michel et al., 2007, 2009, is calculated as follows: where β t λ is the transpose of β λ and V ar is the variance function β t λ Γ β λ is the "inter-slice" variance, whereas β t λ Σ β λ represents the total variance. The β λ that gives a C λ value closest to 1 is considered as a the best choice for the reducing basis vector. In the present work, C λ varies between 0.91 and 0.97 when using the eigenvector of Σ −1 Γ with the largest eigenvalue λ.
To invert the parameters, we apply linear piecewise interpolation on the coordinates of the projections of the x h -s on β λ . Finally, the estimation of the parameters is made according to: where y is the estimated parameter; y h is the mean of the parameters of the spectra in slice h. The superscript "p" represents the projected value of a selected set of data on β λ i.e., x p =< β λ . x >.

Enhancement of the computational abilities of SIR
In the present work, we are dealing with large amounts of high resolution spectra, so that Σ −1 Γ have typical dimension of ∼ 10 4 × 10 4 . In addition, using a large LDB for SIR induces an increase in the intra-slice variance. This will lead to less accurate inverted parameters. Therefore to simultaneously address these problems, we applied two additional steps to SIR: first, using PCA, we reduce the dimension of every spectra in the LDB (Watson et al., 2017) from ∼ 10 4 to 12. Second, we apply a PCA-based NN-search in the reduced subspace to select a smaller LDB, more relevant for the spectra one wishes to analyze. Σ −1 Γ matrix is generally ill-conditioned. And the higher the condition number is, the more noise sensitive the system becomes (Kreyszig, 2010). For the present work, values as large as 10 20 were found. In that case β λ is very noise sensitive, leading to an unstable functional relationship. As a result, inaccurate inverted parameters may be derived. To solve this issue, we have applied Tikhonov regularization which aims to improve the conditioning of Σ −1 Γ and add a priori information to it based on the analysis of the noise of each observed spectrum. Several regularization methods exist, however, Tikhonov is very common and easily implemented. Other studies may address this issue, such as the truncated SVD used in Watson et al. (2017). Figure 1 summarizes the successive procedures we implemented, and that we discuss with more details hereafter.

LDB reduction via PCA
PCA is a numerical technique that allows for the reduction of dimension of each spectrum by projecting it on a set of orthogonal basis vectors called principal components (PC's). These components are the eigenvectors of the global covariance matrix Σ. Paletou et al. (2015a) and Gebran et al. (2016) showed that for databases similar to the ones used in this study, only 12 PC's associated to the largest eigenvalues are enough to reduce the LDB, while the reconstruction error remains less than 1%. Therefore after this first pass, the new LDB has dimension of N spectra × 12.
The original LDB may reach to a dimension of N spectra ×N λ 10 6 ×10 4 . This is due to the fine sampling in the parameters, the high dimension of the spectra, and the large wavelength range which makes the process of SIR computationally heavy in terms of memory and time. To reduce the LDB which will be used for SIR, for each observed star, a PCA-based nearest neighbor search in the reduced subspace is applied (Paletou et al., 2015a). This is done using the "PCA distance" d (O) j , defined as: where k is the projection coordinate on the k th dimension for an observed spectrum, and p jk is the projection coefficient on the k th dimension for the j th synthetic spectrum. Finally for the SIR, a set of NN will be selected for each observed star as we will later describe in sec. 3.3.

Tikhonov regularization
For the Tikhonov method (Vogel, 2002), one usually inserts a regularization parameter δ > 0 into the ill-conditioned system, usually based on a priori information gathered by analyzing the noise of each observed spectrum. Considering the following matrix: The eigenvector β λ (δ) associated to the largest eigenvalue of the matrix defined in Eq. 11 is calculated based on an optimization approach. For each parameter of each observed star, an optimum and specific δ is calculated. This procedure is initiated by estimating the signal to noise ratio (S/N) of the observed spectrum using the procedure of Stoehr et al. (2008). Then a random set of synthetic spectra are selected from the LDB, and Gaussian white noise having the same S/N as the one of the observed spectrum is added to them. SIR is finally applied to this selected random set and the prediction of their parameters is done via the piecewise interpolation process described in Eq. 9. This simulated inversion leads to the selection of an optimum β λ (δ). δ is estimated by minimizing the difference between the newly inverted parameter values of the randomly selected noise added spectra ( y i ) and their initial noiseless values (y i ). The comparison is done using a normalized χ 2 : It was found that log 10 (χ 2 N ) as a function of log(δ) is a unimodal function which has a local minimum. This function is displayed in Fig. 3 for a synthetic spectrum having T eff , log g, [M/H] , v sin i and S/N of 7 600 K, 2.50 dex, 0.0 dex, and 197 km s −1 , and 196, respectively. The original LDB used in this example is the one of Gebran et al. (2016), explained in detail in Sec. 4. To find the minimum of these curves, we applied a golden-section search algorithm (Kiefer, 1953). It is a classical numerical technique that minimizes unimodal functions which have a global minimum. The inversion process for each analyzed spectrum, and each parameter, has its own χ 2 N = f (δ) that needs to be minimized.

Integrated scheme of the enhancements
Now that we have described the tools that were used to improve the SIR procedure, in what follows we discuss how these techniques are integrated to increase the accuracy of the inversion process. The flowchart in Fig. 1 summarizes our adopted approach. In our work Σ and Σ −1 Γ have reached dimensions of the order of ∼ 10 4 × 10 4 . For that reason and before applying the SIR process for each spectrum to be analyzed, we have reduced the dimension of these matrices by reducing the size of the original LDB using PCA as described in sec. 3.1.
During SIR, at least two distinct parameter values for each slice are required to construct the functional relationship. Therefore to select the optimum reduced LDB, a test for the construction of this relationship is required. Iteratively we tested for the number of distinct parameters by increasing the number of spectra of the nearest nearest neighbors. Whenever the values of the distinct concerned parameters become greater or equal to 2, the iteration breaks and the inversion proceeds to the interpolation. During the tests, there were situations where [Σ 2 + δI] −1 was singular. This iterative approach solved this problem by adding nearest neighbors. Generally, using a smaller LDB which contain only a set of closest spectra to the observed one theoretically insures the success of SIR compared to using the entire original LDB. When selecting a set of nearest neighbors, we insure a lower minimization value of the intra-slice variance V ar(S h ) in Eq. 8. Now within each slice the spectra are closer to each other and they are closer to the average spectrum of the slice. At the same time, choosing these optima reduced LDB's overcomes the issue of the degeneracies. In the PCA based NN-search (Gebran et al., 2016, Paletou et al., 2015a, we had cases where the d where extremely close to each or even equal, with a variety of parameters. In SIR, we do not face such issue because the value are regressed for each parameter and the synthetic spectra with similar or close parameters are averaged to a single slice. Now Σ −1 Γ has a dimension of 12 × 12 and is constructed from the optimized reduced LDB. Its high condition number implies that it is ill-conditioned (see the example of the left panel of Fig .2). Therefore to improve the inversion process for each observed spectrum, we apply the Tikhonov regularization in SIR for our selected optima reduced LDB's, iteratively. By applying this regularization, we are effectively taking advantage of the S/N ratio analysis and inserting the propagated noise infor-mation as a priori. In other words, we are applying an denoising procedure.
In Fig. 2, we display the inversion results for T eff of a noisy synthetic spectrum. This spectrum has a T eff value of 7600 K with an added Gaussian white noise of S/N = 196. As we iterate over different sizes of optimized reduced LDB's, a convergence is achieved in every case. For all of our tests, we noticed that the number of spectra in the optimized reduced LDB's did not surpass 500. It is shown in this figure that the condition number of the non-regularized matrix is ∼ 5 orders of magnitude larger than the ones in which the Tikhonov regularization was applied. The right panel displays the effect of the regularization on the inverted parameter ( T eff ) of the same spectrum. It is clearly shown that whatever the number of the nearest neighbors in the optimized reduced LDB is, inversion is achieved with higher accuracy than the one without regularization. The convergence occurs irrespectively of the value of the condition number, as long as it is smaller than the one without Tikhonov regularization. Figure 3 represents the minimization of the log 10 (χ 2 N ) as a function of log 10 (δ) for different sets of optimized reduced LDB's. This figure shows the unimodal nature of the curves irrespective of the size of the LDB.

Simulations and tests
In this section, we present the implementation and results of RSIR for two different sets of synthetic spectra. We also compare these results to the ones of the PCA NN-search to show the improvement in the accuracies of the derived parameters. To each of these spectra, white Gaussian noise was added with a random S/N. The spectra were calculated in the range of A to K type stars. The reason for selecting this spectral range is that in Sec. 5, we apply this procedure to a sample of the observed stars studied in Paletou et al. (2015a) and Gebran et al. (2016).

The learning databases
As done in Paletou et al. (2015b) and Gebran et al. (2016), model atmospheres were calculated using ATLAS9 with the new opacity distribution function (Castelli & Kurucz, 2003, Kurucz, 1992. These models assume local thermodynamic equilibrium (LTE), hydrostatic equilibrium, and a 1D plane-parallel atmosphere. Convection . log 10 (χ 2 N ) versus log(δ) for T eff of a synthetic spectra having T eff =7 600 K, log g=2.50 dex, v sin i=197 km s −1 , and with a S/N of 196. Each curve displays the minimization using different sets of nearest neigbor generated iteratively.
was treated using a mixing length parameter of 0.5 for 7 000 K ≤ T eff ≤ 8 500 K, and 1.25 for T eff ≤ 7 000K, following the prescriptions of Smalley (2004). Synthetic spectra were calculated using SYNSPEC48 . The adopted line lists were from Kurucz gfhyperall.dat 1 and modified with more recent and accurate atomic data retrieved from the VALD 2 and the

Inversion of simulated A stars
We used the LDB of Gebran et al. (2016) in which the effective temperature of the data varies from 6 800 up to 11 000 K. The wavelength region was chosen between 4 450−4 990 Å. This wavelength region harbors lines that are sensitive to all stellar parameters, and insensitive to microturbulent velocity which was adopted to be ξ t =2 km/s based on the work of Gebran et al. (2014Gebran et al. ( , 2016. The adopted resolution is 76 000 as it corresponds to most of the analyzed stars in Sec. 5. The ranges of all parameters in the A-star LDB are summarized in Tab. 1. Noise added synthetic spectra were calculated to be used as simulated observations. Around 1 500 spectra were calculated for A stars with parameters randomly selected within the range of the LDB but not necessarily at the grid points. To analyze the effect of the sampling on the RSIR technique, we have inverted these spectra using 3 different LDB's. For the same range in all the parameter only the step was modified in each database. As an example, in the LDB 1, T eff has a step of 100 K, whereas in LDB's 2 and 3, the steps are 200 K and 400 K, respectively. The same was done for all parameters and the details about the steps are found in Tab. 2. The sampling of the v sin i in the LDB's is not constant and depends on the value of v sin i (Gebran et al., 2016).
To compare the results of the inversion of PCA NNsearch and RSIR for 1 500 spectra, we estimate the root mean square error for both techniques, Λ, defined as: where y (true) i is the known parameter of the i th synthetic spectrum and y (inv) i its corresponding inverted one.
Columns 4 and 5 of Tab. 2 display the Λ results using the PCA NN-search and the RSIR for the 3 LDB's. The offsets, calculated as a signed mean difference, between the inverted and the true values are presented in the last two columns of Tab. 2. Comparing the Λ values of each approach, an improvement is achieved using RSIR for all parameters. One exception exists in the case of v sin i for test 3. The large v sin i step of the original LDB causes the PCA NN-search pre-processing stage to select inaccurate NN's. For most cases RSIR with a coarse sampling in parameters is producing more accurate inversions compared to PCA with a denser sampling. This directly infers a gain in computational time as a coarse sampling leads to smaller LDB. The time required to invert the parameters of one synthetic spectrum depends on the computational facilities. For instance, the gain in time for using the Astars LDB of test 2 instead of the one of test 1 is ∼25%.
To analyze the effect of the S/N on the inversions, we display in Fig. 4 the inverted T eff as a function of the real T eff for the 1500 A star spectra, for different S/N and different LDB's (tests 1, 2 and 3). The results of the PCA NN-search is affected both by the sampling size and the S/N of the analyzed stars, whereas for RSIR, with the pre-processing of PCA and a Tikhonov regularization this effects becomes less significant on the accuracy of the inversion. In the appendix, we present the behavior of the inversion of log g, [M/H], and v sin i. A similar behavior to the one of T eff can be concluded for these parameters and this can be shown in Figs. 8,9 and 10.

Inversion of Simulated FGK type stars
The same procedure was applied to FGK star-like spectra. Around 2 500 noisy spectra were produced in the ranges described Tab. 1. As done for the A stars, the parameters of the FGK synthetic spectra were also selected randomly. The chosen resolution of 50 000 is the same used in Paletou et al. (2015a). The wavelength range was selected from 5 000 to 5 400 Å containing the Mg i b triplet, a good indicator of log g and sensitive as well to T eff . The microturbulent velocity was set to ξ t ∼ 1 km s −1 (Gebran et al., 2014). The inversion results (Λ and offsets) as a function of the sampling steps are shown in Tab. 3. These results show similar behavior to that of the A stars in terms of improvement in accuracy while comparing RSIR to PCA NN-search. In Fig. 4 we also overplot the T eff for our F/G/K noisy synthetic spectra. The effect of inversion as a function of S/N and sampling is very similar to the one of A stars, and for all the parameters (Figs. 8, 9 and 10).

Application to observed spectra
The performance of the RSIR has been tested on two samples of stellar spectra. The first sample is the one of the Spectroscopic Survey of Stars in the Solar Neighborhood (S 4 N, Allende Prieto et al. 2004). These are spectra of bright FGK stars that are at distance less than 15 pc. We have estimated the S/N of these spectra in the wavelength range used for the inversion of the parameters [5 000−5 400Å]. This ratio ranges between 40 and 450. These spectra are at a resolution of λ/∆λ ∼50 000. All the details about the acquisition and the reduction procedure of the S 4 N data can be found in Allende Prieto et al. (2004). These spectra were inverted using the database of Paletou et al. (2015a), made of 905 spectra retrieved from the ELODIE stellar library (Prugniel & Soubiran, 2001, Prugniel et al., 2007  Mg i b triplet wavelength range as explained in Sec. 4.3. We then compared the values of the inverted parameters with the ones of Allende Prieto et al. (2004) and to the medians found in the Vizier catalog 5 for all these stars. The main reason for using this catalog for our comparison is the necessity for reliable and objective catalogs which are constructed based on previous adopted values by the astronomical community.
Comparing our inverted T eff to the ones of Allende Prieto et al. (2004), we found an average signed difference of 2.09 K with standard deviation of 102 K. For log g, the average signed difference and the standard deviation are both 0.15 dex. For [M/H] and v sin i, we found -0.06±0.08 dex and -0.21±1.89 km s −1 , respectively. If we compare our inverted values to the median of Vizier, we find -85±110 K, -0.07±0.16 dex, 0.01±0.10 dex and -0.50±2.25 km s −1 as a signed mean difference and a standard deviation between the catalogued values and the inverted ones for T eff , log g, [M/H] and v sin i, respectively. Figure 5 displays in filled circles, for the four parameters, the comparison between our inverted values for the FGK observed spectra and the ones derived from Vizier. We have also assigned the catalogues values an error bar corresponding to the standard deviation of the dispersion in the catalogues values for each star.
The second sample of our analysis is constituted of the well studied A stars of Gebran et al. (2016). These are the 19 stars that have been studied extensively by different authors using different techniques (Vega, Sirius A, HD 22484, HD 15318, HD 76644, HD 49933, HD 214994, HD 214923, HD 113139, HD 114330, HD 27819, HD 5448, HD 33256, HD 29388, HD 91480, HD 30210, HD 32301,    HD 28355, and HD 222603) and have more than 120 references each. The source of these high resolution spectra is explained in detail in Gebran et al. (2016). They were observed using ELODIE, NARVAL, ESPaDOnS and SO-PHIE spectrographs. ELODIE has a resolution of 42 000 whereas NARVAL, ESPaDOnS and SOPHIE are at a resolution of ∼76 000. We have applied the RSIR on these data using the database of Test 1 in Sec. 4.2 at both resolutions. The S/N of these spectra is between 180 and 360. The inverted parameters of each star were compared to the ones retrieved from Vizier and added to the plots of Fig. 5 as filled triangles. We found an average signed difference and a standard deviation of -0.14±245 K, -0.20±0.30 dex, -0.11±0.09 dex and -2.07±8.5 km s −1 for T eff , log g, [M/H] and v sin i, respectively between the inverted and the Vizier parameters.

The case of log g
These results show that most of our inverted parameters are in agreement with previous studies. Considering that the most accurate parameters of these A and FGK stars are the Vizier median, our values are less spread with respect to the median than the ones of Paletou et al. (2015a) and Gebran et al. (2016). The standard deviations that we found could be assigned as an estimation of the errors Surface gravity is systematically the most difficult parameter to determine, with typical errors of the order of 0.15 to 0.3 dex. This parameter is very important for chemical analysis as some line profiles could be very sensitive to log g values. Spectroscopic determinations of surface gravity have always been assigned moderately large error bars, especially for A stars (Smalley, 2005). The same applies to FGK stars but with smaller error bars. Asteroseismic log g determinations remain the best tools for achieving accuracies less than 0.05 dex (Chaplin et al., 2014, Creevey et al., 2013, Hekker et al., 2013. RSIR is mainly based on finding the best set of spectra in the database that correspond to the observed one. As it is a spectroscopic method, we should not expect an accurate recovery for log g. Using our values for ∆logg we can, a posteriori figure out what it means in terms of discernibility between two spectra whose respective log g differ from this quantity. This also gives us relevant information about (i) which specific bandwidth(s) are the most sensitive to such differences, and (ii) how significant they are for various S/N. Figures 6 and 7 display the variation in the spectrum profile as a function of log g, fixing all the remaining parameters, for G and A stars, respectively. In Fig. 6, we calculated synthetic spectra for a typical G star with a T eff of 5 200 K, [M/H] of 0.0 dex, v sin i of 6 km s −1 at a resolution of 50 000 and in the wavelength range of 5 000−5 400 Å. The only parameter that differs between the 3 spectra is log g, ranging between 4.00 dex and 4.30 dex with a step of 0.15 dex. The upper panel of Fig. 6 displays the normalized flux of the synthetic spectra in the Mg i b triplet region. The flux level in this region is very sensitive to variation in log g. The following panels displays the same spectra for different values of S/N. When no noise is added (panel with S/N∼ ∞), the distinction between the 3 spectra is clear but when S/N starts to decrease, the distinction between the noisy spectra becomes harder to detect. This shows that for a S/N in the order of 100, the noisy spectra with log g of 4.00 and 4.15 dex are very similar and therefore the best corresponding synthetic spectrum in our LDB could have a log g varying at least 0.15 dex from the correct value. We are not trying to quantify the minimum S/N required for an accurate inversion of log g as the RSIR is not based on a pixelto-pixel comparison, but we are showing the effect of our derived standard deviations in log g on the flux for noisy spectra. Figure 7 displays a similar behaviour for A stars having similar T eff of 8 500 K, [M/H] of 0.0 dex, v sin i of 40 km s −1 , at a resolution of 76 000 in the wavelength range of 4 500-5 000 Å. Surface gravity of these spectra ranges between 3.60 and 4.20 dex with a step of 0.30 dex. This figure shows a similar behavior to that of Fig. 6. At a S/N of ∼150, the distinction between spectra having a difference of 0.30 dex in log g, becomes hardly noticeable. Figures 6 and 7 also show that the effect of weak metallic lines, on the derivation of log g, becomes negligible as the S/N decreases. The log g information that these lines contain is mainly lost in the noise.

Discussion and conclusion
RSIR tests for nearly 4 000 synthetic stars of different spectral type and different noise levels showed an improvement over the PCA-based method of Paletou et al. (2015a,b) and Gebran et al. (2016) for the inversion of stellar parameters. Results of Tabs. 2 and 3 and Fig. 4, for FGK and A stars, show that for most of the tests, the Λ values of RSIR are lower than nearest neighbor PCA approach. Having a prior information about the star using PCA as a pre-process allows us to narrow down the selection of the optimized reduced databases. This decreases drastically the size of the LDB's. Achieving lower Λ with bigger steps helps in decreasing the prohibitive computation time for the construction of databases and the calculations of the PC's. Simulated tests revealed that computation time of RSIR is nearly 1% of that of the process of PCA nearest neighbor approach.
One should be very careful while increasing the size of steps of the parameters because the PCA pre-processing step could deviate drastically from the true inverted values therefore excluding the spectra that actually best describes the observed ones.
Application to observed FGK and A stars reveal a good agreement between the inverted parameters and the ones derived in previous studies. The comparison with Vizier catalog values show an improvement in the derived parameters as compared to the results of Paletou et al. (2015a) and Gebran et al. (2016) for the same stars and LDB's. Surface gravity remains the parameter with the least accuracy. Our derived errors on log g are in the order of 0.15-0.30 dex. Smarter LDB's should be therefore considered, say, "adaptive sampling" (in the parameters under study), taking care with more caution of the flux typical variations at the most sensitive wavelength (sub-)domains, together with the S/N of the observations, instead of the a priori sampling in the parameters. Also, a commonly reported issue with the inversion of stellar parameters using a LDB of synthetic spectra are the so-called "ambiguities". This means that two sets of distinct parameters may generate spectra which are beyond "discernibility". Given a set of observed spectra to characterize, we could naturally relate that discernibility to their level of S/N. Using a nearest neighbor search PCAbased method, for instance, such a level of S/N can easily be translated into a threshold of distance δ PCA . Then, we can anticipate that, instead of relying on LDB's usually made using a priori sampling in the parameters, a smarter DB should rely on δ PCA instead. This would imply to set up LDB's for fundamental stellar parameters in a radically different fashion vs. common practices. In the frame of PCA, it would be more relevant to sample properly the full range of parameters with a "constrainedrandom" process ensuring that there are no nearest neighbors closer than δ PCA . Such a "sieve algorithm" was first proposed by López Ariste & Casini (2002) in the context of the characterization of magnetic fields from spectropolarimetric data (see also Casini et al. 2013). Another line of development relates to the "structure" of our LDB's. Smarter, or optimal LDB's, using different methods of samplings, should be considered. Such a general issue was already evoked by Bijaoui et al. (2012) for instance. Available online databases are usually calculated with large steps in T eff and log g. Our RSIR technique, as it does not require small steps in the LDB, is a good tool to be used with online available synthetic spectra such as the POLLUX 6 database (Palacios et al., 2010) that contains models with temperature ranging between 3 000 and 50 000 K or TLUSTY Non-LTE Line-blanketed Model Atmospheres of O-Type Stars (Lanz & Hubeny, 2003) with T eff ranging between 27 500 and 55 000 K with 2 500 K steps, and log g between 3.0 and 4.75 with steps of 0.25 dex. We can also mention the PHOENIX (Husser et al., 2013) models database for stars having T eff <12 000 K and 6 pollux.oreme.org the AMBRE (de Laverny et al., 2012) project that contains high-resolution FGKM stellar synthetic spectra.
As an output of the new Gaia Data Release 2, Cropper et al. (2018) describe the Gaia RVS specification as well as the predicted performance at the end of the mission. Gaia RVS will provide us with a large number of spectra in the calcium triplet regime (845−872 nm). This triplet is very sensitive to T eff and log g. The medium resolution (11 500) of the RVS and the small range in wavelength would require LDB smaller than the ones used in our work, leading to a fast application of the RSIR. As we did for the inversion of the S 4 N data in Sec. 5, LDB could be constructed with real observed stars having well known fundamental parameters and with the same resolution. Finally, since RSIR is based on single parameter inversion process, one can also incorporate other parameters at the cost of computing and handling more numerous individual spectra, for example, microturbulence velocity and individual chemical abundances.  Inverted v sin i in km s −1 Fig. 10. Results of projected equatorial velocity inversion for the synthetic A to K stars.