Advancing interpretation of stable isotope assignment maps: comparing and summarizing origins of known-provenance migratory bats

Abstract Probability-of-origin maps deduced from stable isotope data are important for inferring broad-scale patterns of animal migration, but few resources and tools for interpreting and validating these maps exist. For example, quantitative tools for comparing multiple probability-of-origin maps do not exist, and many existing approaches for geographic assignment of individuals have not been validated or compared with respect to precision and accuracy. To address these challenges, we created and analyzed probability-of-origin maps using stable hydrogen isotope values from known-origin individuals of three species of migratory bat. We used a metric of spatial overlap to group individuals by areas of origin without a priori knowledge of such regions. The metric of spatial similarity allowed for quantitative comparison of geographic origins and grouping of individuals with similar origins. We then compared four approaches for inferring origins (cumulative-sum, odds-ratio, quantile-only, and quantile-simulation) across a range of thresholds and probable minimum distance traveled. The accuracy of geographic origins and minimum distance traveled varied across species at most threshold values for most approaches. The cumulative-sum and quantile-simulation approaches had generally higher precision at a given level of accuracy than the odds-ratio and quantile-only approaches, and many threshold values were associated with a relatively high degree (> 300 km) of variation in minimum distance traveled. Overall, these results reinforce the importance of validating assignment techniques with known-origin individuals when possible. We present the tools discussed as part of an R package, ‘isocat’ (“Isotope Origin Clustering and Assignment Tools”).

Identifying the connections between habitats used by migratory animals is essential to understanding the ecology and movement of wildlife as well as informing conservation and management [1,2]. Endogenous markers are particularly useful in this regard. For example, stable hydrogen isotope (δ 2 H) values obtained from metabolically inert tissues (e.g., hair, feathers) provide a cost-effective means of identifying the geographic origins, and thus movement and connectivity, of terrestrial organisms [3,4]. The rationale behind this approach is that δ 2 H values of animal tissues reflect those of the environment where they were synthesized [3]. Thus, δ 2 H values obtained from tissues grown at one location and sampled elsewhere can be used to infer the location(s) where the tissues were grown, given an understanding of spatial variation in environmental δ 2 H values and relationships between environmental and tissue δ 2 H values.
In recent years, a general framework has been developed for using δ 2 H values to generate probabilityof-origin maps of individual organisms. By comparing tissue δ 2 H values to model projections of δ 2 H values of precipitation (known as isoscapes), one can generate predictions of the relative likelihood of a tissue sample's true origin being a particular location (e.g., [5,6]). Different approaches have then been used to identify likely regions of origin from these maps, summarize patterns across individuals, and quantify the distance and direction of movement (e.g., [7,8]). However, tools for interpreting model output, including classifying and comparing groups of individuals of similar origins posthoc, have lagged behind the development of tools for creating isoscapes and probability-of-origin maps (e.g., [5,9]).

Existing approaches for interpreting model output
Some prior studies have summarized animal origins inferred from probability-of-origin maps by partitioning probability surfaces into previously identified discrete geographic regions, such as breeding ground locations (e.g., [10][11][12]) or management units (e.g., [13,14]). This approach calculates the mean of all probability-oforigin values associated with all cells within a particular geographic region of the probability surface of each organism; individuals are then typically considered to originate from the region with the highest mean probability. However, identifying the region of origin of individuals based on the highest mean probability can be problematic. For example, by summarizing all probabilities within a region with the mean probability, potential biases could be introduced as a result of disparate region sizes or skewed within-region probability distributions. Additionally, an individual assigned to one geographic region might have a similar mean probability of originating in another region, and such differentiation between regions based on minor differences in mean probability values may be arbitrary and/or insignificant. Furthermore, a priori information about potential regions of geographic origin (e.g., breeding locations) is largely unknown for many species [15] and therefore it is often not possible to define meaningful regions of potential origin for comparison.
As an alternative to a priori identification of potential geographic regions of origin, relative probabilities of origin may be spatially projected as a continuous probability surface in which every location (i.e., raster cell) contains a probability value. Prior studies have used different approaches for distilling and summarizing these data-rich surfaces to emphasize the most likely region of origin. For example, several studies have divided probability surfaces into binary regions of "high" and "low" probability of origin (e.g., [16,17]). By considering only cells of highest relative probability, the most likely regions of origin for multiple individuals can be compared and summarized. These approaches have been used to identify the most likely region of origin of individuals and compile results from multiple individuals into collective maps [18,19]. However, shortcomings of these approaches include: (1) that much of the probabilistic information of the original surfaces is lost [9], (2) they can obscure unique-butuncommon regions of origin among individuals [20], and (3) aggregate surfaces can become difficult to interpret when applied to numerous individuals with overlapping or differently-sized regions of most-likely origin.
Recent studies have also used data from individuals of known geographic origin to assess the likelihood that individuals of uncertain origin came from particular areas [21,22]. This approach uses Monte Carlo simulations of the minimum probability values that contain the true location of known-origin individuals to select a threshold to establish areas of origin for individuals of unknown origin. However, the ability of this method to identify the geographic region of origin of individuals has not been assessed relative to more widely used methods, such as the odds-ratio approach (e.g., [23]).
Finally, recent studies have estimated seasonal animal movement distance using probability-of-origin maps derived from stable isotope data. For example, the distance between the sampling location and the nearest cell(s) of maximum probability [8,9] or to centroids or boundaries of likely regions-of-origin has been quantified [24,25] and used to assess the potential distance traveled by individuals. However, these approaches have not been rigorously compared and the uncertainty associated with such distance estimates is unknown.
We herein develop and compare approaches for using continuous probability surfaces generated from stable isotope data to assess the geographic origins of individuals, and we present an R package for performing such analyses. Although the approaches we use are widely applicable, we illustrate them with δ 2 H data from known-origin individuals of three North American bat species: the hoary bat (Lasiurus cinereus), eastern red bat (Lasiurus borealis), and silver-haired bat (Lasionycteris noctivagans). These three tree-roosting Vespertilionid bat species are thought to be long-distance seasonal migrants with broad geographic ranges, but given that they are difficult to re-capture and too small to carry modern realtime geographic tracking systems, little is known about their habitat use, abundance, and migration patterns [8,26]. There is growing interest in using δ 2 H data to assess the movement of these and similar species that present challenges to traditional methods of tracking movement [27]. Our study has two primary objectives: (1) Demonstrate a hierarchical clustering approach to group individuals to similar geographic regions of origin without a priori knowledge of such regions and (2) Quantify the efficacy with four methods commonly applied for delineating (a) region of geographic origin and (b) probable distance traveled. In each of the four methods, every potential origin (cell in a probability-of-origin surface) was adjusted as follows: (1) a cumulative-sum approach, for which probabilities-of-origin were transformed to reflect the cumulative-sum of probabilities below the original value [28]; (2) an odds-ratio approach, where probabilities were transformed to reflect the relativized odds of assignment with respect to the odds of the maximum probability of origin [29]; (3) a quantile-only approach, where probabilities were ranked by quantile with respect to all probabilities in the original surface; and (4) a quantile-simulation approach, which uses Monte Carlo simulation to estimate the likelihood that the quantile probability could fall within a distribution of quantiles of individuals of known geographic origin [22,24]. The methods developed and explored herein are available in the R package 'isocat' [30].

Isotope data and analysis, isoscape, and probability surfaces
We collected and analyzed δ 2 H values of hair (δ 2 H hair ) from hoary (n = 147), eastern red (n = 182), and silver-haired (n = 39) bats, sampled from turbine-killed carcasses, live captures, and museum specimens. We combined these results with published δ 2 H hair data from these same species (hoary, n = 432; eastern red, n = 124; and silver-haired, n = 88) as reported respectively in [8,17,31] (Figure 2). All 1,012 individuals were of known geographic origin (i.e., they were sampled during the period of presumed molt) based on estimated dates of molt from [8,17,31].
Hair samples for which isotope analysis had not been previously performed were prepared using the sample preparation and δ 2 H measurement protocols detailed in [31]. In brief, samples were cleaned using 1:200 Triton X-100 detergent, 100% ethanol, and then air-dried [32]. To account for exchange of keratin hydrogen with ambient vapor, we used a comparative equilibration approach (Wassenaar & Hobson, 2003) in which samples were analyzed alongside international standards (USGS42; USGS43; CBS, Caribou Hoof Standard; KHS, Kudu Horn Standard; [32,34]) and an internal keratin standard (porcine hair and skin, product # K3030; Spectrum Chemicals, New Brunswick, NJ, USA). The cleaned, homogenized hair from each sample was exposed to ambient air to permit equilibration of exchangeable hydrogen in keratin. Samples were analyzed for δ 2 H values using a ThermoFisher high temperature conversion/ elemental analyzer (TC/EA) pyrolysis unit interfaced with a ThermoFisher Delta V+ isotope ratio mass spectrometer. Values of δ 2 H were normalized to the Vienna Standard Mean Ocean Water-Standard Light Antarctic Precipitation (VSMOW-SLAP) scale using USGS42, USGS43, CBS, and KHS. The δ 2 H values of non-exchangeable hydrogen of these standards are -72.9, -44.4, -157.0, and -35.5‰, respectively [33,35].
Elevation, latitude, and latitude 2 were used to generate a 2 km 2 resolution isoscape of June -August δ 2 H values of precipitation (δ 2 H precip ) across North America using IsoMAP (IsoMAP jobs 66087 and 66098, see [5,31,36]). Values of δ 2 H precip were obtained from the isoscape for the site that each known-origin individual was sampled. When geographic coordinates for a sample site were not available, georeferencing to site or county level was conducted using the 'geocode' function of the R package ggmap (v. 2.6.1; [37]. We fit reduced major axis regressions between values of δ 2 H hair and δ 2 H precip for each species. All δ 2 H hair values were transformed into δ 2 H precip values using mean regression coefficients ( Figure S1).
The probability of an individual having originated in a given cell of the isoscape was predicted using the following equation, as described in [5]: where the probability that the given cell represented the origin of the individual with a given δ 2 H precip value y is f(y|μ,σ), given an expected mean (μ) and combined error (σ) within the δ 2 H precip isoscape and mean regression coefficients. The combined error was defined as: where σ isoscape is the standard deviation of a given cell of the isoscape model (IsoMAP jobs 66087 and 66098) and σ regression is the standard deviation of the variance of the major axis regression between δ 2 H hair and δ 2 H precip for each species ( Figure S1). The resulting surfaces were normalized to sum to 1. Analyses were performed in R v. 3.3.1 [38] using the likelihood function adapted from [39] within the isocat function 'isotopeAssignmentModel'.

Surface similarity and clustering
To quantitatively compare continuous probability surfaces, we used a metric of niche overlap, Schoener's D-metric ( [40]; as described in [41] and reviewed in [42]): where z 1i and z 2i are two surfaces z 1 and z 2 with values in grid cell i. The metric is applied to normalized surfaces and varies between no overlap (D = 0) and complete overlap (i.e., identical surfaces, D = 1). We conducted pairwise analyses among the probability surfaces for all individuals of each species, populating a n x n symmetric matrix with D values where n is the number of individuals. Colors represent the cluster associated with highest mean within-cluster probability for a given cell and are meant to indicate the geographic regions of origin most associated with each specific cluster.
To create groups of individuals of each species that had similar probable regions of origin, we applied hierarchical clustering to each similarity matrix using the "average" method to cluster by correlation distance in the R package pvclust (v. 2.0-0; [43]). We cut each tree at height h = 0.5 ( Figure 3) to account for different sample sizes among species while retaining a similar likelihood of detecting unique regions of origin. However, numerous qualitative and quantitative methods exist for deciding the height at which to cut dendrograms (e.g., [43][44][45][46]) and such approaches could be explored in future studies.
For each group of individuals, we created a mean probability surface representing the average probabilityof-origin for group members. To assess the degree to which grouping individuals improves a summarized representation of all likely origins, we compared the probabilities-of-origin at known sampling sites between mean aggregate surfaces summarizing within-species (i.e., before clustering) and within-cluster (i.e., after clustering) probabilities using a paired t-test. For summary purposes, we also used these mean aggregated surfaces to depict spatially distinct regions of probable origin of each species.

Comparison of existing methods for interpreting model output
Using data from the known-origin bats, we evaluated four thresholding methods to delineate the most likely region of origin for each individual: cumulative sum, odds-ratio, quantile-only, and quantile-simulation ( Figure 1). The cumulative-sum surfaces were created with reference to the entire probabilities in the surface. That is, the cumulative sum value (CS) of each grid cell z i in surface z was the sum of all cells in z less than or equal to z i : Since input probability surfaces had been normalized to sum to 1, the outputs of the cumulative sum model ranged from the minimum of the original surface (the smallest probability value) to 1 (the largest original probability value).
The odds-ratio (OR) surfaces were calculated for each probability surface using the ratio of ratios: within grid cell i of surface z (note that probability surfaces had already been normalized to sum to 1). These ORi values correspond to the X:1 odds-ratio format with X = 1 / (OR i + 1), e.g., odds-ratio threshold values of 0.5 correspond to 1:1 odds and an expected accuracy of 50%, threshold values of 0.25 with 3:1 odds and an expected accuracy of 75%, and threshold values of 0.05 to 19:1 odds and an expected accuracy of 95% ( Figure S2).
The quantile-only (QO) surfaces were determined by estimating the quantile value at each sampling location relative to all probabilities within the probability-oforigin surface using an empirical cumulative distribution function. This function estimates the quantiles of surface z for cell values X = (z 1 , z 2 , …, z n ) where is the fraction of observations less than or equal to t: … and the output quantile-only surface is a transformed probability surface where the value of grid cell z i is: Like the quantile-only method, the quantile-simulation approach also estimates quantile probabilities, but then quantifies the similarity of those quantile probabilities to those of known-origin individuals. This simulation element effectively corrects for poor or variable model accuracy by incorporating the model performance of known-origin individuals. Our quantile-simulation approach, modified from Pylant et al. (2016), incorporated model performance for known-origin individuals as follows: we fit six candidate distribution functions (normal, log-normal, exponential, Weibull, γ, and logistic) to the distribution of 1 -q values, where q is the quantile of the probability at the sampling location of each knownorigin individual, using maximum likelihood estimation within the 'fitdistrplus' package (v. 1.0.9, [47]). When fitting the quantile-simulation model, the best-fit distributions for each species were Weibull, exponential, and γ (AIC scores -369.3, -55.8, -18.4, Figure S3) for hoary, eastern red, and silver-haired bats, respectively. We subsampled (with replacement) from the best-fit distribution for each species 10,000 times, discarding values outside of [0, 1]. We applied Monte Carlo simulation to each surface cell, counting how many times 1 -q fell below the simulated values; output was the proportion of times 1 -q for a given cell remained below the simulated values. The resulting surface was rescaled between 0 and 1.
We assessed the accuracy of each of the four methods by measuring the proportion of "true origins" (sampling sites of known-origin individuals) with probabilities greater than each of a range of threshold proportional probability values (from 0.0 -1.0 in increments of 0.01). All individuals would be expected to be included with a probability threshold value of 0, with progressively fewer included as the probability thresholds increased. We calculated the precision of each method as the proportion of cells in the whole surface higher than each candidate threshold value. To recommend thresholds for geographic assignment for each of the four methods, we selected the smallest threshold that correctly assigned > 75% of knownorigin individuals of each species. We calculated the area under the curve (AUC; an indicator of model precision and accuracy) for both the threshold/accuracy curves (for each species and method) and the accuracy/precision curves (for each individual) using the 'auc' function of the "MESS" package (v. 0.5.6, [48]) in R. We calculated AUC for every range under the curve associated with the top 99−0% of thresholds and accuracies, respectively. To provide a basis for comparing the accuracy/precision tradeoffs associated with each approach [49], we report several ranges of AUC for the threshold/accuracy curves, corresponding to very high accuracies (accuracies encompassing 95% and 100% of validation individuals included), moderate to high (between 75% and 100% included), and low to high (1% to 100% included).
For each method, we also measured the geographic distance from each known-origin individual's sampling location to the nearest cell with a probability greater than each threshold value as follows: for each cell in the specified isoscape, the distance to the sampling location was computed using the 'distanceFromPoints' function of the 'raster' package (v. 3.0-7, [50]) in R. Then, we calculated the minimum distance from the sampling location to the nearest edge of the region delineated by a given threshold (in which the surface values were equal to or greater than a specific threshold) for each standardized probability surface for each method (i.e., cumulativesum, odds-ratio, quantile-only, and quantile-simulation). We then calculated the AUC for each individual for each threshold/distance curve. To evaluate the relationships among approaches across species with respect to AUC, we conducted a two-way factorial ANOVA ('aov' function, [38]). To summarize the distance to the edge of each potential threshold for each surface, we considered mean and variance in minimum distance from the sampling location of each individual to the nearest cell containing a given threshold.

Results
The δ 2 H hair values for the 1,012 individuals included in this study ranged between 24 and -210 ‰. There was a strong positive relationship between δ 2 H hair and δ 2 H precip values during the presumed period of molt for each species ( Figure S1).
Pairwise comparisons of spatial similarity (Schoener's D metric) conducted on the normalized probability-oforigin surfaces revealed mean within-species similarity values across all individuals of 0.53 (range: 1.210×10 -7 to 1). After clustering of the similarity matrices of D values for each species, individuals were assigned to 4 (hoary), 2 (eastern red), and 4 (silver-haired) groups of likely origin (Table 1) based on cutting at h = 0.5 ( Figure 3). The aggregated probabilities of origin at "true origins" increased when probabilities of origin were averaged across all within-group surfaces ( Figure 4) relative to ungrouped aggregate probability surfaces (p < 0.001; Table 2). The accuracy of assignment for known-origin individuals at a given threshold varied among species for each method ( Figure 5). Among species, there was a 48.1% difference between the minimum and maximum AUC for the cumulative-sum approach, 50.8% difference for the odds-ratio approach, a 16.2% difference for the quantileonly approach, and a 9.4% difference for the quantilesimulation approach (Table 3). At the commonly applied odds ratio of 2:1 [29,51] (corresponding to a threshold value of 0.33 and expected accuracy of 66%, Figure S2), 72, 96, and 59% of known-origin hoary, eastern red, and silver-haired bats, respectively, were accurately assigned ( Figure 5). As expected, there was a trade-off between accuracy and precision of geographic assignment for knownorigin individuals of each species at a given threshold for each method. As accuracy (proportion of known-origin individuals correctly assigned) increased between 0 and ~0.75, precision (representing the proportion of the surface below the threshold cutoff for a given accuracy) gradually declined, with some variation in the rate of decline among species and methods. For example, the decline in precision across species was greater for the odds-ratio and quantile-only approaches than for the cumulativesum and quantile-simulation approaches. As accuracy increased beyond ~0.75, precision generally declined more rapidly ( Figure 6). Across species, the mean precision with respect to accuracy varied among methods (Figure 7). The mean AUC for the top 99% of all accuracies ranged from a minimum of 0.74 for silver-haired bats with the odds-ratio approach to a maximum of 0.95 for eastern red bats with the cumulative-sum approach. At this accuracy level, the odds-ratio and quantile-only approaches had the lowest AUC values for each species, and the cumulative-sum approach had the highest AUC value for eastern red bats, a similar AUC value to that of the quantile-simulation approach for hoary bats, and a lower AUC value than that of the quantile-simulation approach for silver-haired bats. For the top 25 and 5% of accuracies the cumulative-sum approach, followed by the quantile-simulation approach, had the highest AUC values for eastern red bats. In contrast, the quantile-simulation approach, followed by cumulative-sum approach (for the top 25% of accuracies) and quantile-only approach (for the top 5% of accuracies), had the highest AUC values for hoary and silver-haired bats (Figure 7). Across species, the distance from each known-origin individual's sampling location to the nearest cell with a probability greater than each threshold value increased gradually below threshold values of ~0.75 and more rapidly thereafter (Figure 8). There was no difference in distance with respect to threshold values across method (ANOVA of AUC values across methods: p = 0.45, F = 0.89, df = 3), although there were differences across species (p < 0.001, F = 240, df = 2). At the commonly-applied oddsratio threshold of 0.33 (2:1 odds), this mean distance was Figure 5. The proportion of individual bats (vertical axes) for which the known origin is contained within the probability-of-origin surface region exceeding a given threshold value (horizontal axes) for each of the four thresholding methods. The horizontal lines indicate 75% accuracy, i.e., 75% of individuals of a given species having probability-of-origin values above the corresponding threshold value. Figure 6. Accuracy (proportion of known-origin individuals assigned to their known-origin sampling location at each proportional probability value) of each method relative to the precision of each method (inferred by the proportion of the probability-of-origin model surface below a given threshold) for each species. Solid lines indicate mean precision at a given accuracy; shaded areas indicate the range of one standard deviation from the mean for given accuracy. The vertical lines indicate 75% accuracy of assignment as local. Figure 7. Area under the curve (AUC) relating accuracy and precision values for each species and method. Panels arranged vertically show the AUC for varying ranges of accuracy, from (respectively) 99% of all potential accuracies (from 1% accurate to 100% accurate), the top 25% (from 75% to 100% accurate), and the top 5% (from 95% to 100% accurate). Letters denote the results of Tukey's HSD tests (p-value < 0.05) for within-species and AUC-threshold analyses. relatively low but with moderate variance (mean error across species = 117 km, sd = 329). The distance extending from each known-origin individual's sampling location was 448 km (sd = 466 km) at the 99 th quantile of cells, 723 km (sd = 448 km) at the odds-ratio threshold of 0.99, 1053 km (sd = 1128 km) at the quantile-simulation threshold of 0.99, and 642 km (sd = 613 km) at the cumulative-sum threshold of 0.99.

Spatial similarity and hierarchical clustering
Our results illustrate that Schoener's D-metric and hierarchical clustering provide a powerful approach for grouping and comparing probability-of-origin maps. Furthermore, we found that hierarchical clustering using mean aggregate surfaces increased the relative probability of origin at the sampling location of knownorigin individuals relative to their probability values prior to clustering. A key advantage of this clustering approach relative to existing approaches for partitioning probability surfaces is that the results of the former can be used to define geographic regions of origin without a priori knowledge of such regions.
The use of continuous aggregate probability surfaces overcomes challenges associated with other methods of combining probability surfaces and summarizing individual origins. First, the resulting probability surfaces from the clustering approach are straightforward to read and interpret, even when analyzing and presenting large datasets. It is often difficult to compare and interpret large numbers of individual probability surfaces or a single summary map of the origins of multiple individuals, especially when the probable regions of origin of different individuals overlap substantially and/or differ in size [9]. When applying hierarchical clustering to create and group aggregate probability surfaces, the maximum sample size yielding interpretable output is unlimited. Second, relatively small subsets of individuals with unique origins that may be unnoticed in a group mean or potentially biased sampling effort within an aggregate probability surface are not obscured using the cluster approach. Grouping individuals by common origin should be preferred over approaches that consider only average within-group (e.g., species or demographic subgroup) origins (e.g., [52]) or stacked regions of probability (e.g., [29,53]) because the grouping approach retains all of the available information about the diversity of geographic origins across individuals. Finally, aggregate maps representing the common origins of groups of individuals retain the entirety of the data within those surfaces while avoiding potential drawbacks associated with high/ low designations (e.g., arbitrary thresholding, loss of relatively low-probability-yet potentially informativeregions of each map) and remaining suitable for further analyses (e.g., of spatial connectivity, distance, and/or direction traveled).
Future studies could build upon the hierarchical clustering approach presented herein. We divided the bats in our study into a relatively small number of groups per species (2 -4) based on a subjective tree-pruning height specification (h = 0.5; Table 1, Figures 3-4). This approach could be modified to suit specific research objectives. For example, the number of clusters could alternatively be selected by k-means. Hierarchical clustering could also potentially be performed across, rather than within, species to assess inter-specific differences in geographic origins. After defining probable regions of origin, one can quantify the strength of migratory connectivity (e.g., [54]) by associating individual sampling locations with each probable region of origin. Future studies could apply this clustering method to help estimate the presence and type of migratory structure (e.g., chain, leapfrog migration) exhibited by various species.

Comparison of existing methods for interpreting model output
We found substantial variation in model accuracy among species at most realistic thresholds (i.e., > 50% accuracy), which indicates that those thresholds may not correspond predictably or consistently to expected accuracy levels. For example, there was substantial variation in accuracy across species at an odds-ratio threshold of 0.33 (corresponding to 2:1 odds), with a 37% difference in accuracy between silver-haired and eastern red bats ( Figure 5). Such variation in accuracy has implications for studies that compare the geographic origin at a given threshold for multiple species, since it indicates that a single threshold is unlikely to yield assignments with the same accuracy among different species. For example, Pylant et al. [22] used a single quantile-simulation threshold value of 0.5 for assessing the whether hoary and eastern red bats obtained as carcasses at wind-energy facilities in the central Appalachian Mountains were killed in their geographic region of origin. Our results suggest that there is likely a ≥ 12% difference in accuracy of classification between these species at this threshold value. Furthermore, thresholds selected a priori for a given species or dataset may not necessarily reflect a predictably accurate level of assignment. When possible, we recommend that future studies comparing summary statistics (e.g., distance moved from a probability threshold-defined region) among species use knownorigin individuals to select species-specific thresholds associated with consistent accuracy levels to ensure comparable levels of data accuracy across species.
We consider the accuracy/precision relationship to be the most reliable indicator of how well a method for interpreting model output performs [55]. Probability threshold selection should be associated with high accuracy rates; however, accuracy is not the only indicator of the utility of a method. Precision is similarly important, since high accuracy but low precision would result in increased uncertainty of geographic assignment. To minimize potential sources of spatial biases across species in our accuracy/precision comparisons, we considered the entire extent of North America with respect to potential origins of individuals, rather than clipping or weighting potential origins by species range or habitat suitability. However, future studies might apply this accuracy/ precision framework when evaluating the addition of other factors to further constrain isotope-based models of animal origin, e.g., occupancy data [56] and abundance [8,55,57].
Within the scope of our study, each method performed at a reasonably high degree of accuracy at most threshold values across species and maintained relatively high levels of precision with respect to most accuracy levels ( Figures 5-7). However, differences in accuracy and precision among species illustrate considerations for selecting which approach to use to infer geographic origin and distance traveled. Of the four methods considered, the quantile-simulation and cumulative-sum approaches appear generally superior to the odds-ratio and quantile-only approaches based on the former two approaches having relatively higher precision at a given level of accuracy ( Figure 6). Since accuracy/precision relationships vary depending on the range of the curves under which AUC is measured, we considered the AUC ranges for three potential accuracy ranges: very high (accuracies encompassing 95% and 100% of validation individuals included), moderate to high (between 75% and 100% included), and low to high (1% to 100% included). For the former two AUC ranges, which should be considered the most informative (as they correspond to high rates of accuracy at corresponding probability thresholds), the quantile-simulation method outperformed the other methods for two of the three species (hoary and silver-haired bats, Figure 7). This outcome is likely a result of the Monte Carlo simulation component of the quantile-simulation approach, which accounts for many of the relatively low initial probabilityof-origin values for known-origin individuals of these two species ( Figure S3). For the eastern red bats, which had relatively high initial quantile values for known-origin individuals, the cumulative-sum method outperformed the quantile-simulation approach. Thus, we infer that the cumulative-sum approach is likely to perform well in situations where the initial isotope assignment of origin performs well, which is when known-origin individuals have relatively high and unimodal initial probabilities of origin at their sampling sites. However, we expect that high and unimodal initial probabilities of origin at sampling sites will not be the norm for many species. Thus, when data from known-origin individuals are available, we recommend the quantile-simulation method because of its more consistent precision ( Figure 6) and incorporation of model performance into designation of geographic origin. In cases where the quantile-simulation approach is not applicable (e.g., because data from known-origin individuals do not exist), we recommend the cumulativesum approach because it is the most consistently accurate method at fixed threshold values (Figures 5 and 7). However, there was substantial variation in accuracy among species at common threshold values using this method, so future studies should avoid or cautiously apply comparisons between species when selection of a threshold is necessary, but potentially arbitrary.
We expected that the distance from each knownorigin individual's sampling location to the nearest cell above each threshold value would be inversely proportional to precision (Figures 6 and 8). That is, with increasing precision, the area included within a particular threshold shrinks, thus the distance from a sampling location outside that threshold to the edge of the encompassed area would increase. Thus, we propose that these distances should be considered a quantification of uncertainty at a given threshold, i.e., as a mechanism to establish probable error in calculating minimum distance traveled. These results have implications for studies that estimate distance of animal movement using probabilityof-origin maps. For example, Cryan et al. [8] used the points of highest probability in probability-of-origin maps to approximate the probable movement and distance traveled by hoary bats in North America. As the point of highest probability is contained within the 99 th quantile, our quantile-only error validation suggests a general mean error of > 448 km, with a hoary bat-specific mean error of > 496 km, for the estimates of distance traveled the R package 'isocat' (Isotope Clustering and Assignment Tools), which is available on CRAN ("https://CRAN.Rproject.org/package=isocat") and GitHub ("https:// github.com/cjcampbell/isocat").
in that study. We recommend that future studies consider distance traveled from the boundary associated with a threshold corresponding to a fixed and high accuracy rate, or to develop methods of incorporating the entire probability surface into a probabilistic assessment of distance traveled.
Data from known-origin individuals are not available for many species because of the high cost and challenges involved with obtaining such datasets. In such situations the predictability of model accuracy at a given threshold value (i.e., as shown in Figure 5) is an important consideration when selecting a method to interpret probability surfaces. In cases where all individuals are of unknown origin, a method that performs with a consistent degree of accuracy vs. threshold value would be valuable.
Our results indicate that the cumulative-sum method performs best in this respect. Relatively few studies have quantified accuracy at given thresholds using knownorigin individuals (e.g., [58]), but those that exist indicate that odds ratios have been documented to perform above [20] and below [9] expected accuracy levels. Similarly, in our study the odds-ratio approach performed at both higher and lower levels of accuracy than predicted, depending on species (Figures 5 and S2). Our results also indicate that even methods that consistently perform within an acceptable range of accuracies across thresholds might vary across species in terms of accuracy at a specific threshold. For example, we found substantial variation in accuracy across species at the commonly applied oddsratio of 2:1. Thus, studies that do not have access to knownorigin individuals and must select an arbitrary threshold value should expect potentially high and unpredictable levels of uncertainty of geographic assignment.
The use of known-origin individuals to quantify an acceptable level of accuracy presents a promising approach to improving the consistency and applicability of isotope assignment approaches, and we encourage the collection of known-origin individuals whenever possible. As isotopic data become available in publiclyavailable repositories (e.g., as proposed by [59]), we anticipate the opportunity to quantify expected accuracy and/or distance-based error at specific threshold values for many taxa. Such future studies will permit a more informed selection of threshold values in studies for which known-origin individuals are not available. Thus, even when accuracy for a particular group of individuals cannot be quantified, it could be estimated based on the distributions of accuracy for similar species and spatial extents at a given threshold.
The hierarchical clustering methods and associated spatial similarity metrics described herein are available in