Reanalysis of an Allocentric Navigation Strategy Scale based on Item Response Theory

Abstract Focusing on 12 allocentric/survey-based strategy items of the Navigation Strategy Questionnaire (Zhong & Kozhevnikov, 2016), the current study applied item response theory-based analysis to determine whether a bidimensional model could better describe the latent structure of the survey-based strategy. Results from item and model fit diagnostics, categorical response and item information curves showed that an item with the lowest rotated component loading (.27) [SURVEY12], could be considered for exclusion in future studies; and that a bidimensional model with three preference-related items constituting a content factor offered a better representation of the latent structure than a unidimensional model per se. Mean scores from these three items also correlated significantly with a pointing-to-landmarks task to the same relative magnitude as the mean scores from all items, and all items excluding SURVEY12. These findings gave early evidence suggesting that the three preference-related items could constitute a subscale for deriving quick estimates of large-scale allocentric spatial processing in healthy adults in both experimental and clinical settings. Potential cognitive and brain mechanisms were discussed, followed by calls for future studies to gather greater evidence confirming the predictive validity of the full and sub scales, along with the design of new items focusing on environmental familiarity.

The current paper focuses on the Navigation Strategy Questionnaire (Zhong, 2013;Zhong & Kozhevnikov, 2016), which includes a survey-based (allocentric) strategy scale composed of items that were adapted/ modified in relation to conceptually similar items from the first three questionnaires mentioned above (cf. Lawton, 1994;Kato & Takeuchi, 2003;Pazzaglia & De Beni, 2001). Specifically, these survey-based strategy items were designed to assess how well one visualizes environmental features in a schematic third-person/ allocentric format and engages the third-person/allocentric perspective when navigating to different destinations. 12 items constituted the scale and they were found to have relatively high interrelatedness (Cronbach's α = .86), and test-retest reliability [r (40) = .88]. Notably, this scale possessed predictive validity with respect to an online/in-situ pointing task that required participants to point to previously observed landmarks/objects immediately after traversing a route that was encountered for the first time (Zhong, 2013;Zhong & Kozhevnikov, 2016); the mean scale scores correlated moderately with the accuracy scores of this pointing task [r (500) = .35, p < .01 (see Table 4, Zhong & Kozhevnikov, 2016)].1 Even though the survey-based strategy scale was shown to possess adequate reliability and validity, it contained one item with a low discriminant loading of .27 [derived from principal component analysis] assessing one's ability to visualize a mental map positioned in a fixed orientation (see last item in Table 1) and three items assessing a selective preference for representing environmental knowledge from the third-person/allocentric perspective (see items 3, 4, and 9 in Table 1). The first item was initially designed by Zhong (2013) with reference to classical behavioral studies that implicated the involvement of a disembodied, object-to-object reference system in encoding and retrieving spatial relations between objects/landmarks (Easton & Sholl, 1995;Rieser, 1989;Sholl, 2001). As for the other three items, they were originally designed to assess one's ability to visualize environmental elements and associated interobject relationships from a third-person/allocentric perspective. Notably, they are exceptional for having the comparative conjunction "rather than," which was introduced for the purpose of distinguishing between descriptions of allocentric and egocentric environmental representations (Zhong, 2011(Zhong, , 2013. This wording style was supported by qualitative findings from sketchmaps and interviews suggesting that the fidelity of one's cognitive map could be discerned and distinguished based on whether one engaged a first-or third-person point of view (Blajenkova et al., 2005;Zhong, 2011Zhong, , 2013Zhong & Kozhevnikov, 2016). Critically, these three items differ conceptually from the other nine items, such that they do not conform to traditional notions of navigation strategies as outcome-or goal-oriented (Kato & Takeuchi, 2003) or as socially contextualized plans of action (Dalton, Hölscher, & Montello, 2019).
As the discrimination of items assessing survey-based strategy from other items assessing alternative strategy types was done previously using principal component analysis (Zhong, 2011, Zhong & Kozhevnikov, 2016, a technique whose quality of findings was largely contingent on the magnitude or regularity of inter-item covariance/correlation (Stevens, 2009), the suitability of each survey-based strategy item was not confirmed with regard to the varying patterns of item responses demonstrated by participants. Therefore, this paper presents a reanalysis of the 12 items of the survey-based strategy scale through item response theory (IRT)-related analysis, which allows a fine-grained examination of item and component/ factor properties by modelling statistical information collected from individual responses to each item (Embretson & Reise, 2000).
The aims of this reanalysis were twofold. First, it aimed to determine whether or not the last item of the scale, which exhibited the lowest component loading among all 12 items, could be considered for exclusion in future studies with regard to item response parameter estimates, item and model level fit indices, and item test information. This aim was set forth in view of the possibility that the poor loading of the last item might have reflected uncertainties associated with assessing one's ability to construct an overview of an environment's layout set in a fixed orientation. Second, the reanalysis aimed to verify if a bidimensional model could better represent the latent structure of the survey-based strategy scale, inasmuch that the three items assessing a selective preference for third-person/allocentric environmental representation could constitute a subscale that associate equally well with online pointing performance as the existing unidimensional scale with 12 items. This aim relates to the possibility that there may exist a component of allocentric/survey-based strategy use that is foremostly concerned with differentiating third-person images from first-person images than with any specific act of navigation. In more technical terms, this proposed allocentric subscale relates to a preconceived notion of local dependence between individual responses to these three preference-related items (i.e., response to one item affects response to another item within the same cluster, see Chen & Thissen, 1997;Liu & Chen, 2012), inasmuch that a composite response or subscale score may better convey the latent trait commonly assessed by all items within that cluster.

.27
Note. The loadings pertain to the discriminant loadings on the latent component representing the survey-based strategy [Source: Table 3 in Zhong (2013); Table 2 in Zhong & Kozhevnikov (2016). Reproduced with permission]. Asterisks denote the three items that were conceived as assessing a selective preference for third-person/allocentric mental representation. " †" indicates the item that was considered for exclusion in future studies.

Method Participants
School of Design and Environment (SDE) nor travelled to any places within SDE previously.2 They received either modular credits or monetary compensation for their participation. Subsequently, these participants, together with 416 more participants from other departments and schools at NUS, were surveyed online with the Navigation Strategy Questionnaire (NSQ) [Zhong, 2013;Zhong & Kozhevnikov, 2016]. Survey data from 26 participants were collected after the publication of the questionnaire.3 Both the behavioral assessment and the online survey were approved by the institutional review board at NUS. All participants gave consent to participate in the online survey by accepting the terms and conditions stated in an online advertisement posted on the university's intranet. Access to the online survey was provided through a hyperlink on the online advertisement. Archived survey data from three of the initial sample of 110 participants who were tested behaviorally were lost during a file transfer process between computers. Consequently, survey data from a total of 523 participants (254 females) were involved in the data analysis below. They ranged from 18 to 45 years of age (M = 21.98, SD = 2.84).

Procedure 4
Each of the 107 participants who were recruited for behavioral assessment were led by the experimenter individually on a route that traversed two floors of the SDE complex (see Figure 1). The route covered approximately 600 meters and took about 10 minutes to traverse. In each experimental session, the experimenter carried a laptop and led the way for one or two participants, who followed him closely behind. Before commencement, the participants were told to remember the configuration of the route and the relative locations of all objects/landmarks they spotted during route traversal and at the route's periphery. Importantly, the experimenter told the participants that they must point to a selection of these landmarks at the end of the route based on a task presented on his laptop. The entrance to the Department of Architecture was selected as the mid-way point (labeled "4", at the right side of Figure 1). Once it was reached, the participants were given a short period of rest to inspect and update their memories of the route traveled thus far. Upon arrival at the ending point (see location in Figure 1), all participants performed a pointing task on the experimenter's laptop (see the section below). After completion, the participants proceeded to an experimental lab and completed the NSQ online. Their survey data was subsequently merged with the latter pool of 416 respondents for IRT-related analysis of item responses.

Route-based pointing task
This task was programmed in E-Prime v1.1. and was presented to participants at the end of the route (dot number six in Figure 1). When performing the task, each participant sat facing northwards at a bench and decided on the directions to the landmarks they encountered during route traversal. These landmarks to which they pointed were beyond their line of sight. Although the task was conceived to be assessing participants' ability to retrieve self-to-object relationships that got updated with progression along the route (Zhong, 2013;Zhong & Kozhevnikov, 2016), it was also designed to assess participants' ability at inferring the spatial relationships between the route's ending location and the locations of the landmarks/objects they observed during route traversal, following the assumption that an allocentric mental representation of one's whereabouts and the relative locations of landmarks would enhance pointing accuracy (Zhong, 2013).
2 Note that these were the same pool of participants reported by Zhong & Kozhevnikov (2016) in their second study/experiment. The sample size of 110 was determined with respect to a power analysis conducted in that study. 3 This sample was an add-on to the published data involving 500 survey respondents. The previously published sample size of 500 ensured that each of the original 59 survey items were answered approximately eight times across eight subsamples of 59 respondents. 4 For a more detailed description of how the route was planned and the purpose of the different route segments depicted in Figure 1, please refer to Zhong (2013) or Zhong & Kozhevnikov (2016).

Figure 1.
Floor plan of the route at School of Design and Environment (SDE) at National University of Singapore (NUS) (not drawn to scale). Black dots numbered from 1 to 5 represent the start of each of five route segments. Dot numbers "1" and "6" represent the starting and ending locations, respectively. Double arrow heads represent the direction along the first leg of each segment. There were 12 landmarks whose names and locations (designated by white circles) were pointed out to participants in sequence. These 12 landmarks were explicitly mentioned because of their inclusion in an imaginal pointing direction task (not mentioned in this paper due to non-significant correlation between its accuracy scores and the survey-based scale scores). The participants were tested on their spatial knowledge of the relative locations of eight of these 12 landmarks in the route-based pointing task. This pointing task also presented the names of seven more landmarks (not shown) located at the periphery or sidewalks of the route [Source: Figure 1 in Zhong (2013) and Zhong & Kozhevnikov (2016). Reproduced with permission].
On each trial, the name of an out-of-view landmark was displayed in white on a black background. A white fixation cross against a black ground separated each trial with a one-second delay. The participants were instructed to focus their gaze on the screen while doing the task, and to make their responses by pressing one of the four buttons on the number pad ('1', '3', '7', and '9'), which had stickers of arrows glued over them. The participants were instructed to press the key that represented the approximate direction to a specified landmark on every trial. The front-left (FL) and front-right (FR) pointing directions were indicated by the buttons '7' and '9' respectively, whereas the back-left (BL) and back-right (BR) pointing directions were indicated by the buttons '1' and '3' respectively. To ensure a relatively equal distribution of trials for each pointing direction, three landmarks corresponded to the FR direction, and four landmarks corresponded to FL, BL, and BR, respectively. On each trial, the name of the target landmark remained on display until a button press was made. Accuracy score ("1" for correct and "0" for incorrect) and reaction time (i.e., time elapsed from the presentation of each name to the button press, in milliseconds) were recorded with each button press. Each participant performed 15 test trials. These trials presented the names of eight landmarks that were encountered en route to the finishing point and seven landmarks that were located at the periphery of the route. The ordering of the trials followed a randomized sequence.

Navigation Strategy Questionnaire (NSQ)
The online NSQ was created at SurveyTool.com and contained 59 items with 19 items that were specifically designed to assess survey-based navigation strategy (for the complete list of pilot test items, see the appendices of Zhong 2013; Zhong & Kozhevnikov, 2016). Each participant's responses were registered based on a five-point Likert scale. Ratings at the extremities -"1" and "5" -were associated with totally disagree and totally agree, respectively, while intermediate ratings of "2" and "4" were associated with disagree and agree, respectively. The rating of "3" was associated with a neither agree nor disagree (neutral) response. Before filling up the questionnaire, on-screen instructions informed the participants that the questionnaire concerned different navigational techniques people adopt when they traveled on foot to different places in their everyday environments, and that it was crucial for them to be as honest as possible when rating each statement. Fully completed survey responses were recorded and stored by the online server. Based on principal component analyses that were conducted previously (Zhong, 2013;Zhong & Kozhevnikov, 2016), 12 survey-based strategy items were found to have discriminant component loadings that made them distinct from two other components representing two other types of navigation strategies (egocentric spatial updating and route/procedural); these items were retained to constitute the survey-based strategy scale (see Table 1). Following the present study aims, IRT-related analysis focused on participants' rating patterns on these 12 items.

IRT-based Graded Response Model
Responses to the 12 survey-based strategy items collected from 523 respondents were analyzed based on a graded response model (GRM) implemented in IRTPRO v4.1 (Scientific Software International, Inc., Skokie, IL). The GRM is a general framework selected on the basis of a two-parameter logistic (2PL) model that predicts how well a person responds to a particular category of ratings for an item of concern. This model/ framework is characterized by the following formula: in which P ij = Probability of person s endorsing one or more ratings grouped under category j in item i θ s = Trait level of person s α i = Item discrimination or slope of item i β ij = Threshold parameter/value of response category j in item i In this model, as well as in other IRT models, parameters related to person (θ) and item (α, β) characteristics are placed on a common scale; and that the logit of α (θ -β) represents a weighted difference score between a person's trait level and an item's category threshold value (Embretson & Reise, 2000). In the GRM, a large positive difference between these two parametric values means that a person is very likely to endorse a particular response category, which generally contains an assortment of ratings within a specified range.

GRM Analysis
To evaluate the item parameter estimates of each survey-based strategy item, a unidimensional GRM analysis was first performed. Parameter estimation was conducted with the Bock-Atkin Expectation/Maximization (EM) algorithm, a marginal maximum likelihood (MML) method for modelling response pattern probability iteratively from an assumed/prior distribution of trait levels. The default/recommended configurational settings used in IRTPRO were as follows: (i) max. no. of cycles (convergence criterion) = 500 (0.005); (ii) max. no. of M-step iterations (convergence criterion) = 50 (0.006); (iii) no. of quadrature points (min., max.) = 49 (-6.00, 6.00). In short, this algorithm works by first computing the probability of endorsing a particular item category at a specific trait level from initial expectations of the number of persons who endorsed that item category at that trait level and the total number of persons who possessed that trait level. Once obtained, the probability/likelihood estimates are maximized over many iterations (i.e., M-step iterations) until convergence is reached -through the use of updated expectation values in each successive iteration.5 Table 2 shows the GRM item parameter estimates and the associated communalities and factor loadings for each item.6 There were four response categories associated with each item. In ascending order, they were tied to computing the probabilities of endorsing: (i) a rating of "2" (disagree) or higher (response category 1), (ii) a rating of "3" (neutral) or higher (response category 2), (iii) a rating of "4" (agree) or higher (response category 3), and (iv) a rating of "5" (strongly agree) [response category 4]. An examination of the parameter estimates and factor loadings showed that SURVEY12 (When I reconstruct my mental map, its environmental orientation is fixed and does not change with my imagined heading directions) yielded the lowest item discrimination value (α = 0.56) and factor loading (λ u = .31), the lowest threshold values in the first (β 1 = -6.41) and second (β 2 = -1.97) response categories, and the highest threshold value in the fourth (β 4 = 4.62) response category. The standard errors of these threshold values were also the highest in their corresponding response categories. These parameter estimates attested to the fact that a relatively low percentage of participants endorsed strongly disagree (3.1%), disagree (23.3%), and strongly agree (7.8%). Further examination of the S-χ 2 item level diagnostic statistics (Orlando & Thissen, 2000) [see Table 3] showed that SURVEY12 stood out for exhibiting the lowest degree of item level fit between the observed data and the predictions of the unidimensional GRM, S-χ 2 (83) = 116.51, p = .009. An examination of the category response and item information curves, further showed that SURVEY12 featured low and irregularly distributed response probabilities in the first and last response categories (ps < .10 for category 0; ps < .30 for category 4) [see Figure 2], as well as consistently low item information values that were close to zero across the different trait levels (min. = 0.086, max. = 0.095) [see Figure 3]. These item information values are inversely related to the standard errors of the theta estimates; higher information values indicated lower variation in theta/trait levels.  2. Category response curves of the survey-based strategy items under the graded response model. Theta represents the parameter estimates of the ability or trait levels associated with survey-based strategy use in each response category. The probability distribution of five response categories were illustrated to represent, in ascending ordering, the probabilities of endorsing: (i) a rating of "1" (strongly disagree) [response category 0; in black], (ii) a rating of "2" (disagree) or higher (response category 1; in blue), (iii) a rating of "3" (neutral) or higher (response category 2; in green) , (iv) a rating of "4" (agree) or higher (response category 3; in red), and (v) a rating of "5" (strongly agree) [response category 4; in turquoise]. Positive theta/ trait levels were associated with higher probabilities of endorsing higher ratings belonging to response categories 3 and 4. Next, a bidimensional GRM analysis was performed -with three items that were conceived as assessing a selective preference for third-person/allocentric environmental representation (SURVEY03, SURVEY04, and SURVEY09, see Table 1) classified as constituting a content/specific factor. This subfactor represented a subdomain of the central/general factor that comprised all 12 items; cross-loadings on both the general and specific factors were specified with regard to the three preference-related items only. Parameter estimation was conducted with the Bock-Atkin EM algorithm using the same configurational settings as those mentioned in the unidimensional GRM analysis above. Table 4 shows the GRM item parameter estimates and the factor loadings of the central and content factors. The three content factor loadings fell in the moderate range (.30 -.50); and two preference-related items (SURVEY03 and SURVEY09) exhibited discrimination estimates on the content factor that were lower than those on the central factor. In addition, an examination of the likelihood-based goodness of fit indices for overall model fit (see Table 5) showed that the bidimensional model offered a better representation of the latent structure of the item response data than the unidimensional model, χ2 (3) = 39.57, p < .001 (reduction in the value of -2loglikelihood from the unidimensional to the bidimensional model). This finding corresponded well with lower Akaike and Bayesian information criterion values associated with the bidimensional model. With SURVEY12 excluded, the reduction of fit index values from the unidimensional to the bidimensional model occurred at the same relative magnitudes, χ2 (3) = 40.92, p < .001. Noticeably, in each of these two types of model analysis, the exclusion of SURVEY12 led to a relatively large reduction in the model fit index values. Note. Logit: αθ + δ. δ = -αβ. α: discrimination parameter. β: category threshold parameter. λ 1 : Factor loadings on the central/ general factor. λ 2 : Factor loadings on the content/specific factor. h 2 : Communality. For each item, the subscript tied to each intercept parameter (δ) represents one of four response categories. The β values tied to each response category and to each dimension can be calculated by dividing δ by the negative value of α. Items without loadings on the content factor did not possess discrimination and category threshold estimates related to the dimension of that factor.

Correlational Analysis
Correlations were performed between the mean scores/ratings of the items constituting the general and content factors, and route-based pointing performance in terms of accuracy and reaction times (natural log transformed). Table 6 shows the descriptive statistics of the variables involved in the correlational analysis. Route-based pointing accuracy was found to correlate moderately and significantly with the mean scale scores obtained from: (i) the full set of 12 items; (ii) 11 survey-based strategy items that excluded SURVEY12, and (iii) the three preference-for-allocentric-representation items, 30 ≤ r (107) ≤ .34 (see Table 7). The exclusion of SURVEY12 had negligible impact on the magnitude of the correlation (Fisher's z = 0.07, p = .471). It is also worth noting that the mean scale scores obtained from the three preference-related items elicited slightly higher correlation with route-based pointing accuracy [r (107) = .34, p < .001] -in numerical terms -compared with the correlations obtained from the two other sets of mean scale scores [r (107) = .31, p = .001 (all 12 items); r (107) = .30, p < .002 (11 items)].
To confirm that the significant correlations between the three sets of mean scale scores and route-based pointing accuracy were primarily related to strategic influences, partial correlations were further performed with the effect of sex controlled for. This was because sex correlated significantly with route-based pointing [point-biserial r (110) = .315, p < .001], with males (coded as "1") exhibiting higher accuracy scores than females ("0"). After controlling for the extraneous effect of sex, the patterns of significant correlations found previously between the mean scale scores and route-based pointing accuracy remained, albeit with slightly lower correlational values: .21 ≤ r (104) ≤ .28 (see Table 8). Like the earlier findings, the exclusion of SURVEY12 had a negligible impact on the magnitude of the partial correlation (Fisher's z = 0.07, p = .473). The mean scale scores obtained from the three preference-related items elicited slightly higher correlation with route-based pointing accuracy [r (104) = .28, p = .004] compared with the correlations obtained from the two other sets of mean scale scores [r (104) = .22, p = .024 (12 items); r (104) = .21, p = .031 (11 items)].
To ascertain that the normality assumption of Pearson's correlation was met, tests of normality were performed on the standardized residuals emanating from linear regressions corresponding to the bivariate correlations shown in Tables 7 and 8. Joint assessments of Q-Q plots and normality test statistics showed that none of these distributions of residuals deviated significantly from normality with alpha set at .05 (053 ≤ Kolmogorov-Smirnov's D ≤,.076, .154 ≤ p ≤ .200; .981 ≤ Shapiro-Wilk's W ≤ .990, .119 ≤ p ≤ .597).

Summary of Findings and Study Limitations
Based on a reanalysis of the survey-based strategy items through IRT, the current findings rendered a clearer picture of their psychometric properties. Two major findings emerged: (i) the last item constituting the scale, SURVEY12, exhibited low levels of item fit, category response probabilities, and item information; and (ii) a bidimensional model with three preference-for-allocentric-representation items loading on a content factor offered a better representation of the latent structure of the observed data compared to a unidimensional model that represented survey-based strategy as a singular cognitive construct. In view of the IRT-based findings generated by the GRM analysis, researchers administering the survey-based navigation strategy scale can consider omitting it in future studies. Note, however, that this omission is not mandatory as SURVEY12 may still carry conceptual merit when administered to professional navigators working in non-university settings, as a standalone item, or together with other types of strategy items, so any consideration for exclusion should be exercised with a clear purview of the research aims.
On the other hand, the significant and moderate correlations observed between pointing accuracy and the three preference-related items provided the first piece of evidence suggesting that these items might constitute a subscale assessing a specific component of the survey-based strategy. To my knowledge, it is currently unknown as to the mechanisms linking subjective perception of allocentric/survey-based strategy use to spatial navigation or orientation activity and hence I argue that these three preference-related items are linked to a key decision-making aspect of survey-based strategy use that requires a comparison of environmental information (either retrieved or computed) from the vantage points of allocentric and egocentric reference frames (see Klatzky, 1998, Mou, McNamara, Valiquette, & Rump, 2004, for conceptual frameworks of spatial reference frames/systems). This interpretation relates well to the presentation of "rather than" in all three items, which might have compelled participants to take a clear stance (between two imagined viewpoints) on their self-perceived ability to generate allocentric environmental representations. As the current study did not involve substantial cognitive-behavioral testing, this possibility needs verification in future studies, preferably with respect to navigational tasks that assess one's ability to switch between egocentric and allocentric perspectives (see, e.g., Harris & Wolbers, 2014). This is important because the current proposal of these three items as constituting a subscale can only be confirmed with respect to more spatial navigation tasks in addition to the route-based pointing task.

Potential Brain Mechanisms
Crucially, the argument for the existence of this subscale aligns well with existing theories and numerous behavioral findings suggesting that the processing and long-term storage of environmental information can occur spontaneously through the use of allocentric reference frames (Blajenkova et al., 2005;Gramann, 2013;Zhong, 2011Zhong, , 2013Zhong & Kozhevnikov, 2016) or in relation to a disembodied/external reference system (Greenauer & Waller, 2010;McNamara, Rump, & Werner, 2003;Mou et al., 2004;Zhong & Kozhevnikov, 2016). Extant neuroscience research in spatial navigation also supports this by showing that a direct acquisition of allocentric spatial knowledge is associated with a network of functional activity spanning across the parietal cortex, the retrosplenial cortex, and the hippocampal formation (Aguirre & D'Esposito, 1999;Byrne, Becker, & Burgess, 2007;Gramann et al., 2010;Sherrill et al., 2013Sherrill et al., , 2015. Interestingly, these regions have been proposed as comprising a "computational core" for processing spatial orientation information (Gramann, 2013). Moreover, within the human entorhinal cortex, head direction cells (Jacobs, Kahana, Ekstrom, Mollison, & Fried, 2010) and grid cells (Doeller, Barry, & Burgess, 2010;Jacobs et al., 2013;Stangl et al., 2018) may also play active roles in the implementation of an allocentric reference system that registers regular patterns of spatial movements in the global environment. This attests to findings showing that head direction signals coding for allocentric heading directions or bearings can get updated without overt attention to egocentric/landmark cues on the part of the moving agent (Jacobs et al., 2010;Taube et al. 1990aTaube et al. , 1990b -and that grid cells firing along running directions arranged in a six-fold rotational symmetry engendered a "grid-like" map of body movements and positioning in the global environment (Doeller et al., 2010;Jacobs et al., 2013;Stangl et al., 2018).

Recommendations for Future Research
To gather more evidence confirming the predictive validity of the proposed three-item subscale, as well as the full scale (12 or 11 items, contingent on the exclusion of the last item), I recommend future studies to employ additional spatial navigation tasks that engage one's capacity for large-scale allocentric spatial processing, as mentioned previously by Zhong & Kozhevnikov (2016). An exemplary task can be an outdoor sporting activity like orienteering, in which the participants must utilize a map and/or a compass to reach a series of waypoints (in a forest, for instance) within a specified amount of time (see, e.g., Di Tore, 2016; Golden, Levy, Vohra, 1987). As this activity spans across large navigable spaces, one would be obliged to imagine or visualize an overview of the environment in order to reach the waypoints within the shortest time possible. Moreover, to ensure greater accuracy in data recording and analysis, future studies should also consider recording pointing or orientation errors [i.e., the discrepancy between a correct/precomputed directional angle and an observed directional angle (derived from the participant)] in terms of degrees or radians. Such errors would offer a better portrayal of behavioral performance, as well as a larger range of values, compared to that of accuracy scores, for correlational or regression analysis.
Furthermore, considering that an optimal retrieval of third-person/allocentric environmental knowledge is inadvertently affected by environmental familiarity, inasmuch that familiar environments will facilitate the retrieval process (Piccardi & Nori, 2011;Zhong, 2011), it will be interesting for future studies to introduce more survey-based strategy items that queries respondents about how differently they encode and retrieve environmental features from the third-person/allocentric perspective between familiar and unfamiliar environments. An exemplary item can be: "In a place that I traveled to for the first time, I attempt to imagine the shape of the route I traversed from an aerial perspective." What can be investigated is to assess whether these new items will constitute additional content factors in a multidimensional model, and whether these subfactors can exhibit predictive validity with respect to navigational performance across different environments evoking varying levels of familiarity.

Conclusion
Taken together, these recommendations for further investigations will offer us a more nuanced understanding of the survey-based strategy as a multidimensional construct, as well as of how individuals differ in navigational performance when implementing different survey-based component strategies. Despite the aforementioned limitations, it must be stated that this is the first study to date which showed that a small number of self-report items assessing allocentric mental imagery could predict pointing/ orientation task performance in a large-scale, real-world environment. Since administering three items assessing a preference for allocentric environmental representation requires minimal effort, future studies aiming at a preliminary assessment of their participants' spatial orientation ability can consider surveying them using these preference-related items.
As an assessment of allocentric mental representation and navigation strategy use necessitates the detection of spatial memory deficits associated with the progression of both normal (Reynolds, Zhong, Clendinen, Moffat, & Magnusson, 2019;Zhong et al., 2017;Zhong & Moffat, 2016 and pathological aging (Laczó et al., 2010;Vlček & Laczó, 2014), the administration of short surveys on allocentric environmental representation will benefit clinicians or neuropsychologists who aim at obtaining quick estimates of older adults' ability to process allocentric spatial information or memory in environmental space. This suggestion aligns well with the current trend of using navigation/wayfinding questionnaires for assessing navigational complaints or disabilities in older adults/patients (see de Rooij, Claessen, van der Ham, Post, & Visser-Meily, 2019). Ideally, such surveys, like the one presented in this paper, should be presented to cognitively intact participants (i.e., individuals with no evidence for cognitive impairment or dementia), and combined with behavioral tasks that specifically evaluate allocentric environmental knowledge (as aforementioned) -so as to ensure confidence with clinical assessment or diagnosis.
Overall, the current findings and recommendations for improvements have taken a small step toward this goal and it is hoped that this study will be the first of many studies to come that acknowledge the usefulness of short self-report questionnaires/surveys for assessing allocentric environmental knowledge in human subjects with intact cognition.