The purpose of this analysis is to determine how strongly the context predicts ratings (measured as match to original aspect), and to measure this variable against other possible variables. The input for our analysis is the data from our study, collected in 111,364 lines, each of which recorded values for the variables detailed in .
5
Table 3: Variables, their levels, and the distribution of responses across levels.
“Rating” is the dependent (response) variable, which consists of ordered categories. “Excellent” was the rating chosen most often, in 57,116 of responses, while “Acceptable” was the rating chosen least often, in only 17,395 of responses. The remaining variables in are independent (predictor) variables.
“Matches Original” tells us whether the form being rated is of the same aspect as in the original text or not. See below.
Table 4: Interpretation of combinations of values for “Matches Original” and “Aspect”.
“Logarithm of Relative Frequency” is the natural logarithm of the relative frequency of the form being rated vs. the frequency of the corresponding form of the opposite aspect. This measure is called a “logit”. It is customary to logarithmically transform corpus frequency data in order to correct for the extreme skewing of corpus word frequencies, known as “Zipf’s Law” (1949). Logits are logarithmically transformed odds ratios. They have the admirable property of transforming odds ratios (which normally range from zero to 1 on one side, and from 1 to infinity on the other) into a symmetrical distribution. For example, an odds ratio of 1000/1 (relative frequency where one item appears 1000 times and the other only once) = 1000 yields a logit of 6.9, and the reverse relative frequency of 1/1000 = 0.001 yields a logit of −6.9. When the frequencies of two items are the same, the odds ratio is 1, and the logit is 0. The purpose of this measure is to determine whether the relative frequency of the two aspectual forms has any influence on the ratings.
“Text” shows the number of responses for each of the six text stimuli in the study.
“Aspect” indicates the aspect of the form that is rated by the participant, which is either Perfective (p) or Imperfective (i). shows what all the combinations of the variables “Matches Original” and “Aspect” mean.
“Subparadigm” records the number of responses for each subparadigm.
“Cue Match” tells us whether there was a cue word present, and, if so, whether the cue is usually associated with the same aspect as in the original text (“True”), with the opposite aspect (“False”), or there was no cue (“None”, the most common value).
“Age” is the age of the participants, which ranged from 16 to 78.
We analyzed the ratings with the mgcv package, using the gam function and setting the family directive to ocat(R = 3), where R specifies the number of ordered categories. The response variable needs to be coded with integers 1 … R.
A main-effects model with by-participant random intercepts representing the variables given in is summarized in .
Table 5: Main-effects model fitted to acceptability ratings.
The central concept underlying this implementation of ordinal regression is the following. We assume there is a latent random variable U that reflects subjects’ intuitions about acceptability. U can assume any value on the real axis. To discretize U into 3 rating categories, the real axis is divided into three bins. For this, we need cut-off points specifying the boundaries between the bins. The first cut-off point is set at −1. For ratings on a three-point scale, a second cut-off point is required that is greater than −1. For the present main-effects model, this cut-off point is estimated at 0.33.
U | Rating |
---|
(−∞, −1) | 1 |
(−1, 0.33) | 2 |
(0.33, +∞) | 3 |
| |
Thus, the mapping of intervals on the real axis to ratings is as follows: The ordinal gam models the latent variable U as a function of the predictors, i.e. the linear predictor ηi
is the following:
Given the predicted value of Ui
(which is taken to follow a logistic distribution), we inspect which interval on the real axis it falls into, and this in turn determines which rating is predicted (see also Baayen and Divjak 2017 for further discussion).
shows that the effect of the variable Matches Original is larger than any other, even when corrections are made for differences in units and numbers of levels. In other words, even when we take into account the fact that there is some tendency to choose a form that is more frequent, and that there are differences associated with various factors relating to the form and context of the token and with the participants, there is a very strong tendency to prefer the original form over the alternative.
As a next step, we considered a model that included two interactions. The first interaction is that of age by frequency ratio, the idea being that experience with language accumulates over the lifetime in such a way that speakers become more proficient as they age, and hence may not need to rely as much on relative frequency of use and more on the “hidden factors” that drive aspect selection (Ramscar et al. 2017). We also considered the interaction of Matches Original by Text, as it is known that the use of aspect can vary substantially between, e.g. scientific texts and fiction. Older speakers appear to rely less on the relative frequency of aspect use, possibly because they are more sensitive to the stylistic/discourse factors that determine aspect use, compared to younger speakers (further details about interactions can be found in the link in Footnote 5).
Word identity is not included as a random effect. The reason for this is that the frequency distribution of verbs is Zipfian, with many verb forms appearing only once and a small number of verbs being used intensively. Including item as a random effect forces the model to find a set of by-item adjustments that follow a normal distribution. Given the Zipfian nature of word probabilities, this is impossible. A model including by-word random intercepts would be misspecified. To see this, consider the large proportion of forms (typically around 50% of the word types) that occur once only. For each of these forms, the model would include not only an intercept adjustment, but also several other item-bound predictors such as Logarithm of Relative Frequency, Matches Original, Aspect, and Subparadigm. Thus, such a model would be overspecified.
In this context, it is worth noting that the logistic GAM is not a Gaussian model, and that there is no error term that should be independently and identically distributed (iid) for p-values in the model summary to be trustworthy. Thus, whereas in a standard linear mixed model for, e.g. reaction times, it would be desirable to include word as random-effect factor to avoid structured errors and violation of the iid model assumption, this issue does not arise in the context of the present ordinal regression model. Furthermore, as this model (for details, see Wood et al. 2016) is not a proportional odds model, assumptions about proportional odds need not be made.
In sum, the statistical analysis brings into relief the importance of the variable Matches Original in determining the rating of a verb form, even when other factors are taken into account. The remainder of our analysis focuses on the rating of original aspect vs. non-original aspect, and how this manifests as an indicator of redundancy vs. open construal in our data.
Comments (0)