Volume 4, Issue 1

# The Yale Grammatical Diversity Project: Morphosyntactic variation in North American English

Raffaella Zanuttini
/ Jim Wood
/ Jason Zentz
/ Laurence Horn
Published Online: 2018-03-09 | DOI: https://doi.org/10.1515/lingvan-2016-0070

## Abstract

The Yale Grammatical Diversity Project approaches the empirical domain of North American English from the perspective of generative microcomparative syntax. In addition to eliciting judgments from speakers of particular varieties, we also conduct large-scale surveys, map the results of those surveys geographically, conduct statistical tests taking geography and other social variables into account, and look for theoretically significant linguistic correlations. In all cases, we do this with the primary goal of understanding variation between speakers at the individual level. While our goals and methodologies are informed by our theoretical perspective, we expect that our work and results will be of interest to linguists working in other frameworks and even to the public more generally. This article outlines the goals and methodologies of the project and describes in broad strokes some of the results obtained so far, as well as some of the ways we have shared our findings with others, inside and outside academia.

This article offers supplementary material which is provided at the end of the article.

## 1 Project overview

The Yale Grammatical Diversity Project (YGDP) approaches the empirical domain of North American English from the perspective of generative microcomparative syntax.

## 1.1 Our approach to the study of morphosyntactic variation

Generative linguists are primarily interested in the mental grammar of individual speakers, modeled as a system of rules that can form some linguistic units (syllables, words, sentences, etc.) but not others. While it is obvious that there are systematic grammatical differences across speakers of different languages or dialects, differences also exist among people within their speech communities (see Trousdale and Adger [2007] and Cornips [2015] for some interesting perspectives on this issue). If every individual has a mental grammar, we might expect that what it means to “speak the same language” is to have mental grammars that are similar but not necessarily identical (and, of course, to share a significant proportion of the lexicon).

Generative microcomparative syntax, then, is the study of the differences between similar mental grammars, with the goal of furthering our broader understanding of the human language faculty. As Kayne (2005: 283) points out, “microcomparative syntax work provides us with a new kind of microscope with which to look into the workings of syntax.” Several projects conducted within this paradigm are listed at www.dialectsyntax.org, including ASIS (Poletto and Benincà 2007), SAND (Barbiers et al. 2005, Barbiers et al. 2008), and ScanDiaSyn (http://uit.no/scandiasyn/?Language=en), in Italy, the Netherlands/Belgium, and Scandinavia, respectively.

For our project, we recruit (and in some cases develop) methodologies especially suited to the kinds of questions we are interested in asking. We collect data in the form of acceptability judgments in order to determine which kinds of sentences can be generated by individual speakers’ mental grammars and which cannot. In addition to eliciting judgments from speakers of particular varieties, we also conduct large-scale surveys, map the results of those surveys geographically, conduct statistical tests taking geography and other social variables into account, and look for theoretically significant linguistic correlations. In all cases, we do this with the primary goal of understanding variation between speakers at the individual level.1 While our goals and methodologies are informed by our theoretical perspective, we expect that our work and results will be of interest to linguists working in other frameworks and even to the public more generally. Therefore, in addition to our technical theoretical work, we are committed to highlighting our descriptive findings and providing freely accessible resources intended for a broader audience.

## 1.2 Situating the YGDP within the study of variation in English

From an empirical perspective, we are interested in morphosyntactic variation across the varieties of English spoken across North America. Interspeaker variation within North American English has long been an object of linguistic investigation (see Schneider [2008] and Wolfram and Schilling [2016] for useful overviews). However, most studies of morphosyntactic variation, as noted by Kortmann (2003), have focused on particular phenomena such as negative concord, positive anymore, subject–verb agreement, and multiple modals (Reed and Montgomery 2016), or on particular varieties such as Alabama English (Feagin 1979), Appalachian English (Wolfram and Christian 1976; Montgomery and Hall 2004b; Hazen et al. 2013), African American English (Labov et al. 1968; Baugh 1983; Rickford 1999; Green 2002; Lanehart 2015), and Canadian English (Tagliamonte 2006).

When it comes to large-scale investigations (considering multiple phenomena across multiple varieties), researchers of North American English have conducted surveys, compiled corpora, or collected data from a variety of resources and expert contributors. The most prominent nationwide surveys of individual speakers of American English have focused on phonology (Labov et al. 2006), the lexicon (Carver 1987; Hall 2013), or both (Kurath et al. 1939–1943 and subsequent work on the Linguistic Atlas of the United States and Canada;2 Vaux and Golder 2003). Grieve (2009, 2016) has created a large corpus of letters to the editor from regional newspapers all over the country, which he uses to conduct statistical and geographical analyses of lexical variation.3

Perhaps most comparable to our project in empirical focus and scope is a large body of work (Kortmann and Schneider 2004; Kortmann 2005; Kortmann et al. 2005; Szmrecsanyi and Kortmann 2009; Hernández et al. 2011; Hickey 2012; Kortmann and Lunkenheimer 2012; Kortmann and Lunkenheimer 2013; Siemund 2013; Gerwin 2014) that considers questions of morphosyntactic variation in English within a research paradigm that integrates functional typology with dialectology. This approach is sometimes called variationist typology or sociolinguistic typology (Siemund 2013, 283). These researchers provide systematic overviews of the features attested (along with the degree of attestation) in particular varieties of English and useful summaries of existing literature on these topics. In many cases, they utilize quantitative techniques to reveal generalizations that illuminate how different varieties of English reflect crosslinguistic tendencies. Moreover, Kortmann and Lunkenheimer (2013) (eWAVE) provides an interactive interface to explore, compare, and geographically visualize the presence of certain features across varieties of English worldwide.

Our project differs from this work in several respects. First, our empirical scope is distinct: we focus exclusively on North American English, whereas most of the work in this line of research is on English as spoken in the British Isles and around the world.4 Second, variationist typologists approach dialect variation from a top-down perspective: they begin with a list of varieties of English and a corresponding list of experts on these varieties, who are responsible for determining the prevalence of a given syntactic property or phenomenon in each variety.5 Our approach, on the other hand, does not take specific varieties as a given; we are trying to discover what they are bottom-up. In order to do so, we quantify over individual speakers’ acceptability judgments on specific sentences rather than over varieties or over categorical ratings of the prevalence of a grammatical property as assigned by researchers at the level of the variety. Third, while many of our goals overlap with those of the variationist typologists (e.g., mapping the geographic distribution of morphosyntactic phenomena that vary across English speakers/varieties), one key goal of the typological investigation is to evaluate whether varieties with distinct historical trajectories or contact scenarios (e.g., first-language varieties versus second-language varieties versus English-based pidgins and creoles) have fundamentally different structural properties. By contrast, we are interested in discovering how observations gleaned from the study of English morphosyntactic variation can answer formal questions framed within generative syntactic theory. In sum, both lines of research seek to explore morphosyntactic variation in English, but our scope, methods, and goals are complementary.

Our next section outlines the overall goals of our project in more detail, and then we move on (Section 3) to discuss methodological issues, including the use of geographical maps of our findings. Finally, we describe in broad strokes some of the results we have obtained so far (Section 4) and some of the ways we have been able to share our findings with others, inside and outside academia (Section 5).

## 2 Project goals

The project has a number of aims, which can be grouped under two overarching goals:

1. to collect and make available information about morphosyntactic variation found across speakers of English in North America;

2. to conduct and foster new research on morphosyntactic variation that can broaden our knowledge at both the empirical and theoretical levels.

To reach our first goal, we have been gathering information about syntactic variation in North American English from a number of different sources. We make it publicly available on a website (http://ygdp.yale.edu), which is organized in a series of pages, each devoted to a particular aspect of the syntax of English that varies across native speakers. For example, the phenomena illustrated in (1) all have dedicated pages:

(1)
a.
 He just kept a-beggin’ and a-cryin’ and a-wantin’ to go out. a-prefixing (Wolfram 1976: 45; McQuaid 2012: 32(11c))
b.
 I am done dinner. done my homework (Yerastov 2010: 5 (1a); Fruehwald and Myler 2015; Yerastov 2015: 157 (1a))
c.
 That’s so Eighties. drama SO (Adams 2003: 77; Irwin 2014)
d.
 The cat wants fed. needs washed (Murray and Simon 1999: 162 (9); Edelstein 2014: 243 (4a))
 Bill can touch the ceiling, and so can’t I. so don’t I (Lawler 1974: 359 (10); Wood 2014)
e.
f.
 f. They didn’t nobody like him. split subject (Feagin 1979: 238; Zanuttini and Bernstein 2014)
g.
 I BÍN had this. stressed BIN (Rickford 1975: 106 (12); Harris 2013)

These pages are intended to be useful to scholars and at the same time accessible to anyone who is not an academic but is interested in language for professional or personal reasons. For the linguist, the website is a repository of information concerning minimal differences in the syntax of North American English. For non-linguists (teachers, journalists, people who are curious about the way we speak, etc.), it is a place to find a detailed yet accessible description of aspects of the syntax of English that might have attracted interest because they differ across speakers (often raising the questions of who’s right and who’s wrong), or because they are associated (often pejoratively) with a certain group of people.

All the pages (found under the Phenomena tab) have a similar structure: they start with an example sentence that exemplifies the syntactic property that varies across speakers, followed by a very accessible description (see for example, the page on negative inversion, http://ygdp.yale.edu/phenomena/negative-inversion). They contain a paragraph or two with information about any extralinguistic factor that has been identified as restricting its distribution, such as geographic region, age, gender, or ethnicity. Next come one or more sections with a slightly more technical description of the syntactic and semantic properties of the element under investigation, followed by a list of bibliographic references (constantly being updated).

We strive to reach our second goal by conducting research on some of the microvariation we find, and by encouraging other linguists (regardless of theoretical framework or level of expertise) to do the same. Recently, we have focused on pronouns, specifically examining two phenomena that vary across American English speakers. The phenomena of interest are exemplified by sentences such as (2) and (3):

(2)
 Here’s you a dog. (Here’s a dog for you.) (Horn 2014: 334n7)
(3)
 We don’t any of us need anything. (Montgomery and Hall 2004a: 413; Zanuttini and Bernstein 2014: 152 (25a))

From the empirical point of view, much needs to be discovered about these two types of sentences. Wood et al. (2015a) and Section 3.4 below detail the geographic coverage of the “dative presentative” construction shown in (2). Wood and Zanuttini (2016) propose a syntactic analysis of this phenomenon, revolving around the features of a functional head Appl (Pylkkänen 2008), but this analysis is still under development and revision (Wood and Zanuttini forthcoming). (See also the open questions in Wood et al. 2015a, 313.) As for (3), our initial research has shown that for such sentences, both the nominative subject (e.g., we) and the partitive (e.g., of us) generally must be pronominal. Building on Zanuttini and Bernstein (2014), Wood et al. (2015b) proposed that such structures are derived by movement. We are currently testing predictions this proposal makes about more complex structures such as we linguists.

## 3 Project methodology

While one primary goal of our current research is to gain a better understanding of the syntax of pronouns, we are also committed to developing methodologies that can be used by dialect syntacticians working in other empirical domains. This section provides some information about these methodologies, and further details are provided in the supplementary materials.

## 3.1 Examining existing sources

In our efforts to document grammatical variation within North American English, we have examined a broad swath of relevant literature on English morphosyntactic variation (http://ygdp.yale.edu/references) in order to compile a list of phenomena, build a bibliography, and collect examples. We have also explored less formal forums such as blogs and social media sites in search of the same kinds of information – we have found that these can be a particularly rich source of data and commentary on understudied and/or new phenomena.

We store bibliographic metadata and PDFs of our sources in a group Zotero database, which allows us to search across all our sources, tag and filter sources by the phenomena they discuss, and export references in whatever format is required for a particular publication venue. The bibliography is publicly accessible at https://www.zotero.org/groups/yale_grammatical_diversity_project/items.

## 3.2 Plotting attested examples

We use Google Fusion Tables to catalog the attested examples we have found in the literature. Each example is associated with metadata including the source where we found it, the phenomenon (or phenomena) it illustrates, the nature and date of attestation, the acceptability of the sentence for the speaker, and the speaker’s speech variety, ethnicity, age, socioeconomic class, locale, and region. We assign geographic coordinates to each example based on the descriptions given in the original source, which allows us to display these examples in interactive Google Maps embedded on our website. Each example is displayed as a pin on the map, and hovering over a pin will reveal the metadata available for that example.

One key methodology we have used to gather new data is to administer acceptability judgment surveys online.

## 3.3.1 Design

In our online surveys, we collect both demographic information about each participant and their acceptability judgments on a set of sentences. All sentences are provided in written form, accompanied by a 5-point Likert scale, bounded by 1 (labeled “totally unacceptable, even in informal settings”) and 5 (labeled “totally acceptable”). Our supplementary materials include a survey manual with details about our design decisions, as well as a complete sample survey in Qualtrics (QSF) format. That file contains our survey instructions and questions, as well as all flow logic, formatting, and answer choices. We also include an annotated version of the same survey in PDF format.

Our surveys are designed and hosted using Qualtrics online software. So far, we have administered these surveys using Amazon Mechanical Turk (MTurk), an online crowdsourcing platform that allows “requesters” to pay freelance workers to complete “Human Intelligence Tasks” (HITs). Workers select which HITs they want to complete; examples include tagging photographs based on their content, transcribing audio recordings, and increasingly, academic research experiments and surveys. A number of studies have validated the use of MTurk for social science research (Behrend et al. 2011; Sprouse 2011; Johnson and Borden 2012), and it is quickly becoming a popular tool for experimental syntax and semantics specifically (Gibson et al. 2011; Kotek et al. 2011; Sprouse 2011; Karttunen 2014; Erlewine and Kotek 2016).

We find online crowdsourcing to be an ideal way to distribute our surveys because it allows us to very quickly and inexpensively receive responses from hundreds of participants distributed widely throughout the United States. As we show in Wood et al. (2015a), our participant pool is quite diverse in terms of age, gender, education, and income, but not in terms of race/ethnicity, mirroring Ipeirotis’s (2010) findings for the MTurk user population as a whole.6 This means that we can use our MTurk surveys to test the influence of some but not all social variables that may play a role in language variation; in particular, other methods are better suited for studying variation that correlates with race/ethnicity.

## 3.3.3 Data processing

After downloading our survey results from Qualtrics in CSV format, we process the dataset so that it can be used for geospatial and statistical analysis. One primary task is to geocode the data; that is, to add geographic coordinates for each location provided by the participant. A second major task is to remove responses that we cannot use because the participant:

• did not complete the survey,

• completed the survey more than once,

• grew up outside the United States,

• or failed the controls.

We typically end up keeping only about 50% of responses. For further details about our geocoding workflow and response exclusion criteria, see the survey manual in the supplementary materials.

## 3.4 Conducting geospatial analysis

Our survey methodology lends itself nicely to studying geographical variation in acceptability judgments, since we get participants from all across the country. This benefit does come with an analytical problem: acceptability judgments are gradient, and in most cases, their spatial distribution is gradient too. So we face two kinds of questions. First, how do we represent the spatial distribution of judgments graphically, so that one can examine a map and glean patterns from it? Second, when are the patterns we see statistically reliable?

For the first question, there are many possibilities. After piloting some of them, we determined that a twofold approach is the most useful. First, we plot the individual participants’ primary childhood residence as points on a map. Often, we present judgments of 1–2 in one color, and 4–5 as a second color, omitting 3s. This is only for visual presentation; 3s are included in all calculations, including the interpolation and hot spot analyses discussed below.

Second, we use interpolation to visualize patterns of values. Interpolation fills in, for each part of the map, what its expected value is, based on a computation over the values of the points closest to it. We use the inverse-distance weighted algorithm (see Wood [2016]). We visualize the interpolation with different shades for 1–2, 2–3, 3–4, and 4–5. The result smooths over the points to reveal broader patterns in a visually clear way.

Together, collapsing the values of the point data (1 and 2 as the same color and 4 and 5 as the same color) and projecting interpolation under them generally provide a good visual overview of a dataset. But it is not always clear whether a geographic pattern is statistically reliable. One useful tool for this is the $Gi*$ statistic (Grieve et al. 2011; Tamminga 2013), referred to as a “hot spot” analysis in ArcGIS, the software we use for mapping and geospatial analysis. The hot spots test is conducted for each data point. If there are any hot spots, we then draw borders around them to indicate an overall “region” of contiguous hot spots.7

An example is shown in Figure 1, which shows the results of the sentence Here’s you some money. The shaded interpolation reveals that the sentence is judged better in the South than in other areas. The preponderance of green dots there (representing judgments of 4 or 5), versus black dots (representing judgments of 1 or 2), reinforces this, and tells us what the interpolation is based on. Some areas, for example, have more data than others, and this is valuable information. The red border indicates a hot spot region, and blue borders indicate cold spot regions. This tells us that the pattern we see is statistically significant.

Figure 1:

Here’s you some money.

Another technique we find useful is to overlay previously postulated dialect boundaries over our data. We can append each data point with its dialect region, and include that information in a regression analysis along with age, sex, race, etc. We also average over such regions and map them out. For example, the map in Figure 2 shows the average judgments for do-support with the have yet to construction (see Section 4). The dialect regions come from the Atlas of North American English (Labov et al. 2006). The darker the shade of blue, the higher the judgment.8 This map shows that such sentences are degraded for Southern speakers, but we find many acceptances in several areas in the North. Figure 3 shows the values and 95% confidence intervals for these regions; a one-way ANOVA reveals that the differences among means is statistically significant (F[12, 494] = 3.054, p = 0.0004); Tukey-corrected multiple comparisons reveal significant pairwise differences between the South and both Inland North and Western Pennsylvania, and between Western Pennsylvania and New York.

Figure 2:

Do-support with the have yet to construction.

Figure 3:

Do-support with the have yet to construction.

## 4 Overview of project results

In this section, we briefly discuss some of what we have been finding in our ongoing survey work.

## 4.1 Interspeaker variation “in every room”

In many cases we find that some sentence or construction has a particular geographical distribution (see Figure 1). We will discuss examples of this kind of result below. In many other cases, however, we find rampant interspeaker variation without any geographic or demographic correlate. Some examples of this type are shown in (4).

(4)
a.
 Shouldn’t have Pam remembered her name? (Johnson 1988: 160 (13a))
b.
 John threatened me to come to my house. (Hartman 2011: 127 (33b); see also Zubizarreta 1982)
c.
 John seems like Mary defeated him. (Asudeh and Toivonen 2012: 329 (20b); see also Rogers 1973)

In (4a), two auxiliaries appear to the left of the subject of a yes–no question (rather than the standard single auxiliary). Sentence (4b) exemplifies subject control in the presence of an indirect object, an important construction type in the control literature (see Landau [2013: 149 and references therein]). Copy raising, recently discussed by Landau (2011) and Asudeh and Toivonen (2012), is exemplified in (4c) – in this case involving an embedded object pronoun (him) that matches the matrix subject (John). All three of these sentence types vary across speakers, but none of the variation, as far as we can tell, has geographic correlates. See Wood et al. (2015a, 307) for a map of (4a) and Wood (2016) for a map of (4b). Figure 4 presents a map of a variant of (4c).

Figure 4:

John seems like Mary offended him.

These kinds of results are certainly open to interpretation; the point here is just that a variety of sentences that are reported in the syntax literature to vary across speakers exhibit “variation in every room” – that is, interspeaker variation that might be expected in any given room of native speakers. Knowing what kinds of syntactic phenomena vary across individuals in the same speech community raises a host of interesting questions about language acquisition and online sentence processing. Moreover, generative syntacticians aim to understand interspeaker variation in general, whether it is tied to geography or not (Kayne 2013: 133).

## 4.2 Implicational relationships

One kind of result that has been theoretically illuminating, independent of geographic concerns, involves implicational relationships of the kind familiar from linguistic typology (Greenberg [1963, 1966] and later work; see Szmrecsanyi and Kortmann [2009] and Siemund [2013] for further discussion of these relationships across English dialects). For example, Tyler and Wood (forthcoming) study survey results focusing on the have yet to (HYT) construction, illustrated with sentences like (5). One of the questions they pursue is whether have is an auxiliary, leading us to expect a yes–no question as in (6a), or a main verb, leading us to expect a yes–no question as in (6b).

(5)
 I have yet to visit my grandmother.
(6)
a.
 Have you yet to visit your grandmother?
b.
 Do you have yet to visit your grandmother?

There is quite a bit of variation in the judgments of sentences like those in (6). The quantitative results lead Tyler and Wood (forthcoming) to two conclusions. First, there are enough speakers who accept (6b) to take it to be a genuine syntactic option for many speakers. Second, speakers who accept (6b) are overwhelmingly likely to accept (6a) as well, but not the other way around: many speakers accept (6a) but reject (6b). Tyler and Wood (forthcoming) develop a syntactic analysis of the HYT construction intended to derive this asymmetry. Results like this strike us as one key area where quantitative studies can help raise (and answer) interesting theoretical questions.

## 4.3 Geographic diffusion

Because we ask for the ages of our survey participants, we are able to analyze changes in a grammatical phenomenon’s geographic distribution over time. An example of this comes from the “personal dative” construction (Christian 1991; Webelhuth and Dannenberg 2006; Horn 2008, Horn 2013; Gerwin 2014: Ch. 7; Hutchinson and Armstrong 2014). As illustrated in (7), the dative (her) is obligatorily pronominal and coreferent with the subject (she):

(7)
 She has her a new boyfriend.

Personal datives were thought to be a phenomenon characteristic of the South (Webelhuth and Dannenberg 2006), though some (e.g., Christian 1991: 18) have conjectured that it may be found in other vernacular varieties as well. Our research has shown that it may be spreading geographically in “apparent time” (Bailey et al. 1991). Among speakers over 40, the construction is primarily accepted in the South. Among speakers between 18 and 30, however, acceptance is much more widespread. (Speakers between 31 and 40 fall somewhere in the middle.) This result, which may reflect what Horn (2008: 176) calls the “Braxton effect,”9 illustrates the kind of results we might expect when we intersect geographic region with other linguistically relevant social categories.

## 4.4 Known geographic distributions

A final kind of result involves confirming or elaborating on previously hypothesized geographic distinctions. We will discuss one example of each, both taken from Wood (2016), to which the reader is referred for more details.

For an example of confirmation, the so don’t I construction previously mentioned in (1e) has long been thought to be restricted to eastern New England (Labov 1972: 815; Hall 2013). This is borne out in our survey data. While nearly half of the participants in eastern New England accept sentences like (1e), exceedingly few outside of eastern New England do.

For an example of elaboration, the be done my homework construction previously mentioned in (1e) has been thought to be characteristic of Canadian, Vermont, and Philadelphia English (Labov 2001; Yerastov 2008, Yerastov 2010, Yerastov 2015; Fruehwald and Myler 2015).10 This is borne out in our survey data, but we also find a more complex picture: it is favored not only in Philadelphia, but also in surrounding areas such as Delaware, southern New Jersey, and Maryland. It is also highly favored in much of New England, including not just Vermont, but New Hampshire, eastern Massachusetts, and Maine.

## 4.5 Summary

In sum, our primary research objectives stem from theoretical questions, and our survey methodology is designed to collect data that will bear on syntactic theory. But since our surveys also collect geographic and demographic information, we can analyze this information to determine which demographic factors, if any, constrain the phenomena we study. Knowing this can certainly help us to advance knowledge relevant to syntacticians, at the very least to help us find speakers of a construction we are interested in. Beyond that, we also aim, with this aspect of our project, to build on a rich tradition of dialect research and contribute to a broader understanding of how syntactic variation distributes across speakers of North American English.

## 5 Project outreach

As should be clear by now, the Yale Grammatical Diversity Project consists of many components. This has allowed us to collaborate with linguists at different levels of seniority and experience, from undergraduates with limited background, to graduate students, postdoctoral fellows, and faculty members. Each person has the opportunity to learn new content and skills, and to be involved in the mentoring of less experienced team members.

We have integrated our research into our teaching. For example, we have offered a course called “Grammatical Diversity in US English” as a freshman seminar, as a seminar for linguistics majors, and as an advanced syntax seminar open to both advanced undergraduate and graduate students. Our students tell us they have enjoyed sharing their new understanding of linguistic variation with interested friends and family members. They also valued the opportunity to meet professional linguists in the classroom, when we were able to invite the author(s) of their readings or public intellectuals who address interspeaker variation in American English in the media. Beyond teaching, we have advised PhD dissertations (Harris In progress; Matyiku 2017) and senior essays on topics related to the project.

We share our work with linguists outside our institution by publishing in a broad variety of venues; by seeking opportunities to give talks at conferences, workshops, and departmental colloquia; and by inviting other linguists with relevant interests to speak at our group meetings. We have also organized two workshops at annual meetings of the Linguistic Society of America. Both represented valuable opportunities to present our work to other members of our professional organization and to give exposure to the work of younger colleagues working on these topics, whether at Yale or at other institutions.

To share our findings and insights with non-linguists, we have been using our website (described in Section 2), Zotero bibliographic database (see Section 3.1), public lectures, and various types of media outlets. These include a Facebook page (http://www.facebook.com/YaleGramDiv/), a blog (http://ygdp.wordpress.com/), two op-ed pieces (Zanuttini 2014, Zanuttini 2015), and press interviews. We value these opportunities to share what we learn from this project with a wider audience, not only to pass on what we know, but also to correct some misconceptions about language in general and dialects in particular. For example, some people think that one may find different lexical items and different “accents” in American English, i.e., differences in the phonological system, but not differences in the grammar (the morphosyntactic system); when grammatical differences are noted, they are attributed to simple ignorance of the “correct grammar” of English. As linguists, we can address such misconceptions, to raise awareness of what it means for an individual to master a language and, perhaps even more important, to discredit prejudice masquerading as the preservation of “correct” or “proper” grammar.

## 6 Conclusion

We have presented an overview of the goals and methods of the Yale Grammatical Diversity Project and sketched the types of results we have been finding. We see this work as informing and being informed by theoretical syntax, as well as illuminating the factors that correlate with the distribution of grammatical phenomena. We also see this line of work as a great opportunity to capitalize on existing public interest in language variation to illustrate the different ways in which linguists approach the study of language, to dispel myths, and to make scientific results accessible outside of academia.

## Acknowledgement

Our work has benefited from interactions with more colleagues and students than we could possibly mention here, but to name just a few, we are especially grateful for ongoing discussions and collaborations with Judy Bernstein, Bob Frank, Lisa Green, Bill Haddican, Tricia Irwin, Greg Johnson, Goldie Ann McQuaid, Neil Myler, Teresa O’Neill, and Christina Tortora. Thanks also to the audiences at a variety of conferences where we have presented parts of the work described here. We are indebted to Bernd Kortmann and Eric Potsdam for their incisive comments on our manuscript. Finally, thanks to former and current members of the project for their contributions: Matt Barros, Phoebe Gaston, Alysia Harris, Nick Huang, Aidan Kaplan, Luke Lindemann, Zach Maher, Sabina Matyiku, Tom McCoy, Rachel Regan, Katie Ruffing, Peter Staub, Dennis Storoshenko, and Matt Tyler.

## Footnotes

• 1

In order to reliably assess individual patterns relevant to grammar, it is often necessary to use quantitative statistical tests that distinguish the signal from the noise. See Sections 3.4 and 4 for further discussion.

• 2

See Grieve (2016, ch. 1) for a comprehensive history of this project.

• 3

Grieve (2009, 2016) does consider some variables that might be of interest to those working on morphosyntax, as do the surveys cited above. He also has begun to replicate findings from the letters to the editor corpus with a Twitter corpus (Huang et al. 2016).

• 4

Schneider (2012) and the contributions in Schneider (2008) specifically address North American English as part of the worldwide project, however.

• 5

The categories used in Kortmann and Lunkenheimer (2012, 2013) are “A: feature is pervasive or obligatory; B: feature is neither pervasive nor extremely rare; C: feature exists, but is extremely rare; D: attested absence of feature; X: feature is not applicable (given the structural make-up of the variety); ?: no information on feature is available.” In our reading of this work, there seems to be a blurring of interspeaker and intraspeaker variation: it is not clear to us whether a property’s rarity means that it happens not to come up much in natural speech because it is lexically or pragmatically restricted, or that it could come up often in natural speech but does not (e.g., agreement forms), or that only a few speakers of the variety use it.

• 6

See also Ipeirotis (2015) and http://demographics.mturk-tracker.com/ for more up-to-date research on MTurk user demographics.

• 7

The regions are drawn with Voronoi polygons, a polygon drawn around a point. Within that polygon, every space is closer to that point than to any other point on the map. In this way, the geographical space is partitioned completely around all of the points. These polygons serve as the basis for the borders around the significant hot/cold spots.

• 8

The colors here are divided up automatically with the Natural Breaks (Jenks) function.

• 9

This refers to the role that Toni Braxton’s 1996 hit song “I Love Me Some Him” may have had in spreading the use or awareness of personal datives beyond the South.

• 10

Labov (2001: 46) notes that this construction likely has its roots in Northern Irish varieties, whereas Yerastov (2010) explores the possibility that it originates in Scots English.

