Unable to retrieve citations for this document
Retrieving citations for document...
Publicly Available
September 18, 2012
Unable to retrieve citations for this document
Retrieving citations for document...
Requires Authentication
Unlicensed
Licensed
September 18, 2012
Abstract
Using data from a 100-million-word representative corpus and a large-scale acceptability survey, we have investigated the relationship between corpus data and acceptability judgments. We conclude that the relative proportions of morphosyntactic variants in a corpus are the most significant predictor of a variant's acceptability to native speakers, and that in particular high relative proportions of one variant in a corpus are reliable indicators of high acceptability to native speakers. At the same time we note the limits of this predictability: low-frequency items, as noted elsewhere in the literature, often enjoy high levels of acceptability. Statistical preemption thus appears as a more limited phenomenon than had heretofore been posited.
Unable to retrieve citations for this document
Retrieving citations for document...
Requires Authentication
Unlicensed
Licensed
September 18, 2012
Abstract
Speech researchers often rely on human annotation of prosody to generate data to test hypotheses and generate models. We present an overview of two prosodic annotation systems: ToBI (Tones and Break Indices) (Silverman et al., 1992), and RaP (Rhythm and Pitch) (Dilley & Brown, 2005), which was designed to address several limitations of ToBI. The paper reports two large-scale studies of inter-transcriber reliability for ToBI and RaP. Comparable reliability for both systems was obtained for a variety of prominence- and boundary-related agreement categories. These results help to establish RaP as an alternative to ToBI for research and technology applications.
Unable to retrieve citations for this document
Retrieving citations for document...
Requires Authentication
Unlicensed
Licensed
September 18, 2012
Abstract
Although widely seen as critical both in terms of its frequency and its social significance as a prime means of encoding and perpetuating moral stance and configuring self and identity, conversational narrative has received little attention in corpus linguistics. In this paper we describe the construction and annotation of a corpus that is intended to advance the linguistic theory of this fundamental mode of everyday social interaction: the Narrative Corpus (NC). The NC contains narratives extracted from the demographically-sampled subcorpus of the British National Corpus (BNC) (XML version). It includes more than 500 narratives, socially balanced in terms of participant sex, age, and social class. We describe the extraction techniques, selection criteria, and sampling methods used in constructing the NC. Further, we describe four levels of annotation implemented in the corpus: speaker (social information on speakers), text (text Ids, title, type of story, type of embedding etc.), textual components (pre-/post-narrative talk, narrative, and narrative-initial/final utterances), and utterance (participation roles, quotatives and reporting modes). A brief rationale is given for each level of annotation, and possible avenues of research facilitated by the annotation are sketched out.