Unable to retrieve citations for this document
Retrieving citations for document...
Requires Authentication
Unlicensed
Licensed
June 3, 2008
Abstract
This paper investigates the random textual vocabulary coverage of the nine domains in the BNC using sets of five hundred 2000-word samples randomly drawn from each of the domains. Random textual vocabulary coverage is the coverage by the vocabulary of a text or a collection of texts of any given length over that of another text or a collection of texts of any given length. The estimation of random textual vocabulary coverage depends on the determination of the relationship between vocabulary size and text length. Brunet's model proves to be robust in capturing such relationship. A mathematical estimator for random textual vocabulary coverage is developed incorporating Brunet's model. A method for computing the 95% random textual vocabulary coverage interval is also devised.
Unable to retrieve citations for this document
Retrieving citations for document...
Requires Authentication
Unlicensed
Licensed
June 3, 2008
Abstract
Using the Nepali National Corpus, a collocation-based technique is applied to the categorization of Nepali postpositions. Ergative le , accusative lāī , and genitive ko/kā/kī are frequently considered in the literature to be part of the Nepali nominal inflection paradigm, but opinion differs on other postpositions. The most significant collocations of several postpositions are examined for patterns that characterize postpositions as a category or categories. Two overarching patterns — collocation with semantically coherent nouns, and collocation with words for which the postposition functions as a subcategorizer — are identified. The analysis of some postpositions as part of the nominal paradigm is not supported by the analysis of these patterns. However, postpositions traditionally seen as part of the nominal paradigm do collocate more with non-lexical words, especially pronouns.
Unable to retrieve citations for this document
Retrieving citations for document...
Requires Authentication
Unlicensed
Licensed
June 3, 2008
Abstract
Many efforts in corpora annotation start with segmenting discourse into units of analysis. In this paper, we present a method for deciding on segmentation units within Centering Theory (Grosz et al. 1995). We survey the different existing methods to break down discourse into utterances and discuss the results of a comparison study among them. The contribution of our study is that it was carried out with spoken data and in two different languages (English and Spanish). Our comparison suggests that the best unit of analysis for Centering-based annotation is the finite clause. The final result is a set of guidelines for how to segment discourse for Centering analysis, which is also potentially applicable to other analyses.
Unable to retrieve citations for this document
Retrieving citations for document...
Requires Authentication
Unlicensed
Licensed
June 3, 2008
Unable to retrieve citations for this document
Retrieving citations for document...
Requires Authentication
Unlicensed
Licensed
June 3, 2008