This rather programmatic paper discusses the use of parallel corpora in the typological study of grammatical categories. In the author's earlier work, tense-aspect categories were studied by means of a translational questionnaire, and cross-linguistic gramtypes were identified through their distribution in the questionnaire. It is proposed that a similar methodology could be applied to multilingual parallel corpora. The possibility of identifying grammatical markers by word-alignment methods is demonstrated with examples from Bible texts.
A central methodological issue in language typology is sampling how to choose a representative set of languages for a typological investigation. Most proposed typological sampling methods are a priori in the sense that they are based on assumed, rather than observed, effects of biasing factors such as genealogical and areal proximity. The advent of the World Atlas of Language Structures (WALS) creates for the first time a chance to attempt a posteriori sampling. The basic idea is to create a sample by removing from the set of available languages one member of each pair of languages whose typological distance as defined in terms of the features in WALS does not reach a predefined threshold. In this way, a sample of 101 languages was chosen from an initial set of the 222 languages that are best represented in WALS. The number of languages from different macroareas in this sample can be taken as an indication of the internal diversity of the area in question. Two issues are discussed in some detail: (i) the high diversity of the indigenous languages of the Americas and the tendency for these to be underrepresented by previous sampling methods; (ii) the extreme areal convergence of Mainland South East Asian languages. It is concluded that areal factors cannot be neglected in typological sampling, and that it must be questioned whether the creation of elaborate sampling algorithms makes sense.