Martin Dittus, Mark Graham
November 29, 2019
Wikipedia is one of the predominant ways in which internet users obtain knowledge about the world. It is also one of the most important mirrors, or augmentations, of the world: it contains representations of all manner of places. However, Wikipedia’s knowledge of the world is characterised by a linguistic inequality. Although it is written in a growing number of languages, some languages are overrepresented and contribute significantly more to Wikipedia’s body of knowledge than others. This deeply affects how the world is represented on Wikipedia, and by whom: it has been shown that for many countries in the Global South, there are more articles written in English than in their respective native languages. As a result, a significant number of people are being excluded from the collective process of knowledge production, solely on the basis of their native language. Who writes these representations of local places, and for which audiences? We present early findings from the first study of Wikipedia’s geolinguistic contours. We investigate to what extent local languages are involved in the process of creating local representations. In a large-scale quantitative analysis across the almost 300 language versions of Wikipedia, we identify regions of the world where local languages such as Armenian, Catalan or Malay are dominant sources of representation for local places, and we contrast these findings with instances where representations are significantly shaped by foreign languages. Where do, and do not, we see significant amounts of local content available in local languages? Where are the most detailed local representations largely written in foreign languages, intended for foreign audiences? And what factors can explain this?