Skip to content
BY-NC-ND 3.0 license Open Access Published by De Gruyter Open Access August 26, 2015

Geo-spatial modelling with unbalanced data:modelling the spatial pattern of human activityduring the Stone Age

  • Jarosław Jasiewicz and Iwona Sobkowiak-Tabaka
From the journal Open Geosciences


With the increasing availability of data, geoscienceprovides many methods to model the spatial extentof various phenomena.Acquiring representative, highquality data is the most important criterion to assess thevalue of any spatial analysis, however, there are many situationsin which these criteria cannot be fulfilled. Archiveddata, collected in the past, for which analysis cannot berepeated or supplemented is a very common informationsource. Archaeological data collected at a regional extentduring years of field work and superficial observations arean additional example. Such data rarely provide representativesamples and are usually imbalanced; only very fewexamples contain useful data, while many examples remainwithout any archaeological traces. In spite of theselimitations archaeological information presented in theform of maps can be a useful and helpful tool to analysethe spatial patterns of some phenomena and, from a morepractical point of view, a tool to predict the location ofundiscovered occurrences. The primary goal of this paperis to present a methodology for modelling spatial patternsbased on imbalanced categorical data which do not fulfilthe criteria of spatial representation and incorporatesuncertainty in its decision process. This concept will bediscussed using a collection of Stone Age sites and set ofenvironmental variables from the postglacial lowlands inWestern Poland. We will propose a machine-learning systemwhich adopts CART through bootstrap simulation toincorporate uncertainty into the spatial model and utilisethat uncertainty in the decision-making process. Finally,we will describe the relationships between the model andenvironmental variables and present our results in cartographicform using the principles of decision-tree cartography.


[1] Makropoulos, C.K., Butler, D., Spatial decisions under uncertainty:fuzzy inference in urban water management. Journal Of Hydroinformatics, 2004, 6, 3–18.10.2166/hydro.2004.0002Search in Google Scholar

[2] Willey, G.R., Prehistoric Settlement Patterns in the Viru‘ Valley,Peru. Bureau of American Ethnology Bulletin, 1953, 155, Washington,D.CSearch in Google Scholar

[3] Williams, S., 1956. Settlement patterns in the lower Mississippivalley. In: Willey, G. (Ed.), Prehistoric Settlement Patterns in theNew World, Viking Fund Publications in Anthropology, Willey,New York, 1956, 52–62.Search in Google Scholar

[4] Williams, L., Thomas, D., Bettinger, R., Notions to Numbers: GreatBasin Settlements as Polythetic Sets in Redman C.L., (Ed.) Researchand Theory in Current Archaeology, JohnWiley, Sons, NewYork, 1973, 215-237.Search in Google Scholar

[5] Warren, R.E., Predictive modelling of archaeological site location:a case study in the Midwest in: Allen A., Green B., ZubrowW., (Eds.) Interpreting Space: GIS and archaeology, Taylor, FrancisLtd., London, 1990, 201-215.Search in Google Scholar

[6] Leusen van, P.M., A review of wide-area predictive modeling usingGIS. In: Pattern to process: methodological investigationsinto the formation and interpretation of spatial patterns in archaeologicallandscapes, Phd Thesis, University Library Groningen,2002, 1–21.Search in Google Scholar

[7] Verhagen, P., & Whitley, T. G., Integrating ArchaeologicalTheory and Predictive Modeling: A Live Report from theScene. Journal of Archaeological Method and Theory, 2012,19. doi:10.1007/s10816-011-9102-710.1007/s10816-011-9102-7Search in Google Scholar

[8] Barceló, J.A., Piana, E., Martinioni, D., Archaeological spatialmodelling: a case study from Beagle Channel (Argentina) BritishArchaeological Reports, 2002, 1016Search in Google Scholar

[9] Ebert, J.I., Camilli, E.L., Berman, M.J., GIS in the Analysis of DistributionalArchaeological Data. In D. G. Herbert, ed. New MethodsOld Problems Geographic Informaton Systems in Modern ArchaeologicalResearch. Occasional Paper 23, 1996, 25–38.Search in Google Scholar

[10] Vanacker, V., Using Monte Carlo Simulation for the EnvironmentalAnalysis of Small Archaeologic Datasets, with the Mesolithicin Northeast Belgiumas a Case Study. , Journal of ArchaeologicalScience, 2001, 28(6), 661–669.10.1006/jasc.2001.0654Search in Google Scholar

[11] Warren, R. E., Asch, D. L., A predictive model of ArchaeologicalSite Location in the Eastern Prairie Peninsula. In: Practical applicationsof GIS for archaeologist. A predictive modeling kit. Taylor& Francis London, 2003, 5–32.10.1201/b16822-3Search in Google Scholar

[12] Duke, C., Quantifying Palaeolithic landscapes: computer approachesto terrain analysis and visualisation in: Doerr, M., Sarris,A., (Eds.), The Digital Heritage of Archaeology, Hellenic Ministryof Culture, Athens, 2003, 139-146.Search in Google Scholar

[13] Fry, G.L.A., Skarb, B., Jerpåsenb, G., Bakkestuenc, V., Erikstadc,L., Locating archaeological sites in the landscape: a hierarchicalapproach based on landscape indicators. Landscape and UrbanPlanning, 2004, 67, 97–107.10.1016/S0169-2046(03)00031-8Search in Google Scholar

[14] Fletcher, R., Some spatial analyses of Chalcolithic settlement inSouthern Israel, Journal of Archaeological Science, 2008, 35(7),2048-2058.10.1016/j.jas.2008.01.009Search in Google Scholar

[15] Vaughn, S., Crawford, T., A predictive model of archaeologicalpotential: An example from northwestern Belize. Applied Geography,2009, 29(4), 542–555.10.1016/j.apgeog.2009.01.001Search in Google Scholar

[16] Dalla Bona, L., Cultural heritage resource predictive modelingproject, Volume 3, Methodological considerations. Centre for ArchaeologicalResource Prediction, Lakehead University, ThunderBay, 1994, 3. Retrieved from in Google Scholar

[17] Stančič, Z., Kvamme, K., Settlement Pattern Modelling throughBoolean Overlays of Social and Environmental Variables in:Barcelo, JA, Briz., I, Vila, A., (Eds.), New Techniques for Old Times- CAA98. Computer Applications and Quantitative Methods in Archaeology. BAR International Series 757, 1994, 231-237,Search in Google Scholar

[18] Kvamme, K. L., Archaeological modeling with GIS at scales largeand small. GIS Symposium Reading the Historical Spatial Informationin the World Studies for Human Cultures and Civilizationsbased on Geographic Information Systems held in Kyoto Japan,February. Vol. 7. No. 11. 2005.Search in Google Scholar

[19] Barceló, J. A., Multidimensional Spatial Analysis in Archaeology:Beyond the GIS Paradigm. Presented at the GIS SymposiumReading the Historical Spatial Information in the World Studiesfor Human Cultures and Civilizations based on Geographic InformationSystems held in Kyoto, Japan February 7,11, 2005.Search in Google Scholar

[20] Espa, G., Benedetti, R., Demeo, A., Ricci, U., Espa, S., GIS basedmodels and estimation methods for the probability of archaeologicalsite location. Journal of Cultural Heritage, 2006, 7(3),147–155.10.1016/j.culher.2006.06.001Search in Google Scholar

[21] Jasiewicz, J., Hildebrandt-Radke, I., Using multivariate statisticsand fuzzy logic system to analyse settlement preferences inlowland areas of the temperate zone: an example from the PolishLowlands. Journal of Archaeological Science, 2009, 36(10),2096–2107.10.1016/j.jas.2009.06.004Search in Google Scholar

[22] Jones, E.E., An analysis of factors influencing sixteenth andseventeenth century Haudenosaunee (Iroquois) settlement locations.Journal of Anthropological Archaeology, 2010, 29, 1–14.10.1016/j.jaa.2009.09.002Search in Google Scholar

[23] Bongers, J., Arkush, E., Harrower, M., Landscapes of death: GISbasedanalyses of chullpas in the western Lake Titicaca basin.Journal of Archaeological Science, 2012, 39: 1687–169310.1016/j.jas.2011.11.018Search in Google Scholar

[24] Grønnow, B., Meiendorf and Stellmoor revisited: an analysis ofLate Palaeolithic reindeer exploitation. Acta Archaeologica, 1985,56, 131–166.Search in Google Scholar

[25] Mithen, S.J., Thoughtful Foragers: A Study of Prehistoric DecisionMakers, Cambridge University Press, 1990.10.1017/CBO9780511752964Search in Google Scholar

[26] Drennan, R.D., Statistics for Archaeologists: A CommonsenseApproach. Plenum Press, New York, 1996.10.1007/978-1-4899-0165-1Search in Google Scholar

[27] Delicado, P. Statistics in archaeology: new directions, BAR INTERNATIONALSERIES 1999, 757, 29-38.10.1002/tl.7904Search in Google Scholar

[28] Leusen, P. M. van., GIS and locational modeling in Dutch archaeology:a review of current approaches. New Methods, Old Problems:Geographic Information Systems in Modern ArchaeologicalResearch. Center for Archaeological Investigations, OccasionalPaper, 2000, 23.Search in Google Scholar

[29] De Meo, A., Espa, G., Espa, S., Pifferi, A., Ricci, U., A GIS for thestudy of the mid-Tiber valley. Comparisons between archaeologicalsettlements of the Sabine Tiberine area. Journal of CulturalHeritage, 2003, 4(3), 169–173.10.1016/S1296-2074(03)00043-8Search in Google Scholar

[30] Weinand, D.C., A study of parametric versus non-parametricmethods for predicting paleohabitat from Southeast AsianBovid astragali. Journal of Archaeological Science, 2007, 34(11),1774–1783.10.1016/j.jas.2006.12.016Search in Google Scholar

[31] Japkowicz, N., Learning from imbalanced data sets: a comparisonof various strategies. AAAI workshop on learning from imbalanceddata sets, 2000, 0–5.Search in Google Scholar

[32] Estabrooks, A., Jo, T., Japkowicz, N., A Multiple ResamplingMethod for Learning from Imbalanced Data Sets. ComputationalIntelligence, 2004, 20(1), 18–36.10.1111/j.0824-7935.2004.t01-1-00228.xSearch in Google Scholar

[33] Garcia, E.A., Haibo, H., Learning from Imbalanced Data. IEEETransactions on Knowledge and Data Engineering, 2009, 21(9), 1263–1284.10.1109/TKDE.2008.239Search in Google Scholar

[34] Goldman, L. et al., A computer-derived protocol to aid in the diagnosisof emergency room patients with acute chest pain. TheNew England journal of medicine, 1982, 307(10), 588-596.10.1056/NEJM198209023071004Search in Google Scholar

[35] Cook, E.F., Goldman, L., Empiric comparison of multivariate analytictechniques: advantages and disadvantages of recursivepartitioning analysis. Journal of chronic diseases, 1984, 37(9),721–731.10.1016/0021-9681(84)90041-9Search in Google Scholar

[36] Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A., Classificationand regression trees. Wadsworth, Brooks. Monterey, CA.1984.Search in Google Scholar

[37] Li, R.H., Belford, G.G., Instability of decision tree classificationalgorithms. In Proceedings of the eighth ACM SIGKDD internationalconference on Knowledge discovery and data mining. NewYork, NY, USA: ACM, 2002, 570–575.10.1145/775047.775131Search in Google Scholar

[38] White, D., Sifneos, J.C., Regression Tree Cartography. Journal ofComputational and Graphical Statistics, 2002, 11(3), 600–614.10.1198/106186002484Search in Google Scholar

[39] Stepinski, T., Ranatunga, T., Jasiewicz, J., Identifying SpatiallyInhomogeneous Relationships Between Drainage Density and ItsControlling Variables, Americal Geophysical Union, Fall Meeting,San Francisco, CA, 5-9 Dec 2011,Search in Google Scholar

[40] Dylik, J., Coup d’oeil sur la Pologne periglaciare. BiuletynPeryglacjalny, 1956, 4, 195–238.Search in Google Scholar

[41] Kozarski, S., Time scales and the rhythm of Vistulian geomorphicevents in the Polish Lowland. Czasopismo Geograficzne,1986, 57, 247–270.Search in Google Scholar

[42] Kobusiewicz, M., Ludy łowiecko-zbierackie północnozachodniejPolski, Wydawn. Poznanskiego TowarzystwaPrzyjacio Nauk. 1999.Search in Google Scholar

[43] Sobkowiak-Tabaka, I., Społeczności późnego paleolitu wdorzeczu Odry. (Ph.D Thesis, Polish Academy of Science), 2011.Search in Google Scholar

[44] Sobkowiak-Tabaka, I., The recolonisation of the Polish Lowland- new ideas and discoveries. In J. Riede, F., Tallavaara, M.,Apel, ed. Lateglacial and postglacial pioneers in northern Europe.Oxbow, Oxford. 2013, 53-65.Search in Google Scholar

[45] Jaskanis, D., Polish national record of archaeological sites. Generaloutline. In: Larsen, C.V. (Ed.), Sites and Monuments. NationalArchaeological Records. National Museum of Denmark, Copenhaguen,1992, 81–87Search in Google Scholar

[46] Prinke, A., Polish National Record of Archaeological Sites. AComputerization in: Larsen C.V., (Ed.) Sites and Monuments. NationalArchaeological Records, Copenhaguen, National Museumof Denmark, 1992, 89-94.Search in Google Scholar

[47] Prinke, A., Can developing countries afford national archaeologicalrecords? The Polish answer in: Hansen H.J., Quine G., (Eds.)Our Fragile Heritage. Documenting the Past for the Future, Copenhaguen,National Museum of Denmark, 1999,147-154.Search in Google Scholar

[48] Barford P., Brzeziński W., Kobyliński, Z., The past, present andfuture of the Polish Archaeological Record Project in: Bintliff J.,Kuna M., Venclova N. (Eds.) The Future of Surface Artefact Surveyin Europe, Shefleld. Shefleld University Press, 2000, 73-92.Search in Google Scholar

[49] Prinke, A., Introducing Information Technology to ArchaeologicalResource Management: Towards a GIS-Based SMR of MidwesternPoland" in: Garcia Sanjuan L., Wheatley D., (Eds.) Mappingthe Future of the Past. Managing the Spatial Dimensionof the European Archaeological Resource, Sevilla, University ofSevilla, 2002, 85-96.Search in Google Scholar

[50] Williams, A.N. et al., Human refugia in Australia during the LastGlacial Maximum and Terminal Pleistocene: a geospatial analysisof the 25–12 ka Australian archaeological record. Journal ofArchaeological Science, 2013, 40(12), 4612–4625.10.1016/j.jas.2013.06.015Search in Google Scholar

[51] GRASS Development Team. Geographic Resources Analysis SupportSystem (GRASS), GNU General Public License., 2015.Search in Google Scholar

[52] Strahler, A. N., Quantitative slope analysis. Geol. Soc. Am. Bull.1956, 67, 571-596.Search in Google Scholar

[53] Moore, I.D., Grayson, R.B., Ladson, A.R., Digital terrain modeling:A review of hydrological, geomorphological, and biologicalapplications. Hydrol. Processes, 1991, 5, 3-30.10.1002/hyp.3360050103Search in Google Scholar

[54] Jasiewicz, J., Metz, M., A new GRASS GIS toolkit for Hortoniananalysis of drainage networks. Computers & Geosciences, 2011,37, 1162-1173.10.1016/j.cageo.2011.03.003Search in Google Scholar

[55] Jasiewicz, J., Stepinski, T., Geomorphons-a pattern recognitionapproach to classification and mapping of landforms. Geomorphology,2013, 182, 147–156.10.1016/j.geomorph.2012.11.005Search in Google Scholar

[56] Marks, L., Ber, A., Gogołek, W., Piotrowska, K., Geologic Map ofPoland, 2006.Search in Google Scholar

[57] Cox, D. R., The regression analysis of binary sequences (withdiscussion). J. Roy. Stat. Soc., 1958, B20, 215–242.10.1111/j.2517-6161.1958.tb00292.xSearch in Google Scholar

[58] Walker, S. H., Duncan, D. B. Estimation of the probability of anevent as a function of several independent variables. Biometrika,1967, 54(1-2), 167-179.10.1093/biomet/54.1-2.167Search in Google Scholar

[59] Alberti, G., 2014. Modeling Group Size and Scalar Stress by LogisticRegression from an Archaeological Perspective. PLoS ONE,9(3), e91510.10.1371/journal.pone.0091510Search in Google Scholar PubMed PubMed Central

[60] Wu, X., Kumar, V., Quinlan, J.R., Top 10 algorithms in data mining.Knowledge and Information Systems, 2008, 14(1), 1–37.10.1007/s10115-007-0114-2Search in Google Scholar

[61] Kearns, M., Valiant, L. Cryptographic limitations on learningBoolean formulae and finite automata. STOC ’89 Proceedings ofthe twenty-first annual ACM symposiumon Theory of computing,1989, 433–444.10.1145/73007.73049Search in Google Scholar

[62] Schapire, R.E., The strength of weak learnability. MachineLearning, 1990, 5, 197–227.10.1007/BF00116037Search in Google Scholar

[63] Uusitalo, L., Lehikoinen, A., Helle, I.,Myrberg, K. An overview ofmethods to evaluate uncertainty of deterministic models in decisionsupport. Environmental Modelling and Software, 2015, 63,24–31.10.1016/j.envsoft.2014.09.017Search in Google Scholar

[64] Efron, B., Tibshirani, R., An introduction to the bootstrap. TeachingStatistic, 1993, 23(2), 49–54.10.1007/978-1-4899-4541-9Search in Google Scholar

[65] Blanchard, F., Vautrot, P., Akdag, H., Herbin, M. Data RepresentativenessBased on Fuzzy Set Theory, Journal of Uncertain Systems,2010, 4(3), 216–228.Search in Google Scholar

[66] Therneau, T. M., Atkinson, B., Ripley, B., T rpart package, 2013.Search in Google Scholar

[67] Delicado, P., del Rio, M., Bootstrapping the general linear hypothesistest. Computational statistics, data analysis, 1994,18(3), 305–316.10.1016/0167-9473(94)90065-5Search in Google Scholar

[68] R Core Team, R: A Language and Environment for StatisticalComputing. Available at: in Google Scholar

[69] Kohavi, R., Provost, F.,. Glossary of terms. Machine Learning,1998, 30(2-3), 271–274.10.1023/A:1017181826899Search in Google Scholar

[70] Spiess, A.E., Reindeer and caribou hunters: an archaeologicalstudy, Academic Press New York. 1979.Search in Google Scholar

[71] Bagniewski, Z., Niektóre zagadnienia osadnictwa mezolitycznegona terenie Polski południowo-zachodniej. Studia Archeologiczne,1987, 15, 1-66.Search in Google Scholar

[72] Daugnora, L., Girininkas, A., S˘iaure˙ s elniu¸ keliai ir ju¸ paplitimaslietuvoje ve˙ lyvajame paleolite. Lietuvos archeologija, 2005, 29,119–132.Search in Google Scholar

Received: 2015-02-20
Accepted: 2015-05-09
Published Online: 2015-08-26

©2015 J. Jasiewicz and I. Sobkowiak-Tabaka

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Downloaded on 11.6.2023 from
Scroll to top button