The importance of geographic space to minimize the error of representative samples
Keywords:
Regionalization, spatial stratification, spatial samplingAbstract
This paper discusses the importance of geographic space in the context of generating a sample framework for surveys, questioning the traditional statistical premise of randomness and independence of the number of observations. The contribution of quantitative geography in the generation of regionalization methodologies is analyzed, since these allow the improvement of the sampling error of the surveys, focusing mainly on urban areas, and in the presence of stratification variables with spatial autocorrelation.
Regionalization algorithms with and without heuristic optimization processes are empirically tested, using census data, to subsequently define the level of error and establish comparisons against traditional random and two-stage random sampling, using a Monte Carlo procedure.
The results obtained show a decrease of up to 20% in error against traditional methodologies or alternatively, a reduction of up to 100 cases with the same level of error. It is concluded that spatialized sampling methodologies with heuristic optimization offer advantages in urban areas, in the presence of spatial autocorrelation.
Downloads
References
Borchsenius, L. (2001). From a Conventional To a Register-Based Census of Population. Census Seminar, 20–21. Retrieved from http://www.demography-lab.prd.uth.gr/european-census/Files/general-data/Insee-Eurostat/borchsenius.pdf
Bravo, D., Larrañaga, O., Millán, I., Ruiz, M., & Zamorano, F. (2013). Informe final Comisión externa revisora del CENSO 2012. Resumenes I Congreso Iberoamericano de Gestión Integrada de Áreas Litorales., 23–30. Retrieved from http://www.censo.cl/documentos/informe_final-comision-nacional.pdf
Brus, D. J., & De Gruijter, J. J. (1997, October 1). Random sampling or geostatistical modelling? Choosing between design-based and model-based sampling strategies for soil (with Discussion). Geoderma, Vol. 80, pp. 1–44. https://doi.org/10.1016/S0016-7061(97)00072-4
Cochran W.G. (1977). Sampling Techniques. Retrieved from http://agris.fao.org/agris-search/search.do?recordID=XF2015028634
Cohen, B. (2006). Urbanization in developing countries: Current trends, future projections, and key challenges for sustainability. Technology in Society, 28(1–2), 63–80. https://doi.org/10.1016/j.techsoc.2005.10.005
Cook, L. (2004). The quality and qualities of population statistics, and the place of the census. Area, 36(2), 111–123. https://doi.org/10.1111/j.0004-0894.2004.00208.x
Cressie, N. A. C. (1993). 01 Statistics for Spatial Data. In Statistics for Spatial Data (pp. 1–26). https://doi.org/10.1002/9781119115151
de Gruijter, J. J., & ter Braak, C. J. F. (1990). Model-free estimation from spatial samples: A reappraisal of classical sampling theory. Mathematical Geology, 22(4), 407–415. https://doi.org/10.1007/BF00890327
Duque, Juan C., Anselin, L., & Rey, S. J. (2012). The max-p-regions problem. Journal of Regional Science, 52(3), 397–419. https://doi.org/10.1111/j.1467-9787.2011.00743.x
Duque, Juan Carlos, Ramos, R., & Suriñach, J. (2007). Supervised regionalization methods: A survey. International Regional Science Review, 30(3), 195–220. https://doi.org/10.1177/0160017607301605
ESRI. (2018). Análisis de agrupamiento—Ayuda | ArcGIS Desktop.
Retrieved February 29, 2020, from https://desktop.arcgis.com/es/arcmap/10.3/tools/spatial-statistics-toolbox/grouping-analysis.htm
Folch, D. C., & Spielman, S. E. (2014). Identifying regions based on flexible user-defined constraints. International Journal of Geographical Information Science, 28(1), 164–184. https://doi.org/10.1080/13658816.2013.848986
Garreton, M., & Sánchez, R. (2016). Identifying an optimal analysis level in multiscalar regionalization: A study case of social distress in Greater Santiago. Computers, Environment and Urban Systems, 56, 14–24. https://doi.org/10.1016/j.compenvurbsys.2015.10.007
Griffith, D. A. (2005). Effective Geographic Sample Size in the Presence of Spatial Autocorrelation. Annals of the Association of American Geographers, 95(4), 740–760. https://doi.org/10.1111/j.1467-8306.2005.00484.x
Guo, D. (2008). Regionalization with dynamically constrained agglomerative clustering and partitioning (REDCAP). International Journal of Geographical Information Science, 22(7), 801–823. https://doi.org/10.1080/13658810701674970
Guzman, J., & Schkolnik, S. (2001). AMÉRICA LATINA: LOS CENSOS DEL 2000 Y EL DESARROLLO SOCIAL.
Heaton, M. J., & Gelfand, A. E. (2012). Kernel averaged predictors for spatio-temporal regression models. Spatial Statistics, 2, 15–32. https://doi.org/10.1016/J.SPASTA.2012.05.001
Horn, M. E. T. (1995). Solution Techniques for Large Regional Partitioning Problems. Geographical Analysis, 27(3), 230–248. https://doi.org/10.1111/j.1538-4632.1995.tb00907.x
Jin, X., Wah, B. W., Cheng, X., & Wang, Y. (2015). Significance and Challenges of Big Data Research. Big Data Research, 2(2), 59–64. https://doi.org/10.1016/J.BDR.2015.01.006
Lefebvre, H. (1991). The production of space. Blackwell.
Legendre, P. (1993). Spatial autocorrelation: trouble or new paradigm? Ecology, 74(6), 1659–1673. https://doi.org/10.2307/1939924
Lindley, D. (1956). On a measure of the information provided by an experiment. The Annals of Mathematical Statistics, 27, 986–1005. Retrieved from http://www.jstor.org/stable/2237191?casa_token=CB0UZDZivncAAAAA:-6VZV_tAaxiAQbOXVGJGf3VXwodidklVhnUmtZbdjavY2Hk9LOMxJZrRCWbc6IQMV9wnBQAUz5JYV_I_GIoNQaXQti7q_tvHlXGyiAZ78MqdFGyB448
Miller, H. J. (2004). Tobler’s first law and spatial analysis. Annals of the Association of American Geographers, 94(2), 284–289. https://doi.org/10.1111/j.1467-8306.2004.09402005.x
Ministerio de Desarrollo Social. (2018). Metodología de Diseño Muestral. Retrieved from http://observatorio.ministeriodesarrollosocial.gob.cl/casen-multidimensional/casen/docs/Diseno_Muestral_Casen_2017_MDS.pdf
Montello, D. R. (2003). Regions in geography: Process and content. In M. Duckham, M. F. Goodchild, & M. F. Worboys (Eds.), Foundations of geographic information science (Taylor & F, pp. 173–189). https://doi.org/doi:10.1201/9780203009543.ch9
Moreno, P., García, J., & Lacalle, L. D. E. (2011). Estado del Arte en procesos de zonificación. Geofocus, 11, 155–181. Retrieved from www.geo-focus.org
Observatorio de Ciudades PUC. (2018). ISMT | Infraestructura de Datos Espaciales OCUC. Retrieved February 28, 2020, from IDE OCUC website: https://ideocuc-ocuc.hub.arcgis.com/datasets/97ae30fe071349e89d9d5ebd5dfa2aec_0
Openshaw, S. (1977a). A geographical solution to scale and aggregation problems in region-building, partitioning and spatial modelling. Transactions of the Institute of British Geographers, 2(4), 459–472. https://doi.org/10.2307/622300
Openshaw, S. (1977b). Optimal zoning systems for spatial interaction models. Environment and Planning A, 9(2), 169–184. https://doi.org/10.1068/a090169
Openshaw, S. (1978). An optimal zoning approach to the study of spatially aggregated data. https://doi.org/10.1007/978-1-4613-4067-6_5
Openshaw, S., & Baxter, R. S. (1977). Algorithm 3; a procedure to generate pseudo random aggregations of N zones into M zones where M is less than N. Environment and Planning A, 9(12), 1423–1428. https://doi.org/10.1068/a091423
Openshaw, S., & Taylor, P. J. (1979). A million or so correlation coefficients: three experiments on the modifiable areal unit problem. Statistical Applications in the Spatial Sciences, 127–144. Retrieved from https://ci.nii.ac.jp/naid/10009667572/
Pettitt, A. N., & McBratney, A. B. (1993). Sampling Designs for Estimating Spatial Variance Components. Applied Statistics, 42(1), 185. https://doi.org/10.2307/2347420
Phillips, S. J., Anderson, R. P., & Schapire, R. E. (2006). Maximum entropy modeling of species geographic distributions. Ecological Modelling, 190(3–4), 231–259. https://doi.org/10.1016/J.ECOLMODEL.2005.03.026
Rodríguez-iglesias, G., & Teresa, M. (2010). La importancia de la especificidad territorial en la construccion de indicadores locales. Ciencia Ergo Sum, 18(m), 145–152. Retrieved from http://www.redalyc.org/articulo.oa?id=10418753005
Sáenz, H. (2016). Revisando los métodos de agregación de unidades espaciales: MAUP, algoritmos y un breve ejemplo. In Estudios Demográficos y Urbanos (Vol. 31). https://doi.org/10.24201/edu.v31i2.1592
Sánchez, R. (2015). Spatial self-organization in Santiago. Methods and Applications. Universidad Adolfo Ibáñez.
Shewry, M. C., & Wynn, H. P. (1987). Maximum entropy sampling. Journal of Applied Statistics, 14(2), 165–170. https://doi.org/10.1080/02664768700000020
Spielman, S. E., & Logan, J. R. (2013). Using High-Resolution Population Data to Identify Neighborhoods and Establish Their Boundaries. Annals of the Association of American Geographers, 103(1), 67–84. https://doi.org/10.1080/00045608.2012.685049
Stein, A., & Ettema, C. (2003). An overview of spatial sampling procedures and experimental design of spatial studies for ecosystem comparisons. Agriculture, Ecosystems & Environment, 94(1), 31–47. https://doi.org/10.1016/S0167-8809(02)00013-0
Tobler, W. R. (1969). Large sample standard errors of kappa and weighted kappa. Psychological Bulletin, 72(5), 234. https://doi.org/10.1037/h0028106
Vallejos, R., & Osorio, F. (2014). Effective sample size of spatial process models. Spatial Statistics, 9(C), 66–92. https://doi.org/10.1016/j.spasta.2014.03.003
Villalón, G., & Vera, S. (2011). Análisis de la Cobertura Censo 2002. Retrieved from https://www.cepal.org/celade/noticias/paginas/3/45123/chile_cobertura.pdf
Wallgren, Andres; Wallgren, B. (2016). Frames and Populations in a Register-based National Statistical system. Journal of Mathematics and Statistical Science, 2016, 208–216.
Wallgren, A., & Wallgren, B. (2007). Register-based Statistics: Administrative Data for Statistical Purposes. In Register-based Statistics: Administrative Data for Statistical Purposes. https://doi.org/10.1002/9780470061350
Wang, J.-F. F., Stein, A., Gao, B.-B. B., & Ge, Y. A review of spatial sampling. , 2 Spatial Statistics § (2012).
Wang, J.-F., Jiang, C.-S., Hu, M.-G., Cao, Z.-D., Guo, Y.-S., Li, L.-F., … Meng, B. (2013). Design-based spatial sampling: Theory and implementation. Environmental Modelling & Software, 40, 280–288. https://doi.org/10.1016/j.envsoft.2012.09.015
Wang, J., Haining, R., & Cao, Z. (2010). Sample surveying to estimate the mean of a heterogeneous surface: Reducing the error variance through zoning. International Journal of Geographical Information Science, 24(4), 523–543. https://doi.org/10.1080/13658810902873512
Williamson, I., Rajabifard, A., & Binns, A. (2006). Challenges and Issues for SDI Development. International Journal of Spatial Data Infrastructures Research, 1(1), 24–35. https://doi.org/10.2902/
Xavier, A., Carvalho, Y., Henrique, P., Albuquerque, M., Rezende, G., Junior, A., & Dantas Guimarães, R. (2009). Spatial Hierarchical Clustering. 27(3), 411–442. https://doi.org/10.1053/j.pcsu.2013.01.006
Yates, F. (1946). A Review of Recent Statistical Developments in Sampling and Sampling Surveys. Journal of the Royal Statistical Society, 109(1), 12–43. https://doi.org/10.2307/2981390
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2022 Revista de Geografía Norte Grande
This work is licensed under a Creative Commons Attribution 4.0 International License.