Research Design and Societal Implications

5/2/2024

GIS and Research Design

“Credibility revolution” is the name given by Angrist and Pischke (2010) to describe the movement within applied microeconomics to pay greater attention to:
- The robustness of empirical results to changes in statistical specifications (i.e. is a particular result merely an artifact of particular modelling choices, or does it actually describe a real world relationship?)
- The validity of causal claims (i.e. does the independent variable of interest actually cause variation in the outcome variable, or are the independent and dependent variables merely correlated?
While the credibility revolution took root in applied microeconomics, it has come to influence many applied social scientific fields beyond economics as well.

Spatial data and methods can play an important role in research designs inspired by the credibility revolution
- Indeed, John Snow epidemiological analysis of Cholera in 19th century London–considered the first GIS analysis–is also important in the intellectual history of causal inference
Kudamatsu (2018) provides a useful survey of some of the ways in which GIS techniques (many of which we learned about in the past four classes), can facilitate credible inference
- Spatial join (Class 2)
- Vector intersections (Class 2)
- Zonal statistics (Class 3, Class 4)
- Buffers and distance calculations (Class 2, Class 3)
- Raster calculations/cell statistics (Class 3, Class 4)

Identify 3-4 “big” theoretical or conceptual questions in your field (i.e. what is the relationship between political institutions and economic outcomes? how do people construct their social identities? what are the determinants of state capacity?).
Make a list of articles/books that attempt to explore these questions in concrete empirical settings, and consider how the authors attempted to address relevant inferential challenges in their research designs. What were some potential problems with those designs?
Familiarize yourself (if you haven’t already), with some of the empirical methods used to facilitate credible inference (see the ITSS mini-course on causal inference, or this relatively accessible online book by Scott Cunningham)
Finally, develop a strategy for grafting spatial data onto one of these methods (using the papers discussed in Kudamatsu, or which we read earlier in this course, as a template for how you might do so) to credibly answer your question

Is it appropriate to actually encode spatial variables as lines/points/polygons (vector data) or grids (raster data)? Do these data models adequately capture the underlying theoretical construct the researcher wants to measure?
In many cases, yes. But in some cases maybe not (i.e. Euratlas using polygons to represent pre-modern political units, where political authority was much more fluid than today, and was not delimited by the hard borders implied by polygons)
- “The validity of linear boundaries as a measure of the extent of political organization changes over time, an evolution that is not captured by the Euratlas data set” (Branch 2016, 855)

The things left out of GIS datasets (because they are difficult to represent with GIS data models) may, depending on the context of your question, lead to biased results.
Example: Say you are interested in the settlement patterns of ethnic groups, and those settlement patterns are represented by polygons, the dataset will likely leave out groups whose spatial distribution is difficult to represent with polygons (i.e. diaspora groups); this omission could lead to biased inferences (Branch 2016, 862)

Results of statistical analyses using data derived with GIS methods can be sensitive to the geographic units of analysis that are used
- Important to keep in mind the Modifiable Areal Unit Problem (MAUP), and ensure that the units of analysis or theoretically justified and appropriate.
- If data are not provided at the right unit of analysis (for your question) sensitivity analysis can help to gauge the robustness of your results
- Easy to commit ecological fallacy when working with GIS data: make sure you do not test individual-level hypotheses using geographically aggregated data. Just because a statistical relationship holds true at an aggregate level doesn’t mean it holds true at the individual level (i.e. richer counties may be more likely to vote Democratic, but it does not necessarily follow that richer individuals are more likely to vote Democratic)

Our course has focused on developing a practical GIS skillset that you can use in your research as social scientists.
But GIS and mapping technologies are themselves an interesting topic for social scientific inquiry:
- What are the economic, political, social, and technological forces that shape the mapping and geospatial industries?
- What is the causal role of cartography and GIS technology in shaping social, economic, and political processes, ideas, and institutions (both historically, and in today’s world)?