10 Data Transfer Part 1: Importing Data into R Studio

Typically, the first step when working with research data in R Studio is to load your relevant data into memory. There are many ways to do this, and the precise way in which you will do so will depend on where your data is stored, and how it is structured. Below, we’ll cover the process of reading your data into R Studio under a couple of different scenarios.

We will be working extensively with a country-level crossnational dataset collected by the political economists Torsten Persson and Guido Tabellini for their book on the Economic Effects of Constitutions. If you’d like to learn more about this dataset, please refer to its codebook In addition, we will also work with four country-level datasets extracted from the World Bank’s World Development Indicators series. These datasets contain information on central government debt as a percentage of GDP; net foreign direct investment inflows as a percentage of GDP; trade as a percentage of GDP; and the urban population as a percentage of the total population. The data for all four of these WDI datasets corresponds to the year 2019.

10.1 Reading in a dataset from a directory on your computer

Often (especially when a dataset is of tractable size), you will have the dataset you would like to analyze stored on a directory on your computer. In order to read in a dataset from a computer directory, you can use the read_csv() function (provided it is stored as a CSV; if the file type is different, than the import function would be different as well), and the pass dataset’s file path as an argument to the function. Typically, you will want to assign the dataset you read in to a new R object:

# Reads in Persson/Tabellini Data from local directory, and assigns it to new object named "pt"
pt<-read_csv("data/pt/persson_tabellini_workshop.csv")

## Rows: 85 Columns: 75
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): country, continent
## dbl (73): oecd, pind, pindo, ctrycd, col_uk, t_indep, col_uka, col_espa, col_otha, legor_uk, legor_so,...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

If you’d like to view the contents of the dataset, pass it to the R Studio data viewer:

# views "pt" data frame in R Studio data viewer
View(pt)

10.2 Reading in multiple datasets from your disk

Sometimes, your data is spread out over multiple files. For example, you may have multiple CSV files with data stored on disk, which you want to read into R from disk at one-go, instead of loading in multiple files individually.

To do so, we can use the list data structure to hold all of the desired files, and use the map() function we learned about above to iteratively read these files into our R environment.

The first step is to use the list.files() function to create a character vector of the file names we want to read in; if all of the files you want to read in are already in your working directory, you don’t need to supply any arguments to the list.files() function. If the files are stored in another location, you can specify the relevant file path as an argument to list.files(). In the case below, the individual files we want to read in are four World Development Indicators datasets (which were extracted using the menu on the WDI website); these files are in the “wb” directory within the “data” subdirectory of the working directory:

# print relevant file names, which are stored in the data/wos subdirectory
wb_files<-list.files("data/wb")

Let’s now print the contents of “wb_files” and observe the file names”:

# prints contents of "wos_files"
wb_files

## [1] "wdi_debt2019.csv"  "wdi_fdi2019.csv"   "wdi_trade2019.csv" "wdi_urban2019.csv"

Now that we have our file names, we can iteratively pass them through the read_csv() function, and deposit the files as data frames in a list, which we’ll assign to a new object named wb_file_list:

# Iteratively reads in all individual WB files from the "data/db" directory and assigns it to an object named "wb_file_list"
setwd("data/wb")
wb_file_list<-map(wb_files, read_csv)

## Rows: 271 Columns: 5
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Country Name, Country Code, Series Name, Series Code, 2019 [YR2019]
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 271 Columns: 5
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Country Name, Country Code, Series Name, Series Code, 2019 [YR2019]
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 271 Columns: 5
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Country Name, Country Code, Series Name, Series Code, 2019 [YR2019]
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 271 Columns: 5
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Country Name, Country Code, Series Name, Series Code, 2019 [YR2019]
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

The code above takes the first file name in wb_file_list() and then passes it to the read_csv() function and deposits the file as the first data frame in a new list; it then takes the second file name in wb_file_list() and then passes it to the read_csv() function and deposits that file as the second data frame in the list; and so on. The list containing all of the files is assigned to a new object named wb_file_list; we’ll print the contents below:

# prints contents of "wos_file_list"
wb_file_list

## [[1]]
## # A tibble: 271 × 5
##    `Country Name`      `Country Code` `Series Name`                          `Series Code` `2019 [YR2019]`
##    <chr>               <chr>          <chr>                                  <chr>         <chr>          
##  1 Afghanistan         AFG            Central government debt, total (% of … GC.DOD.TOTL.… ..             
##  2 Albania             ALB            Central government debt, total (% of … GC.DOD.TOTL.… 75.69848824949…
##  3 Algeria             DZA            Central government debt, total (% of … GC.DOD.TOTL.… ..             
##  4 American Samoa      ASM            Central government debt, total (% of … GC.DOD.TOTL.… ..             
##  5 Andorra             AND            Central government debt, total (% of … GC.DOD.TOTL.… ..             
##  6 Angola              AGO            Central government debt, total (% of … GC.DOD.TOTL.… ..             
##  7 Antigua and Barbuda ATG            Central government debt, total (% of … GC.DOD.TOTL.… ..             
##  8 Argentina           ARG            Central government debt, total (% of … GC.DOD.TOTL.… ..             
##  9 Armenia             ARM            Central government debt, total (% of … GC.DOD.TOTL.… 50.02842068637…
## 10 Aruba               ABW            Central government debt, total (% of … GC.DOD.TOTL.… ..             
## # ℹ 261 more rows
## 
## [[2]]
## # A tibble: 271 × 5
##    `Country Name`      `Country Code` `Series Name`                          `Series Code` `2019 [YR2019]`
##    <chr>               <chr>          <chr>                                  <chr>         <chr>          
##  1 Afghanistan         AFG            Foreign direct investment, net inflow… BX.KLT.DINV.… 0.124495985791…
##  2 Albania             ALB            Foreign direct investment, net inflow… BX.KLT.DINV.… 7.797920483865…
##  3 Algeria             DZA            Foreign direct investment, net inflow… BX.KLT.DINV.… 0.804144058246…
##  4 American Samoa      ASM            Foreign direct investment, net inflow… BX.KLT.DINV.… ..             
##  5 Andorra             AND            Foreign direct investment, net inflow… BX.KLT.DINV.… ..             
##  6 Angola              AGO            Foreign direct investment, net inflow… BX.KLT.DINV.… -5.78081314444…
##  7 Antigua and Barbuda ATG            Foreign direct investment, net inflow… BX.KLT.DINV.… 7.433324076307…
##  8 Argentina           ARG            Foreign direct investment, net inflow… BX.KLT.DINV.… 1.485006875706…
##  9 Armenia             ARM            Foreign direct investment, net inflow… BX.KLT.DINV.… 0.736361516844…
## 10 Aruba               ABW            Foreign direct investment, net inflow… BX.KLT.DINV.… -2.21528256776…
## # ℹ 261 more rows
## 
## [[3]]
## # A tibble: 271 × 5
##    `Country Name`      `Country Code` `Series Name`    `Series Code`  `2019 [YR2019]` 
##    <chr>               <chr>          <chr>            <chr>          <chr>           
##  1 Afghanistan         AFG            Trade (% of GDP) NE.TRD.GNFS.ZS ..              
##  2 Albania             ALB            Trade (% of GDP) NE.TRD.GNFS.ZS 76.2791946495763
##  3 Algeria             DZA            Trade (% of GDP) NE.TRD.GNFS.ZS 51.8097384415762
##  4 American Samoa      ASM            Trade (% of GDP) NE.TRD.GNFS.ZS 156.568778979907
##  5 Andorra             AND            Trade (% of GDP) NE.TRD.GNFS.ZS ..              
##  6 Angola              AGO            Trade (% of GDP) NE.TRD.GNFS.ZS 57.8295381183036
##  7 Antigua and Barbuda ATG            Trade (% of GDP) NE.TRD.GNFS.ZS 137.625175755884
##  8 Argentina           ARG            Trade (% of GDP) NE.TRD.GNFS.ZS 32.6306150458499
##  9 Armenia             ARM            Trade (% of GDP) NE.TRD.GNFS.ZS 96.1141541288708
## 10 Aruba               ABW            Trade (% of GDP) NE.TRD.GNFS.ZS 145.343572735289
## # ℹ 261 more rows
## 
## [[4]]
## # A tibble: 271 × 5
##    `Country Name`      `Country Code` `Series Name`                          `Series Code` `2019 [YR2019]`
##    <chr>               <chr>          <chr>                                  <chr>         <chr>          
##  1 Afghanistan         AFG            Urban population (% of total populati… SP.URB.TOTL.… 25.754         
##  2 Albania             ALB            Urban population (% of total populati… SP.URB.TOTL.… 61.229         
##  3 Algeria             DZA            Urban population (% of total populati… SP.URB.TOTL.… 73.189         
##  4 American Samoa      ASM            Urban population (% of total populati… SP.URB.TOTL.… 87.147         
##  5 Andorra             AND            Urban population (% of total populati… SP.URB.TOTL.… 87.984         
##  6 Angola              AGO            Urban population (% of total populati… SP.URB.TOTL.… 66.177         
##  7 Antigua and Barbuda ATG            Urban population (% of total populati… SP.URB.TOTL.… 24.506         
##  8 Argentina           ARG            Urban population (% of total populati… SP.URB.TOTL.… 91.991         
##  9 Armenia             ARM            Urban population (% of total populati… SP.URB.TOTL.… 63.219         
## 10 Aruba               ABW            Urban population (% of total populati… SP.URB.TOTL.… 43.546         
## # ℹ 261 more rows

We will work with the separate data frames in wb_file_list later in the tutorial.

10.3 Reading in a dataset from cloud storage

At this point, we have all of the data we need for subsequent sections loaded in our R environment. However, before proceeding, it’s worth noting some additional methods of reading in data into R.

If you store your data on the Cloud using a standard storage service such as Dropbox, you can simply extract the URL to the data from your service provider, and pass it as an argument to a data transfer function in R such as read_csv():

# Reads in PT dataset from Dropbox and assigns it to a new object named "pt_cloud"
pt_cloud<-read_csv("https://www.dropbox.com/scl/fi/cow7hk7zsdp46tkzrebxf/persson_tabellini_workshop.csv?rlkey=3dx0dul8uv0c4gpy41ucxwlyo&dl=1")

## Rows: 85 Columns: 75
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): country, continent
## dbl (73): oecd, pind, pindo, ctrycd, col_uk, t_indep, col_uka, col_espa, col_otha, legor_uk, legor_so,...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

The code above reads in the Persson-Tabellini dataset that is stored on a Dropbox account straight into R using its URL as an argument, and assigns it to a new object named pt_cloud. If you view pt_cloud in the data viewer, you’ll notice that the dataset is exactly the same as the one assigned to the pt object.

An R Primer for Social Scientists

10 Data Transfer Part 1: Importing Data into R Studio

10.1 Reading in a dataset from a directory on your computer

10.2 Reading in multiple datasets from your disk

10.3 Reading in a dataset from cloud storage

10.4 Reading in data from an R package

10.5 Reading in data from a website