10 Data Transfer Part 1: Importing Data into R Studio
Typically, the first step when working with research data in R Studio is to load your relevant data into memory. There are many ways to do this, and the precise way in which you will do so will depend on where your data is stored, and how it is structured. Below, we’ll cover the process of reading your data into R Studio under a couple of different scenarios.
We will be working extensively with a country-level crossnational dataset collected by the political economists Torsten Persson and Guido Tabellini for their book on the Economic Effects of Constitutions. If you’d like to learn more about this dataset, please refer to its codebook In addition, we will also work with four country-level datasets extracted from the World Bank’s World Development Indicators series. These datasets contain information on central government debt as a percentage of GDP; net foreign direct investment inflows as a percentage of GDP; trade as a percentage of GDP; and the urban population as a percentage of the total population. The data for all four of these WDI datasets corresponds to the year 2019.
10.1 Reading in a dataset from a directory on your computer
Often (especially when a dataset is of tractable size), you will have the dataset you would like to analyze stored on a directory on your computer. In order to read in a dataset from a computer directory, you can use the read_csv() function (provided it is stored as a CSV; if the file type is different, than the import function would be different as well), and the pass dataset’s file path as an argument to the function. Typically, you will want to assign the dataset you read in to a new R object:
# Reads in Persson/Tabellini Data from local directory, and assigns it to new object named "pt"
pt<-read_csv("data/pt/persson_tabellini_workshop.csv")## Rows: 85 Columns: 75
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): country, continent
## dbl (73): oecd, pind, pindo, ctrycd, col_uk, t_indep, col_uka, col_espa, col_otha, legor_uk, legor_so,...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
If you’d like to view the contents of the dataset, pass it to the R Studio data viewer:
10.2 Reading in multiple datasets from your disk
Sometimes, your data is spread out over multiple files. For example, you may have multiple CSV files with data stored on disk, which you want to read into R from disk at one-go, instead of loading in multiple files individually.
To do so, we can use the list data structure to hold all of the desired files, and use the map() function we learned about above to iteratively read these files into our R environment.
The first step is to use the list.files() function to create a character vector of the file names we want to read in; if all of the files you want to read in are already in your working directory, you don’t need to supply any arguments to the list.files() function. If the files are stored in another location, you can specify the relevant file path as an argument to list.files(). In the case below, the individual files we want to read in are four World Development Indicators datasets (which were extracted using the menu on the WDI website); these files are in the “wb” directory within the “data” subdirectory of the working directory:
# print relevant file names, which are stored in the data/wos subdirectory
wb_files<-list.files("data/wb")Let’s now print the contents of “wb_files” and observe the file names”:
## [1] "wdi_debt2019.csv" "wdi_fdi2019.csv" "wdi_trade2019.csv" "wdi_urban2019.csv"
Now that we have our file names, we can iteratively pass them through the read_csv() function, and deposit the files as data frames in a list, which we’ll assign to a new object named wb_file_list:
# Iteratively reads in all individual WB files from the "data/db" directory and assigns it to an object named "wb_file_list"
setwd("data/wb")
wb_file_list<-map(wb_files, read_csv)## Rows: 271 Columns: 5
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Country Name, Country Code, Series Name, Series Code, 2019 [YR2019]
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 271 Columns: 5
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Country Name, Country Code, Series Name, Series Code, 2019 [YR2019]
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 271 Columns: 5
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Country Name, Country Code, Series Name, Series Code, 2019 [YR2019]
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 271 Columns: 5
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Country Name, Country Code, Series Name, Series Code, 2019 [YR2019]
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
The code above takes the first file name in wb_file_list() and then passes it to the read_csv() function and deposits the file as the first data frame in a new list; it then takes the second file name in wb_file_list() and then passes it to the read_csv() function and deposits that file as the second data frame in the list; and so on. The list containing all of the files is assigned to a new object named wb_file_list; we’ll print the contents below:
## [[1]]
## # A tibble: 271 × 5
## `Country Name` `Country Code` `Series Name` `Series Code` `2019 [YR2019]`
## <chr> <chr> <chr> <chr> <chr>
## 1 Afghanistan AFG Central government debt, total (% of … GC.DOD.TOTL.… ..
## 2 Albania ALB Central government debt, total (% of … GC.DOD.TOTL.… 75.69848824949…
## 3 Algeria DZA Central government debt, total (% of … GC.DOD.TOTL.… ..
## 4 American Samoa ASM Central government debt, total (% of … GC.DOD.TOTL.… ..
## 5 Andorra AND Central government debt, total (% of … GC.DOD.TOTL.… ..
## 6 Angola AGO Central government debt, total (% of … GC.DOD.TOTL.… ..
## 7 Antigua and Barbuda ATG Central government debt, total (% of … GC.DOD.TOTL.… ..
## 8 Argentina ARG Central government debt, total (% of … GC.DOD.TOTL.… ..
## 9 Armenia ARM Central government debt, total (% of … GC.DOD.TOTL.… 50.02842068637…
## 10 Aruba ABW Central government debt, total (% of … GC.DOD.TOTL.… ..
## # ℹ 261 more rows
##
## [[2]]
## # A tibble: 271 × 5
## `Country Name` `Country Code` `Series Name` `Series Code` `2019 [YR2019]`
## <chr> <chr> <chr> <chr> <chr>
## 1 Afghanistan AFG Foreign direct investment, net inflow… BX.KLT.DINV.… 0.124495985791…
## 2 Albania ALB Foreign direct investment, net inflow… BX.KLT.DINV.… 7.797920483865…
## 3 Algeria DZA Foreign direct investment, net inflow… BX.KLT.DINV.… 0.804144058246…
## 4 American Samoa ASM Foreign direct investment, net inflow… BX.KLT.DINV.… ..
## 5 Andorra AND Foreign direct investment, net inflow… BX.KLT.DINV.… ..
## 6 Angola AGO Foreign direct investment, net inflow… BX.KLT.DINV.… -5.78081314444…
## 7 Antigua and Barbuda ATG Foreign direct investment, net inflow… BX.KLT.DINV.… 7.433324076307…
## 8 Argentina ARG Foreign direct investment, net inflow… BX.KLT.DINV.… 1.485006875706…
## 9 Armenia ARM Foreign direct investment, net inflow… BX.KLT.DINV.… 0.736361516844…
## 10 Aruba ABW Foreign direct investment, net inflow… BX.KLT.DINV.… -2.21528256776…
## # ℹ 261 more rows
##
## [[3]]
## # A tibble: 271 × 5
## `Country Name` `Country Code` `Series Name` `Series Code` `2019 [YR2019]`
## <chr> <chr> <chr> <chr> <chr>
## 1 Afghanistan AFG Trade (% of GDP) NE.TRD.GNFS.ZS ..
## 2 Albania ALB Trade (% of GDP) NE.TRD.GNFS.ZS 76.2791946495763
## 3 Algeria DZA Trade (% of GDP) NE.TRD.GNFS.ZS 51.8097384415762
## 4 American Samoa ASM Trade (% of GDP) NE.TRD.GNFS.ZS 156.568778979907
## 5 Andorra AND Trade (% of GDP) NE.TRD.GNFS.ZS ..
## 6 Angola AGO Trade (% of GDP) NE.TRD.GNFS.ZS 57.8295381183036
## 7 Antigua and Barbuda ATG Trade (% of GDP) NE.TRD.GNFS.ZS 137.625175755884
## 8 Argentina ARG Trade (% of GDP) NE.TRD.GNFS.ZS 32.6306150458499
## 9 Armenia ARM Trade (% of GDP) NE.TRD.GNFS.ZS 96.1141541288708
## 10 Aruba ABW Trade (% of GDP) NE.TRD.GNFS.ZS 145.343572735289
## # ℹ 261 more rows
##
## [[4]]
## # A tibble: 271 × 5
## `Country Name` `Country Code` `Series Name` `Series Code` `2019 [YR2019]`
## <chr> <chr> <chr> <chr> <chr>
## 1 Afghanistan AFG Urban population (% of total populati… SP.URB.TOTL.… 25.754
## 2 Albania ALB Urban population (% of total populati… SP.URB.TOTL.… 61.229
## 3 Algeria DZA Urban population (% of total populati… SP.URB.TOTL.… 73.189
## 4 American Samoa ASM Urban population (% of total populati… SP.URB.TOTL.… 87.147
## 5 Andorra AND Urban population (% of total populati… SP.URB.TOTL.… 87.984
## 6 Angola AGO Urban population (% of total populati… SP.URB.TOTL.… 66.177
## 7 Antigua and Barbuda ATG Urban population (% of total populati… SP.URB.TOTL.… 24.506
## 8 Argentina ARG Urban population (% of total populati… SP.URB.TOTL.… 91.991
## 9 Armenia ARM Urban population (% of total populati… SP.URB.TOTL.… 63.219
## 10 Aruba ABW Urban population (% of total populati… SP.URB.TOTL.… 43.546
## # ℹ 261 more rows
We will work with the separate data frames in wb_file_list later in the tutorial.
10.3 Reading in a dataset from cloud storage
At this point, we have all of the data we need for subsequent sections loaded in our R environment. However, before proceeding, it’s worth noting some additional methods of reading in data into R.
If you store your data on the Cloud using a standard storage service such as Dropbox, you can simply extract the URL to the data from your service provider, and pass it as an argument to a data transfer function in R such as read_csv():
# Reads in PT dataset from Dropbox and assigns it to a new object named "pt_cloud"
pt_cloud<-read_csv("https://www.dropbox.com/scl/fi/cow7hk7zsdp46tkzrebxf/persson_tabellini_workshop.csv?rlkey=3dx0dul8uv0c4gpy41ucxwlyo&dl=1")## Rows: 85 Columns: 75
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): country, continent
## dbl (73): oecd, pind, pindo, ctrycd, col_uk, t_indep, col_uka, col_espa, col_otha, legor_uk, legor_so,...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
The code above reads in the Persson-Tabellini dataset that is stored on a Dropbox account straight into R using its URL as an argument, and assigns it to a new object named pt_cloud. If you view pt_cloud in the data viewer, you’ll notice that the dataset is exactly the same as the one assigned to the pt object.