2 Loading data

Let’s begin by reading in the dataset on State Patrol traffic stops released by the Stanford Open Policing Project. The data can be downloaded from this website, but has also been made available to you as part of the workshop materials as a CSV file.

Once the traffic patrol data has been downloaded into your working directory, pass the name of the file to the read_csv() function, and assign it to an object. Here, we’ll assign the traffic patrol data to an object named co_traffic_stops. Note that the name of an object is arbitrary, but ideally should meaningfully describe the data that has been assigned to it.

# Read in Stanford police data for Colorado and assign to object named 
# "co_traffic_stops"
co_traffic_stops<-read_csv("co_statewide_2020_04_01.csv")

── Column specification ────────────────────────────────────────────────────────────────────────────────────
cols(
  .default = col_character(),
  date = col_date(format = ""),
  time = col_logical(),
  subject_age = col_double(),
  arrest_made = col_logical(),
  citation_issued = col_logical(),
  warning_issued = col_logical(),
  contraband_found = col_logical(),
  search_conducted = col_logical()
)
ℹ Use `spec()` for the full column specifications.

Once the traffic patrol data has been read into R studio and assigned to an object, we can print the contents of the dataset to the console by typing the name of that object into the R Studio console (note that only the first few records will be printed)

# Print the contents of "co_traffic_stops" (i.e. the CO traffic patrol data) 
# to the console; the first few records of the dataset will print
co_traffic_stops
# A tibble: 3,112,853 × 20
   raw_row_number date       time  location county_name subject_age subject_race subject_sex officer_id_hash
   <chr>          <date>     <lgl> <chr>    <chr>             <dbl> <chr>        <chr>       <chr>          
 1 1947986|19479… 2013-06-19 NA    19, I70… Mesa County          26 hispanic     male        b942632983     
 2 1537576        2012-08-24 NA    254, H2… Jefferson …          NA <NA>         <NA>        f3d4f46927     
 3 1581594        2012-09-23 NA    115, I7… Logan Coun…          52 white        male        6e49e2fbc8     
 4 1009205        2011-08-25 NA    197, H8… Douglas Co…          32 white        female      eaea851669     
 5 1932619        2013-06-08 NA    107, H2… Kiowa Coun…          33 hispanic     male        d18e34d749     
 6 1179436        2011-12-23 NA    48, 384… Boulder Co…          NA <NA>         <NA>        b84c696aed     
 7 1326795        2012-04-07 NA    0, R250… Boulder Co…          39 white        male        4c0279748e     
 8 1786795        2013-03-03 NA    19, E47… Arapahoe C…          44 white        female      e6b5b9bb98     
 9 1552164        2012-09-02 NA    224, H2… Park County          NA <NA>         <NA>        43f1f150d3     
10 1004281|10042… 2011-08-21 NA    R2000, … Adams Coun…          32 hispanic     male        dd2f10b6f8     
# … with 3,112,843 more rows, and 11 more variables: officer_sex <chr>, type <chr>, violation <chr>,
#   arrest_made <lgl>, citation_issued <lgl>, warning_issued <lgl>, outcome <chr>, contraband_found <lgl>,
#   search_conducted <lgl>, search_basis <chr>, raw_Ethnicity <chr>

We can also view the co_traffic_stops object (or, for that matter, any dataset in R Studio) within the R Studio data viewer by passing the name of the relevant object to the View() function:

# Inspect co_traffic_stops in the R Studio data viewer
View(co_traffic_stops)