9 Iteration
Once we have a function written down, it is straightforward to apply that function to multiple inputs in an iterative fashion. For example, let’s say you have four different Fahrenheit temperature values that you would like to convert to celsius, using the fahrenheit_to_celsius_converter we developed above. One option would be to apply the fahrenheit_to_celsius_converter function to each of the Fahrenheit temperature inputs individually. For example, let’s say our Fahrenheit values, which we’d like to convert to celsius, are the following: 45.6, 95.9, 67.8, 43. We could, of course, run these values through the function individually, as below:
## [1] 7.555556
## [1] 35.5
## [1] 19.88889
## [1] 6.111111
This is manageable with a collection of only four Fahrenheit values, but would quickly become tedious if you had a substantially larger set of Fahrenheit temperature values that required conversion. Instead of manually applying the function to each individual input value, we can instead put these values into a vector, and then iteratively apply the fahrenheit_to_celsius_converter function to each of these vector elements.
Let’s first assign our Fahrenheit temperature values to a numeric vector object named fahrenheit_input_vector:
# makes a vector out of Fahrenheit values we want to convert, and assigns it to a new object named "fahrenheit_input_vector"
fahrenheit_input_vector<-c(45.6, 95.9, 67.8, 43)Our goal is to also iteratively apply our function to all of these vector elements, and deposit the transformed results into a new vector. In programming languages, functions are typically applied to to multiple inputs in an iterative fashion using a construct known as a for-loop, which some of you may already be familiar with. R users also frequently use specialized functions (instead of for-loops) to iterate over elements; this is often faster, or at the very least, makes R scripts more readable. One family of these iterative functions is the “Apply” family of functions. A more recent set of functions that facilitate iteration is part of the tidyverse, and is found within the purrr package. These functions are known as map() functions, and we will use them here to iteratively apply our functions to multiple inputs.
Let’s see how we can use a map() function to sequentially apply the fahrenheit_to_celsius_converter() function we created to several different values for the “fahrenheit_input” argument, contained in fahrenheit_input_vector. We’ll pass fahrenheit_input_vector as the first argument to the map_dbl() function, and fahrenheit_to_celsius_converter (i.e. the function we want to apply iteratively to the elements in `thefahrenheit_input_vector ) as the second argument. The result of this operation will be a new “results vector”, containing the transformed temperature values for each input in the original vector of Fahrenheit values (fahrenheit_input_vector). We’ll assign this result/output vector to a new object named celsius_outputs_vector:
# Iteratively applies the "fahrenheit_to_celsius_converter" to celsius input values in "fahrenheit_input_vector" and assigns the resulting vector of converted temperature values to "celsius_ouputs_vector"
celsius_outputs_vector<-map_dbl(fahrenheit_input_vector, fahrenheit_to_celsius_converter)In short, the code above takes ``fahrenheit_input_vector(i.e. a vector with the numbers 45.6, 95.9, 67.8, 43), and runs each of these numbers through thefahrenheit_converter()function, and sequentially deposits the transformed result to the newly createdcelsius_outputs_vector()``` object, which contains the following elements:
## [1] 7.555556 35.500000 19.888889 6.111111
More explicitly, the code that reads celsius_outputs_vector<-map_dbl(fahrenheit_input_vector, fahrenheit_converter) did the following:
- Pass 45.6 (the first element in the input vector,
fahrenheit_input_vector) to thefahrenheit_converter()function, and place the output (7.555556) as the first element in a new vector of transformed values, namedcelsius_outputs_vector. - Pass 95.9 (the second element in the input vector,
fahrenheit_input_vector) to thefahrenheit_converter()function, and deposit the output (35.500000) as the second element incelsius_outputs_vector. - Pass 67.8 (the third element in the input vector,
fahrenheit_input_vector) to thefahrenheit_converter()function, and deposit the output (19.888889) as the third element incelsius_outputs_vector. - Pass 43 (the fourth element in the input vector,
fahrenheit_input_vector) to thefahrenheit_converter()function, and deposit the output (6.111111) as the fourth element incelsius_outputs_vector.
There are a variety of map() functions from the purrr package, and the precise one you should use turns on the number of arguments used by the function (in this example, there is of course only one argument, i.e. “fahrenheit_input”), and the desired class of the output (i.e. numeric vector, character vector, data frame, list etc.). For example, let’s say we want to apply the fahrenheit_to_celsius_converter function iteratively to the input values in fahrenheit_input_vector, but that we want the output values to be stored as a list, rather than as a vector. Instead of using the map_dbl() function, we can use the map() function, which always returns outputs as a list. Below, we pass our input vector (fahrenheit_input_vector), and the function we want to iteratively apply to the elements of the input vector (fahrenheit_converter) to the map() function. We’ll assign the output list to a new object named celsius_outputs_list:
# iteratively applies the "fahrenheit_to_celsius_converter" function to the input values in "fahrenheit_input_vector", and assigns the list of celsius output values to a new object named "celsius_outputs_list"
celsius_outputs_list<-map(fahrenheit_input_vector, fahrenheit_to_celsius_converter)Let’s print out the list of output values:
## [[1]]
## [1] 7.555556
##
## [[2]]
## [1] 35.5
##
## [[3]]
## [1] 19.88889
##
## [[4]]
## [1] 6.111111
We can confirm that celsius_outputs_list is indeed a list using the class() function that we introduced earlier:
## [1] "list"
Now, let’s say we we want to organize our information in a data frame, where one column represents our Fahrenheit input values, and the other column represents the corresponding Celsius output values. To do so, we’ll first slightly modify our function to return a data frame:
# Creates function that takes an input value in degrees Fahrenheit (fahrenheit_input), converts this value to Celsius, and returns a data frame with the input Fahrenheit temperature value as one column, and the corresponding Celsius temperature value as another column; the function is assigned to a new object named "fahrenheit_to_celsius_converter_df"
fahrenheit_to_celsius_converter_df<-function(fahrenheit_input){
celsius_output<-(fahrenheit_input-32)*(5/9)
celsius_output_df<-data.frame(fahrenheit_input, celsius_output)
return(celsius_output_df)
}Now, let’s test out this new function for a single “fahrenheit_input” value, to make sure it works as expected; we’ll test it out for a value of 63 degrees Fahrenheit:
# applies "fahrenheit_to_celsius_converter_df" function to input value of 63 degrees Fahrenheit
fahrenheit_to_celsius_converter_df(fahrenheit_input=63)## fahrenheit_input celsius_output
## 1 63 17.22222
Having confirmed that the function works as expected, let’s now assemble a dataset using multiple Fahrenheit input values, where one column consists of these input values, and the second column consists of the corresponding Celsius outputs. We can do so using the map_dfr() function from the purrr package, which is a cousin of the map() and map_dbl() functions we explored above. While the map() function returns function outputs in a list, and the map_dbl() function returns function outputs in a numeric vector, the map_dfr() is used to bind together multiple function outputs rowwise into a data frame. To make this more concrete, let’s consider the code below, which uses map_dfr() to iteratively apply the fahrenheit_to_celsius_converter_df function to the Fahrenheit values in fahrenheit_input_vector, and assemble the resulting rows into a data frame that is assigned to a new object named celsius_outputs_df:
# Iteratively applies the "fahrenheit_to_celsius_converter_df" function to input values in "fahrenheit_input_vector" to generate a data frame with column of input Fahrenheit values, and column of corresponding output Celsius values; assigns this data frame to a new object named "celsius_outputs_df"
celsius_outputs_df<-map_dfr(fahrenheit_input_vector, fahrenheit_to_celsius_converter_df)Let’s now print the contents of celsius_outputs_df:
## fahrenheit_input celsius_output
## 1 45.6 7.555556
## 2 95.9 35.500000
## 3 67.8 19.888889
## 4 43.0 6.111111
We now have a dataset with one column consisting of our Fahrenheit inputs (taken from fahrenheit_input_vector), and a second column consisting of our Celsius outputs (derived by applying the fahrenheit_to_celsius_converter_df() function to our vector of input values, `fahrenheit_input_vector).
We’ve just covered three different purrr functions: map() (which returns a list), map_dbl() (which returns a vector), and map_dfr() (which returns a dataframe). There are other map functions which return different types of objects; you can see a list of these other map functions by inspecting the documentation for the map() function:
The process of iteratively applying a function with more than one argument is beyond the scope of the workshop, but the same general principles are at work in those cases. If you’d like to explore the process of iteratively applying a function with two arguments, or more than two arguments, check out the documentation for the map2() and pmap() functions, respectively.
Before we move into the next section, let’s consider one more example of how you can use your own custom-written functions in conjunction with the iteration functions in the purrr package to write scripts that can help you to automate tedious tasks. In particular, we’ll demonstrate the utility of the list data structure in helping you to carry out such automation tasks.
Let’s say, for example, that you have temperature values stored in Fahrenheit, for multiple countries, and want to quickly convert those country-level values to degrees Celsius. Suppose that these Fahrenheit values are stored in a series of vectors:
# creates sample country-level Fahrenheit data for Country A
countryA_fahrenheit<-c(55,67,91,23, 77, 98, 27)
# creates sample country-level Fahrenheit data for Country B
countryB_fahrenheit<-c(33,45,11,66, 44)
# creates sample country-level Fahrenheit data for Country C
countryC_fahrenheit<-c(60,55,12,109)
# creates sample country-level Fahrenheit data for Country D
countryD_fahrenheit<-c(76, 24, 77, 78)Let’s say that we want to take all of these vectors, and iteratively pass them as arguments to the fahrenheit_to_celsius_converter_df function, thereby creating four country-specific data frames that have the original Fahrenheit values in one column and the transformed Celsius values in the other column. The easiest way to do this is to first put our input vectors into a list, which we’ll assign to a new object named temperature_input_list:
# Creates list of input vectors and assigns this list to new object named "input_list"
temperature_input_list<-list(countryA_fahrenheit, countryB_fahrenheit, countryC_fahrenheit, countryD_fahrenheit) Now, we’ll use the map() function to iteratively pass the vectors in temperature_input_list as arguments to the fahrenheit_to_celsius_converter_df function, and deposits the resulting data frames into a list; we’ll assign this list that contains the output data frames to a new list object, named processed_temperature_data_list:
# Iteratively passes vectors in "temperature_input_list" as arguments to "fahrenheit_to_celsius_converter_df" and deposits the resulting data frames to a list, which is assigned to a new object named "processed_temperature_data_list"
processed_temperature_data_list<-map(temperature_input_list, fahrenheit_to_celsius_converter_df)In effect, the code above takes the countryA_fahrenheit vector, uses it as the argument to the fahrenheit_to_celsius_converter_df function, and deposits the resulting data frame as the first element in the processed_temperature_data_list list; it then takes the countryB_fahrenheit vector, uses it as the argument to the fahrenheit_to_celsius_converter_df function, and deposits the resulting data frame as the second element in the processed_temperature_data_list list; and so on.
Let’s print the contents of processed_temperature_data_list and confirm that our data frames have been created as expected:
## [[1]]
## fahrenheit_input celsius_output
## 1 55 12.777778
## 2 67 19.444444
## 3 91 32.777778
## 4 23 -5.000000
## 5 77 25.000000
## 6 98 36.666667
## 7 27 -2.777778
##
## [[2]]
## fahrenheit_input celsius_output
## 1 33 0.5555556
## 2 45 7.2222222
## 3 11 -11.6666667
## 4 66 18.8888889
## 5 44 6.6666667
##
## [[3]]
## fahrenheit_input celsius_output
## 1 60 15.55556
## 2 55 12.77778
## 3 12 -11.11111
## 4 109 42.77778
##
## [[4]]
## fahrenheit_input celsius_output
## 1 76 24.444444
## 2 24 -4.444444
## 3 77 25.000000
## 4 78 25.555556
As an exercise, try and extract a given dataset from processed_temperature_data_list using the indexing method we discussed above. Additionally, see if you can assign names to the list elements in processed_temperature_data_list.