Pre-requisites:
tidyverse
Loops are coding structures that help repeat highly similar or repetitive code effectively, without the need for copy-pasting. They increase code efficiency and readability.
Types of loops:
for
loop (do something x times)while
loop (do something until a certain event is reached)if
statements (control flow, not actually a ‘proper’ loop)For
loopsWhat is a for
loop? It does something repeatedly a certain amount of times.
Syntax
for (values to loop over) {
do all of this stuff for the current value
}
The for
loop keeps repeating until the last value is reached.
In a bit more appropriate coding jargon:
for (var in seq) {
expr
}
Where var
= variable, seq
= sequence, and expr
= expression. You need to use these specific types of brackets in the correct place, otherwise the for
loop won’t work.
A really simple example is:
for (x in 1:10) {
print(x)
}
Here, x is defined as 1:10 (a sequence of 1 to 10 in steps of 1, i.e., 1, 2, 3, … 10). The first time the loop runs, x <- 1
, so it prints 1. The second time, x <- 2
, so it prints 2. Etc. Until x <- 10
has been printed, then it stops as x has no further values.
What is the output of the following for loop:
for (x in 1:2) {
y <- x + 3
print(y)
}
What is the output of the following for loop:
for (x in 1:2) {
y <- x + 3
print(y)
}
for
loopConsider the following scenario. You want to create 3 variables, each is equal to the previous variable, plus 1.
Easy enough to type out, right? If a is a different number, e.g., 3; b, c and d are still calculated correctly and you don’t need to change any lines other than the first where a value is assigned to a. Done!
However, it turns out you now need to do the same thing, but rather than just 3 variables, you need to create 100! That’s a lot of copy-pasting to create e, f, g, …
Let’s try to write a for
loop to obtain the above 3 output variables:
Not only did you just do the same thing super fast, you also tidied the output into a single list, so you don’t clutter your workspace with lots of separate variables. If you needed the 3rd output, you can obtain it using:
output[3]
Hey, but that was more lines of code than the copy-pasting above! Yeah, well, now think about when you needed to do this 100 times, instead of 3. All you need to change is:
No additional lines! And you can still easily change the input variable.
No! ‘x’ is just a placeholder variable (a bit of a habit in coding examples, you may also see ‘i’ used often), it can be anything you like. In fact, it is a better idea to give it a sensible name (ideally following coding style best practice!) so you and anybody else reading your code can remember/figure out what this loop is actually doing.
Let’s try a more sensible example:
Yes, a lot! It is a fundamental tool in coding. It takes some practice, but saves lots of time and effort in computing.
Your Console is hanging waiting for input, showing a ‘+’, and not a ‘>’. You have most likely forgotten a bracket somewhere. Press ESC and you can correct your loop and try again.
while
loopsWhat is a while
loop? It keeps on doing something continuously until a certain event is reached.
while (condition) {
expr
}
Where condition
is a criterion that needs to be satisfied for the loop to keep on running, and expr
= expression (some bit of code). The condition uses a Boolean statement, that is: TRUE
of FALSE
. As long as the condition equals TRUE
, the while loop will keep on executing the expression. As soon as this condition equals FALSE
, it stops.
while
loopYou have 20 apples. You are handing out 1 at a time to others. Once you only have 2 apples left, you want to stop handing them out, so that you have some for yourself.
What happened there? The loop ran initially with num_apples
= 20. On the next iteration, num_apples <- 19
, and 19 > 2
== TRUE
, so it kept on running, then it was 18, 17, etc, until num_apples <- 2
and 2 > 2
== FALSE
, so the loop stopped and the print line below the closing bracket of the loop was executed.
Fun anecdote - while (!!) I was writing this loop, I made a typo, which meant the loop kept going forever as the condition never changed to FALSE
! (Try the following code, then Press ESC to exit the stuck loop):
num_apples <- 20
while (num_apples > 2) {
print("Keep handing out apples")
num_applies <- num_apples - 1
}
print(num_apples)
You can see that num_apples
never changed from 20, so the while
condition remained TRUE
at all times!
Depends. In analytical work, probably not as much as for
loops. But it has its uses.
# Exercise 1: Try and write a for loop, printing the numbers 1 to 20:
# Exercise 2: This for loop doesn't work. Why not? Try to fix it:
for i in 1:10 {
print(i)
}
# Exercise 3: Try and write a while loop, increasing a variable 'i' by 1 at each
# iteration, starting with i <- 1. Keep running until i reaches 6, then stop.
# Print i at each iteration to keep track of your loop:
# Exercise 1: Try and write a for loop, printing the numbers 1 to 20:
for (i in 1:20) {
print(i)
}
# Exercise 2: This for loop doesn't work. Why not? Try to fix it:
# Answer: you forgot some brackets!
for (i in 1:10) {
print(i)
}
# Exercise 3: Try and write a while loop, increasing a variable 'i' by 1 at each
# iteration, starting with i <- 1. Keep running until i reaches 6, then stop.
# Print i at each iteration to keep track of your loop:
i <- 1
while (i < 6) {
print(i)
i <- i + 1
}
if
statements (control flow)What is an if
statement/control flow? It does what it suggests: conditional statements. If some criterion is TRUE
, execute an expression. If it is FALSE
, don’t execute the expression. This can be expanded with else if
to include an additional alternative statement, and/or else
to express what to do if none of the above criteria are satisfied (else
always goes last).
Syntax
if (test_expression) {
statement
}
Relational operators
You need relational operators to run if
statements. These simply compare a statement on the left with a statement on the right. Here is a list:
==
b “a is equal to b”>
b “a is greater than b”>=
b “a is greater than or equal to b”<
b “a is smaller than b”<=
b “a is smaller than or equal to b”!=
b “a is not equal to b”Try them out, for example:
a <- 1
b <- 2
a == b
b < a
a != b
You can see the output in the console is a boolean/logical value, either TRUE
or FALSE
.
What is the output of: a < b
a <- 4
b <- 6
a < b
What is the output to: a < b
<- 4
a <- 6
b < b
a }
Let’s try to code how a traffic lights work. We’ll start with green:
light_status <- "green"
if (light_status == "green") {
print("go")
}
You will get to “go” only if the light is green and for no other value. (Try some other values for light_status)
else
Same as before, except anything other than green now gets a “stop” message. (Try some other values for light_status)
else if
A green light still gets “go”, a red light still gets “stop”, but an orange light now gets a different message.
The traffic light works as it should! However… in coding you usually want to catch errors before they happen. Imagine the light cover is broken and the light is white. It might actually be a green, orange, or red. You will have to do a double take and determine which position the light is located and/or any further information available before deciding what to do. In the above example, anything other than green or orange would tell you to “stop”. We could expand this example by adding another else if
statement:
(Try some other values for light_status)
Now you have catered for all possible scenarios. You can add as many else if
statements as you wish.
ifelse
There is a quicker way to code an if
statement, whenever the situation is binary, i.e., there are only 2 options: if TRUE
, do this, if FALSE
, do that. There is a special built-in function in R to do this in one line:
Consider the following 2-option if
, else if
statement, where there are no other scenarios possible:
This can be replaced with ifelse
, syntax:
ifelse(test, yes, no)
So here, this would be:
5 lines of code became 1. Please note to only use this function when you have a binary decision. If there are 3 or more options, you need the full-version if
statement.
A small note: note how R printed the output twice. Because ifelse
is a function, it returns a value. So if you didn’t want R to print out the resulting value twice in the console, you would need to assign the output to a variable (using <- before ifelse
).
What will be the output of:
What will be the output of:
# Exercise 4: Try and write an if statement, printing "below 5" if the input
# value is <5, and "5 up" if 5 or above:
# Exercise 5: This if statement doesn't work. Why not? Try to fix it:
<- "apple"
input if (input == "apple") {
<- "This is an apple"
output else {
} <- "This is not an apple or a pear"
output else if (input == "pear") {
} <- "This is a pear"
output
}print(output)
# Exercise 4: Try and write an if statement, printing "below 5" if the input
# value is <5, and "5 up" if 5 or above:
x <- 6
if (x < 5) {
print("below 5")
} else if (x >= 5) {
print("5 up")
}
# Exercise 5: This if statement doesn't work. Why not? Try to fix it:
# Answer: else and else if were the wrong way around!
input <- "apple"
if (input == "apple") {
output <- "This is an apple"
} else if (input == "pear") {
output <- "This is a pear"
} else {
output <- "This is not an apple or a pear"
}
print(output)
You can combine any of the above loops/conditional statements by nesting them.
Let’s combine an if
statement and for
loop. Let’s expand the baby names examples from earlier, and only print a name if the name was in the top 10 in 2021. We’ll need to draw from a database to do this.
Note the use of %in%
here for the if
statement only. The for
loop knows how to use a regular in
, as it iterates over the contents of the sequence. However, the if
statement requires a relational operator with result TRUE
or FALSE
. To check whether a variable is included within a list or vector, you can use %in%
as you can see here.
I want to print a baby name if it’s in the top 10, but only if there are more than 5 babies in my database. What type of control flow structure would I start with (i.e. the first/outer-most one)?
I want to print a baby name if it’s in the top 10, but only if there are more than 5 babies in my database. What type of control flow structure would I start with (i.e. the first/outer-most one)?
In nested loops, it is easily done to forget a bracket or have an extra one. In your script, you can place your cursor directly after one of the brackets, and the corresponding bracket will be highlighted.
To keep track of your nested loops, and all the brackets required, it also helps a lot to use the correct indentation. You can re-indent lines automically by highlighting a section of code, then Code -> Reindent Lines, or use the shortcut ctrl-I.
Sometimes, you want the loop to run, but stop when a certain criterion is reached. In this scenario, you can use the break
statement.
An example: you want to write out some data, but your system can’t handle more than 10 characters in the filename. Your for
loop will print the output (in your real code, this would be a statement writing the data using the supplied filename), but it will first count the number of characters in your filename, and stop if the name is too long, so that your system doesn’t crash.
Actually, we want the above loop to not just stop, but skip filenames that are too long and carry on with the next one! For this, you can use next
.
As you know, a for
loop will run for all values within the sequence. It can be very useful to have access to the index that goes along with each value. For example, you can use this to store the output in a vector, or to use a counter. The index reflects the current iteration of the loop.
The following loop illustrates the difference between index and variable value:
And a numerical example, to show how you can use this to store results in a new vector:
Note: seq_along()
can be used in place of 1:length(input)
. This is a built-in method that, by default, creates a sequence of consecutive integers from 1 to the length of the object (vector, list, data frame).
i.e., the previous for
loop can be written as:
# Exercise 6: Can you expand the baby names example to print "This baby name is
# in the top 10 girls names" when this is the case?
# (hint: you'll need to add another 'database':
# top_10_2021_girls <- c("Olivia", "Amelia", "Isla", "Ava", "Ivy", "Freya",
# "Lily", "Florence", "Mia", "Willow")
# This is a copy of the babies loop used in the training course:
baby_names <- list("Oliver", "Leo", "Olivia", "Frank")
top_10_2021_boys <- c("Noah", "Oliver", "George", "Arthur", "Muhammad", "Leo",
"Harry", "Oscar", "Archie", "Henry")
for (baby_name in baby_names) {
if (baby_name %in% top_10_2021_boys) {
print(baby_name)
}
}
# Now expand this to print "This baby name is in the top 10 girls names" when
# applicable:
# Exercise 7: This nested loop doesn't work. Why not? Try to fix it:
x <- 1:10
for (i in x) {
if (x < 3) {
print(paste0("i = ", i, " = below 3"))
} else {
print(paste0("i = ", i, " = not below 3"))
}
}
## The following are more advanced exercises:
# Exercise 8: Can you write the apples example `while` loop as a `for` loop
# instead? Try to use an index iterator, and print this each time the loop runs:
# A repeat of the while loop used in the course:
num_apples <- 20
while (num_apples > 2) {
print("Keep handing out apples")
num_apples <- num_apples - 1
}
print(paste0("Stop handing out apples, because you only have ",
num_apples, " left"))
# Now try writing this as a for loop instead (requires some simple maths):
# Exercise 6: Can you expand the baby names example to print "This baby name is
# in the top 10 girls names" when this is the case?
# (hint: you'll need to add another 'database':
# top_10_2021_girls <- c("Olivia", "Amelia", "Isla", "Ava", "Ivy", "Freya",
# "Lily", "Florence", "Mia", "Willow")
# This is a copy of the babies loop used in the training course:
baby_names <- list("Oliver", "Leo", "Olivia", "Frank")
top_10_2021_boys <- c("Noah", "Oliver", "George", "Arthur", "Muhammad", "Leo",
"Harry", "Oscar", "Archie", "Henry")
top_10_2021_girls <- c("Olivia", "Amelia", "Isla", "Ava", "Ivy", "Freya",
"Lily", "Florence", "Mia", "Willow")
for (baby_name in baby_names) {
if (baby_name %in% top_10_2021_boys) {
print(baby_name)
} else if (baby_name %in% top_10_2021_girls) {
print(baby_name)
print("This baby name is in the top 10 girls names" )
}
}
# Exercise 7: This nested loop doesn't work. Why not? Try to fix it:
# Answer: you used the wrong iterator (x vs i)!
x <- 1:10
for (i in x) {
if (i < 3) {
print(paste0("i = ", i, " = below 3"))
} else {
print(paste0("i = ", i, " = not below 3"))
}
}
## The following are more advanced exercises:
# Exercise 8: Can you write the apples example `while` loop as a `for` loop
# instead? Try to use an index iterator, and print this each time the loop runs:
# for loop:
num_apples <- 20
num_apples_to_keep <- 2
num_apples_hand_out <- 1
for (i in 1:((num_apples - num_apples_to_keep) / num_apples_hand_out)) {
print(paste0("Loop iteration: ", i))
print(paste0("Keep handing out apples, you still have ",
num_apples, " left"))
num_apples <- num_apples - num_apples_hand_out
}
print(paste0("Stop handing out apples, because you only have ",
num_apples, " left"))
# or, without the variables assigned in advance:
num_apples <- 20
for (i in 1:((num_apples - 2) / 1)) {
print(paste0("Loop iteration: ", i))
print(paste0("Keep handing out apples, you still have ",
num_apples, " left"))
num_apples <- num_apples - 1
}
print(paste0("Stop handing out apples, because you only have ",
num_apples, " left"))
For more information about control flow see: