Control Flow and Loops in R

Pre-requisites:

What are loops?

Loops are coding structures that help repeat highly similar or repetitive code effectively, without the need for copy-pasting. They increase code efficiency and readability.

Types of loops:

For loops

What is a for loop? It does something repeatedly a certain amount of times.

Syntax

for (values to loop over) {
  do all of this stuff for the current value
}

The for loop keeps repeating until the last value is reached.

In a bit more appropriate coding jargon:

for (var in seq) {
  expr
}

Where var = variable, seq = sequence, and expr = expression. You need to use these specific types of brackets in the correct place, otherwise the for loop won’t work.

A really simple example is:

for (x in 1:10) {
  print(x)
}

Here, x is defined as 1:10 (a sequence of 1 to 10 in steps of 1, i.e., 1, 2, 3, … 10). The first time the loop runs, x <- 1, so it prints 1. The second time, x <- 2, so it prints 2. Etc. Until x <- 10 has been printed, then it stops as x has no further values.

Quiz 1

Question

What is the output of the following for loop:

for (x in 1:2) {
  y <- x + 3
  print(y) 
}
  1. 1, 2
  2. 3, 4
  3. 4, 5
  4. 5, 6

See Answer

What is the output of the following for loop:

for (x in 1:2) {
  y <- x + 3
  print(y) 
}
  1. 1, 2
  2. 3, 4
  3. 4, 5
  4. 5, 6

Example of the advantages of a for loop

Consider the following scenario. You want to create 3 variables, each is equal to the previous variable, plus 1.

a <- 1
b <- a + 1
c <- b + 1
d <- c + 1
print(c(b, c, d))

Easy enough to type out, right? If a is a different number, e.g., 3; b, c and d are still calculated correctly and you don’t need to change any lines other than the first where a value is assigned to a. Done!

However, it turns out you now need to do the same thing, but rather than just 3 variables, you need to create 100! That’s a lot of copy-pasting to create e, f, g, …

Let’s try to write a for loop to obtain the above 3 output variables:

input_variable <- 1
number_of_outputs <- 3
output <- c() # empty vector to put outputs into
for (x in 1:number_of_outputs) {
  output[x] <- x + 1
}
print(output)

Not only did you just do the same thing super fast, you also tidied the output into a single list, so you don’t clutter your workspace with lots of separate variables. If you needed the 3rd output, you can obtain it using:

output[3]

Hey, but that was more lines of code than the copy-pasting above! Yeah, well, now think about when you needed to do this 100 times, instead of 3. All you need to change is:

input_variable <- 1
number_of_outputs <- 100
output <- c() #empty vector to put outputs into
for (x in 1:number_of_outputs) {
  output[x] <- x + 1
}
print(output)

No additional lines! And you can still easily change the input variable.

Do I have to use x?

No! ‘x’ is just a placeholder variable (a bit of a habit in coding examples, you may also see ‘i’ used often), it can be anything you like. In fact, it is a better idea to give it a sensible name (ideally following coding style best practice!) so you and anybody else reading your code can remember/figure out what this loop is actually doing.

Let’s try a more sensible example:

baby_names <- list("Oliver", "Leo", "Olivia", "Frank")

for (baby_name in baby_names) {
  print(baby_name)
}

Will I actually use this much?

Yes, a lot! It is a fundamental tool in coding. It takes some practice, but saves lots of time and effort in computing.

My loop is stuck!

Your Console is hanging waiting for input, showing a ‘+’, and not a ‘>’. You have most likely forgotten a bracket somewhere. Press ESC and you can correct your loop and try again.

while loops

What is a while loop? It keeps on doing something continuously until a certain event is reached.

Syntax

while (condition) {
  expr
}

Where condition is a criterion that needs to be satisfied for the loop to keep on running, and expr = expression (some bit of code). The condition uses a Boolean statement, that is: TRUE of FALSE. As long as the condition equals TRUE, the while loop will keep on executing the expression. As soon as this condition equals FALSE, it stops.

Example of a while loop

You have 20 apples. You are handing out 1 at a time to others. Once you only have 2 apples left, you want to stop handing them out, so that you have some for yourself.

num_apples <- 20
while (num_apples > 2) {
  print("Keep handing out apples")
  num_apples <- num_apples - 1
}
print(paste0("Stop handing out apples, because you only have ", num_apples, " left"))

What happened there? The loop ran initially with num_apples = 20. On the next iteration, num_apples <- 19, and 19 > 2 == TRUE, so it kept on running, then it was 18, 17, etc, until num_apples <- 2 and 2 > 2 == FALSE, so the loop stopped and the print line below the closing bracket of the loop was executed.

Fun anecdote - while (!!) I was writing this loop, I made a typo, which meant the loop kept going forever as the condition never changed to FALSE! (Try the following code, then Press ESC to exit the stuck loop):

num_apples <- 20
while (num_apples > 2) {
  print("Keep handing out apples")
  num_applies <- num_apples - 1
}
print(num_apples)

You can see that num_apples never changed from 20, so the while condition remained TRUE at all times!

Will I actually use this much?

Depends. In analytical work, probably not as much as for loops. But it has its uses.

Loop excercises

Exercises

# Exercise 1: Try and write a for loop, printing the numbers 1 to 20:


# Exercise 2: This for loop doesn't work. Why not? Try to fix it:

for i in 1:10 {
  print(i)
}

# Exercise 3: Try and write a while loop, increasing a variable 'i' by 1 at each
# iteration, starting with i <- 1. Keep running until i reaches 6, then stop.
# Print i at each iteration to keep track of your loop:

Example solutions

# Exercise 1: Try and write a for loop, printing the numbers 1 to 20:
for (i in 1:20) {
  print(i)
}

# Exercise 2: This for loop doesn't work. Why not? Try to fix it:
# Answer: you forgot some brackets!
for (i in 1:10) {
  print(i)
}

# Exercise 3: Try and write a while loop, increasing a variable 'i' by 1 at each
# iteration, starting with i <- 1. Keep running until i reaches 6, then stop.
# Print i at each iteration to keep track of your loop:
i <- 1
while (i < 6) {
  print(i)
  i <- i + 1
}

if statements (control flow)

What is an if statement/control flow? It does what it suggests: conditional statements. If some criterion is TRUE, execute an expression. If it is FALSE, don’t execute the expression. This can be expanded with else if to include an additional alternative statement, and/or else to express what to do if none of the above criteria are satisfied (else always goes last).

Syntax

if (test_expression) {
  statement
}

Relational operators

You need relational operators to run if statements. These simply compare a statement on the left with a statement on the right. Here is a list:

Try them out, for example:

a <- 1
b <- 2
a == b
b < a
a != b

You can see the output in the console is a boolean/logical value, either TRUE or FALSE.

Quiz 2

Question

What is the output of: a < b

a <- 4
b <- 6
a < b
  1. 2
  2. TRUE
  3. NA
  4. FALSE

See Answer

What is the output to: a < b

a <- 4
b <- 6
a < b
}
  1. 2
  2. TRUE
  3. NA
  4. FALSE

Example: traffic lights

Let’s try to code how a traffic lights work. We’ll start with green:

light_status <- "green"
if (light_status == "green") {
  print("go")
}

You will get to “go” only if the light is green and for no other value. (Try some other values for light_status)

else

light_status <- "green"
if (light_status == "green") {
  print("go")
} else {
  print("stop")
}

Same as before, except anything other than green now gets a “stop” message. (Try some other values for light_status)

else if

light_status <- "green"
if (light_status == "green") {
  print("go")
} else if (light_status == "orange") {
  print("stop if you are able to...")
} else {
  print("stop")
}

A green light still gets “go”, a red light still gets “stop”, but an orange light now gets a different message.

The traffic light works as it should! However… in coding you usually want to catch errors before they happen. Imagine the light cover is broken and the light is white. It might actually be a green, orange, or red. You will have to do a double take and determine which position the light is located and/or any further information available before deciding what to do. In the above example, anything other than green or orange would tell you to “stop”. We could expand this example by adding another else if statement:

light_status <- "white"
if (light_status == "green") {
  print("go")
} else if (light_status == "orange") {
  print("stop if you are able to...")
} else if (light_status == "red") {
  print("stop")
} else {
  print("Warning: unexpected colour! Be careful and assess the situation...")
}

(Try some other values for light_status)

Now you have catered for all possible scenarios. You can add as many else if statements as you wish.

ifelse

There is a quicker way to code an if statement, whenever the situation is binary, i.e., there are only 2 options: if TRUE, do this, if FALSE, do that. There is a special built-in function in R to do this in one line:

Consider the following 2-option if, else if statement, where there are no other scenarios possible:

x <- 6
if (x < 10) {
  print("smaller than 10")
} else if (x >= 10) {
  print("greater than 10")
}

This can be replaced with ifelse, syntax:

ifelse(test, yes, no)

So here, this would be:

ifelse(x < 10, print("smaller than 10"), print("greater than 10"))

5 lines of code became 1. Please note to only use this function when you have a binary decision. If there are 3 or more options, you need the full-version if statement.

A small note: note how R printed the output twice. Because ifelse is a function, it returns a value. So if you didn’t want R to print out the resulting value twice in the console, you would need to assign the output to a variable (using <- before ifelse).

Quiz 3

Question

What will be the output of:

x <- 20
ifelse(x == 20, print("FALSE"), print("TRUE"))
  1. TRUE
  2. FALSE

See Answer

What will be the output of:

x <- 20
ifelse(x == 20, print("FALSE"), print("TRUE"))
  1. TRUE
  2. FALSE

Control flow exercises

Exercises

# Exercise 4: Try and write an if statement, printing "below 5" if the input
# value is <5, and "5 up" if 5 or above:


# Exercise 5: This if statement doesn't work. Why not? Try to fix it:
input <- "apple"
if (input == "apple") {
  output <- "This is an apple"
} else {
  output <- "This is not an apple or a pear"
} else if (input == "pear") {
  output <- "This is a pear"
}
print(output)

Example solutions

# Exercise 4: Try and write an if statement, printing "below 5" if the input
# value is <5, and "5 up" if 5 or above:
x <- 6
if (x < 5) {
  print("below 5")
} else if (x >= 5) {
  print("5 up")
}

# Exercise 5: This if statement doesn't work. Why not? Try to fix it:
# Answer: else and else if were the wrong way around!
input <- "apple"
if (input == "apple") {
  output <- "This is an apple"
} else if (input == "pear") {
  output <- "This is a pear"
} else {
  output <- "This is not an apple or a pear"
}
print(output)

Combining loops/conditional statements

You can combine any of the above loops/conditional statements by nesting them.

Example

Let’s combine an if statement and for loop. Let’s expand the baby names examples from earlier, and only print a name if the name was in the top 10 in 2021. We’ll need to draw from a database to do this.

baby_names <- list("Oliver", "Leo", "Olivia", "Frank")
top_10_2021_boys <- c("Noah", "Oliver", "George", "Arthur", "Muhammad", "Leo", "Harry", "Oscar", "Archie", "Henry")

for (baby_name in baby_names) {
  if (baby_name %in% top_10_2021_boys) {
    print(baby_name)
  }
}

Note the use of %in% here for the if statement only. The for loop knows how to use a regular in, as it iterates over the contents of the sequence. However, the if statement requires a relational operator with result TRUE or FALSE. To check whether a variable is included within a list or vector, you can use %in% as you can see here.

Quiz 4

Question

I want to print a baby name if it’s in the top 10, but only if there are more than 5 babies in my database. What type of control flow structure would I start with (i.e. the first/outer-most one)?

  1. for loop
  2. if statement
  3. while loop

See Answer

I want to print a baby name if it’s in the top 10, but only if there are more than 5 babies in my database. What type of control flow structure would I start with (i.e. the first/outer-most one)?

  1. for loop
  2. if statement
  3. while loop

Keep track of your brackets

In nested loops, it is easily done to forget a bracket or have an extra one. In your script, you can place your cursor directly after one of the brackets, and the corresponding bracket will be highlighted.

To keep track of your nested loops, and all the brackets required, it also helps a lot to use the correct indentation. You can re-indent lines automically by highlighting a section of code, then Code -> Reindent Lines, or use the shortcut ctrl-I.

Advanced loop features

Break

Sometimes, you want the loop to run, but stop when a certain criterion is reached. In this scenario, you can use the break statement.

An example: you want to write out some data, but your system can’t handle more than 10 characters in the filename. Your for loop will print the output (in your real code, this would be a statement writing the data using the supplied filename), but it will first count the number of characters in your filename, and stop if the name is too long, so that your system doesn’t crash.

filenames <- c("filename1", "filename2", "fartoolongfilename3", "filename4")
for (filename in filenames) {
  if (nchar(filename) > 10) {
    break
  }
  print(filename)
}

Next

Actually, we want the above loop to not just stop, but skip filenames that are too long and carry on with the next one! For this, you can use next.

filenames <- c("filename1", "filename2", "fartoolongfilename3", "filename4")
for (filename in filenames) {
  if (nchar(filename) > 10) {
    next
  }
  print(filename)
}

Looping index

As you know, a for loop will run for all values within the sequence. It can be very useful to have access to the index that goes along with each value. For example, you can use this to store the output in a vector, or to use a counter. The index reflects the current iteration of the loop.

The following loop illustrates the difference between index and variable value:

filenames <- c("filename1", "filename2", "fartoolongfilename3", "filename4")
for (index in 1:length(filenames)) {
  print(paste0("index: ", index))
  print(paste0("current element: ", filenames[index]))
}

And a numerical example, to show how you can use this to store results in a new vector:

input <- c(6, 3, 10, 2)
result <- c()
for (i in 1:length(input)) {
  print(paste0("index: ", i))
  print(paste0("current value: ", input[i]))
  result[i] <- input[i] * 2
}

Note: seq_along() can be used in place of 1:length(input). This is a built-in method that, by default, creates a sequence of consecutive integers from 1 to the length of the object (vector, list, data frame).

input <- c(6, 3, 10, 2)
seq_along(input)
[1] 1 2 3 4

i.e., the previous for loop can be written as:

input <- c(6, 3, 10, 2)
result <- c()
for (i in seq_along(input)) {
  print(paste0("index: ", i))
  print(paste0("current value: ", input[i]))
  result[i] <- input[i] * 2
}

Control flow and loops excercises

Exercises

# Exercise 6: Can you expand the baby names example to print "This baby name is 
# in the top 10 girls names" when this is the case?
# (hint: you'll need to add another 'database': 
# top_10_2021_girls <- c("Olivia", "Amelia", "Isla", "Ava", "Ivy", "Freya",
# "Lily", "Florence", "Mia", "Willow")

# This is a copy of the babies loop used in the training course:
baby_names <- list("Oliver", "Leo", "Olivia", "Frank")
top_10_2021_boys <- c("Noah", "Oliver", "George", "Arthur", "Muhammad", "Leo", 
                      "Harry", "Oscar", "Archie", "Henry")

for (baby_name in baby_names) {
  if (baby_name %in% top_10_2021_boys) {
    print(baby_name)
  }
}

# Now expand this to print "This baby name is in the top 10 girls names" when 
# applicable:


# Exercise 7: This nested loop doesn't work. Why not? Try to fix it:

x <- 1:10
for (i in x) {
  if (x < 3) {
    print(paste0("i = ", i, " = below 3"))
  } else {
    print(paste0("i = ", i, " = not below 3"))
  }
}


## The following are more advanced exercises:

# Exercise 8: Can you write the apples example `while` loop as a `for` loop
# instead? Try to use an index iterator, and print this each time the loop runs:

# A repeat of the while loop used in the course:
num_apples <- 20
while (num_apples > 2) {
  print("Keep handing out apples")
  num_apples <- num_apples - 1
}
print(paste0("Stop handing out apples, because you only have ",
             num_apples, " left"))

# Now try writing this as a for loop instead (requires some simple maths):

Example solutions

# Exercise 6: Can you expand the baby names example to print "This baby name is 
# in the top 10 girls names" when this is the case?
# (hint: you'll need to add another 'database': 
# top_10_2021_girls <- c("Olivia", "Amelia", "Isla", "Ava", "Ivy", "Freya",
# "Lily", "Florence", "Mia", "Willow")

# This is a copy of the babies loop used in the training course:
baby_names <- list("Oliver", "Leo", "Olivia", "Frank")
top_10_2021_boys <- c("Noah", "Oliver", "George", "Arthur", "Muhammad", "Leo", 
                      "Harry", "Oscar", "Archie", "Henry")
top_10_2021_girls <- c("Olivia", "Amelia", "Isla", "Ava", "Ivy", "Freya",
                       "Lily", "Florence", "Mia", "Willow")

for (baby_name in baby_names) {
  if (baby_name %in% top_10_2021_boys) {
    print(baby_name)
  } else if (baby_name %in% top_10_2021_girls) {
    print(baby_name)
    print("This baby name is in the top 10 girls names" )
  }
}

# Exercise 7: This nested loop doesn't work. Why not? Try to fix it:
# Answer: you used the wrong iterator (x vs i)!
x <- 1:10
for (i in x) {
  if (i < 3) {
    print(paste0("i = ", i, " = below 3"))
  } else {
    print(paste0("i = ", i, " = not below 3"))
  }
}

## The following are more advanced exercises:

# Exercise 8: Can you write the apples example `while` loop as a `for` loop
# instead? Try to use an index iterator, and print this each time the loop runs:

# for loop:
num_apples <- 20
num_apples_to_keep <- 2
num_apples_hand_out <- 1
for (i in 1:((num_apples - num_apples_to_keep) / num_apples_hand_out)) {
  print(paste0("Loop iteration: ", i))
  print(paste0("Keep handing out apples, you still have ",
               num_apples, " left"))
  num_apples <- num_apples - num_apples_hand_out
}
print(paste0("Stop handing out apples, because you only have ",
             num_apples, " left"))

# or, without the variables assigned in advance:
num_apples <- 20
for (i in 1:((num_apples - 2) / 1)) {
  print(paste0("Loop iteration: ", i))
  print(paste0("Keep handing out apples, you still have ",
               num_apples, " left"))
  num_apples <- num_apples - 1
}
print(paste0("Stop handing out apples, because you only have ",
             num_apples, " left"))

Additional resources

For more information about control flow see: