Control Flow

Definition

Programming languages allow us to express the way that execution components (statements, expressions, and declarations) are wired together to effect a computation.

There are (at least) seven types of flow

  1. SEQUENCING
  2. SELECTION (if, unless, switch, case, ...)
  3. ITERATION (for, while, repeat, until, ...)
  4. PROCEDURAL ABSTRACTION (subroutine call)
  5. RECURSION
  6. NON-DETERMINACY
  7. CONCURRENCY

Execution Components

Languages allow a lot of flexibility in which

In some languages, for example, C++, there is a declaration statement and an expression statement.

void f() {
    cout << "hello";       // expression
    int x = 10;            // declaration
    if (x < 2) return;     // statement
}

Some languages don't have statements at all — code is just made up of expressions. Yes, they even have while-expressions, not while-statements. Example in ML:

- val x = ref 5;
> val x = ref 5 : int ref
- val y = while !x > 0 do (print(Int.toString(!x)^"\n"); x := !x - 1);
5
4
3
2
1
> val y = () : unit

... and an example in Algol68

begin
   a := if b < c then d else e;
   a := begin f(b); g(c) end;
   g(d);
   2 + 3
end             # whole expression evaluates to 5 #

Expressions

At a fundamental level, programming can be viewed as nothing more than applying operators to operands. Operators can be simple things like "+" or "<", or can be user-defined functions, or can introduce declarations, or even modify control flow.

Some examples of applying operators

Ada
F
F(X, Y, Z)
"+"(X, Y)        -- abbreviated X + Y
">"(X, Y)        -- abbreviated X > Y
C++
f()
f(x, y, z)
operator+(x, y)  // abbreviated x + y
operator>(x, y)  // abbreviated x > y
Lisp
(f)
(f x y z)
(+ x y)
(+ a b c d e)    /* Not just binary! */
(+ x (* y z))
Perl
&f
f x, y, z        # Hmm, what does this mean?
f(x), y, z
f(x, y, z)
print 1, 3, sort 4, 2    # Gaaack
Standard ML
f ()             (* f must be applied to something *)
f x y z          (* curried *)
f(x, y, z)       (* uncurried *)
op+(x, y)        (* same as x + y *)

Operator Precedence

Motivation:

Does a • b ¶ c mean ((a • b) ¶ c) or does it mean (a • (b ¶ c)) ?

Higher precedence operators are "applied first".

Language Issues

Operator Associativity

Motivation:

Does a • b • c mean ((a • b) • c) or does it mean (a • (b • c)) or are we not even allowed to write such a thing?

We speak of left-associativity, right-associativity, and non-associativity.

Observations

Operator Arity

The arity of an operator is the allowed number of operands. It can be fixed or variable. A variable arity operator is said to be variadic.

Operator Fixity

Prefix, infix, postfix, overfix, underfix, outfix, ...

Syntax of Expressions

We can encode precedence, associativity, arity, and fixity directly in the syntax, for example

    EXP   →  ID  ":="  EXP  |  EXP1
    EXP1  →  (EXP1  LOGOP)?  EXP2
    EXP2  →  EXP3  (RELOP  EXP3)?
    EXP3  →  (EXP3  ADDOP)?  EXP4
    EXP4  →  (EXP4  MULOP)?  EXP5
    EXP5  →  EXP6  (EXPOP  EXP5)?
    EXP6  →  (UNARYOP  EXP6)?  |  EXP7
    EXP7  →  LITERAL  |  ID  |  FUNCALL  |  "(" EXP ")"

This implies

OpPrecAssocArityFixity
Unary -
Unary +
!
HighestR1Prefix
** R2Infix
* / % L2Infix
+ - L2Infix
< <= == != >= > NO!2Infix
and or L2Infix
:=LowestR2Infix

Evaluation Order

Java forces a left-to-right ordering: a–f(b)–c*d means do the following, one after another:

  t0 ← f(b)
  t1 ← a - t0
  t2 ← c * d
  t3 ← t1 - t2

Most languages allow the evaluation order to be undefined so that the compiler can choose the best order it can. This is especially important for parallel architectures (multiprocessor or multicore).

For example, a–f(b)–c*d can be parallelized as

  t0 ← f(b)
  t1 ← a - t0     t2 ← c * d
  t3 ← t1 - t2

Also, a:=b[i]; c:=a*2+d*3 can be done like this:

  t0 ← b + i      t1 ← d * 3
  a ← *t0
  t2 ← a * 2
  c ← t1 + t2

Undefined ordering can lead to ambiguities or errors:

Short-Circuit Operators

The famous short-circuit logical operators are really control-flow mechanisms...

Expression Meaning Can also be written
e1 andalso e2
e1 and then e2
e1 && e2
First evaluate e1.
If this is false, the whole expression is false so don't evaluate e2 at all.
Otherwise evaluate and "return" e2.
if e1 then e2 else e1
e1 orelse e2
e1 or else e2
e1 || e2
First evaluate e1.
If this is true, the whole expression is true so don't evaluate e2 at all.
Otherwise evaluate and "return" e2.
if e1 then e1 else e2
Exercise: The languages in the C family handles these operators a bit differently from most other languages. Explain why.

Short-circuiting appears frequently in many programming idioms

Side Effects

Note that evaluation order really only matters when side effects can occur (which is why immutability rocks!). Side effects are what occurs when a storage location is updated.

Lvalues and Rvalues

Storage locations are denoted by lvalues. They are called lvalues because they can appear on the Left side of an assignment. Examples in C:

x
x[4]
y[5].p()->q
*(e1 + e2)
*y[6]

An example of an rvalue is 500 (an integer literal).

Sometimes pretty complicated looking expressions can evaluate to lvalues! Examples:

// C++, but not C
(x *= 10) += 7

# Perl
($x < 5 ? $y : $z) = 10;

# Perl
$x = "dog";
${$x} = 2;
$x = <STDIN>
chomp $x;
${$x} = 2;

(* ML *)
val x = ref 0;
x := 3;
x := !x + 1;

Sometimes, lvalues can be made read-only. Examples:

Initialization vs. Assignment

Initialization and assignment are very different.

One language that makes the distinction explicit in code is C++. Examples:

// Initializations:
int x = 10;
int y(15);
int z(a + 5 / 2);
int w(x);
Point p(5, 12);
Point q = p;

// Assignments:
x = 12;
y = x / 6;
q = p;
q = midpoint(p1, p2);

In C++, initializations are effected with constructor invocations and assignments with assignment operator applications. You can define your own constructors and assignment operators:

// ---------------------------------------------------------------------------
// SimpleLinkedList.h
//
// A generic linked list class.  (There is already a list class in the
// Standard C++ Library; this class is just an illustration of how such a
// class is written from scratch!)
// ---------------------------------------------------------------------------

#ifndef _SIMPLE_LINKED_LIST_H
#define _SIMPLE_LINKED_LIST_H

#include <stdexcept>
using namespace std;

// A list object is represented as a doubly-linked ring of nodes.  One
// header (or "dummy") node always exists, so that insertion and removal
// can be implemented consistently whether or not it is taking place
// at the middle or the ends.

template<class T>
class List {

// A private local class for the nodes that are linked together.
private:
  class Node {
  public:
    Node* previous;
    T data;
    Node* next;
    Node(Node* previous, T data, Node* next):
      previous(previous), data(data), next(next) {
    }
  };

// The list itself is just a pointer to the dummy header node.
private:
  Node* header;
  int size;

// Constructors and Destructors
public:

  // Constructs a new EMPTY list.  The empty list is represented as one
  // containing only the header.
  List(): header(new Node(0, T(), 0)), size(0) {
    header->previous = header;
    header->next = header;
  }

  // The copy constructor!
  List(const List& other): header(new Node(0, T(), 0)), size(other.size) {
    header->previous = header;
    header->next = header;
    for (Node* n = other.header->next; n != other.header; n = n->next) {
      Node* newNode = new Node(header->previous, n->data, header);
      newNode->previous->next = newNode;
      newNode->next->previous = newNode;
    }
  }

  // Deletes all the nodes.  XXX: INEFFICIENT.
  ~List() {
    destroy();
    delete header;
  }

// Behavior
public:

  // Returns the number of items in the list.
  int getSize() {
    return size;
  }

  // Returns whether the list is empty.
  bool isEmpty() {
    return size == 0;
  }

  // Adds an item so that it has position index.  The index positions start
  // at zero.  If the index argument is not in the range 0..size then an
  // out_of_range is thrown.
  void add(int index, T item) {
    addBefore(item, (index == size ? header : nodeAt(index)));
  }

  // Removes the item at position index.  Throws an out_of_range if the
  // index is not in the range 0..size-1.
  void remove(int index) {
    remove(nodeAt(index));
  }

  // Returns the index of the first occurence of the given item, or -1
  // if not found.
  int indexOf(T item) {
    int index = 0;
    for (Node* n = header->next; n != header; n = n->next) {
      if (n->data == item) {
        return index;
      }
      index++;
    }
    return -1;
  }

  // Retrieve the item at the given index.
  T get(int index) {
    return nodeAt(index)->data;
  }

  // Update the item at the given index.
  void set(int index, T item) {
    nodeAt(index)->data = item;
  }

  // Assignment Operator
  List& operator=(const List& other) {
    if (&other == this) {
      return *this;
    }
    destroy();
    size = other.size;
    for (Node* n = other.header->next; n != other.header; n = n->next) {
      Node* newNode = new Node(header->previous, n->data, header);
      newNode->previous->next = newNode;
      newNode->next->previous = newNode;
    }
    return *this;
  }

  // Returns whether this list has the same size, and values in the
  // same positions, as another.
  bool operator==(const List& other) {
    if (size != other.size) {
      return false;
    }
    Node* n1 = header->next;
    Node* n2 = other.header->next;
    while (n1 != header && n2 != other.header) {
      if (n1->data != n2->data) {
        return false;
      }
      n1 = n1->next;
      n2 = n2->next;
    }
    return true;
  }

// Helpers
private:

  // Deletes all the nodes in this list except the header.
  void destroy() {
    while (!isEmpty()) {
      remove(0);
    }
  }

  // Return a pointer to the node at the given index position.
  Node* nodeAt(int index) {
    if (index < 0 || index > size) {
      throw out_of_range("Invalid list position");
    }
    Node* n = header;
    for (int i = 0; i <= index; i++) {
      n = n->next;
    }
    return n;
  }

  // Add a new node containing the given value before the node
  // pointed to by n.
  void addBefore(T item, Node* n) {
    Node* newNode = new Node(n->previous, item, n);
    newNode->previous->next = newNode;
    newNode->next->previous = newNode;
    size++;
  }

  // Remove the node pointed to by n.
  void remove(Node* n) {
    if (n == 0 || n == header) {
      throw runtime_error("The list class is screwed up");
    }
    n->previous->next = n->next;
    n->next->previous = n->previous;
    size--;
  }

};

#endif

Prefer initialization to assignment where possible. Here's a case where you must. Suppose we had a point class in C++

class Point {
public:
    int x; int y;
    Point(int x1, int y1): x(x1), y(y1) {}
};

Because we defined a constructor with parameters, we cannot ever define uninitialized points:

Point p; // ILLEGAL

Point* p = new Point[10]; // ILLEGAL

class Rectangle {
public:
    Point corner1, corner2;
    Rectangle(Point p, Point q) {  // ILLEGAL
        corner1 = p;
        corner2 = q;
    }
};

The Rectangle constructor failed because it is trying to initialize the fields to their default values and then assign them in the constructor body.... but there is NO default initializer for class Point. You MUST write the Rectangle constructor like this:

class Rectangle {
public:
    Point corner1, corner2;
    Rectangle(Point p, Point q): corner1(p), corner2(q) {}
};

Sequencing

Sequencing is the most basic control flow... it refers to doing elaborations, evaluations, or executions one after another. Not in parallel and not out of order (unless a compiler can guarantee that such optimizations don't change the meaning).

C uses semi-colons for sequentializing statements and declarations, but uses the comma operator for expressions, for example.

x = 5;
y = x;                    /* x and y are both 5 */

x = f(a), g(b), 3;        /* x ends up as 3 */

ML uses the semi-colon for sequencing, for example.

val x = 5;
val y = x                 (* x and y now both 5 *)

val x = (f(a); g(b); 3)   (* x now 3 -- note parens required BTW *)

(* However "and" does parallel binding, NOT sequential *)
val x = y
and y = x                 (* Yes, it's a swap *)

Selection

Selection means we do one of several alternatives. Often done with an "if" statement, e.g.

// Java, C
if (e1) {
    s1
} else if (e2) {
    s2
} else if (e3) {
    s3
} else if (e4) {
    s4
} else {
    s5
}

/* For PHP, replace "else if" with "elseif" above */
/* For Perl, replace "else if" with "elsif" above */

# Ruby
if e1
  s1
elsif e2
  s2
elsif e3
  s3
elsif e4
  s4
else
  s5
end


# Python
if e1:
  s1
elif e2:
  s2
elif e3:
  s3
elif e4:
  s4
else:
  s5

# Bash
if e1; then
  s1
elif e2; then
  s2
elif e3; then
  s3
elif e4; then
  s4
else
  s5
fi
Exercise: Show the equivalent of the above in Ada, Fortran, Scala, and ML.

Some languages have things called switch or case statements. Lots of people argue about whether one should "fall through" each case or not.

// Java, C
switch (e) {
    case c1 : s1;
    case c2 : s2;    // These "fall through", so you might need
    case c3 : s3;    // ... a "break" or a "return"
    case c4 : s4:
    default : s5;
}

-- Ada
case e is
    when c1 => s1;
    when c2 => s2;
    when c3 => s3;
    when c4 => s4;
    when others => s5;
end case;

Ada even checks to make sure at compile time that the cases are exhaustive! (ML does this too.)

Exercise: Look up the equivalents in lots of other languages. Find out whether they fall through or not, or whether a compiler can check for exhaustive matches.

Perl and Ruby have an unless statement

#Perl
unless ($x >= 0) {   # Braces required in Perl for compound statements
    die "must be non-negative";
}

Perl and Ruby allow if and unless to be used as modifiers for simple statements — allowing you to dispense with the braces or "end"s

die "must be non-negative" unless $x >= 0;
die "must be non-negative" if $x < 0;        # Same as above

Finally short-cirucit operators can be used for selection too...

chdir $dir or die "Can't change to $dir: $!";
open F, $file or die "Can't open $file: $!";
@lines = <F> or die "$file may be empty";
close F or die "Can't close $file: $!";

This is appealing code because one's eye can scan the "normal" flow of control down the left margin.

Exercise: Write the above using (ugly) if-statements.

Iteration

Iteration is executing code repeatedly, either for each value in an enumeration, or while some condition holds.

Logically Controlled Loops

These are of the "while" or "until" variety and come in many forms:

while c do s                         ML, Pascal
WHILE c DO s END                     Modula-2, Modula-3
while C do S end loop                Ada
while (c) s                          Java, C, JavaScript (braces optional for one stmt)
while (c) {s}                        Perl (braces required)
while (c) {s} continue {s}           Perl
e while c                            Perl, Ruby
while e                              |
  c                                  +---- Ruby
end                                  |

repeat s until c                     Pascal
REPEAT s UNTIL c                     Modula-2, Modula-3
until (c) {s}                        Perl
until (c) {s} continue {s}           Perl
e until c                            Perl, Ruby
until e                              |
  c                                  +---- Ruby
end                                  |
do s while (c)                       C, C++, Java

Enumeration Controlled Loops

These execute code "for" each item in a collection. Some examples in Perl:

my @pets = ("dog", "rat", "fish", "rabbit");
for my $p (@pets) {...}
for my $i (1..10) {...}
for my $count (10,9,8,7,6,5,4,3,2,1,"liftoff") {...}
for my $count (reverse "liftoff", 1..10) {...}
for my $key (sort keys %h) {...}
for my $field (split /:/, $data) {...}
for (1..10) {print $_;}
for (1..10) {print}
print for (1..10)
In Java
Collection<Pet> pets = ........
...
for (Pet p: pets) {        # Works with collections and arrays
    p.eat();               # ... in fact, with any object of
    p.sleep();             # ... of a class implementing Iterable
    ...
}

Ruby has a for-statement, too, but the usual way to iterate is with a cool method that repeatedly yields to its block.

pets = ["dog", "rat", "fish", "rabbit"]

for p in pets          # Okay, but less idiomatic
  puts p.upcase
end

pets.each {|p| puts p.upcase}   # preferred

Sometimes you will want to (or need to) iterate over numbers that index a collection (i.e., you want to find the index of something, or your language doesn't let you iterate over the items directly!)

#Perl
for my $i (0..$#pets) {return $i if $pets[i] eq "rat";}

// JavaScript -- no choice!
for (var i = 0; i < a.length; i++) {
    ... some code with a[i] ...
}

-- Ada
for I in 1..10 loop ........ end loop;
for I in A'First .. A'Last loop ........ end loop;
for I in A'Range loop ........ end loop;

/* C */
for (int* p = a; p; p = p ->next) ...

Loop control modifiers

You can

Exercise: Fill out this list for many more languages.

Examples

# Perl
LINE: while (<STDIN>) {
    next LINE if /^#/;
    next LINE if /^$/;
    ... process $_ ....
} continue {
    count++;
}

Things to note:

Combination Loops

Algol 60 has this:

for i := 1, i+2, while i<10 do ...

Iterators

An iterator is an object that keeps track of where you are during an interation. If using explicit iterators, watch out for "concurrent modification exceptions".

Collection<Pet> pets = ........
...
for (Iterator<Pet> i = pets.iterator(); i.hasNext();) {
    Pet p = i.next();
    p.eat();
    p.sleep();
    ...   # don't modify pets while there is more iteration to do!
}
Exercise: Write a paper, with code snippets, about iterator objects in C++, Python, and Ruby.

Recursion

A recursive subroutines calls itself. Recursion is

Recursive code is

What could go wrong?

(* THIS IS A REALLY STUPID USE OF RECURSION *)
fun fact n = if n < 0 then 1 else n * fact(n-1)

evaluates as

    fact 4
    = 4 * fact 3
    = 4 * (3 * fact 2)
    = 4 * (3 * (2 * fact 1))
    = 4 * (3 * (2 * (1 * fact 0)))
    = 4 * (3 * (2 * (1 * 1)))
    = 4 * (3 * (2 * 1))
    = 4 * 3 * 2
    = 4 * 6
    = 24

Here, all these pieces of partially complete computations take up a lot of space, hurting performance.

Tail Recursion

Tail recursive code can always be implemented efficiently. Contrast that last factorial implementation with this one:

(* Tail Recursive *)
fun gcd x y = if y = 0 then x else gcd y (x mod y)

This evaluation is much cleaner:

    gcd 444 93
    = gcd 93 72
    = gcd 72 21
    = gcd 21 9
    = gcd 9 3
    = gcd 3 0
    = 3

A tail recursive function returns exactly the result of calling itself — no partial results need to be stored. A good compiler can recognize tail recursion and generate code that has no calls at all: just reload the parameters and jump back to the top. It's as if you wrote:

def gcd(x, y)
  while true
    return x if y == 0
    x, y = y, x % y
  end
end

though the tail-recursive code is cleaner, and better because it has no side effects!.

In functional programming the programmer will sometimes have to turn a non-tail-recursive function into a tail recursive one. The idea is to "pass along" arguments (like a counter or partial result) "into" the next call. So factorial can be written

fun fact n =
  let
    fun f i a = if i = n then a else f (i+1) ((i+1)*a)
  in
    f 0 1
  end

which is evaluated like so

    fact 5
    = f 0 1
    = f 1 1
    = f 2 2
    = f 3 6
    = f 4 24
    = f 5 120
    = 120

and Fibonacci like this

fun fib n =
  let fun f i last current =
    if i = n then current else f (i+1) current (last+current)
  in
    f 0 0 1
  end

This yields

    fib 7
    = f 0 0 1
    = f 1 1 1
    = f 2 1 2
    = f 3 2 3
    = f 4 3 5
    = f 5 5 8
    = f 6 8 13
    = f 7 13 21
    = 21

This sure beats the naive implementation, which requires 535,821,591 calls to compute fib(40).

Exercise: Oops, these functions don't handle an initial negative n. Fix them.

Non-Determinacy

Non-deterministic control flow occurs when the next computation step is made randomly (not arbitrarily) from a set of alternatives, something like:

select
    x := 4;
or
    y := 6;
or
    print "Hello";
end;

Generally each of the arms are guarded. To execute a select statement, first the guards are evaluated, then a choice is made among the open alternatives (those with true guards). A missing guard is assumed to be true. Example

select when x > y =>
    y := x * x - 3;
or when not found =>
    print x;
or
    close(f)
or when x % y >= 25 || finished =>
    crash();
end;

What if all guards are false? In Ada, this raises an exception. In other langauges, the statement simply has no effect.

Non-deterministic control flow can hide some asymmetries in code. Some examples:

select when (a >= b) =>
    max := a
or when (a <= b) =>
    max := b
end;
loop
    select when a > b =>
        a := a - b;
    or when b > a =>
        b := b - a;
    or when a = b =>
        return a;
    end;
end;

Non-determinisitic constructs turn out to be very useful in concurrent programming, because random execution paths sometimes help to avoid deadlock.