Subroutines

What is a Subroutine?

Example in C

/* A subroutine declaration */
int play(char* filename, double frequency, int times, audio_stream* stream) {
    ...
}

/* A subroutine call */
play("happy_birthday.mid", 35, 2, &s);

Subroutine Signatures

A subroutine's signature specifies the number, order, names, modes and types of its parameters and return values. Example:

// Declaration specifies the subroutine name and its signature
sub f(p1: m1 T1, p2: m2 T2, ... pn: mn Tn) returns T0;

// Here is an example call
T0 x = f(e1, e2, ... ek);

Signature Matching

How do we know if a call matches a signature?

One very common rule is:

k=n and the type of each ei must be compatible with the corresponding Ti

but many languages have many different rules.

Parameter Association

How do we know which argument goes with which parameter? Many answers to this question!

Positional and Named Association

Arguments are usually associated with parameters positionally, but in Ada you can name the parameters in the call:

procedure P(X: Integer; Y: Float; C: Duration; B: Boolean);
...
P(X => 4, B => False, C => 4.0, Y => 35.66);
P(B => True, C => 12.0, Y => -11.4 * Z, X => 1);
P(C => 12.0, Y => -11.4, B => Found or not Odd(Q), X => 1);
P(8, 4.2, B => True, C => Stop_Time - End_Time);
...
procedure Push(On_The_Stack: Stack, The_Value: Character);
...
Push(On_The_Stack => S, The_Value => '$');
Push(The_Value => '^', On_The_Stack => T);

In some languages you can "fake" named association by passing in hash literals:

>> def f(h); puts h[:x], h[:y]; end
=> nil
>> f(:y=>3, :x=>2)
2
3
=> nil

Overloading

With overloaded subroutines the compiler has to figure out which subroutine is being called; this is called overload resolution.

How to deal with ambiguity? Is the return type considered? Are the modes? How about the number of arguments, when variadic subroutines are in the mix? Should we search in outer scopes, even when subroutines of the same name are "closer" to the call? Should ambiguities be detected at the point of call, or at the point of declaration?

Default Parameters

A kind of overloading, actually.

-- Ada
function F(X: Integer; Y: Integer := 1; Z: Integer := 0) return Integer is
begin
  return X * Y + Z;
end F;

-- F(4, 3, 9) returns 21
-- F(4, 3) returns 12
-- F(4) returns 4
// C++
int f(int x, int y = 1, int z = 0) {
    return x * y + z;
}

// f(4, 3, 9) returns 21
// f(4, 3) returns 12
// f(4) returns 4
; Common Lisp
(defun f (x &optional (y 1) (z 0)) (+ (* x y) z))

; (f 4 3 9) returns 21
; (f 4 3) returns 12
; (f 4) returns 4

You fake them in Perl because all the arguments are passed in one big list:

sub f {
    die "At least one argument required" if not @_;
    my $x = shift;
    my $y = @_ ? shift : 1;
    my $z = @_ ? shift : 0;
    return $x * $y + $z;
}

Some languages have de facto defaults -- if you don't call the subroutine with enough arguments, the "missing" parameters automatically get some kind of undefined or null value.

Exercise: For languages that do this, identify the exact default value. Write sample code to illustrate.

Variadic Subroutines

A variadic subroutine can accept an arbitrary number of parameters.

In Lisp, you make a function variadic by marking the last parameter with &rest.

(defun f (x &rest y) (* x (apply #'+ y)))

; (f 2) means x=2 and y=()
; (f 3 5) means x=3 and y=(5)
; (f 2 8 7 5 3 4) means x=2 and y=(8 7 5 3 4)

In Perl, all subroutines are variadic, since all arguments are collected into the array @_.

sub f {
    my $x = shift;
    my $result = 0;
    $result += $_ for (@_);
    return $x * $result;
}

f 2 == 0 or die;
f 3, 5 == 15 or die;
f 2, 8, 7, 5, 3, 4 == 54 or die;

In Java, you can mark the last parameter of a method with "..." to mean that it is really an array and the caller can pass in a variable number of arguments without the clunkiness of putting them in an array.

public static int f(int x, int... y) {
    int result = 0;
    for (int a : y) result += a;
    return x * result;
}

; f(2) means x=2 and y=new int[]{} and returns 0
; f(3,5) means x=3 and y=new int[]{5} and returns 15
; f(2,8,7,5,3,4) means x=2 and y=new int[]{8,7,5,3,4} and returns 54

In C, you need the stdarg.h library module and a lot of code.

#include <assert.h>
#include <stdarg.h>

/*
 * f(k, x, y1, y2, y3, ..., yk) returns the value of
 * x * sum(y1 ... yk).  Note that because this is C,
 * you have to pass in the argument count!
 */
int f(int k, int x, ...) {
    int result = 0;
    va_list ap;
    va_start(ap, x);             /* x is the param just before the extras */
    while (k--) {                /* simple count down */
        int y = va_arg(ap, int);
        result += y;             /* sum up the extras */
    }
    va_end(ap);
    return x * result;
}

int main() {
    assert( f(2, 2, 5, 9) == 28 );
    assert( f(3, 2, 1, 6, 2) == 18 );
    assert( f(0, 2) == 0 );
    assert( f(1, 2, 50) == 100 );
    assert( f(9, 4, 1, 2, 3, 2, 1, 1, 1, 1, 5) == 68 );
}

Function Returns

Some design considerations:

Parameter Passing

There are many ways parameter passing can be defined. In:

subroutine f (x) begin ... end f;
...
f (y);
  1. [Semantic] Can x be written to in the body of f, and if so, will the change be made in y, and if y does change, does the change happen right away or only after f returns?

  2. [Pragmatic] What exactly happens at the call? Is a copy of y made and passed to x? Are x and y to be aliases? Is the "address" of y passed? Is a block of code to evaluate y passed?

Common mechanisms

Scott's table

  Implementation Allowable operations
on parameter
Can argument change? Aliasing?
in any R No Maybe
out any W YesMaybe
in out any RWYesMaybe
value value RWNo No
value-resultvalue RWYesNo
sharing any RWYesYes
reference addressRWYesYes
name closureRWYesYes

Generic Subroutines

When calling a subroutine, you pass an object. What if you wanted to pass a type?

Not an issue in ML (polymorphism), Lisp (dynamic typing).

In C++ and Ada the compiler generates code for every instance of a "generic" subroutine. In C++ each actual subroutine is implicitly generated when needed:

// ---------------------------------------------------------------------------
// mintemplate.cpp
//
// This program illustrates the declaration of a template function and how
// it can be called with many different types, including user-defined ones.
// ---------------------------------------------------------------------------

#include <iostream>
using namespace std;

template <class T>
T minimum(T x, T y) {
    return x < y ? x : y;
}

class A {
    int x;
    int y;
public:
    A(int x, int y): x(x), y(y) {}
    bool operator<(A other) {return x + y < other.x + other.y;}
    friend ostream& operator<<(ostream& s, A a) {
        return s << "(" << a.x << "," << a.y << ")";
    }
};


// Show it off

int main() {
    cout << minimum(4, 5) << "\n";
    cout << minimum(3.3, 4.1) << "\n";
    cout << minimum("dog", "bat") << "\n";
    cout << minimum(A(1,3), A(6,0)) << "\n";
    cout << minimum(A(5,5), A(6,0)) << "\n";
    cout << minimum(A(9,1), A(2,1)) << "\n";
    cout << minimum(A(2,3), A(-6,0)) << "\n";
    return 0;
}

But in Ada you have to implicitly instantiate them:

------------------------------------------------------------------------------
-- generic_min_demo.adb
--
-- This program illustrates the declaration of a generic function and how
--/ it can be called with many different types, including user-defined ones.
------------------------------------------------------------------------------

with Ada.Text_IO, Ada.Integer_Text_IO, Ada.Float_Text_IO;
use Ada.Text_IO, Ada.Integer_Text_IO, Ada.Float_Text_IO;

procedure Generic_Min_Demo is

  -- Define a generic Minimum function callable on any type.
  -- The second parameter is a default parameter.  Cool, eh?

  generic
    type T is limited private;
    with function "<"(X: T; Y: T) return Boolean is <>;
  function Minimum(X: T; Y: T) return T;

  function Minimum(X: T; Y: T) return T is
  begin
    if X < Y then
      return X;
    else
      return Y;
    end if;
  end Minimum;

  -- Make a user defined type called A, with "<" and Put operations.

  type A is record
    X: Integer;
    Y: Integer;
  end record;

  function "<"(X: A; Y: A) return boolean is
  begin
      return X.X + X.Y < Y.X + Y.Y;
  end "<";

  procedure Put (X: A) is
  begin
    Put ("(");
    Put (X.X, 0);
    Put (",");
    Put (X.Y, 0);
    Put (")");
  end Put;

  -- Instantiate a few

  function Min_Integer is new Minimum(Integer);
  function Min_Float is new Minimum(Float);
  function Min_A is new Minimum(A);

  -- Show off the calls

begin
  Put(Min_Integer(4, 5), 0); New_Line;
  Put(Min_Float(3.3, 4.1), 0); New_Line;
  Put (Min_A(A'(1,3), A'(6,0))); New_Line;
  Put (Min_A(A'(5,5), A'(6,0))); New_Line;
  Put (Min_A(A'(9,1), A'(2,1))); New_Line;
  Put (Min_A(A'(2,3), A'(-6,0))); New_Line;
end Generic_Min_Demo;
Exercise: Fill the inside of the C++ template with lots of syntax errors and show that it compiles anyway. Experiment with the Ada version to see if you can do a similar dirty deed.

Here's an example of a Java generic method:

    public static <T, S extends T> boolean in(S x, T[] a) {
        for (T y : a) {
            if (y.equals(x)) return true;
        }
        return false;
    }

Subroutines as Values

In some languages, a subroutine s may be "just another object," like an integer, string or list. If so, we can ask these questions:

  1. Does s have a type and if so, what is it, and what are the type equivalence and compatibility rules?
  2. Can s be assigned to a variable?
  3. Can s be passed as an argument to a subroutine?
  4. Can s be returned from a subroutine?
  5. Can we generate a new (unnamed) subroutine on the fly, at run time?

In ML, Lisp, Scheme, Haskell and other "functional" languages, then answer to all 5 questions is definitely yes.

Other languages answer many of these questions no, but others might say the answer is technically no but allow references to subroutines everywhere. However the answer to question 5 is often no. Except in Perl:

# Some higher order functions in Perl

use strict;
use warnings;

# ----------------------------------------------------------------------------
# compose($f, $g) returns the function which is the composition
# of &$f and &$g.
# ----------------------------------------------------------------------------

sub compose {
  my ($f, $g) = @_;
  return sub { $f->($g->($_[0])); }
}

# ----------------------------------------------------------------------------
# twice($f) returns the function which applies &$f to an argument twice.
# ----------------------------------------------------------------------------

sub twice {
    return compose($_[0], $_[0]);
}

# ----------------------------------------------------------------------------
# TESTING
# ----------------------------------------------------------------------------

sub square { $_[0] * $_[0]; }
sub addSix { $_[0] + 6; }

my $fun = compose(\&square, \&addSix);

$fun->(9) == 225 or die;
twice(\&square)->(3) == 81 or die;

Closures

A closure is a subroutine that refers to variables defined in an enclosing scope.

function counterFrom(start) {
    var number = start;
    return function(increment) {
        number += increment;
        return number;
    }
}

var c1 = counterFrom(20);
var c2 = counterFrom(100);
alert(c1(2));
alert(c1(5));
alert(c2(3));
alert(c1(4));
alert(c2(1));

They are interesting when the enclosing scope itself is a function, because when the enclosing function goes out of scope, its local variables must be retained because instances of the inner function might still need them! This is both powerful, but you have to be careful not to fill up memory since a lot of stuff becomes ineligible for garbage collection.

Study:

Exercise: Rewrite Martin Fowler's Ruby examples in JavaScript and ML.

Implementation Issues

Activation Records (Stack Frames)

Each invocation of a subroutine has a record associated with it at runtime, holding some or all of the following:

Note that some of the stack frame may be in registers and some of it may be in memory.

Activation records are also called stack frames. When the runtime supports closures or concurrency, something like a spaghetti stack is required.

Calling Conventions

Calling conventions specify:

Example: C on the IA-32

The C-calling convention on the IA-32 is:

int example(int x, int y) {
  int a, b, c;
  ...
  f(y, a, b, b, x);
  ...
}

The prolog/epilog for this function would look like:

	push	ebp			; must save old ebp
	mov	ebp, esp		; point ebp to this frame
	sub	esp, ___		; make space for locals
	...
	mov	esp, ebp		; clean up locals
	pop	ebp			; restore old ebp
	ret

with stack frame

                +---------+
         ebp-12 |    a    |
                +---------+
         ebp-8  |    b    |
                +---------+
         ebp-4  |    c    |
                +---------+
         ebp    | old ebp |
                +---------+
         ebp+4  | retaddr |
                +---------+
         ebp+8  |    x    |
                +---------+
         ebp+12 |    y    |
                +---------+

Example: SPARC

The SPARC uses register windows

! ----------------------------------------------------------------------------
! writeint
!
! A small program that contains a function to write an unsigned integer value
! to standard output.
! ----------------------------------------------------------------------------

        .global	_start

        .text

! writeint(unsigned int n) writes n to standard output as text.  It is
! inefficient since it is recursive and invokes a system call to write
! each digit.
!
! It is equivalent to the C function
!
!   void writeint(unsigned n) {
!       if (n >= 10) writeint(n / 10);
!       putchar(n + '0');
!   }

writeint:
	save	%sp, -96, %sp

	mov	10, %l0			! Keep 10 in %l0
	cmp	%i0, %l0		! Is it less than 10?
	bl	writedigit              ! If so just write the digit
	nop
	mov	%g0, %y
	udiv	%i0, %l0, %l1		! n / 10 --> %l1
	umul	%l1, %l0, %l2		! n * (n/10) --> %l2
	sub	%i0, %l2, %l2		! n mod 10 --> %l2

	call	writeint
	mov	%l1, %o0		! call writeint(n / 10)

	mov	%l2, %i0		! Put digit into %i0 before writing

writedigit:				! Writes the digit in %i0
	add	48, %i0, %l1		! Character value of the digit
	set	digit, %o1
	stb	%l1, [%o1]		! Must be in memory to write
	mov	4, %g1			! System call write
	mov	1, %o0			! stdout
	mov	1, %o2			! number of characters to write
	ta	0

	ret
	restore

! A starting point for the program that just makes a couple of calls
! to the writeint function.

_start:
	call	writeint		! writeint(5)
	mov	5, %o0

	call	writeint		! writeint(1312)
	mov	1312, %o0

        sethi	%hi(7308841), %o0	! writeint(7308841)
	call	writeint
	or	%o0, %lo(7308841), %o0

	mov	1, %g1
	ta	0

	.data
	.align	4
digit:	.skip	4

Leaf Subroutines

A leaf subroutine makes no calls. A good compiler can generate code to make a leaf subroutine's activation use the stack frame of its caller. The SPARC has direct support for this.

! write_stars(n) writes n stars followed by a newline

write_stars:
	mov	%o0, %l0        ! save the argument in %l0
	mov	4, %g1          ! syscall write
	mov	1, %o0          ! stdout
	set	star, %o1       ! message to write
	mov	1, %o2          ! one character in message
write_star:
	tst	%l0             ! no more characters to write
	be	write_newline   ! write terminating newline if so
	nop
	ta	0
	ba	write_star      ! go see if more to write
	dec	%l0             ! count off one more char as written
write_newline:
	set	newline, %o1
	retl
	ta	0               ! write the newline (before returning)
star:
	.ascii	"*"
newline:
	.ascii	"\n"

Inlining

A compiler inlines a subroutine call when it replaces the call with the body of the called subroutine (with all argument-parameter bindings resolved, of course). If all calls are inlined, there is no need to generate separate code for the subroutine. Inlining should be faster in general, since there is no call/return overhead (which is substantial for short subroutines). But if there are thousands of such calls you can end up with more compiled code.

And compilers have to be careful when inlining recursive subroutines....

Non Local Lexical Variables

Given

function f(a, b) {
    var x = 1;

    function g(a, y) {
        var z = 3;

        function h(z) {
            print x;
        }

        function k(c) {
            var x = c;
            if (x < 5) k(c+1); else h(1);
        }

        if (a == 0) g(1, 0); else k(2);
    }

    g(0, 0);
}

The call sequence goes f → g → g → k → k → k → k → h. What gets printed? It depends on whether we have static or dynamic scoping.

If dynamically scoped, use a-lists or central reference tables.

If statically scoped, use static links (or displays). In this case we rely on the compiler to generate code that says:

"To find x from the body of k, take two steps down the static chain and look in offset -4".

statdynlink.png