Names

Why Study Naming?

One of the most important influences on a programming language's usability and readability is its use of names.

What can be named?

Types, variables, constants, labels. blocks, subroutines, fields, methods, packages, modules, classes, operators, threads, tasks, parameters, ...

What can names look like?

Terminology

Binding
The attachment of a name to a entity
Binding Time
The instant a binding is applied
Binding Lifetime
The duration in which the binding is active (Note: the lifetime of the binding can be different from the lifetime of the entity to which the name is bound.)
Scope
The textual area of the code in which the binding is usable
Referencing Environment
The current set of bindings in effect at a given point
Polymorphism
The same name being bound to different entities at the same time
Aliasing
Multiple names being bound to the same entity at the same time

Object Lifetime vs. Binding Lifetime

A Typical Scenario

Typically we see:

  1. Object is created
  2. Binding is created
  3. Over and over again...
    1. The object is accessed through the binding
    2. Binding gets deactivated
    3. Binding gets reactivated
  4. Binding is destroyed
  5. Object is destroyed

Example

int a;                 // 1. Object is created
                       // 2. Name 'a' is bound to it

int main() {
    a = 2;             // 3. Object is accessed through bound name
    f();
    printf("%d\n", a);
    return 0;
}

void f() {
    int a;             // 4. The binding of name 'a' to the global
                       //    object is deactivated.
    ...
}                      // 5. On exit, binding of 'a' to global object
                       //    is reactivated

                       // 6. After main() exits, the binding is destroyed
                       // 7. Finally, the object is destroyed

Objects Outliving Bindings

This happens a lot, and is normal

void f(int& x) {x = 5;}
.
.
.
int y;
f(y);
cout << y;

Bindings Outliving Objects

This is usually a bug...

int* a = malloc(sizeof(int));
*a = 2;
free(a);
*a = 3;  // NOOOOOOO!!!

Object Allocation

There are three kinds of objects

Allocation Type Description Examples Notes
Static Allocated at fixed location in memory Global variables
String and floating point constants
 
Stack (a.k.a. Auto) Allocated and deallocated with subroutine calls and returns Locals
Temporaries
"LIFO"
Heap Dynamically allocated and deallocated Things made with 'new' or 'malloc' Allocation can be implicit or explicit
Deallocation can be implicit or explicit
Watch out for memory leaks and dangling references

An example in C++:

int x = 5;                      // Static

void f() {
    static float y = 0;         // Static
    int z;                      // Stack
    y++;
    print("abcdef");            // Literal probably static
}                               // z deallocated here

void g() {
    int* p = new int;           // p on stack, *p on heap
    .
    .
    .
    delete p;                   // explicit deallocation of *p
    int* q = new int[10];       // q on stack, *q on heap
    .
    .
    .
    h(q);
    .
    .
    .
    delete[] q;
}

void h(int* a) {                // a on stack, *a don't care!
    .
    .
    .
    *a = 5;
    a[6] = 3;
    a = new int;                // doesn't affect old object
}                               // a goes away here, but not *a

int main() {
    f(); f(); g();
}

Exercise: Draw pictures that show what's going on in this example.

Scope

The scope of a binding is the region in a program in which the binding is active.

Static Scope

Static scoping generally goes with block-structure

                    +-----------------------------------------------+
       procedure P  |  (X: Integer) is                              |
     +--------------+                                               |
     |    A, B: Integer;                                            |
     |                +------------------------------------------+  |
     |    procedure Q | (Y: Integer) is                          |  |
     |  +-------------+  +------------------------------------+  |  |
     |  |    function R  |  (X, Y: Integer) return Integer is |  |  |
     |  |  +-------------+    +---------------+               |  |  |
     |  |  |    procedure T   |  is           |               |  |  |
     |  |  |  +---------------+               |               |  |  |
     |  |  |  |    T: Integer;                |               |  |  |
     |  |  |  | begin                         |               |  |  |
     |  |  |  |    ...                        |               |  |  |
     |  |  |  | end T;                        |               |  |  |
     |  |  |  +-------------------------------+               |  |  |
     |  |  |    W: Float;                                     |  |  |
     |  |  | begin                                            |  |  |
     |  |  |    ...                                           |  |  |
     |  |  | end R;                                           |  |  |
     |  |  +--------------------------------------------------+  |  |
     |  |                 +-----------------------------------+  |  |
     |  |    procedure S  |  (Y: Integer) is                  |  |  |
     |  |  +--------------+                                   |  |  |
     |  |  | begin                                            |  |  |
     |  |  |   ...                                            |  |  |
     |  |  | end S;                                           |  |  |
     |  |  +--------------------------------------------------+  |  |
     |  | begin                                                  |  |
     |  |    ...                                                 |  |
     |  |    declare X: Integer; begin ... end;                  |  |
     |  |    ...                                                 |  |
     |  | end Q;                                                 |  |
     |  +--------------------------------------------------------+  |
     |    M: Integer;                                               |
     | begin                                                        |
     |    ...                                                       |
     | end P;                                                       |
     +--------------------------------------------------------------+

The general rule is that bindings are determined by finding the "innermost" declaration.

Interestingly, not everything is obvious:

Static scoping is implemented with static chains or displays. Details in the Compiler Construction course.

Modules

Modules affect static scoping in interesting ways. Wait — what are modules? Modules are structures that

In practice, modules are often used to define data abstractions using information hiding to safeguard its internal state.

Modules are more powerful than the static local variables of C, since modules spread state among several subroutines, not just one.

module Dictionary {
    import capacity, String;
    export put, get, getSize;
    struct Entry {String key; String value;}
    Entry[] entries = new Entry[capacity];
    int size = 0;
    void put(String key, String value) {...}
    String get(String key) {...}
    int getSize() {return size;}
}

// Clients will say things like
//     Dictionary.put("mega", "either 1000000 or 1048576");
//     print(Dictionary.getSize());

In this module

So this module gives us only one dictionary. How can we get more? Answer: export a dictionary type.

module DictionaryModule {
    import String;
    export Dictionary, put, get, getSize;
    struct Entry {String key; String value;}
    struct Dictionary(int capacity) {
        Entry[] entries = new Entry[capacity];
        int size = 0;
    }
    void put(Dictionary d, String key, String value) {...}
    String get(Dictionary, String key) {...}
    int getSize(Dictionary d) {return d.size;}
}

// Clients will say things like
//     DictionaryModule.Dictionary d1, d2, d3;
//     Dictionary.put(d2, "mega", "either 1000000 or 1048576");
//     print(getSize(d3));

This is the module-as-manager style: you compile the module only once, but you can declare many variables of the exported type.

The next technical advance is to make the module itself a type, rather than just a plain encapsulation device. This is the module-as-type style.

module Dictionary {
    import ...
    export ...
    ...
}

// Clients will say things like
//     Dictionary d1, d2, d3;
//     d2.put("mega", "either 1000000 or 1048576");
//     print(d3.getSize());

Some people define a class as a module type that can participate in inheritance relationships.

Open and Closed Scopes

Nested scopes introduce even more questions. Consider

subroutine f {
    int x, y, z;

    subroutine g {
        int a, b, c;
        ...
    }

    module m {
        import x;
        export j;
        int i, j, k;
        subroutine h {print k;}
        ...
    }

    ....
}

In g, are x, y, and z automatically imported or not? Are a, b, and c automatically exported or not? What about in m?

The usual rule is that nothing is ever automatically exported from an inner scope, but for imports the rule is different:

How some languages do things:

LanguageSubroutinesModules
Modula optionally closed closed
Modula 2, 3 open closed
Ada open open
Euclid closed closed
Turing optionally closed ?
Clu closed closed
Exercise: Investigate and write an article on the scoping issues involved in:

Dynamic Scope

With dynamic scoping, the current binding for a name is the one most recently encountered during execution. The following script prints 2, not 1:

our $x = 1;

sub f{
    print "$x\n";
}

sub g {
    local $x = 2;
    f();
}

g();
print "$x\n";

dynamicscope.png

Dynamic scopes tend to require lots of runtime work: type checking, binding resolution, argument checking, etc.

They are also prone to redeclaration problems.

Exercise: Explain this.

The principal argument in favor of dynamic scoping is the way they allow subroutines to be customized by using an "environment variable" style that allows selective overriding of a default (more or less). Scott's example:

our $print_base = 10;

sub print_integer {
    ...
    # some code using $print_base
    ...
}

# So generally we want to use 10, but say at one point
# we were in a hex mood.  We wrap the call in a block like this:
{
    local $print_base = 16;
    print_integer($n);
}
# At the end of the block the old value is essentially restored
Exercise: Show how default parameters make this argument seem not so good.

Managing Names

In a statically scoped language, bindings are kept in a symbol table at compile time. You a free to throw away the names at runtime, unless you want to run the code under a good debugger.

In a dynamically scoped language, you need to keep bindings at run time. The common approaches are to use association lists or a central reference table.

Here's an example we'll use to illustrate each approach.

var i;
var j;
function p(var i) { ... }
function q(var j) {var x; ... p(j); ...}
q();

Let's say main calls q, then q calls p.

Association Lists

The A-list works like this:

alist.png

Generally speaking, scope entry and exit is fast, lookup is slow.

Exercise: Really? What exactly is the entry/exit time proportional to? What exactly is the lookup time proportional to?

Central Reference Table

The central reference table works like this:

crtable.png

Generally speaking, scope entry and exit is slow, lookup is blazingly fast (assuming your hash function is cheap).

Exercise: Really? What exactly is the entry/exit time proportional to?

Finding the Referencing Environment

The referencing environment is the set of bindings active at a given point in time. While this seems easy to figure out, it's complicated by two things: masking and the passing of subroutines.

Masking Declarations

Some languages let you mask earlier declarations in the same scope:

    # Perl                          (* ML *)

    my $x = 3;                      val x = 3;
    sub f {return $x + 5;}          fun f() = x + 5;
    my $x = 2;                      val x = 2;
    print f(), "\n";                f();

Both of these programs print 8:

mask.png

Exercise: Find a language in which the direct translation of the code about would cause 7 to be printed. Write the program.

Subroutines as Arguments

At first glance, passing subroutines as arguments doesn't appear to raise any scoping problems:

fun twice f x = f(f(x));
fun square x = x * x;
fun addSix x = x + 6;
twice square 5;
twice addSix 8;
twice (fn x => x / 2.0) 1.0;
twice (fn x => x ^ "ee");
typedef int fun(int);
int square(int x) {return x * x;}
int addSix(int x) {return x + 6;}
int twice(fun f, int x) {return f(f(x));}
int main() {
    printf("%d\n", twice(square, 5));
    printf("%d\n", twice(addSix, 8));
    return 0;
}
sub twice {
    my ($f, $x) = @_;
    return &$f(&$f(x));   # or $f->$f->(x)
}
sub square {$_[0] * $_[0];}
sub addSix {$_[0] + 6;}
print twice(\&square, 5), "\n";
print twice(\&addSix, 8), "\n";
function twice (f, x) {return f(f(x));}
function square(x) {return x * x;}
function addSix(x) {return x + 6;}
twice(square, 5);
twice(addSix, 8);
with Ada.Text_IO, Ada.Integer_Text_IO;
use Ada.Text_IO, Ada.Integer_Text_IO;

procedure Twice_Demo is
  type Fun is access function(X: Integer) return Integer;

  function Twice(F: Fun; X: Integer) return Integer is
  begin
    return F.all(F.all(X));
  end Twice;

  function Square(X: Integer) return Integer is
  begin
    return X * X;
  end Square;

  function Add_Six(X: Integer) return Integer is
  begin
    return X + 6;
  end Add_Six;

begin
  Put(Twice(Square'Access, 5));
  New_Line;
  Put(Twice(Add_Six'Access, 8));
  New_Line;
end Twice_Demo;

However, what happens when you pass a subroutine g, that is nested within another subroutine f, in which g makes reference to entities declared within f?

Deep and Shallow Binding

Consider the following code (from Scott):

function a(i, p) {
    function b() {document.write(i);}
    if (i > 1) p(); else a(2, b);
}
function c() {}
a(1, c);

Let's trace the execution....

deepshallow.png

The function in blue is defined in the context of the first activation of a in which i is 1, but is called from a context in which i is 2. Some languages will be defined to use one or the other. We say the language uses:

Try the same example in different languages...

sub a {
    my ($i, $p) = @_;
    sub b {print "$i\n";}
    if ($i > 1) {
        &$p();
    } else {
        &a(2, \&b);
    }
}
sub c {}
&a(1, \&c);
fun a(i: int, p: unit->unit) =
    let
        fun b() = print(if i=1 then "Deep" else "Shallow")
    in
        if i > 1 then p() else a(2, b)
    end;
fun c() = ();
a(1,c);
Exercise: Write this example in Lisp, Python, and Ada.
Exercise: Get the Perl example to work using "local" instead of "my" for $i and $p. Did you notice a change from deep to shallow?

Dangling References

In this crazy example, a reference to a subroutine outlives the execution of the scope in which it was declared:

var x = 5;
function a(i) {
    var y = 3;
    function b() {document.write(i + y + x);}
    y = 10;
    i = 2;
    return b;
}
p = a(4);
p();

But it's okay, since JavaScript, like Perl and ML, retain call frames (I guess we can't call them stack frames!) that are still being hung onto by active subroutines. The more static languages would never let you get away with this. How do they prevent it?

Exercise: Rewrite the crazy example in Perl and ML and check the output value against the value written by the JavaScript fragment. Then write the equivalent code in Ada, and make a note of the compile-time error.