Types

Why Types?

Type Systems

In programming language theory, a language's type system defines:

  1. A set of basic types
  2. A mechanism for defining new types
  3. Rules for type equivalence
  4. Rules for type compatibility
  5. Rules for type inference
  6. Rules for handling type clashes (a.k.a type checking)

Kinds of Types

Languages will differ as to which types are built-in, and which have to be defined using definition mechanisms. Some kinds of types are:

Type
    Data
        Simple
            Discrete
                Ordinal
                    Integer
                        Signed
                        Unsigned (Modular)
                    Character
                    Boolean
                    Other Enumeration
                Pointer
            Real
                Floating-Point
                Fixed-Point
        Composite (Aggregate)
            String
            Array
            List
            Set
            Bag
            Hierarchy
            Graph
            Dictionary
            Stack
            Queue
            Priority Queue
            Record
            Union
            Class
            File
    Computational
        Synchronous
            Procedure
            Function
            Anonymous Block of Code
        Asynchronous
            Process
            thread (Task)

Example

Java has eight built-in types: boolean, byte, char, short, int, long, float, double. All other types are defined with the array or class definition mechanism.

Type Operators

Some languages allow type "operators" to make new types. Examples in ML include "*" and list. Users can define their own:

7                                     int
(4, "dog")                            int * string
[5, 3, 2, 7]                          int list
[(3, 4.0),(1, 5.5)]                   (int * real) list
[ [], [3,3,3], [], [1] ]              int list list
[5, [2.2, 1.0E~4]]                    int * real list

datatype primary_color = Red | Blue | Green

Red                                   primary_color
(Red, 4)                              primary_color * int
fn x => x + 4                         int -> int
fn x => fn y => x ^ Int.toString(y)   string -> int -> string

Type Equivalence

Question:

When are two types the same?

Several ways to answer this. Consider:

typedef struct {
   int a;
   int b;
} Point;

typedef struct {
   int a;
   int b;
} Pair;

Point x;
Pair y;

Do x and y have the same type? We could say yes (because they have the same structure), or no (because their types have different names, and furthermore appear in different declarations).

Structural Equivalence Name Equivalence
Check equivalence by expanding structures all the way down to basic types Strict: Every type declaration defines a new type
Loose: Factor out aliases

Example (from [Scott])

type alink = pointer to cell;
subtype blink = alink;
p, q : pointer to cell;
r    : alink;
s    : blink;
t    : pointer to cell;
u    : alink;

-- If structural: [p q r s t u]

-- If strict name: [p q] [r u] [s] [t]

-- If loose name: [p q] [r s u] [t]

Loose name equivalence is pretty common: we like to distinguish things like Point and Pair above but want the flexibility of aliasing. In Ada we can do both explicitly:

type Height is new Integer;
type Weight is new Integer;
-- Can't add heights and weights

subtype Height is Integer;
subtype Weight is Integer;
-- Now you can freely mix them

ML seems to use structural equivalence for anonymous types and also for types named in a type declaration:

type pair = int * int;
type point = int * int;
val x: point = (1, 4);
val y: pair = x;
type person = {name: string, age: int};
type school = {name: string, age: int};
val p: person = {name="Alice", age=22};
val s: school = p;

that's because "type" does not define a new type! Only a datatype or abstype declaration creates a new type.

abstype person = P of string * int with
  fun new_person (s, i) = P(s, i)
  fun name (P(s, i)) = s
  fun age (P(s, i)) = i
end;

Type Compatibility

Question:

When can a value of type A be used in a context that expects type B?

Possible answers:

Example

    int a;
    real b;
    real c;
    c = a + b;

Is this

A language that allows this is said to coerce ints to reals, and we say "int is compatible with real".

Definitions

Type Conversion
Explicit operation that takes in an object of one type and returns an object of a different type that has the "same value" (not necessarily the same bit pattern).
Type Coercion
Implicit
Non-converting Type Cast
"Reinterpret" an object as an object of another type by preserving its bit pattern, regardless of value.

Type Inference

Question:

How do we determine the type of an expression?

Answer: Look at all the types of the primitive expressions, then at the types of arguments that the subroutines and operators expect, and work your way out to the whole expression.

Example (from [Scott])

1   fun fib (n) =
2       let fun fib_helper (f1, f2, i) =
3           if i = n then f2
4           else fib_helper (f2, f1+f2, i+1)
5       in
6           fib_helper (0, 1, 0)
7       end;

What kinds of things complicate type inference?

Type Variables

Here are some functions and their types

fun I x = x;                              'a -> 'a
fun p x y = y                             'a -> 'b -> 'b
fun first (x, y) = x;                     'a * 'b -> 'a
fun q x = (hd o hd) x;                    'a list list -> 'a
fun c x y = if x = y then "y" else "n";   ''a -> ''a -> string

(In ML, a type variable beginning with two primes can only be instantiated with a type that admits equality.)

Type Checking

Question:

What if you use an expression in a manner inconsistent with its type, like trying to compute the age of a string (instead of a person). Should the evaluation result in an error, or return a "best guess"?

Strong Typing vs. Weak Typing

Strong TypingWeak Typing
Type clashes among unrelated types result in errors. they may be compile time, or even run time (e.g. NoSuchMethodError, or ClassCastException) but they are errors. Type clashes don't really exist... the language implementation will try to cast the argument into some reasonable (!) type and carry out the operation.
Alternate definition: you can't really circumvent the type system. Example: In Java, even if you explicitly add a cast to cast a string to an int, you get an error. Alternate definition: you can easily circumvent the type system. Example: In C++, you can cast a pointer to an int to a void*, then to a string*, and you get away with it.

Static Typing vs. Dynamic Typing

Static TypingDynamic Typing
Type checking done at compile time

Once a variable's type is known, it can only be assigned expressions of that type (including related types that can be coerced to it, of course).

Type checking done at run time

Because checking is deferred until run-time a variable can be assigned an expression of one type and then later an expression of another type.

var x;
x = 2;
print x + 5;
x = "dog";
print concat(x, "house");

Manifest Typing vs. Implicit Typing

Manifest TypingImplicit Typing
Types of variables must be given in their declarations.
void f(int x, list<int> z) {
  int y = x * x;
  z.push_back(y);
}
the types of variables will be inferred from context
fun f x z =
  let y = x * x in z @ [y] end
x must be an int because of the "*" operator, so y must be an int as well, and then z must be an int list, and finally f must be int -> int list -> int list.

Examples

Unification in type checking

Suppose f has type 'a * int -> 'b * 'b -> string. then the following expressions are legal

    f(4, 5)("sd","de")
    f(1,1)(2,2)
    f([],5)([],[4,4])

but these are not

    f(3,2)(4,"p")
    f((3,2),5)([5.5,2.2],[1,6])

Links

the literature on strong/weak, static/dynamic, and manifest/implicit typing is enormous, as are the debates and flame wars. Some good reading:

Overview of Common Types

Numeric Types

Enumerations

Subranges

Records

Unions

Arrays

Usually by array we mean a collection of elements indexed by integers or some other enumerated type, that has constant time access of any element given its index. Contrast this with a list, which needn't have constant time access, and with a map (a.k.a dictionary or associative array), which can be indexed by anything, not just integers or related enumerated types.

Array Bounds

Lifetime and Shape

  Static Bounds Bounds Set at Elaboration Bounds Can be Changed
(Dynamic Shape)
Global
Lifetime
C globals
Local
Lifetime
C locals Ada locals
Arbitrary
Lifetime
Java Perl, APL

Implementation

Slices

Several languages (Perl, APL, Fortran 90) have nice syntax for slices. Fortran 90 allows (these examples are from Scott):

    matrix(3:6, 4:7)        columns 3-6, rows 4-7
    matrix(6:, 5)           columns 6-end, row 5
    matrix(:4, 2:8:2)       columns 1-4, every other row from 2-8
    matrix(:, /2, 5, 9/)    all columns, rows 2, 5, and 9

Strings

Sets

Pointers

A pointer is, more or less, an object through which you reference some other object.

    int x = 5;
    int* p = &x;
    int* q = NULL;
    int* r = new int;
    int* s = new int[100];

    cout << *p << *r << s[20];
    cout << *q; // crash

Basics

Reference and Dereference Syntax

Can be always explicit, always implicit, or sometimes-implicit (like in Ada).

pointer.png

If the pointer is called p (or $p in Perl), then

Inthe referent is calledand the field is called
C*p(*p).x or p->x
Adap.allp.all.x or p.x
Modula-2p^p^.x
Javapp.x
Perl%$p$$p{x} or $p->{x}

If the referent is called p (or %p in Perl), then

Inthe pointer is calledand the field is called
C&pp.x
Adap'accessp.x
Perl\%p$p{x}

Dangling Pointers

Only a problem in languages td at require/allow the programmer to explicitly deallocate.

Can there be run time checks to ensure pointers cannot outlive referenced objects?

Exercise: Research how tombstones are used.

Garbage Collection

Covered separately. For now, you can read Wikipedia's article on garbage collection, and an interesting developerworks article on Java performance and garbage collection.

Pointers and Arrays in C

this can get confusing. After all,

    e1[e2] is the same as *(e1 + e2)
For definitions these are completely different things:

    int *x;           /* is totally different from: */
    int x[100];

    int *a[n];        /* is totally different from: */
    int a[n][100];

But for declarations, at least in parameter declarations, you can blur the distinction:

    void f(int* a) { ... }
    void g(int b[]) { ... }

An array of ints, or pointer to an int, can be passed to either.

Streams

A stream

Subroutines

Subroutines are often objects with their own types. We'll see these later when we discuss subroutines.

Processes and threads

Will be covered later when we discuss concurrency

Orthogonality

Can an object of any type...

Are types themselves objects?

Issues in Orthogonality