Introduction to ML

Overview

ML is really the name for a family of languages; the two important ones are Standard ML and Objective CAML (a.k.a O'Caml or ocaml). These notes only cover Standard ML (the latest version, SML 97).

Links

Features of Standard ML

You will love ML because it:

More reasons: Read this overview of features.

A First Example

The traditional "Hello, world" program:

print "Hello, World\n";

Running Standard ML Programs

There are many implementations of Standard ML, including Moscow ML, Standard ML of New Jersey, MLton, Poly/ML, SML.NET, MLJ and ML Kit. These notes only cover Moscow ML.

You can "run" the code by dropping it into an interactive shell, or by using a compiler to produce executables.

The Interactive Shell

Just type your declarations and expressions into the interactive shell and the system will respond. Here is an example session. (Assume the file hello.sml contains the code above.)

C:\lmu\web\code\ML>mosml
Moscow ML version 2.00 (June 2000)
Enter `quit();' to quit.
- 4;
> val it = 4 : int
- "hello " ^ " there";
> val it = "hello  there" : string
- fun fact n = if n = 0 then 1 else n * fact(n - 1);
> val fact = fn : int -> int
- fact 5;
> val it = 120 : int
- val x = 12 and y = ~5;
> val x = 12 : int
  val y = ~5 : int
- x - y * 2;
> val it = 22 : int
- use "hello.sml";
[opening file "hello.sml"]
Hello, World
> val it = () : unit
[closing file "hello.sml"]
> val it = () : unit
- length [4, 3, 8, 7, 3];
> val it = 5 : int
- quit();

Note how nice the system is.... If you don't specify types, it figures them out for you (as long as what you wrote makes sense). If you type in an expression directly, it turns it into a declaration binding the result to the special name it.

Compiling

The Moscow ML compiler is called mosmlc. It won't compile our "Hello, World" program because it doesn't like standalone expressions: it requires declarations:

val _ = print "Hello, World\n";

Then on Windows you can enter

    mosmlc helloworld.sml -o helloworld.exe

and on Unix you can enter

    mosmlc -standalone helloworld.sml -o helloworld
and you have your executable.

Here's a slightly more interesting "program"

(*
 * reverseargs.sml
 *
 * A little program that writes the command line arguments
 * in reverse, to standard output.  It is longer than necessary
 * because it is a teaching example.
 *)
 
(* Returns the reversal of a list.  Warning: wheel reinvention. *)

fun reverse [] = []
  | reverse (h :: t) = reverse t @ [h];

fun concat_space s = s ^ " ";

(* Prints each command line arg, suffixed with a space. *)

val _ = 
  let 
    val args = CommandLine.arguments()
  in 
    map (print o concat_space) (reverse args);
    print "\n"
  end;

Program Structure

A "program" is a sequence of declarations. Declarations begin with:

  • val         declares a new value binding
  • fun         declares a new function
  • type        introduces a type abbreviation
  • datatype    defines a new type with visible constructors
  • abstype     defines a new type with hidden constructors
  • exception   defines a new exception constructor
  • signature   defines a new signature
  • structure   defines a new structure
  • functor     defines a new function
  • infix       gives infix, left-associative status to an operator
  • infixr      gives infix, right-associative status to an operator
  • nonfix      declares an operator to be prefix
  • local       makes a declaration that uses private local declarations
  • open        allows all items in a structure to be used without qualification

Most program entities (types, exceptions, values, etc.) are defined in structures. For example, the structure Timer contains the type cpu_timer and the functions startCPUTimer and checkCPUTimer, among other things, so you need to write Timer.cpu_timer and Timer.checkCPUTimer, etc. But entities in the package General (which contains types like int, string, and real and functions like <=, not, and mod do not have to be qualified with the package name.

(*
 * timerExample.sml
 *
 * Trivial example of timing.
 *)
 
fun count(n) =
  if (n > 0) then count(n - 1) else ();
  
val _ =
   let
     val t = Timer.startCPUTimer()
   in
     count(10000000);
     print (Time.toString(#usr(Timer.checkCPUTimer(t))) ^ "\n")
   end;

Lexical Matters

Reserved Words

  abstype and       andalso   as     case  do       datatype   else
  end     eqtype    exception fn     fun   functor  handle     if
  in      include   infix     infixr let   local    nonfix     of
  op      open      orelse    raise  rec   sharing  sig        signature
  struct  structure then      type   val   where    with       withtype
  while   (         )         [      ]     {        }          ,
  :       :>        ;         ...    _     |        =          =>
  ->      #

Identifiers

Are either

But reserved words are excluded. And notice no mixing of alphanumeric and symbolic characters! You can't have the identifier |-o-|, though you can have |-*-|.

Comments

Between (* and *) and they can nest!

Types

Here are some of the types in Standard ML

Name Defined in Examples
unit General ()
bool Bool, General true
false
int General, Int 0
~4
999999
0xfa33bc
~0x1fcf5
word General, Word, Word8 0w0
0w4
0wx4ccab
real General, Real, Math 0.7
~0.7
3.32E5
3E~7
~3E~7
char Char, General #"s"
string General, String "hello"
α -> β the type of functions taking an argument of type α and returning a value of type β fn x => x + 1
fn x => fn y => 1 - x / y
record types a built-in type former {name="Masha", age=25}
{re=6.2, im=~3.0}
{1=4, 2="d", 3=false}
α * β abbreviation for records whose fields are numbers starting with 1 (4, "d", false)
α list General, List nil
4::3::12::5::nil
[4, 3, 12, 5]
[ ]
α option General, Option SOME 2
NONE
exn General
(This is a special datatype because new constructors can be added at runtime)
Div
Fail "dunno"
α ref General ref 2
α frag General QUOTE "hello"
ANTIQUOTE 9

Defining your own types

The type declaration does not make a new type, just a type abbreviation.

type intpair = int * int;
type person = {name: string, birthday: date, weight: real};
type simplefun = real -> real;
type text = string;
type intset = bool list;
type coin_flipping_results = bool list;

Here coin_flipping_results and intset are exactly the same type.

You need to use datatype or abstype to create a new type, distinct from any other. Datatypes are defined with constructors.

datatype weekday = Monday | Tuesday | Wednesday | Thursday | Friday;

datatype file_descriptor = Stdin | Stdout | Stderr | Other of int;

datatype 'a tree = Empty
                 | Node of 'a * 'a tree * 'a tree;

datatype Shape = Circle of {radius: real}
               | Rectangle of {height: real, width: real}
               | Polygon of (real * real) list;

abstype 'a stack = Data of 'a list
with
    exception Underflow;
    val newStack = Data []
    fun push (Data s) x = Data (x::s)
    fun pop (Data []) = raise Underflow
      | pop (Data (_::t)) = Data t
    fun top (Data []) = raise Underflow
      | top (Data (x::t)) = x
    fun listToStack s = Data s
    fun stackToList (Data s) = s
end;

The abstract type is just a datatype that hides its constructors. But its probably better to use structures anyway.

Functions

Functions are just regular values who happen to have a type like 'a -> 'b. The syntax of a function is:

    fn pattern => exp ('|' pattern => exp)*

Function calls are straighforward:

(fn x => x + 6) 20;
(fn [] => false | (h::t) => true) [4, 5, ~3];
(fn _ => 0) (4, "dog", [4.5, 1.1], (1,1));
(fn 5 => 0 | x => x) 5;
(fn 5 => 0 | x => x) 12;

Note that each match is tried in order. Order matters, so be careful.

Defining Functions

Because functions are values, you declare them just like any other value. You can take advantage of the fact that each declaration can use declarations that precede it.

val two = 2;
val addSix = fn x => x + two + 4;
val square = fn x:real => x * x;
val magnitude = fn (x,y) => Math.sqrt(square x + square y);
val rec fact = fn n => if n = 0 then 1 else n * fact (n-1);

You can use fun for convenience (it's basically val rec):

fun addSix x = x + 6;
fun square (x:real) =  x * x;
fun magnitude (x,y) = Math.sqrt(square x + square y);
fun fact n = if n = 0 then 1 else n * fact (n-1);

You can use multiple patterns with fun too; it sure makes writing recursive functions easy. And pattern-based definitions are much preferred over if-expressions.

fun fact 0 = n
  | fact n = n * fact(n - 1);

fun pow (0, _) = 1.0
  | pow (n, x) = x * pow(n - 1, x);

fun size Empty = 0
  | size (Node (x, left, right)) = 1 + size left + size right;

fun preorder Empty = []
| preorder (Node (x, left, right)) = [x] @ preorder left @ preorder right;

Currying and Uncurrying

The functions fn (x,y) => x + y and fn x => fn y => x + y are similar. The former is the uncurried version; the latter is curried. Curried functions allow "partial application":

fun add x y = x + y;
fun addSix = add 6;
add 6 5;                    (* returns 11 *)
addSix 5;                   (* also returns 11 *)

Higher Order Functions

A higher order function is one that takes a function as a parameter and/or returns a function as its result. The curried plus function above is a higher order function, because it takes in an integer and returns a function.

Operators

Operators are really just functions that are given precedence and fixity in a infix, infixr or nonfix directive. These are some of the infix operators defined in the standard library. (Note that many of the built-in operators are overloaded, so the specification has "fake types" — wordint means int, word or word8; num means wordint or real; numtxt means num, char or string.)

Precedence Id Type Description Associativity
7 * num * num -> num product L
/ real * real -> real floating-point quotient L
div wordint * wordint -> wordint quotient (rounded toward -infinity) L
mod wordint * wordint -> wordint remainder (of div) L
6 + num * num -> num sum L
- num * num -> num difference L
^ string * string -> string concatenation L
5 :: 'a * 'a list -> 'a list push to head of list R
@ 'a list * 'a list -> 'a list append lists R
4 = 'a * 'a -> bool equal to L
<> 'a * 'a -> bool not equal to L
< numtxt * numtxt -> bool less than L
<= numtxt * numtxt -> bool less than or equal to L
> numtxt * numtxt -> bool greater than L
>= numtxt * numtxt -> bool greater than or equal to L
3 := 'a ref * 'a -> unit assignment L
o ('b*'c) * ('a*'b) -> ('a*'c) composition L
0 before 'a * 'b -> 'a return first argument L

And you can define your own:

fun |-*-| (x, y) = 2 * x + 3 * y;
infix 5 |-*-|;
10 |-*-| 4;                            (* returns 32 *)
1 |-*-| 2 |-*-| 3;                     (* returns 25 *)
1 |-*-| (2 |-*-| 3);                   (* returns 41 *)
5 + 3 |-*-| 6 before 8;                (* returns 34 *)

You can abuse the "built-in" operators:

- infix 2 *;
> infix 2 *
- 4 + 8 * 1 + 4;
> val it = 60 : int
- val op+ = op-;
> val + = fn : int * int -> int
- 5 + 2;
> val it = 3 : int

Modules

Remember, useful programming languages provide modules for

Structures

structure Set =
struct
    datatype 'a set = Items of 'a list;
    val empty = Items [];
    fun add (Items xs) x =
        if List.exists (fn y => y = x) xs then Items xs else Items(x::xs);
    fun remove (Items xs) x =
        Items (List.filter (fn y => y <> x) xs);
    fun cardinality (Items xs) = length xs;
    fun contains (Items xs, x) = List.exists (fn y => y = x) xs;
end;

Example of using the structure

val s = Set.add Set.empty 7;
val s = Set.add s 8;
val s = Set.remove s 2;
Set.cardinality s;
Set.contains (s, 7);
Set.Items [8,8,4];              (**** uggh ****)

Signatures

The previous structure exposes too much information. We can use a signature to restrict the view. We should write:

signature SET =
sig
    type 'a set;
    val empty: 'a set;
    val add: ''a set -> ''a -> ''a set;
    val remove: ''a set -> ''a -> ''a set;
    val cardinality: 'a set -> int;
    val contains: ''a set * ''a -> bool;
end;

Then we say

structure Set :> SET =
struct
    datatype 'a set = Items of 'a list;
    val empty = Items [];
    fun add (Items xs) x =
        if List.exists (fn y => y = x) xs then Items xs else Items(x::xs);
    fun remove (Items xs) x =
        Items (List.filter (fn y => y <> x) xs);
    fun cardinality (Items xs) = length xs;
    fun contains (Items xs, x) = List.exists (fn y => y = x) xs;
end;

Now Set.Items doesn't exist! We can now even say

signature READ_ONLY_SET =
sig
    type 'a set;
    val empty: 'a set;
    val cardinality: 'a set -> int;
    val contains: ''a set * ''a -> bool;
end;

structure ReadOnlySet :> READ_ONLY_SET = Set;

now ReadOnlySet.empty exists, but ReadOnlySet.remove does not.

Functors

If we change the set representation from a list to a binary search tree, we would have to specify how the set elements were to be ordered. This "less than" relation is an input to the structure. The way to parameterize structures in SML is with functors.

functor MakeSet (eqtype t; val lt: t * t -> bool):
    sig
        type ''a set
        val empty: t set
        val add: t set -> t -> t set
        val remove: t set -> t -> t set
        val cardinality: t set -> int
        val contains: t set * t -> bool
    end =
struct
    datatype ''a set = Empty | Node of ''a * ''a set * ''a set
    val empty = Empty;
    fun add Empty x = Node (x, Empty, Empty)
      | add (s as Node(y, left, right)) x =
          if (x = y) then s
          else if lt(x, y) then Node(y, add left x, right)
          else Node(y, left, add right x);
    fun remove s x = raise (Fail "remove not implemented");
    fun cardinality Empty = 0
      | cardinality (Node(_, left, right)) =
          1 + cardinality left + cardinality right;
    fun contains (Empty, x) = false
      | contains (Node(y, left, right), x) =
          if lt(x, y) then contains(left, x)
          else if lt(y,x) then contains(right, x)
          else true;
end;

Then

structure IntSet = MakeSet(type t = int; val lt = op<);

val s = IntSet.add IntSet.empty 8;
IntSet.contains(s, 7);
...

The Standard Basis Library

The following structures comprise the Standard Basis Library

Arraymutable constant-time-access arrays
Array2two-dimensional arrays
Binarysetbinary tree implementation of finite sets
BoolBooleans
Bytecharacter-byte conversion
Charcharacters
CharArrayarrays of characters
CharVectorvectors of characters (= strings)
CommandLineprogram name and arguments
Datemanipulation of calendar dates
FileSysinteraction with the file system
Generalvarious top-level primitives
Intoperations on integers
Listclassic list manipulation functions
ListPairoperations on pairs of lists
Mathtrigonometric functions etc.
OSoperating system information
Optionpartial functions
Pathfile-system independent path manipulation
Processmanipulating processes
Realarithmetics on floating-point numbers
SignalUnix signals
SML90top-level compatibility with SML'90
Stringstring manipulation
StringCvtconversion to and from strings
Substringmanipulation of constant-time substrings
TextIOtext input-output streams (imperative)
Timetime points and durations
Timermeasuring real time and cpu time
Unixstarting concurrent subprocesses under Unix
Vectorimmutable constant-time-access vectors
Wordwords (31-bit unsigned integers)
Word8bytes (8-bit unsigned integers)
Word8Arrayarrays of bytes
Word8Vectorvectors of bytes

Different installations will add many more structures. For example Moscow ML adds Arraysort, Binarymap, BinIO, Dynarray, Dynlib, Intset, Meta, Mosmlcgi, Mysql, Regex, Signal, Splaymap, Gdimage, among others.

The Moscow ML Library Reference is online.