The Language Iki

1  Introduction

Iki is a small language, which is why it is not named Nui. This document defines the language Iki. The definition has the usual format; first we define which strings are Iki programs (and which are not), and for those that are, we define what those strings mean.

2  Syntax

The source of an Iki program is a Unicode string. A string s is an Iki program if and only if its tokenization is derivable from the grammar in Section 2.2 below.

2.1 Microsyntax

A token in Iki is one of:

A skip is one of:

Tokens and skips are formally defined as follows:

    KEYWORD   →  'begin' | 'end' | 'var' | 'read' | 'write' | 'while' | 'loop'
    ID        →  [\p{L}] [\p{L}\p{Nd}'_']* - KEYWORD
    INTLIT    →  [\p{Nd}]+
    SYMBOL    →  [+\-*/=,;()]
    SKIP      →  [\x20\x09\x0A\x0D] | '--' [^\x0A\x0D]* [\x0A\x0D]
    TOKEN     →  INTLIT | ID | KEYWORD | SYMBOL

The tokenization of a string of characters is the token sequence t1 t2 ... tn whenever the string greedily matches

    SKIP* t1 SKIP* t2 SKIP* ... tn SKIP*

Recall that "greedily matches" implies "--" always starts a comment and is never a sequence of two "-" tokens, "xy" is always a single identifier token and never a sequence of two identifier tokens. and "vary=2" is a three-token sequence (not one, two, four, five, or six).

Example: The following character string:
    -- This doesn't write hello world
    begin  -- My first program
      var x  ;       var y;
           -- uggh: lousy indentation   and     SPACing

         while y - 5 loop
         var y;
                      read x ,y;
      x = 2 * (3+y);
        end;
          write 5;    -- Not sure why five, but okay.
    end

(uniquely) tokenizes into

begin var ID(x) ; var ID(y) ; while ID(y) - INTLIT(5) loop var ID(y) ; read ID(x) , ID(y) ; ID(x) = INTLIT(2) * ( INTLIT(3) + ID(y) ) ; end ; write INTLIT(5) ; end

Any string of characters that cannot be tokenized is not an Iki program.

2.2  Macrosyntax

The following grammar completely defines those token strings which are syntactically valid Iki programs:

    PROGRAM   →  'begin' BLOCK 'end'
    BLOCK     →  (DEC ';')* (STMT ';')+
    DEC       →  'var' ID
    STMT      →  ID '=' EXP
              |  'read' ID (',' ID)*
              |  'write' EXP (',' EXP)*
              |  'while' EXP 'loop' BLOCK 'end'
    EXP       →  TERM (ADDOP TERM)*
    TERM      →  FACTOR (MULOP FACTOR)*
    FACTOR    →  INTLIT | ID | '(' EXP ')'
    ADDOP     →  '+' | '-'
    MULOP     →  '*' | '/'

2.3  Abstract Syntax

The function Ψ defined below, maps legal Iki token sequences to Iki abstract syntax tree fragments.

    Ψ[begin b end] = (Program Ψb)
    Ψ[d1 ';' ... dn ';' s1 ';' ... sn ';'] = (Block Ψd1 ... Ψdn Ψs1 ... Ψsn)
    Ψ[var ID(i)] = (Var i)
    Ψ[ID(i) '=' e] = (Assign (Varref i) Ψe)
    Ψ[read ID(i1), ... ID(in)] = (Read (Varref i1) ... (Varref in))
    Ψ[write e1, ... en] = (Write Ψe1 ... Ψen)
    Ψ[while e loop b end] = (While Ψe Ψb)
    Ψ[t0 o1 t1 ... on tn] = (Ψon (Ψon-1 (... (Ψo1 Ψt0 Ψt1) ...) Ψtn-1) Ψtn)
    Ψ[f0 o1 f1 ... on fn] = (Ψon (Ψon-1 (... (Ψo1 Ψf0 Ψf1) ...) Ψfn-1) Ψfn)
    Ψ[INTLIT(n)] = (Intlit n)
    Ψ[ID(i)] = (Varref i)
    Ψ['+'] = Plus
    Ψ['-'] = Minus
    Ψ['*'] = Times
    Ψ['/'] = Divide
Example: The token sequence from Section 2.1 is translated into the following abstract syntax tree.
(Program
  (Block (Var x) (Var y)
    (While (Minus (Varref y) (Intlit 5))
      (Block
        (Var y)
        (Read (Varref x) (Varref y))
        (Assign (Varref x) (Times (Intlit 2) (Plus (Intlit 3) (Varref y))))
      )
    )
    (Write (Intlit 5))
  )
)

ikiast.png

3  Semantics

This section defines the semantics of Iki. As is customary, semantics generally has a static and a dynamic part; the static semantics describes rules that can be checked "at compile time" but that are impossible (or ridiculously inconvenient) to put into a context free grammar. The dynamic part defines what the program actually does when it runs. This is done on a macro as well as a micro level.

3.1  Static Semantics

In Iki, like almost every other programming language, we can distinguish between defining occurrences and using occurrences of identifier. One aspect of the Iki semantics is mapping each using occurrence into the actual variable being referred to. This can be done at compile time (i.e., without execution), so it is part of the static semantics.

The function T below transforms an abstract syntax tree into a semantic graph. The semantic graph is almost identical to the abstract syntax tree; the only exception (in Iki, anyway) being that variable references now refer to actual variables, rather than just names. The definition of T mathematically encodes the rules we intuitively know as "identifiers must be declared before being used," "inner declarations hide outer ones," and "variables local to a block are invisible to sibling and ancestor blocks." The definition makes use of environments which are mappings from variable names (string) to actual variables. At any point in time we have of stack of environments corresponding to the nesting of the block currently being processed. Every time we enter a block we push a new, empty environment on this stack, and fill it with the declarations of this block. When leaving a block, the topmost environment is popped. When resolving a variable reference, we "search" the environments from the top of the stack to the bottom.

    type environment = (string * Var) map
    fun find [env1 ... envn] i =
      n = 0 ? NOT_FOUND_ERROR :
      env1 containsKey i ? lookup env1 i :
      find [env2 ... envn] i

    T (Program b) = (Program (T b emptystack))
    T (Block d1 ... dn s1 ... sn) stack =
        let extendedstack = (T (Decs d1 ... dn) emptyenv)::stack in
        (Block d1 ... dn (T s1 extendedstack) ... (T sn extendedstack))
    T (Decs (v1 as Var i1 ... vn as Var in)) env =
        (n = 0) ? env :
        (env containskey i1) ? REDECLARATION_ERROR :
        T (Var i2 ... Var in) (add env i1 v1)
    T (Assign (Varref i) e) stack = (Assign (find stack i) (T e stack))
    T (Read (Varref i)) stack = (Read (find stack i))
    T (Write e) stack = (Write (T e stack))
    T (While e s) stack = (While (T e stack) (T s stack))
    T (Intlit n) stack = (Intlit n)
    T (Varref i) stack = find stack i
    T (Add e1 e2) stack = (Add (T e1 stack) (T e2 stack))
    T (Minus e1 e2) stack = (Minus (T e1 stack) (T e2 stack))
    T (Times e1 e2) stack = (Times (T e1 stack) (T e2 stack))
    T (Divide e1 e2) stack = (Divide (T e1 stack) (T e2 stack))

Note that T may fail to produce a semantic graph for some abstract syntax trees. In such a case we have a program which is syntactically valid (possesses no syntax errors), but semantically invalid.

Example: For the abstract syntax tree in the preceding section, the corresponding semantic graph is:

ikisemgraph.png

3.2  Dynamic Semantics

The dynamic meaning of a program is its effect on its environment. As Iki is so simple, the meaning of an Iki program is some transformation from an input file to an output file. For example, the meaning of the Iki program begin var x; read x; write x+1; end is "the function that takes in an arbitrary input file, and returns an output file containing a single element whose value is one more than the first item on the input file.

3.2.1. Macrosemantics

The meaning is given here by a function E from semantic graphs to their "meanings." While the meaning of a program is a mapping from inputs to outputs, the meanings of other semantic entities are different. Statements change the state of the computation; the state is the aggregation of (1) its current data store (mapping from its variables to their values), (2) the current input file, and (3) the current output file. The meaning of an expression is the value it produces, relative to a store, of course. Declarations in Iki have no dynamic effect.

    type file = int list
    type store = Var -> int
    type state = (store * file * file)

    E (Program b) input = #3(E b (λv.0, input, []))
    E (Block d1 ... dn s1 ... sn) ψ = E (Block s1 ... sn) ψ
    E (Block s1 ... sn) ψ = (n = 0) ? ψ : E (Block s2 ... sn) (E s1 ψ)
    E (Assign (Varref v) e) (σ, input, output) = (σ[(E e σ)/v], input, output)
    E (Read (Varref v1 ... vn)) ψ = E (Read (Varref v2 ... vn)) (E (Read (Varref v1)) ψ)
    E (Read (Varref v)) (σ, input, output) =
        length input = 0 ? END_ERROR :
        (σ[(head input)/v], tail input, output)
    E (Write e1 ... en) ψ = E (Write v2 ... vn) (E (Write v1) ψ)
    E (Write e) (σ, input, output) = (σ, input, output @ E e σ)
    E (While e b) ψ = (E e (#1 ψ) = 0) ? ψ : E (While e b (E s ψ))
    E (Intlit n) σ = n
    E (Varref v) σ = σ v
    E (Add e1 e2) σ = E e1 σ + E e2 σ
    E (Minus e1 e2) σ = E e1 σ - E e2 σ
    E (Times e1 e2) σ = E e1 σ * E e2 σ
    E (Divide e1 e2) σ = E e1 σ / E e2 σ

Note how the semantics mathematically captures the intuitive requirement that "all variables are initialized to 0." There is one runtime error defined: reading past the end of the input.

3.2.2  Microsemantics

The meaning of a program is a mapping from an input file to an output file; these correspond to the environment's standard input and standard output files respectively. Integers are read from standard input as digit strings optionally prefixed by a single '-' and separated by whitespace. The first integer in the file may be preceded by whitespace and the last integer may be followed by whitespace. Note that the denotational semantics does allow a program to terminate without reading the entire input file; in this case an implementation must ignore the unread portion of the input file. If an error occurs during reading, the program aborts and must signal its environment in some fashion (e.g. it may set an exit status value, or write a message). Integers are written to the output file as digit strings, prefixed by a single '-' if negative, separated by a single space. There may be whitespace before the first integer and after the last.

4  Implementation Permissions

An implementation may fix a maximum length for identifiers, and may fix a maximum bit length for integers. Arithmetic may be signed or modular.