The Language Carlos

1  Introduction

Carlos is an imperative, block structured programming language. It has much in common with C, but is an applications language, not a systems language. It features heap-allocated arrays and records without explicit pointers, automatic garbage collection, a built-in immutable string type, and functions which may be nested and overloaded (but are not first-class values).

This document defines the language Carlos.

2  Syntax

The string s is a syntactically valid Carlos program if it greedily matches

    SKIP* TOKEN1 SKIP* TOKEN2 SKIP* ... TOKENn SKIP*
where tokens and skips are defined in the following microsyntax rules
    LETTER      →  [\p{L}]
    DIGIT       →  [\p{Nd}]
    KEYWORD     →  'boolean' | 'if' | 'break' | 'else' | 'int' | 'for' | 'new'
                |  'return' | 'char' | 'struct' | 'null' | 'while' | 'real'
                |  'true' | 'string' | 'void' | 'false' | 'length' | 'print'
    ID          →  LETTER (LETTER | DIGIT | '_')* - KEYWORD
    INTLIT      →  DIGIT+
    FLOATLIT    →  DIGIT+ '.' DIGIT+ ([Ee] [+-]? DIGIT+)?
    CHAR        →  [^\p{Cc}'"\\] | ESCAPE
    ESCAPE      →  '\\' ([nt'"\\] | HEX{1,8} ';')
    HEX         →  [0-9a-fA-F]
    CHARLIT     →  '\x27' (CHAR | '"') '\x27'
    STRINGLIT   →  '"' (CHAR | '\x27')* '"'
    SYMBOL      →  [+\-~!*/%&|^<>=,.;()[\]{}]
                |  '||' | '&&' | '<<' | '>>' | '<=' | '>=' | '==' | '++' | '--'
    SKIP        →  [\x20\x09\x0A\x0D]
                |  '//' [^\x0A\x0D]* [\x0A\x0D]
    TOKEN       →  INTLIT | FLOATLIT | CHARLIT | STRINGLIT | ID | KEYWORD | SYMBOL

and the string TOKEN1TOKEN2...TOKENn is derivable from this grammar

    PROGRAM     →  STMT+
    STMT        →  DEC
                |  ASSIGNMENT ';'
                |  CALL ';'
                |  'break' ';'
                |  'return' EXP? ';'
                |  'print' ARGS ';'
                |  'if' '(' EXP ')' BLOCK ('else' 'if' '(' EXP ')' BLOCK)* ('else' BLOCK)?
                |  'while' '(' EXP ')' BLOCK
                |  'for' '(' (TYPE ID '=' EXP)? ';' EXP? ';' ASSIGNMENT? ')' BLOCK
    ASSIGNMENT  →  INCREMENT | VAR '=' EXP
    INCREMENT   →  INCOP VAR | VAR INCOP
    DEC         →  TYPEDEC | VARDEC | FUNDEC
    TYPEDEC     →  'struct' ID '{' (TYPE ID ';')* '}'
    TYPE        →  'boolean' | 'char' | 'int' | 'real' | 'string' | ID | TYPE '[' ']'
    VARDEC      →  TYPE ID ('=' EXP)? ';'
    FUNDEC      →  (TYPE | 'void') ID '(' PARAMS ')' BLOCK
    PARAMS      →  (TYPE ID (',' TYPE ID)*)?
    BLOCK       →  '{' STMT* '}'
    EXP         →  EXP1 ('||' EXP1)*
    EXP1        →  EXP2 ('&&' EXP2)*
    EXP2        →  EXP3 ('|' EXP3)*
    EXP3        →  EXP4 ('^' EXP4)*
    EXP4        →  EXP5 ('&' EXP5)*
    EXP5        →  EXP6 (RELOP EXP6)?
    EXP6        →  EXP7 (SHIFTOP EXP7)*
    EXP7        →  EXP8 (ADDOP EXP8)*
    EXP8        →  EXP9 (MULOP EXP9)*
    EXP9        →  PREFIXOP? EXP10
    EXP10       →  LITERAL
                |  VAR
                |  INCREMENT
                |  NEWOBJECT
                |  '(' EXP ')'
    LITERAL     →  'null'
                |  'true'
                |  'false'
                |  INTLIT
                |  FLOATLIT
                |  CHARLIT
                |  STRINGLIT
    VAR         →  ID | CALL | VAR '[' EXP ']' | VAR '.' ID
    NEWOBJECT   →  'new' ID '{' ARGS '}'
                |  'new' TYPE '[' ']' '{' ARGS '}'
                |  'new' TYPE ('[' EXP ']')+
    CALL        →  ID '(' ARGS ')'
    ARGS        →  (EXP (',' EXP)*)?
    RELOP       →  '<' | '<=' | '==' | '!=' | '>=' | '>'
    SHIFTOP     →  '<<' | '>>'
    ADDOP       →  '+' | '-'
    MULOP       →  '*' | '/' | '%'
    PREFIXOP    →  '-' | '!' | '~' | 'char' | 'int' | 'string' | 'length'
    INCOP       →  '++' | '--'

3  Semantics

We describe the semantics of Carlos informally but somewhat precisely.

3.1  Programs

A program is a sequence of one or more statements. Some statements, called declaration statements, declare entities; others simply execute.

// This is a complete Carlos program.  When executed, it writes
// "hello, world" to standard output.

string greeting = "hello";
print greeting;
print ", ";
print place();
string place() {return "world";}

3.2  Blocks

Blocks exist to control the scope of declarations. A block is a sequence of zero or more statements.

3.3  Declarations

A declaration binds an identifier to an entity. There are six kinds of declarations:

Each occurrence of an identifier is either a defining occurrence or a using occurrence. Using occurrences are legal only in the visible region of the declaration that declares the identifier. The visible region of a declaration is the declaration's scope minus any "inner" scopes of declarations of identifiers with the same name. (This means the visible region may be discontinuous).

int c = g(null);                  // line 1
void g(int[] a) {}                // line 2
real y = c + 2;                   // line 3
print c;                          // line 4
char f(string s) {                // line 5
    Point q() {return null;}      // line 6
    int y;                        // line 7
    struct Point {int x; int y;}  // line 8
    c = 50;                       // line 9
}                                 // line 10
f("Can't see a point here");      // line 11

// Identi-   Declared    Scope    Visible
//   fier    on  line             Region
// ----------------------------------------
//    c          1        1-11      1-11
//    g          2        1-11      1-11
//    a          2        2-2       2-2
//    y          3        3-11    3-6,11-11
//    f          5        1-11      1-11
//    s          5        6-10      6-10
//    q          6        6-10      6-10
//    y          7        7-10      7-10
//  Point        8        6-10      1-10

The declared identifiers at the top-level of a block of program (types, variables, and functions) must be mutually distinct, except that multiple functions can share the same name. The parameters of a function are logically top-level identifiers of the function's body block, and an iterator is logically a top-level identifier of its statement's block.

void test(int x, int y) {
    real z;
    string x;             // ERROR: x is already a parameter
    if (z > 1.0) {
        int x = 5;        // this x is fine, however
    }
    for (int i = 0; i < 10; i++) {
        int i = 4;        // ERROR: clashes with iterator i
        while (true) {
            int i = 2;    // this i is fine
        }
    }
}

3.4  Types

Carlos features the following types:

The types int and real are arithmetic types; the type string, together with array types and structure types, are the reference types.

3.5  Functions

A function has a name, an optional return type, a parameter list, and a body. The identifiers declared as parameters must all be unique. Functions marked void in their declarations are called "void functions" or "procedures" and have no return type.

The signature of a function refers to the number, type, and order of its parameters, for example, if a function f is declared

t0 f(t1 p1, t2 p2, t3 p3)

then its signature is the type list (t1, t2, t3). Note that the return type does not affect the signature. An expression list (e1, ..., en) is said to match a signature (t1, ..., tk) if n = k and each ei is type-compatible (see Section 3.8) with ti.

3.6  Variables

A variable is something that stores a value. All variables have a type. Variables are either writable or not writable. The kinds of variables are:

3.7  Statements

A statement is code that is executed solely for its side effect; it produces no value. The kinds of statements are:

3.8  Expressions

Each expression has a type and a value. The value of an expression with a reference type is either null or a reference to an object. String, array and structure values are therefore never manipulated directly, but only through references.

An expression e is type-compatible with a type t if and only if

An expression of type int can appear anywhere an expression of type real is expected; in this case the integer value is implicitly converted to one of type real. The conversion must maintain the expression's value; this is always possible since the type real has 53 bits of precision.

The Carlos expressions are:

4  Built-in Functions

The following functions are assumed to exist in the outermost scope of every Carlos program.