Carlos is an imperative, block structured programming language. It has much in common with C, but is an applications language, not a systems language. It features heap-allocated arrays and records without explicit pointers, automatic garbage collection, a built-in immutable string type, and functions which may be nested and overloaded (but are not first-class values).
This document defines the language Carlos.
The string s is a syntactically valid Carlos program if it greedily matches
SKIP* TOKEN1 SKIP* TOKEN2 SKIP* ... TOKENn SKIP*
where tokens and skips are defined in the following microsyntax rules
LETTER → [\p{L}]
DIGIT → [\p{Nd}]
KEYWORD → 'boolean' | 'if' | 'break' | 'else' | 'int' | 'for' | 'new'
| 'return' | 'char' | 'struct' | 'null' | 'while' | 'real'
| 'true' | 'string' | 'void' | 'false' | 'length' | 'print'
ID → LETTER (LETTER | DIGIT | '_')* - KEYWORD
INTLIT → DIGIT+
FLOATLIT → DIGIT+ '.' DIGIT+ ([Ee] [+-]? DIGIT+)?
CHAR → [^\p{Cc}'"\\] | ESCAPE
ESCAPE → '\\' ([nt'"\\] | HEX{1,8} ';')
HEX → [0-9a-fA-F]
CHARLIT → '\x27' (CHAR | '"') '\x27'
STRINGLIT → '"' (CHAR | '\x27')* '"'
SYMBOL → [+\-~!*/%&|^<>=,.;()[\]{}]
| '||' | '&&' | '<<' | '>>' | '<=' | '>=' | '==' | '++' | '--'
SKIP → [\x20\x09\x0A\x0D]
| '//' [^\x0A\x0D]* [\x0A\x0D]
TOKEN → INTLIT | FLOATLIT | CHARLIT | STRINGLIT | ID | KEYWORD | SYMBOL
and the string TOKEN1TOKEN2...TOKENn is derivable from this grammar
PROGRAM → STMT+
STMT → DEC
| ASSIGNMENT ';'
| CALL ';'
| 'break' ';'
| 'return' EXP? ';'
| 'print' ARGS ';'
| 'if' '(' EXP ')' BLOCK ('else' 'if' '(' EXP ')' BLOCK)* ('else' BLOCK)?
| 'while' '(' EXP ')' BLOCK
| 'for' '(' (TYPE ID '=' EXP)? ';' EXP? ';' ASSIGNMENT? ')' BLOCK
ASSIGNMENT → INCREMENT | VAR '=' EXP
INCREMENT → INCOP VAR | VAR INCOP
DEC → TYPEDEC | VARDEC | FUNDEC
TYPEDEC → 'struct' ID '{' (TYPE ID ';')* '}'
TYPE → 'boolean' | 'char' | 'int' | 'real' | 'string' | ID | TYPE '[' ']'
VARDEC → TYPE ID ('=' EXP)? ';'
FUNDEC → (TYPE | 'void') ID '(' PARAMS ')' BLOCK
PARAMS → (TYPE ID (',' TYPE ID)*)?
BLOCK → '{' STMT* '}'
EXP → EXP1 ('||' EXP1)*
EXP1 → EXP2 ('&&' EXP2)*
EXP2 → EXP3 ('|' EXP3)*
EXP3 → EXP4 ('^' EXP4)*
EXP4 → EXP5 ('&' EXP5)*
EXP5 → EXP6 (RELOP EXP6)?
EXP6 → EXP7 (SHIFTOP EXP7)*
EXP7 → EXP8 (ADDOP EXP8)*
EXP8 → EXP9 (MULOP EXP9)*
EXP9 → PREFIXOP? EXP10
EXP10 → LITERAL
| VAR
| INCREMENT
| NEWOBJECT
| '(' EXP ')'
LITERAL → 'null'
| 'true'
| 'false'
| INTLIT
| FLOATLIT
| CHARLIT
| STRINGLIT
VAR → ID | CALL | VAR '[' EXP ']' | VAR '.' ID
NEWOBJECT → 'new' ID '{' ARGS '}'
| 'new' TYPE '[' ']' '{' ARGS '}'
| 'new' TYPE ('[' EXP ']')+
CALL → ID '(' ARGS ')'
ARGS → (EXP (',' EXP)*)?
RELOP → '<' | '<=' | '==' | '!=' | '>=' | '>'
SHIFTOP → '<<' | '>>'
ADDOP → '+' | '-'
MULOP → '*' | '/' | '%'
PREFIXOP → '-' | '!' | '~' | 'char' | 'int' | 'string' | 'length'
INCOP → '++' | '--'
We describe the semantics of Carlos informally but somewhat precisely.
A program is a sequence of one or more statements. Some statements, called declaration statements, declare entities; others simply execute.
// This is a complete Carlos program. When executed, it writes
// "hello, world" to standard output.
string greeting = "hello";
print greeting;
print ", ";
print place();
string place() {return "world";}
Blocks exist to control the scope of declarations. A block is a sequence of zero or more statements.
A declaration binds an identifier to an entity. There are six kinds of declarations:
Each occurrence of an identifier is either a defining occurrence or a using occurrence. Using occurrences are legal only in the visible region of the declaration that declares the identifier. The visible region of a declaration is the declaration's scope minus any "inner" scopes of declarations of identifiers with the same name. (This means the visible region may be discontinuous).
int c = g(null); // line 1
void g(int[] a) {} // line 2
real y = c + 2; // line 3
print c; // line 4
char f(string s) { // line 5
Point q() {return null;} // line 6
int y; // line 7
struct Point {int x; int y;} // line 8
c = 50; // line 9
} // line 10
f("Can't see a point here"); // line 11
// Identi- Declared Scope Visible
// fier on line Region
// ----------------------------------------
// c 1 1-11 1-11
// g 2 1-11 1-11
// a 2 2-2 2-2
// y 3 3-11 3-6,11-11
// f 5 1-11 1-11
// s 5 6-10 6-10
// q 6 6-10 6-10
// y 7 7-10 7-10
// Point 8 6-10 1-10
The declared identifiers at the top-level of a block of program (types, variables, and functions) must be mutually distinct, except that multiple functions can share the same name. The parameters of a function are logically top-level identifiers of the function's body block, and an iterator is logically a top-level identifier of its statement's block.
void test(int x, int y) {
real z;
string x; // ERROR: x is already a parameter
if (z > 1.0) {
int x = 5; // this x is fine, however
}
for (int i = 0; i < 10; i++) {
int i = 4; // ERROR: clashes with iterator i
while (true) {
int i = 2; // this i is fine
}
}
}
The types int and real are arithmetic types; the type string, together with array types and structure types, are the reference types.
A function has a name, an optional return type, a parameter list, and a body. The identifiers declared as parameters must all be unique. Functions marked void in their declarations are called "void functions" or "procedures" and have no return type.
The signature of a function refers to the number, type, and order of its parameters, for example, if a function f is declared
t0 f(t1 p1, t2 p2, t3 p3)
then its signature is the type list (t1, t2, t3). Note that the return type does not affect the signature. An expression list (e1, ..., en) is said to match a signature (t1, ..., tk) if n = k and each ei is type-compatible (see Section 3.8) with ti.
A variable is something that stores a value. All variables have a type. Variables are either writable or not writable. The kinds of variables are:
(Simple variable) Here i is a simple identifier with the same name as an identifier declared in a visible variable declaration, parameter declaration, or iterator declaration. The type of this variable is the type given the identifier in the innermost visible declaration. It is writable.
(Subscripted variable) Here v is a variable of an array type or the string type and e is an expression of type int. The type of this variable is v's base type if v is an array, or char if a string. This variable is the array component or character at (zero-based) index e, and is writable unless v is a string. If during execution v is null, or e evaluates to a value less than zero or greater than or equal to the length of v, the program dies.
(Selected variable) Here v is a variable of a struct type and f must be an identifier declared as a field of v's type. This variable refers to the f-field of the object referred to by v, and is writable. The type of this variable is the type associated with the field f. If during execution v is null, the program dies.
(Function call result) Here f must name a function whose signature is matched by the argument list e1 through en, and be the only visible function that is so matched. Each expression is evaluated in any order and the function f is called with the arguments copied to the parameters. The function must not have been declared as void. This variable refers to the result of calling the function, has the type of the function, and is not writable.
A statement is code that is executed solely for its side effect; it produces no value. The kinds of statements are:
(Increment statement) Here v must have type int. Increments v.
(Decrement statement) Here v must have type int. Decrements v.
(Call statement) Here f must name a method whose signature is matched by the argument list e1 through en, and be the only visible function that is so matched. Each expression is evaluated in any order and the function f is called with the arguments copied to the parameters. The function must have been declared void.
(Assignment statement) Here e must be type compatible with the type of v. v is determined and e is evaluated, then the value of e is copied into v.
(Break statement) This statement may only appear within a while or for statement that is properly within the same function as the break statement. The break statement terminates the execution of the innermost enclosing while or for statement.
(Return statement) Causes an immediate return from the innermost enclosing function, which must have been marked void.
(Return statement) Evaluates e then causes the innermost enclosing function to immediately return the value of e. The function must not have been marked void, and e must be type compatible with its return type.
(Print statement) Evaluates each ei in order and writes to standard output, the value of string(ei).
(While statement) Here e must have type boolean. First e is evaluated. If e produces false the execution of the while statement terminates. If e produces true, b is executed then the while statement is executed again.
(If statement) Each ei must have type boolean. Each ei is evaluated in order from left to right until one of them is true or they have all been evaluated. If any of the ei's evaluate to true the corresponding bi is executed, completing the execution of the if-statement. If none of the ei's evaluate to true, bn is executed (if it exists).
(For statement) This is equivalent to {t i = e1; while (e2) {b; e3}}. If e2 is missing it is assumed to be true.
Each expression has a type and a value. The value of an expression with a reference type is either null or a reference to an object. String, array and structure values are therefore never manipulated directly, but only through references.
An expression e is type-compatible with a type t if and only if
An expression of type int can appear anywhere an expression of type real is expected; in this case the integer value is implicitly converted to one of type real. The conversion must maintain the expression's value; this is always possible since the type real has 53 bits of precision.
The Carlos expressions are:
The literal consists of one non-control character or escape sequence. If a non-control character, the value of the character literal is the character; if an escape sequence the value is as follows:
| \n | newline |
| \t | tab |
| \xxxxxxxx; | where xxxxxxxx is a one to eight character hexadecimal digit sequence, this escape sequence stands for a character with a given codepoint. |
| \" | the double quote character |
| \' | the single quote character |
| \\ | the backslash character |
The value of the string literal is the character sequence within the double quotes, where each character is interpreted as described for character literals above.
The literal of type boolean denoting truth.
The literal of type boolean denoting falsity.
A literal of the internal null type representing a reference to no object.
Where v is a variable. The type of this expression is the type of the variable v, and the value of this expression is the current value stored in v.
v must have type int. Produces the value of v, but increments v immediately after producing the value.
v must have type int. Produces the value of v, but decrements v immediately after producing the value.
v must have type int. Increments v, then produces this value.
v must have type int. Decrements v, then produces this value.
(Structure object construction) Here i must be the name of a visible structure type. This expression refers to a newly constructed object of type denoted by i whose field values are, in order, e1 through en.
(Array object construction) Here t must be the name of a type, and each ei must have type t. This expression refers to a newly constructed array of n items consisting of the values of each subexpression, respectively.
(Object construction) Here each of the ei's must have integer type. This expression refers to a newly constructed array of en items, each of which is a newly constructed array of en-1 items, each of which is a, you know, and so on, until we need to talk about the newly constructed array of e1 objects of type t. All array elements are to be initialized with the proper initial values for their type. If during execution any of the ei's evaluate to a non-positive integer, the program dies.
Here i must be an expression of type int. Produces the character whose codepoint is i.
Here c must be an expression of type char. Produces the codepoint of c.
Produces the printable form of its argument. For ints, chars, doubles and strings the returned string is identical to the output of C function printf with the %d, %c, %s and %f format specifiers, respectively. For the value null, the returned string is "null". For booleans, the returned string is either "true" or "false". For other objects and arrays, the returned string is the object's type name followed by "@" followed by some unique integer identifier.
Here e must be an expression of type string or be an array. Produces the length of the array or the number of characters in the string. If the expression is null, the program crashes.
Evaluates e and produces this value.
e must have type an arithmetic type. Evaluates e and produces the negation of e.
e must have type int. Evaluates e and produces the bitwise complement of e.
e must have type boolean. If e evaluates to true, the entire expression produces false, otherwise it produces true.
Both subexpressions must have arithmetic type. The subexpressions are evaluated in any order and their product is produced. The type of the quotient is real only if either operand is real, otherwise the type is int.
Each subexpression must have an arithmetic type. Both expressions are evaluated, in any order, and the entire expression produces the quotient of e1 divided by e2. The type of the quotient is real only if either operand is real, otherwise the type is int.
Each subexpression must have type int. Both expressions are evaluated, in any order, and the entire expression produces an integer which is the modulo of e1 and e2.
Both subexpressions must have arithmetic type. The subexpressions are evaluated in any order and their sum is produced. The type of the sum is real only if either operand is real, otherwise the type is int.
Each ei must have an arithmetic type. Evaluates the subexpressions in any order, then produces the difference of e1 and e2. The type of the difference is real only if either operand is real, otherwise the type is int.
Each ei must have type int. Produces the value of e1 shifted left e2 positions.
Each ei must have type int. Produces the value of e1 arithmetically shifted right e2 positions.
Each subexpression must have arithmetic type, or must both have type char, or must both have type string. Both expressions are evaluated, in any order, and the entire expression produces whether the value of e1 is less than or equal to the value of e2.
Each subexpression must have arithmetic type, or must both have type char, or must both have type string. Both expressions are evaluated, in any order, and the entire expression produces whether the value of e1 is less than the value of e2.
e1 must be type-compatible with the type of e2 or e2 must be type compatible with the type of e1. The subexpressions are evaluated in any order, and the entire expression produces whether these values are the same, taking into account any necessary promotions of int values to real values where necessary.
Equivalent to !(e1==e2).
Each subexpression must have arithmetic type, or must both have type char, or must both have type string. Both expressions are evaluated, in any order, and the entire expression produces whether the value of e1 is greater than the value of e2.
Each subexpression must have arithmetic type, or must both have type char, or must both have type string. Both expressions are evaluated, in any order, and the entire expression produces whether the value of e1 is greater than or equal to the value of e2.
Each subexpression must have type int. Both expressions are evaluated, in any order, and the entire expression produces an int which is the bitwise and of e1 and e2.
Each subexpression must have type int. Both expressions are evaluated, in any order, and the entire expression produces an int which is the bitwise exclusive or of e1 and e2.
Each subexpression must have type int. Both expressions are evaluated, in any order, and the entire expression produces an int which is the bitwise inclusive or of e1 and e2.
Each subexpression must have type boolean. First e1 is evaluated. If it evaluates to false, the entire expression immediately produces false (without evaluating e2). Otherwise e2 is evaluated and the entire expression produces the value of e2.
Each subexpression must have type boolean. First e1 is evaluated. If it evaluates to true, the entire expression immediately produces true (without evaluating e2). Otherwise e2 is evaluated and the entire expression produces the value of e2.
The following functions are assumed to exist in the outermost scope of every Carlos program.
Reads from standard input up to and including the first newline character, or until the end of the input file is reached. Returns a string consisting of all consumed characters not including the newline character. Bytes are converted to characters according to the default character encoding. Returns null if the end of file had previously been reached. This is a blocking call.
Returns the string consisting of the length characters of s starting at startIndex. If startIndex is less than 0, the program dies. If startIndex is beyond the end of s, returns the empty string. If length is too large, then the returned string consists only of the characters up to the end of s.
Returns the square root of x.
Returns pi.
Returns the sine of x (in radians).
Returns the cosine of x (in radians).
Returns the arctangent of the angle between a line from the origin to (x,y) and the positive x-axis.
Returns the natural log of x.