CMSI 488 - Compiler Construction Quiz 1 Answers ================================ PROBLEM 1 --------- Is it possible for the assignment statement x.x = x; to ever legally appear in a Carlos program? If it can, state what it means, semantically. If not, state why it is impossible (technically), hinting at either (1) why allowing it would be stupid (inconsistent with the language design), or (2) the inexplicable oversight or stupidity of the language designer for leaving it out. Answer: Yes it's legal. It makes a field of an object reference the object itself, as in this Carlos program: struct s {s x;} // This is a TYPE DECLARATION in Carlos s x; // This declares x to be a variable of type s x.x = x; // x.x is also a variable of type s PROBLEM 2 --------- For each of the following Carlos fragments, tell whether it is a lexical error, syntax error, static semantic error, dynamic semantic error, or no error. a) c.real = 2.0; c.imag = -3.5; b) int f(int x) {int x = x;} c) string `ohana = "family"; d) string pet = "dog"; pet = "rat"; e) string pet = "cat"; pet[0] = 'r'; f) int f(int x) {} int g() {f(1,2,3,4);} Answer: a) syntax (real is a reserved word) b) static semantic (but note this is NOT AN ERROR in C!) c) lexical ("`" is not an allowable character). d) no error e) static semantic - strings are immutable in Carlos f) static semantic (too many arguments to f) PROBLEM 3 --------- Sketch the AST for the Java fragment: static private synchronized long woofer(Object... v) { for (int y : f(x)) { x = p.data[0] * (2<< 5 |- x ---c); } } Answer: method | +-----+---+---+----+-------+--------+ | | | | | | | static | | long | param | | | | / \ | private | woofer ... v block | | | synchronized Object for // \ \ __/ | \ \____ / / \ \ int y call = / \ / \ f x x * / \ [] | / | / \ . 0 << - / \ / \ / \ p data 2 5 - c | post-- | x PROBLEM 4 --------- Consider a language for describing vector graphics. An example program in this language is: deg color 1 0 0 ccw 90 forward 4 color 0 0 1 [ ccw 90 forward 1.5 ] cw 90 forward 1.5 This program draws the letter T with a red vertical line of size 4 units and topped with a 3 unit blue line. A program is a sequence of instructions. The instructions are: deg - switch to degree mode rad - switch to radians mode ccw a - turn left by angle a cw a - turn right by angle a forward n - draw a line by moving forward n units. backward n - draw a line by moving backward n units. color r g b - set color (r,g,b), values are floats in the range 0 to 1. [ - save current state ] - restore previously saved state Write a macrosyntax in EBNF for this language. Handle the brackets reasonably, please. State whether your grammar is LL(1) (meaning "can be parsed top-down with only one lookahead symbol) or not, and whether it is ambiguous or not. Answer: PROGRAM -> INST+ INST -> deg | rad | ccw NUM | cw NUM | forward NUM | backward NUM | color NUM NUM NUM | "[" INST+ "]" Note that NUM is a primitive token, and note that the value constraint on color arguments is not specified in the syntax; we're leaving that to the static semantic description. The grammar is LL(1) and non-ambiguous. PROBLEM 5 --------- Write a regular expression for the language of "positive hexadecimal numerals divisible by 65536". Is it possible to implement a parser for this language in JavaCC using only numeric lookahead values? If so, what is the find the smallest lookahead value you would need? Answer: The regex is: 0*[1-9A-Fa-f][0-9A-Fa-f]*0000 That's not LL(1), it's LL(5). You have to lookahead 5 symbols to know whether to take the final four zeros of the right alternative (the fifth being ). I think 5 is the minimum we can get. PROBLEM 6 --------- a) Write a regular expression for "any string of alphanumeric characters, beginning with a letter, which is anything BUT a three-character string ending with 'oo' (case insensitive). You may use any of Java's regex machinery like lookahead or lookbehind if you like. b) Suppose you were trying to encode this regex as a JavaCC token rule. Show a JavaCC token specification for this class of strings. Hard, isn't it? c) Fortunately, if you were really using JavaCC, you could avoid the hassle of ugly token specifications. Explain how the prohibition of -oo variables would be handled in a real JavaCC application. Answer: a) Here's one answer using negative lookahead \p{L}(?![Oo][Oo]$)[\p{L}\p{N}]* and here's a more "positive" approach [\p{L}] ( [\p{L}\p{N}&&[^oO]][\p{L}\p{N}]* | [oO][\p{L}\p{N}&&[^oO]][\p{L}\p{N}]* | [oO][oO][\p{L}\p{N}&&[^oO]][\p{L}\p{N}]* ) b) Frightening. Best thing I can think of is to make a class of letters without o's. That takes too long to write out for an exam, unless we allow the minus operator. If JavaCC allows a minus-like operator for token formation, we could do something like the second answer to part (a) above. c) The way to reject certain things as identifiers in JavaCC is to place a regex for them ***BEFORE*** the one for ID. After all, this is how we defined reserved words. So you could place the following line < ST00P1D_NAME: ["O","o"]["O","o"] > just before the line defining ID. Then whenever the scanner saw "foo" or "goo" it would be classified as a ST00P1D_NAME and not an ID. Exactly what we want! PROBLEM 7 --------- EBNF generally uses * A B to mean exactly one A followed by exactly one B * A? to mean zero or one A * A* to mean zero or more As * A | B to mean either exactly one A OR exactly one B Suppose I wanted to add a new one: * A1 # A2 # ... # An to mean "a non-empty string in which each of the Ais occurs zero or one times." Show how to write A # B # C using only the conventional EBNF markup. Answer: A(B?C?|C?B?) | B(A?C?|C?A?) | C(A?B?|B?A?)