CMSI 488/588
Final Exam

The test is open-everything with the sole limitation that you neither solicit nor give help while the exam is in progress.

The final exam is in two parts.

ProblemYou gotOut of
1
 
15
2
 
10
3
 
15
4
 
10
5
 
15
6
 
10
7
 
15
8
 
10
TOTAL
 
100
  1. Consider a little language for specifying (JavaScript-style) object literals, where objects are key-value pairs separated by commas and enclosed in curly braces. Keys are identifiers (non-empty strings of ASCII letters, digits, and underscores that must start with a letter), and values are either numeric literals, string literals, identifiers, other objects, or arrays (comma separated values enclosed in square brackets). For simplicity, assume string literals are just character sequences enclosed in double quotes, with no special escape sequences; all characters except control characters are allowed. This little language has no comments, but the usual whitespace can be used liberally between tokens. An example object description is:
        {dog: "spike",
         id: 234234,
         pups: [234, 25254, "spot", 3.2e5, false],
         house: {color: "GREEN", date: {m:4, y:2005, d:22}},
         nothings: [{}, {}]
        }
    

    Write the JavaCC TOKEN clauses, SKIP clauses, and parsing functions for syntax analysis of this little object description language. Do not include the PARSER_BEGIN/PARSER_END section or any action routines.

  2. Assume we decided to extend Hana so that structs could contain functions. Draw the abstract syntax tree for this Hana function declaration:

        void f(string z, ...) {
            struct e {
                double x;
                string f() {return " " * y * #z;}
            };
            e[] p;
            print (p.c[1].f()[6 |~ x+2 >> x]);
        }
    
  3. Write a NASM function to return the product of its input (which must be a double) and 7.0, without using multiplication or loops. USE AT MOST 4 ADDITIONS. The return type is double. Assume the function will be called from a C program built under gcc.
  4. Classify each of the following Hana conditions as a lexical error, a syntax error, a static semantic error, a dynamic semantic error, or just fine.
    1. Applying the atan standard function to two ints
    2. The declaration struct Chemical {boolean volatile; string color;}
    3. A string literal with a non-breaking space in it.
    4. x < y < z, where x and y are ints, and z is a boolean.
    5. Applying the "$" operator to an int
    6. Having the wrong number of components in a struct's constructor
    7. A return expression at the "top level" of a script.
    8. The statement f(x)[3] = 1, where f is a function of one integer argument returning an integer array.
    9. Returning a thread from a function.
    10. Application of the "." to an identifier which is not a field of the struct.
  5. In my Hana compiler, I chose to represent expressions containing the short-circuit ands and ors as binary expressions. One ramification of this was that the intermediate code produced for expressions like x || y || z is not efficient.

    1. Write out the tuple sequence produced by the naive translation (go ahead and use my Hana viewer program).
    2. Write out a more efficient sequence of tuples (Hint: only one register should be needed).
    3. How would an optimizer detect that sequence is lousy (by looking only at the tuples)? What kind of transformations would an optimizer do (at the tuple level) to turn the lousy tuple sequence into the good one?
    4. Explain how treating these operators an n-ary rather than binary, simplifies this issue a great deal. Use a tree grammar in your explanation.
  6. Here is a small language

        EXP  -> "{" EXP "}" | "[" EXP "]" | "(" EXP ")" | EXP EXP | ID
    
    1. What language is this?
    2. Is the grammar ambiguous? Why or why not?
    3. Is it LL(k) for any k? If so, for which k? If not, why not?
    4. Give an "attribute grammar" for this language that attaches a "nesting level" to each identifier. You will have to make a slight modification to the original grammar for this to make sense.
  7. Here's some Hana code that prints the elements of an integer array separated by commas:

        for (int i = 0; i < #a; i++) {
            print($a[i]);
            print(", ") if (i != #a-1);
        }
    

    With optimizations turned off, my compiler produces:

    p0:
      copy 0, i1
    L0:
      copy [i0-4], r0
      less i1, r0, r1
      jz r1, L1
      assert_not_null i0
      copy [i0-4], r2
      assert_in_range i1, 0, r2
      mul i1, 4, r3
      add i0, r3, r4
      copy [r4], r5
      to_string r5, r6
      param r6
      call __print, 4
      copy [i0-4], r7
      sub r7, 1, r8
      not_equal i1, r8, r9
      jz r9, L2
      param s0
      call __print, 4
    L2:
      inc i1
      jump L0
    L1:
      exit
    s0:
      [44, 32]
    
    1. Describe, in high-level terms, what each of the assert tuples are doing. Are both of them necessary? Why or why not?
    2. Rewrite this code fragment showing what it would look like without using the variable i (but rather stepping through the array elements by incrementing an internal pointer). Note that this problem does not require you to know anything about how optimizers work. You are only being asked to show off your understanding of Squid to come up with a super-efficient Squid tuple sequence for a specific algorithm.
  8. Write regular expressions for
    1. Unsigned binary numbers, of any size, divisible by 8.
    2. Sixteen-bit hexadecimal numerals (signed or unsigned!) divisible by 8.
    3. Floating point constants that are not allowed to have an empty fractional part and can have no more than three digits in the exponent part.
    4. the set of all character strings that contain neither the substring "return" nor "retry".