CMSI 488/588
Homework #1

Read the first two chapters in the Scott textbook, skipping the section in Chapter 2 entitled "Theoretical Foundations." Do all of the exercises marked "Check Your Understanding" in a social setting with classmates or friends, but do not turn these in.

Submit, in hardcopy, answers to the following problems generated form LaTeX source according to the usual homework submission guidelines. Also build up your CVS repository with the following:

    /homework/cmsi488/hw1.tex
    /homework/cmsi488/iki/src/main/javacc/Iki.jj

Make sure that you have run the setup-class script so that I can checkout and run your code from CVS while grading.

  1. Write regular expressions for:
    1. U.S. Zip Codes
    2. Legal Visa® Card Numbers (extra credit if you do the checksums)
    3. Legal MasterCard® Numbers (extra credit if you do the checksums)
    4. Floating-Point constants in Ada
    5. Strings over {a,b,c} not containing the substring "aba" or "bb"

    Include descriptions of what you did, if you want full credit.

  2. We saw in the course notes, on the page entitled "Syntax", an example of an abstract syntax tree with five concrete syntaxes. Show this same example using JSON syntax.
  3. Here's a grammar.
        S -> A M
        M -> S?
        A -> 'a' E | 'b' A A
        E -> ('a' B | 'b' A)?
        B -> 'b' E | 'a' B B
    
    1. Describe in English, the language of this grammar.
    2. Draw a parse tree for the string "abaa"
    3. Prove or disprove: "This grammar is LL(1)."
    4. Prove or disprove: "This grammar is ambiguous."
  4. Here's a grammar that's trying to capture the usual expressions, terms, and factors, while considering assignment to be an expression.
        EXP         -> ID ":=" EXP | TERM TERM_TAIL
        TERM_TAIL   -> ("+" TERM TERM_TAIL)?
        TERM        -> FACTOR FACTOR_TAIL
        FACTOR_TAIL -> ("*" FACTOR FACTOR_TAIL)?
        FACTOR      -> "(" EXP ")" | ID
    
    1. Prove that this grammar is not LL(1).
    2. Rewrite it so that it is LL(1).
  5. In the Ada language comments are started with "--" and go to the end of the line. Therefore the designers decided not to make the unary negation operator have the highest precedence. Instead, expressions are defined as follows:
        EXP  → EXP1 ('and' EXP1)*  |  EXP1 ('or' EXP1)*
        EXP1 → EXP2 (RELOP EXP2)?
        EXP2 → '-'? EXP3 (ADDOP EXP3)*
        EXP3 → EXP4 (MULOP EXP4)*
        EXP4 → EXP5 ('**'  EXP5)?  |  'not' EXP5  |  'abs' EXP5
    
    Explain why this choice was made. Also, give an abstract syntax tree for the expression -8 * 5 and explain how this is similar to and how it is different from the alternative of dropping the negation from EXP2 and adding - EXP5 to EXP4.
  6. Here is a description of a language. Programs in this language are made up of a non-empty sequence of function declarations, followed by a single expression. Each function declaration starts with the keyword fun followed by the function's name (an identifier), then a parenthesized list of zero or more parameters (also identifiers) separated by commas, then the body, which is a sequence of one or more expressions terminated by semicolons with the sequence enclosed in curly braces. Expressions can be numeric literals, string literals, identifiers, function calls, or can be made up of other expressions with the usual binary arithmetic operators (plus, minus, times, divide) and a unary prefix negation and a unary postfix factorial ("!"). There's a conditional expression with the weird syntax "x if y else z". Factorial has the highest precedence, followed by negation, the multiplicative operators, the additive operators, and finally the conditional. Parentheses are used, as in most other languages, to group subexpressions. Numeric literals are non-empty sequences of decimal digits with an optional fractional part and an optional exponent part. String literals are as in Carlos. Identifiers are those non-empty sequences of letters, decimal digits, underscores, at-signs, and dollar signs, beginning with a letter or dollar sign, that are not also reserved words. Function calls are as in Carlos, with the arguments in a comma-separated list of expressions bracketed by parentheses. There are no comments in this language, and whitespace can be used liberally between tokens.

    Write the micro and macrosyntax of this language, using the definition of Carlos as a guide.

  7. Give an abstract syntax tree for the following Java code fragment:
        if (x > 2 || !String.matches(f(x))) {
            write(- 3*q);
        } else if (! here || there) {
            do {
                while (close) tryHarder();
                x = x >>> 3 & 2 * x;
            } while (false);
            q[4].g(6) = person.list[2];
        } else {
            throw up;
        }
    
  8. Write an abstract syntax tree generator for the Iki programming language. You can base it on the example from the course notes.