CMSI 488/588
Homework #1
Partial Answers
  1. Some regexes:
    1. U.S. Zip Codes
            \d{5}(-\d{4})?
      
    2. Legal Visa Card Numbers
            4\d{12}(\d{3})?
      
    3. Legal Master Card Numbers
            5[1-5]\d{14}
      
    4. Floating-Point constants in Ada. This is easiest to write if you assume the freespacing (x) and case-insensitive (i) modes:
            \d (_?\d)*
            (
              \. \d (_?\d)*
            |
              \#
                [\da-f](_?[\da-f])* (\. [\da-f](_?[\da-f])*)?
              \#
            )?
            (e [+-]? \d(_?\d)* )?
      
    5. Strings over {a,b,c} not containing the substring "aba" or "bb". Tracking the FSM directly, you get
          (c|a+c|a+bc|bc|ba+c|ba+bc)*(a+|a+b|b|ba+|ba+b)?
      
      which simplifies to this:
          ((b|b?a+b?)?c)*(b|b?a+b?)?
      
      or just use negative lookahead, which is trivial, but can is slower in theory.
         (a(?!ba)|b(?!b)|c)*
      
  2. A JSON version of the example code fragment:
    {
      program : [
        {"var" : "x"},
        {"var" : "y"},
        {
          while : [
            {"minus" : ["y", 5]},
            [
              {"var" : "y"},
              {"read" : "x"},
              {"read" : "y"},
              {
                "assign" : [
                  "x",
                  {
                    "times" : [
                      2,
                      {"plus" : [3, "y"]}
                    ]
                  }
                ]
              }
            ]
          ]
        },
        {"write" : 5}
      ]
    }
    
  3. For the grammar
        S -> A M
        M -> S?
        A -> 'a' E | 'b' A A
        E -> ('a' B | 'b' A)?
        B -> 'b' E | 'a' B B
    
    1. This is the language of strings over {a,b} with more a's than b's. The start symbol S generates a string of one or more A's. Each A expands to a string of a's and b's containing one more a than b; each B has one more b than a; each E has an even number of a's and b's.
    2.           S
              /   \
             A     M
            / \    |
           a   E   S
               |  / \
                 A   M
                /|\  |
               b A A
                /| |\
               a E a E
                 |   |
      
    3. This grammar is not LL(1) because when trying to expand an E when looking at an 'a', you can do either E=>aB or you could turn E into the empty string, because an E can be followed by an 'a' (because an E can end with an A, which can be followed by an A which can begin with an 'a').

      By the way, you can use JavaCC to inform you of the conflict! Try:

            PARSER_BEGIN(Test)
            public class Test {
                public static void main(String[] args) {
                    Test parser = new Test(System.in);
                    parser.S();
                }
            }
            PARSER_END(Test)
      
            TOKEN: {"a" | "b"}
            void S(): {} {A() M()}
            void M(): {} {(S())?}
            void A(): {} {"a" E() | "b" A() A()}
            void E(): {} {("a" B() | "b" A())?}
            void B(): {} {"b" E() | "a" B() B()}
      
    4. This grammar is ambiguous since we can demonstrate two parse trees for aaba:
                   S                                 S
                /     \                           /     \
              A         M                       A         M
             / \        |                      / \        |
            a   E       S                     a   E       S
               / \     / \                        |      / \
              a   B   A   M                             A   M
                 /|   |\  |                            / \  |
                b E   a E                             a   B S
                  |     |                                /| |\
                                                        b E A M
                                                          | |\ \
                                                            a E
                                                              |
      
  4. For the grammar
        EXP         -> ID ":=" EXP | TERM TERM_TAIL
        TERM_TAIL   -> ["+" TERM TERM_TAIL]
        TERM        -> FACTOR FACTOR_TAIL
        FACTOR_TAIL -> ["*" FACTOR FACTOR_TAIL]
        FACTOR      -> "(" EXP ")" | ID
    
    1. This grammar is not LL(1) because when trying to expand an EXP when looking at an ID, you can expand EXP in two ways (note that a TERM can start with an ID).
    2. Rewrite it so that it is LL(1):
          EXP  ->  ID (":=" EXP | FACTOR_TAIL TERM_TAIL)
               |   "(" EXP ")" FACTOR_TAIL TERM_TAIL
         (all other rules are unchanged)
      
  5. I can't read their minds, but I suspect they either (1) figured that negation was a kind of additive operator, and they wanted to keep it with the binary additive operators, or that (2) they didn't want programmers agonizing over potential programming problems that might come from writing x--y, since it would be treated as a comment if the dashes were run together, but with a space it would be x minus negative y. With the grammar the way it is, one can't even write x- -y. Anyway, the two approaches differ in structure, as seen in these two abstract syntax trees for -8*5:

          -            *
          |           / \
          *          -   5
         / \         |
        8   5        8
    

    They are similar in that the dynamic semantics, the value produced by the expression at runtime, will be the same regardless of structure.

  6.     LETTER      ->  [\p{L}]
        DIGIT       ->  [\p{Nd}]
        KEYWORD     ->  'fun' | 'if' | 'else'
        ID          ->  [\p{L}$] [\p{L}\p{Nd}_@$]* - KEYWORD
        NUMLIT      ->  DIGIT+ ('.' DIGIT+ ([Ee] [+-]? DIGIT+)?)?
        CHAR        ->  [^\p{Cc}"\\] | ESCAPE
        ESCAPE      ->  '\\' ([nt'"\\] | HEX {1,8} ';')
        HEX         ->  [0-9a-fA-F]
        STRINGLIT   ->  '"' CHAR* '"'
        SYMBOL      ->  [-+*/!,;()]
        SKIP        ->  [\x20\x09\x0A\x0D]
        TOKEN       ->  NUMLIT | STRINGLIT | ID | KEYWORD | SYMBOL
    
        PROGRAM     ->  FUNDECL+  EXP
        FUNDECL     ->  'fun'ID '(' PARAMS? ')' BODY
        PARAMS      ->  ID (',' ID)*
        BODY        ->  '{' (EXP ';')+ '}'
        EXP         ->  EXP1 ('if' EXP1 'else' EXP)*
        EXP1        ->  EXP2 (ADDOP EXP2)*
        EXP2        ->  EXP3 (MULOP EXP3)*
        EXP3        ->  '-'? EXP4
        EXP4        ->  EXP5 '!'?
        EXP5        ->  NUMLIT | STRINGLIT | ID | CALL
        CALL        ->  ID '(' (EXP (',' EXP)*)? ')'
        ADDOP       ->  '+' | '-'
        MULOP       ->  '*' | '/'
        
  7. This is one possible abstract syntax tree:
                                    if
                                    |
               +--------------------+---------+--------------------+
               |                              |                    |
              case                           case                throw
             /    \                        /      \                |
            /      call                  ||        \               up
           /       /    \              /   \        \
         ||      write   *            !    there      ;
        /   \           / \           |             /   \
       >       !       -   q        here     dowhile      =
      / \      |       |                   /     \      /   \
     x   2   call      3                  /    false  call   \
            /    \                       /            /  \    \
          .        call                ;             .    6    \
         / \       /  \              /   \          /  \        \
        /   \     f    x          while    =       []   g        []
       /     \                   /  \     / \     /  \          /   \
    String  matches         close  call  x   &   q    4        .     2
                                    |       / \              /   \
                                tryharder  >>>  *         person list
                                          / |  / \
                                         x  3 2   x