Nearly every student of programming languages encounters the so-called dangling else problem. That is, the code
if e1 then if e2 then s1 else s2
can seem to mean one of two possible things:

Similar ambiguities occur with statements like
if e1 then while e2 do if e3 then s1 else s2
This problem arises in languages with (ambiguous) grammars like Pascal's:
STMT -> ASSIGNMENT | IFSTMT | WHILESTMT | BLOCK | ...
IFSTMT -> 'if' EXP 'then' STMT ('else' STMT)?
WHILESTMT -> 'while' EXP 'do' STMT
BLOCK -> 'begin' STMT* 'end'
or C's:
STMT -> ASSIGNMENT | IFSTMT | WHILESTMT | BLOCK | ...
IFSTMT -> 'if' '(' EXP ')' STMT ('else' STMT)?
WHILESTMT -> 'while' '(' EXP ')' STMT
BLOCK -> '{' STMT* '}'
There are many ways to deal with this problem, but most lead to rather unsatisfactory ways to handle the true multi-way conditional, We want a design that has no dangling else problems and allows — no, requires — a nice multiway conditional statement.
Here are seven existing approaches:
In Pascal and C, the syntax is ambiguous, but a semantic rule says that dangling elses match the closest if, regardless of indentation! (This can be very confusing to beginners.) So,
(* Without blocks, 'else' tied to second 'if' *) if cond1 then if cond2 then stmt1 else (* Goes w/ cond2 *) stmt2 |
(* MUST use block to tie 'else' to first 'if' *) if cond1 then begin if cond2 then stmt1 end else (* Goes w/ cond1 *) stmt2 |
Purists will probably hate how this solution relies on a semantic rule to handle something so inherently structural! Anyway, let's see how multiway conditionals look:
if cond1 then
stmt1
else if cond2 then
stmt2
else if cond3 then
stmt3
else if cond4 then
stmt4
else
stmt5
|
if cond1 then begin
stmts1
end else if cond2 then begin
stmts2
end else if cond3 then begin
stmts3
end else if cond4 then begin
stmts4
end else begin
stmts5
end
|
Kind of ugly. But it can be a lot worse: programmers often mix up their conditional arms by sometimes using blocks and sometimes not, and get the ugly code they deserve. The worst part of the dangling else syntax is that there seem to be hundreds of different ways to format multiway if statements. Take a look at some old Pascal textbooks if you want to be grossed out.
With a lot of effort ... TODO
This grammar is no longer ambiguous, and every 'else' is automatically connected to the nearest 'if'. But it does nothing to prevent the confusion arising from misindented code. Java uses this approach.
If there always has to be an else part, there is never any ambiguity, but you'll have funny looking code whether your empty statement is blank or a special null keyword.
if cond1 then
if cond2 then
stmt1
else
null
else
stmt2
|
if cond1 then
if cond2 then
stmt1
else
stmt2
else
null
|
Not pretty. It works, though, even if the while statement (and other compound statements) aren't fully bracketed, for example:
if e1 then while e2 do if e3 then s1 else null else s2 if e1 then while e2 do if e3 then s1 else s2 else null
Ultimately, languages that have compound statements where the bodies are "trailing statements" leave too many formatting choices and look ugly and unbalanced to many people. Defining a language's syntax to require bracketed compound statements is a Good Thing for that reason, and as a bonus it removes the dangling else problem completely. The idea is that a block should not be a kind of statement, and that all compound statements use blocks for bodies, never simple statements! This grammar fragment:
STMT -> ASSIGNMENT | IFSTMT | WHILESTMT | ...
IFSTMT -> 'if' '(' EXP ')' BLOCK ('else' BLOCK)?
WHILESTMT -> 'while' '(' EXP ')' BLOCK
BLOCK -> '{' STMT* '}'
yields code like
if (cond1) {
if (cond2) {
stmts1
} else {
stmts2
}
}
|
if (cond1) {
if (cond2) {
stmts1
}
} else {
stmts2
}
|
."Bracketing" can also be done with just a terminating 'end' instead of curly braces or begin-end pairs:
STMT -> ASSIGNMENT | IFSTMT | WHILESTMT | ...
IFSTMT -> 'if' EXP 'then' STMT+ ('else' STMT+)? end
WHILESTMT -> 'while' EXP 'do' STMT+ 'end'
yielding rather clean code like this:
if cond1 then
if cond2 then
stmts1
else
stmts2
end
end
|
if cond1 then
if cond2 then
stmts1
end
else
stmts2
end
|
But, wait, if we really require bracketing, won't that make multiway conditionals ugly? Either you start indenting too much or you get a bunch of "}"s (or ENDs) at the end. Like this, right?
if cond1 {
stmts1
} else {
if cond2 {
stmts2
} else {
if cond3 {
stmts3
} else {
if cond 4 {
stmts4
} else {
stmts5
}
}
}
}
|
if cond1 {
stmts1
} else { if cond2 {
stmts2
} else { if cond3 {
stmts3
} else { if cond4 {
stmts4
} else {
stmts5
}}}}
|
In Python, there is no dangling else problem since the indentation makes things clear:
if cond1:
if cond2:
stmts1
else
stmts2
|
if cond1:
if cond2:
stmts1
else
stmts2
|
But, in general, required indentation might lead to funny looking code in multiway conditionals:
if cond1:
stmts1
else:
if cond2:
stmts2
else
if cond3:
stmts3
else
if cond4:
stmts4
else
stmts5
While the code above is legal Python, real Python programmers use the next solution....
In most languages where bracketing (or indentation) is required, the multiway conditional is described syntactically as a single if-statement, usually with the help of a special keyword (called elsif in Ada, Ruby, and Perl; elif in bash and Python; and elseif in PHP. Curly-brace style:
IFSTMT -> 'if' '(' EXP ')' BLOCK
('elsif' '(' EXP ')' BLOCK)*
('else' BLOCK)?
BLOCK -> '{' STMT* '}'
and terminating-end style:
IFSTMT -> 'if' EXP 'then' STMT+
('elsif' EXP 'then' STMT+)*
('else' STMT+)?
'end'
Code looks like this:
# Ruby
if cond1
stmts1
elsif cond2
stmts2
elsif cond3
stmts3
elsif cond4
stmts4
else
stmts5
end
|
# Python
if cond1:
stmts1
elif cond2:
stmts2
elif cond3:
stmts3
elif cond4:
stmts4
else
stmts5
|
// PHP
if (cond1) {
stmts1
} elseif (cond2) {
stmts2
} elseif (cond3) {
stmts3
} elseif (cond4) {
stmts4
} else {
stmts5
}
|
Because Lisp is basically written in abstract syntax trees, you won't ever have a dangling else, and the multiway conditional is already handled by COND:
(COND
(condition1 block1)
(condition2 block2)
(condition3 block3)
(condition4 block4)
(T block5))
|
There is a way to require bracketing without resorting to special words like elsif or elif and without ending up with a whole mess of terminators at the end. The solution is amazingly simple:
IFSTMT -> 'if' '(' EXP ')' BLOCK
('else' 'if' '(' EXP ')' BLOCK)*
('else' BLOCK)?
BLOCK -> '{' STMT* '}'
Why is this not popular? The only thing I can think of is that top-down parsers will need a two-token lookahead when encountering an 'else'. Why should this be a big deal? Just peek ahead to see if the next token is an 'if' (or a '(').
How can we do this for a terminating-end syntax? This doesn't work very well::
IFSTMT ->'if' EXP 'then' STMT+('else' 'if' EXP 'then' STMT+)*('else' STMT+)?'end'
because the amount of required lookahead is infinite. What about rejecting 'if' statements in the final 'else' part?
STMT -> IFSTMT | NONIFSTMT
IFSTMT -> 'if' EXP 'then' STMT+
('else' 'if' EXP 'then' STMT+)*
('else' NONIFSTMT STMT*)?
'end'
As long as NONIFSTMT cannot be empty and cannot start with 'if', we're parsable topdown with a lookahead of 2.
If the objection to 'elsif' and related words is just that they are made up, we could give the if-statement a make over, using more reasonable words, or symbols, even. Let's see:
when cond1 do
stmts1
or when cond2 do
stmts2
or when cond3 do
stmts3
or when cond4 do
stmts
or else do
stmts
end
|
try cond1 do
stmts1
or cond2 do
stmts2
or cond3 do
stmts3
or cond4 do
stmts
otherwise
stmts
end
|
try [ cond1 ] =>
stmts1
[ cond2 ] =>
stmts2
[ cond3 ] =>
stmts3
[ cond4 ] =>
stmts
[] =>
stmts
end
|