Language Theory Basics

Strings

A symbol is a primitive unit of data representation.

An alphabet is a finite set of symbols.

Example: Σ = {a, b, c}.

A string is a finite sequence of symbols over some alphabet.

Examples: w = abbcbabb, x = b, y = aaa.

The length of a string is the number of symbol occurrences in the string.

Example: if w = abbac, then |w| = 5.

The empty string, denoted ε, is the (unique) string of length 0.

The concatenation of two strings x and y is the string obtained by appending y at the end of x. Also, we define w0 = ε. w1 = w. w2 = ww. wk+1 = wwk.

Example: if x = abb and y = ca then xy = abbca.

Example: wε = εw = w.

Example: (abc)4 = abcabcabcabc.

The reversal wR of a string w is the string formed by reversing w; formally εR = ε and (σw)R = wRσ for some symbol σ

x is a substring of y iff ∃v ∃w. y = vxw.

Languages

A language is a set of strings over some alphabet.

The concatenation of two languages L1 and L2 is L1L2 = {xy | x ∈ L1 ∧ y ∈ L2}. Also, we define L0 = {ε}, L1 = L, L2 = LL, ... Lk+1 = LLk.

The Kleene Closure of L is L* = L0 ∪ L1 ∪ L2 ∪ ... = {w1w2...wk | wi ∈ L ∧ k ≥ 0}.

The Positive Closure of L is L+ = L1 ∪ L2 ∪ L3 ∪ ... = {w1w2...wk | wi ∈ L ∧ k > 0}.

If Σ is an alphabet then Σ is also a language over Σ. So is Σ*. Σ* is important: it is the set of all strings over Σ. So for any language L over Σ, L ⊆ Σ*. Also important is that &Sigma* is countable.

Examples:

Σ = {a,b}
Σ* = {ε,a,b,aa,ab,ba,bb,aaa,aab,aba,...}
L1 = {ba, b}
L2 = {abb, bb}
L1* = {ε,ba,b,baba,bab,bba,bb,...}
L1+ = {ba,b,baba,bab,bba,bb,...}
L1L2 = {baabb, babb, bbb}

Some Language Theorems

Let Σ be an alphabet and A, B, C, and D be languages over Σ (i.e. subsets of &Sigma*).

A∅ = ∅A = ∅
A{ε} = {ε}A = A
(AB)C = A(BC)
(A ⊆ B) ∧ (C ⊆ D) ⇒ AC ⊆ BD
A(B ∪ C) = AB ∪ AC
(B ∪ C)A = BA ∪ CA
A(B ∩ C) ⊆ AB ∩ AC
(B ∩ C)A ⊆ BA ∩ CA
A* = A+ ∪ {ε}
An ⊆ A*, n ≥ 0
An ⊆ A+, n ≥ 1
A ⊆ AB*
A ⊆ B*A
A ⊆ B ⇒ A* ⊆ B*
A ⊆ B ⇒ A+ ⊆ B+
AA* = A*A = A+
{ε} ∈ A iff A+ = A*
(A*)* = A*A* = A*
(A*)+ = (A+)* = A*
A*A+ = A+A* = A*
(A*B*)* = (A ∪ B)* = (A* ∪ B*)*
(C = AC ∪ B) ⇒ (C = A*B)

Types of Languages

Finite Languages

Regular Languages

Context Free Languages

Context Sensitive Languages

Recursive Languages

Recursively Enumerable Languages

Non-Recursively Enumerable Languages