A symbol is a primitive unit of data representation.
An alphabet is a finite set of symbols.
Example: Σ = {a, b, c}.
A string is a finite sequence of symbols over some alphabet.
Examples: w = abbcbabb, x = b, y = aaa.
The length of a string is the number of symbol occurrences in the string.
Example: if w = abbac, then |w| = 5.
The empty string, denoted ε, is the (unique) string of length 0.
The concatenation of two strings x and y is the string obtained by appending y at the end of x. Also, we define w0 = ε. w1 = w. w2 = ww. wk+1 = wwk.
Example: if x = abb and y = ca then xy = abbca.
Example: wε = εw = w.
Example: (abc)4 = abcabcabcabc.
The reversal wR of a string w is the string formed by reversing w; formally εR = ε and (σw)R = wRσ for some symbol σ
x is a substring of y iff ∃v ∃w. y = vxw.
A language is a set of strings over some alphabet.
The concatenation of two languages L1 and L2 is L1L2 = {xy | x ∈ L1 ∧ y ∈ L2}. Also, we define L0 = {ε}, L1 = L, L2 = LL, ... Lk+1 = LLk.
The Kleene Closure of L is L* = L0 ∪ L1 ∪ L2 ∪ ... = {w1w2...wk | wi ∈ L ∧ k ≥ 0}.
The Positive Closure of L is L+ = L1 ∪ L2 ∪ L3 ∪ ... = {w1w2...wk | wi ∈ L ∧ k > 0}.
If Σ is an alphabet then Σ is also a language over Σ. So is Σ*. Σ* is important: it is the set of all strings over Σ. So for any language L over Σ, L ⊆ Σ*. Also important is that &Sigma* is countable.
Examples:
| Σ = {a,b} |
| Σ* = {ε,a,b,aa,ab,ba,bb,aaa,aab,aba,...} |
| L1 = {ba, b} |
| L2 = {abb, bb} |
| L1* = {ε,ba,b,baba,bab,bba,bb,...} |
| L1+ = {ba,b,baba,bab,bba,bb,...} |
| L1L2 = {baabb, babb, bbb} |
Let Σ be an alphabet and A, B, C, and D be languages over Σ (i.e. subsets of &Sigma*).
| A∅ = ∅A = ∅ |
| A{ε} = {ε}A = A |
| (AB)C = A(BC) |
| (A ⊆ B) ∧ (C ⊆ D) ⇒ AC ⊆ BD |
| A(B ∪ C) = AB ∪ AC |
| (B ∪ C)A = BA ∪ CA |
| A(B ∩ C) ⊆ AB ∩ AC |
| (B ∩ C)A ⊆ BA ∩ CA |
| A* = A+ ∪ {ε} |
| An ⊆ A*, n ≥ 0 |
| An ⊆ A+, n ≥ 1 |
| A ⊆ AB* |
| A ⊆ B*A |
| A ⊆ B ⇒ A* ⊆ B* |
| A ⊆ B ⇒ A+ ⊆ B+ |
| AA* = A*A = A+ |
| {ε} ∈ A iff A+ = A* |
| (A*)* = A*A* = A* |
| (A*)+ = (A+)* = A* |
| A*A+ = A+A* = A* |
| (A*B*)* = (A ∪ B)* = (A* ∪ B*)* |
| (C = AC ∪ B) ⇒ (C = A*B) |