Prev | Next |
Regular Expressions
The language accepted by DFA, NFA or ε-NFA is called a regular language. A regular language can be described using regular expressions consisting of symbols from alphabet Σ and operators such as +, . and *.
+ is a alternation operator - meaning either this or that (least precedence).
. is a concatenation operator - meaning this followed by that (higher precedence).
* is a Kleene star operator - meaning this is repeated zero or more times (highest precendence)
The symbols ( and ) are also used in regular expressions. The concatenation operator is usually neglected while writing the expression.
Additional operators
? operator - meaning the preceding expression/symbol occurs once or does not occur.
+ (unary) operator - meaning the preceding expression/symbol occurs one or more times.
Definition: A regular expression is recursively defined as follows:
- φ is a regex denoting empty language.
- ε is a regex denoting the language that contains an empty string.
- a is a regex denoting language that contains the single string a.
- If R is a regex denoting the regular language LR and S is a regex denoting the regular language LS, then
- R + S is a regex corresponding to LR U LS
- R.S is a regex corresponding to LR . LS
- R* is a regex corresponding to LR*
The expressions obtained by applying any of the rules from 1 to 4 are regular expressions.
Examples of regular expressions
a? → Strings with 0 or 1 a. L = {ε, a}
a* → Strings with 0 or more a's. L = {ε, a, aa, aaa, ...}
a+ → Strings with 1 or more a's. L = {a, aa, aaa, ...}
a+b → Either a or b. L = {a, b}
(a+b)(a+b) → Combination of a or b of length 2. L = {aa, ab, ba, bb}
(a+b)* → Strings with any combination of a's and b's. L = {ε, a, b, aa, ab, ba, bb, aaa, ...}
(a+b)*abb → Strings ending with abb. L = {abb, aabb, babb, aaabb, ababb, baabb, bbabb, ...}
ab(a+b)* → Strings starting with ab. L = {ab, aba, abb, abaa, abab, abba, abbb, ...}
(a+b)*aa(a+b)* → Strings that contains aa. L = {aa, aaa, baa, aab, ...}
a*b*c* → 0 or more a's, followed by 0 or more b's, followed by 0 or more c's. L = {ε, a, b, c, aa, ab, ac, ba, bb, bc, ca, cb, cc, aaa, ...}
a+b+c+ → 1 or more a's, followed by 1 or more b's, followed by 1 or more c's. L = {abc, aabc, abbc, abcc, aabbc, aabcc, abbcc, ...}
aa*bb*cc* → Same as above
(a+b)*(a+bb) → Strings that end with a or bb. L = {a, bb, aa, abb, ba, bbb, ...}
(aa)*(bb)*b → Strings with even number of a's followed by odd number of b's. L = {b, aab, bbb, aabbb, ...}
(0+1)*000 → Binary strings ending with 3 0's. L = {000, 0000, 1000, 00000, 01000, 10000, 11000, ...}
(11)* → Strings with even number of 1's. L = {ε, 11, 1111, 111111, ...}
01* + 1 → {1} U Strings the start with 0 followed by zero or more 1's. L = {1, 0, 01, 011, ...}
(01)* + 1 → {1} U Strings with zero or more 01's. L = {1, ε, 01, 0101, 010101, ...}
0(1* + 1) → 0 followed by any number of 1's. L = {0, 01, 011, 0111, ...}
(1+ε)(00*1)*0* → Strings with no consecutive 1's. L = {ε, 1, 10, 101, 1001, 1010, 10101, 101001, 101010, ...}
Exercises
Try yourself first. Select text below the question to check the answer.1. a's and b's of length 2
aa + ab + ba + bb OR (a+b)(a+b)
2. a's and b's of length ≤ 2
ε + a + b + aa + ab + ba + bb OR (ε + a + b)(ε + a + b) OR (a+b)? (a+b)?
3. a's and b's of length ≤ 10
(ε + a + b)10
4. Even-lengthed strings of a's and b's
(aa + ab + ba + bb)* OR ((a+b)(a+b))*
5. Odd-lengthed strings of a's and b's
(a+b) ((a+b)(a+b))*
6. L(R) = { w : w ∈ {0,1}* with at least three consecutive 0's }
(0+1)* 000 (0+1)*
7. Strings of 0's and 1's with no two consecutive 0's
(1+ 0 1*)* OR (11* 0 1*)* OR (1 + 01)* (0 + ε)
8. Strings of a's and b's starting with a and ending with b.
a (a+b)* b
9. Strings of a's and b's whose second last symbol is a.
(a+b)* a (a+b)
10. Strings of a's and b's whose third last symbol is a and fourth last symbol is b.
(a+b)* b a (a+b) (a+b)
11. Strings of a's and b's whose first and last symbols are the same.
(a (a+b)* a) + (b (a+b)* a)
12. Strings of a's and b's whose first and last symbols are different.
(a (a+b)* b) + (b (a+b)* a)
13. Strings of a's and b's whose last and second last symbols are same.
(a+b)* (aa + bb)
14. Strings of a's and b's whose length is even or a multiple of 3 or both.
R1 + R2 where R1 = ((a+b)(a+b))* and R2 = ((a+b)(a+b)(a+b))*
15. Strings of a's and b's such that every block of 4 consecutive symbols has at least 2 a's.
(aaxx + axax + axxa + xaax + xaxa + xxaa)* where x = (a+b)
16. L = {anbm : n ≥ 0, m ≥ 0}
a* b*
17. L = {anbm : n > 0, m > 0}
aa* bb* OR a+b+
18. L = {anbm : n + m is even}
aa* bb* + a(aa)* b(bb)*
19. L = {a2nb2m : n ≥ 0, m ≥ 0}
(aa)* (bb)*
20. Strings of a's and b's containing not more than three a's.
b* (ε + a) b* (ε + a) b* (ε + a) b*
21. L = {anbm : n ≥ 3, m ≤ 3}
aaa a* (ε + b) (ε + b) (ε + b)
22. L = { w : |w| mod 3 = 0 and w ∈ {a,b}* }
( (a+b)(a+b)(a+b) )*
23. L = { w : na(w) mod 3 = 0 and w ∈ {a,b}* }
b* a b* a b* a b*
24. Strings of 0's and 1's that do not end with 01
(0+1)* (00 + 10 + 11)
25. L = { vuv : u, v ∈ {a,b}* and |v| = 2}
(aa + ab+ ba + bb) (a+b)* (aa + ab + ba + bb)
26. Strings of a's and b's that end with ab or ba.
(a+b)* (ab + ba)
27. L = {anbm : m,n ≥ 1 and mn ≥ 3}
This can be broken down into 3 problems:
- n = 1, m ≥ 3
- n ≥ 3, m = 1
- n ≥ 2, m ≥ 2
a bbb b* + aaa a* b + aa a* bb b*