Prev Next

Regular Expressions

The language accepted by DFA, NFA or ε-NFA is called a regular language. A regular language can be described using regular expressions consisting of symbols from alphabet Σ and operators such as +, . and *.

    + is a alternation operator - meaning either this or that (least precedence).

    . is a concatenation operator - meaning this followed by that (higher precedence).

    * is a Kleene star operator - meaning this is repeated zero or more times (highest precendence)

The symbols ( and ) are also used in regular expressions. The concatenation operator is usually neglected while writing the expression.

Additional operators

    ? operator - meaning the preceding expression/symbol occurs once or does not occur.

    + (unary) operator - meaning the preceding expression/symbol occurs one or more times.

Definition: A regular expression is recursively defined as follows:

  1. φ is a regex denoting empty language.
  2. ε is a regex denoting the language that contains an empty string.
  3. a is a regex denoting language that contains the single string a.
  4. If R is a regex denoting the regular language LR and S is a regex denoting the regular language LS, then
  5. - R + S is a regex corresponding to LR U LS
    - R.S is a regex corresponding to LR . LS
    - R* is a regex corresponding to LR*

The expressions obtained by applying any of the rules from 1 to 4 are regular expressions.

Examples of regular expressions

a? → Strings with 0 or 1 a. L = {ε, a}

a* → Strings with 0 or more a's. L = {ε, a, aa, aaa, ...}

a+ → Strings with 1 or more a's. L = {a, aa, aaa, ...}

a+b → Either a or b. L = {a, b}

(a+b)(a+b) → Combination of a or b of length 2. L = {aa, ab, ba, bb}

(a+b)* → Strings with any combination of a's and b's. L = {ε, a, b, aa, ab, ba, bb, aaa, ...}

(a+b)*abb → Strings ending with abb. L = {abb, aabb, babb, aaabb, ababb, baabb, bbabb, ...}

ab(a+b)* → Strings starting with ab. L = {ab, aba, abb, abaa, abab, abba, abbb, ...}

(a+b)*aa(a+b)* → Strings that contains aa. L = {aa, aaa, baa, aab, ...}

a*b*c* → 0 or more a's, followed by 0 or more b's, followed by 0 or more c's. L = {ε, a, b, c, aa, ab, ac, ba, bb, bc, ca, cb, cc, aaa, ...}

a+b+c+ → 1 or more a's, followed by 1 or more b's, followed by 1 or more c's. L = {abc, aabc, abbc, abcc, aabbc, aabcc, abbcc, ...}

aa*bb*cc* → Same as above

(a+b)*(a+bb) → Strings that end with a or bb. L = {a, bb, aa, abb, ba, bbb, ...}

(aa)*(bb)*b → Strings with even number of a's followed by odd number of b's. L = {b, aab, bbb, aabbb, ...}

(0+1)*000 → Binary strings ending with 3 0's. L = {000, 0000, 1000, 00000, 01000, 10000, 11000, ...}

(11)* → Strings with even number of 1's. L = {ε, 11, 1111, 111111, ...}

01* + 1 → {1} U Strings the start with 0 followed by zero or more 1's. L = {1, 0, 01, 011, ...}

(01)* + 1 → {1} U Strings with zero or more 01's. L = {1, ε, 01, 0101, 010101, ...}

0(1* + 1) → 0 followed by any number of 1's. L = {0, 01, 011, 0111, ...}

(1+ε)(00*1)*0* → Strings with no consecutive 1's. L = {ε, 1, 10, 101, 1001, 1010, 10101, 101001, 101010, ...}

Exercises

Try yourself first. Select text below the question to check the answer.

1. a's and b's of length 2

aa + ab + ba + bb OR (a+b)(a+b)

2. a's and b's of length ≤ 2

ε + a + b + aa + ab + ba + bb OR (ε + a + b)(ε + a + b) OR (a+b)? (a+b)?

3. a's and b's of length ≤ 10

(ε + a + b)10

4. Even-lengthed strings of a's and b's

(aa + ab + ba + bb)* OR ((a+b)(a+b))*

5. Odd-lengthed strings of a's and b's

(a+b) ((a+b)(a+b))*

6. L(R) = { w : w ∈ {0,1}* with at least three consecutive 0's }

(0+1)* 000 (0+1)*

7. Strings of 0's and 1's with no two consecutive 0's

(1+ 0 1*)* OR (11* 0 1*)* OR (1 + 01)* (0 + ε)

8. Strings of a's and b's starting with a and ending with b.

a (a+b)* b

9. Strings of a's and b's whose second last symbol is a.

(a+b)* a (a+b)

10. Strings of a's and b's whose third last symbol is a and fourth last symbol is b.

(a+b)* b a (a+b) (a+b)

11. Strings of a's and b's whose first and last symbols are the same.

(a (a+b)* a) + (b (a+b)* a)

12. Strings of a's and b's whose first and last symbols are different.

(a (a+b)* b) + (b (a+b)* a)

13. Strings of a's and b's whose last and second last symbols are same.

(a+b)* (aa + bb)

14. Strings of a's and b's whose length is even or a multiple of 3 or both.

R1 + R2 where R1 = ((a+b)(a+b))*   and   R2 = ((a+b)(a+b)(a+b))*

15. Strings of a's and b's such that every block of 4 consecutive symbols has at least 2 a's.

(aaxx + axax + axxa + xaax + xaxa + xxaa)* where x = (a+b)

16. L = {anbm : n ≥ 0, m ≥ 0}

a* b*

17. L = {anbm : n > 0, m > 0}

aa* bb* OR a+b+

18. L = {anbm : n + m is even}

aa* bb* + a(aa)* b(bb)*

19. L = {a2nb2m : n ≥ 0, m ≥ 0}

(aa)* (bb)*

20. Strings of a's and b's containing not more than three a's.

b* (ε + a) b* (ε + a) b* (ε + a) b*

21. L = {anbm : n ≥ 3, m ≤ 3}

aaa a* (ε + b) (ε + b) (ε + b)

22. L = { w : |w| mod 3 = 0 and w ∈ {a,b}* }

( (a+b)(a+b)(a+b) )*

23. L = { w : na(w) mod 3 = 0 and w ∈ {a,b}* }

b* a b* a b* a b*

24. Strings of 0's and 1's that do not end with 01

(0+1)* (00 + 10 + 11)

25. L = { vuv : u, v ∈ {a,b}* and |v| = 2}

(aa + ab+ ba + bb) (a+b)* (aa + ab + ba + bb)

26. Strings of a's and b's that end with ab or ba.

(a+b)* (ab + ba)

27. L = {anbm : m,n ≥ 1 and mn ≥ 3}

This can be broken down into 3 problems:

  1. n = 1, m ≥ 3
  2. n ≥ 3, m = 1
  3. n ≥ 2, m ≥ 2

a bbb b* + aaa a* b + aa a* bb b*