Every regular expression describes regular language, let R be an arbitrary regular expression over the alphabet Σ. We will prove that the language described by R is a regular language. The proof is by induction on the structure of R.
The first base case of induction: Assume that R = ε. The R describes the language of {ε}. In order to prove that this language is regular, it suffices, by the theorem which says,
Theorem 1: Let A be a language. Then A is regular if and only if there exists a nondeterministic finite automaton that accepts A.
thus, let construct the NFA M = (Q, Σ, δ, q, F) that accepts this language. This NFA is obtained by defining Q={q}, q is the start state, F = {q}, and δ(q,a) = ε, for all a ∈ Σε . The figure below gives the state diagram of M:
The second base case:Assume that R= ε. The R describes the language of {ε}. In order to prove that this language is regular, we know , by theorem 1, which state that if language is regular then it should be accepted by NFA.
So, let construct the NFA M = (Q, Σ, δ, q, F) that accepts this language. This NFA is obtained by defining Q={q}, q is the start state, F = θ, means final state not exist, and δ(q,a) = θ, for all a ∈ Σε . The figure below gives the state diagram of M:
The third base case: Let a ∈ Σ and assume that R = a. The R describes the language of {a}. In order to prove that this language is regular, we know , by theorem 1, which state that if language is regular then it should be accepted by NFA.
So, let construct the NFA M = (Q, Σ, δ, q1, F) that accepts this language. This NFA is obtained by defining Q={q1, q2}, q1 is the start state, F = {q2}, and
δ(q1,a) ={q2},
δ(q1,b) = θ for all b ∈ Σε \ {a}
δ(q1,b) = θ for all b ∈ Σε
The figure below gives the state diagram of M:
The first case of the induction step: Assume that R = R1 ∪ R2, where R1 and R2 are regular expressions. Let L1 and L2 be the languages described by R1 and R2, respectively, and assume that L1 and L2 are regular. Then R describes the language L1 ∪ L2, which, by,
Theorem 2: The set of regular languages is closed under the union operation, i.e., if A1 and A2 are regular languages over the same alphabet Σ, then A1 ∪ A2 is also a regular language.
The second case of the induction step: Assume that R = R1 ∪ R2, where R1 and R2 are regular expressions. Let L1 and L2 be the languages described by R1 and R2, respectively, and assume that L1 and L2 are regular. Then R
describes the language L1 ∪ L2, which, by Theorem 3, is regular.
Theorem 3: The set of regular languages is closed under the concatenation operation, i.e., if A1 and A2 are regular languages over the same alphabet Σ , then A1A2 is also a regular language.
The third case of the induction step: Assume that R = (R1)*, where R1 is a regular expression. Let L1 be the language described by R1 and assume that L1 is regular. Then R describes the language (L1)*, which, by Theorem 4, is regular.
Theorem 4: The set of regular languages is closed under the star (Kleene) operation, i.e., if A is a regular language, then A* is also a regular language.
This concludes the proof of the claim that every regular expression describes a regular language.