Regular Language Archives -

Every regular expression describes regular language

ComputeNow — Sat, 27 Jul 2019 08:51:39 +0000

Every regular expression describes regular language, let R be an arbitrary regular expression over the alphabet Σ. We will prove that the language described by R is a regular language. The proof is by induction on the structure of R.

The first base case of induction: Assume that R = ε. The R describes the language of {ε}. In order to prove that this language is regular, it suffices, by the theorem which says,

Theorem 1: Let A be a language. Then A is regular if and only if there exists a nondeterministic finite automaton that accepts A.

thus, let construct the NFA M = (Q, Σ, δ, q, F) that accepts this language. This NFA is obtained by defining Q={q}, q is the start state, F = {q}, and δ(q,a) = ε, for all a ∈ Σ_ε. The figure below gives the state diagram of M:

The second base case:Assume that R= ε. The R describes the language of {ε}. In order to prove that this language is regular, we know , by theorem 1, which state that if language is regular then it should be accepted by NFA.

So, let construct the NFA M = (Q, Σ, δ, q, F) that accepts this language. This NFA is obtained by defining Q={q}, q is the start state, F = θ, means final state not exist, and δ(q,a) = θ, for all a ∈ Σ_ε. The figure below gives the state diagram of M:

The third base case: Let a ∈ Σ and assume that R = a. The R describes the language of {a}. In order to prove that this language is regular, we know , by theorem 1, which state that if language is regular then it should be accepted by NFA.

So, let construct the NFA M = (Q, Σ, δ, q₁, F) that accepts this language. This NFA is obtained by defining Q={q₁, q₂}, q₁ is the start state, F = {q₂}, and

δ(q₁,a) ={q₂},

δ(q₁,b) = θ for all b ∈ Σ_ε\ {a}

δ(q₁,b) = θ for all b ∈ Σ_ε

The figure below gives the state diagram of M:

The first case of the induction step: Assume that R = R1 ∪ R2, where R1 and R2 are regular expressions. Let L1 and L2 be the languages described by R1 and R2, respectively, and assume that L1 and L2 are regular. Then R describes the language L1 ∪ L2, which, by,

Theorem 2: The set of regular languages is closed under the union operation, i.e., if A1 and A2 are regular languages over the same alphabet Σ, then A1 ∪ A2 is also a regular language.

The second case of the induction step: Assume that R = R1 ∪ R2, where R1 and R2 are regular expressions. Let L1 and L2 be the languages described by R1 and R2, respectively, and assume that L1 and L2 are regular. Then R
describes the language L1 ∪ L2, which, by Theorem 3, is regular.

Theorem 3: The set of regular languages is closed under the concatenation operation, i.e., if A1 and A2 are regular languages over the same alphabet Σ , then A1A2 is also a regular language.

The third case of the induction step: Assume that R = (R1)*, where R1 is a regular expression. Let L1 be the language described by R1 and assume that L1 is regular. Then R describes the language (L1)*, which, by Theorem 4, is regular.

Theorem 4: The set of regular languages is closed under the star (Kleene) operation, i.e., if A is a regular language, then A* is also a regular language.

This concludes the proof of the claim that every regular expression describes a regular language.

Read: Regular Language in Automata Thoery

The post Every regular expression describes regular language appeared first on .

Regular Language in Automata Thoery

ComputeNow — Thu, 20 Sep 2018 17:22:27 +0000

Regular Languages or Formal Language : A language is regular if it can be expressed in terms of regular expression.

Closure Properties of Regular Languages

Union : If L1 and If L2 are two regular languages, their union L1 ∪ L2 will also be regular. For example, L1 = {aⁿ | n ≥ 0} and L2 = {bⁿ | n ≥ 0}
L3 = L1 ∪ L2 = {aⁿ ∪ bⁿ | n ≥ 0} is also regular.
Intersection : If L1 and If L2 are two regular languages, their intersection L1 ∩ L2 will also be regular. For example,
L1= {a^m bⁿ | n ≥ 0 and m ≥ 0} and L2= {a^m bⁿ ∪ bⁿ a^m | n ≥ 0 and m ≥ 0}
L3 = L1 ∩ L2 = {a^m bⁿ | n ≥ 0 and m ≥ 0} is also regular.
Concatenation : If L1 and If L2 are two regular languages, their concatenation L1.L2 will also be regular. For example,
L1 = {aⁿ | n ≥ 0} and L2 = {bⁿ | n ≥ 0}
L3 = L1.L2 = {a^m . bⁿ | m ≥ 0 and n ≥ 0} is also regular.
Kleene Closure : If L1 is a regular language, its Kleene closure L1* will also be regular. For example,
L1 = (a ∪ b)
L1* = (a ∪ b)*
Complement : If L(G) is regular language, its complement L’(G) will also be regular. Complement of a language can be found by subtracting strings which are in L(G) from all possible strings. For example,
L(G) = {aⁿ | n > 3}
L’(G) = {aⁿ | n <= 3}

Note : Two regular expressions are equivalent if languages generated by them are same. For example, (a+b*)* and (a+b)* generate same language. Every string which is generated by (a+b*)* is also generated by (a+b)* and vice versa.

How to solve problems on regular expression and regular languages?

Question 1 : Which one of the following languages over the alphabet {0,1} is described by the regular expression?
(0+1)*0(0+1)*0(0+1)*
(A) The set of all strings containing the substring 00.
(B) The set of all strings containing at most two 0’s.
(C) The set of all strings containing at least two 0’s.
(D) The set of all strings that begin and end with either 0 or 1.

Solution : Option A says that it must have substring 00. But 10101 is also a part of language but it does not contain 00 as substring. So it is not correct option.
Option B says that it can have maximum two 0’s but 00000 is also a part of language. So it is not correct option.
Option C says that it must contain atleast two 0. In regular expression, two 0 are present. So this is correct option.
Option D says that it contains all strings that begin and end with either 0 or 1. But it can generate strings which start with 0 and end with 1 or vice versa as well. So it is not correct.

Question 2 : Which of the following languages is generated by given grammar?
S -> aS | bS | ∊
(A) {aⁿ b^m | n,m ≥ 0}
(B) {w ∈ {a,b}* | w has equal number of a’s and b’s}
(C) {aⁿ | n ≥ 0} ∪ {bⁿ | n ≥ 0} ∪ {aⁿ bⁿ | n ≥ 0}
(D) {a,b}*

Solution : Option (A) says that it will have 0 or more a followed by 0 or more b. But S -> bS => baS => ba is also a part of language. So (A) is not correct.
Option (B) says that it will have equal no. of a’s and b’s. But But S -> bS => b is also a part of language. So (B) is not correct.
Option (C) says either it will have 0 or more a’s or 0 or more b’s or a’s followed by b’s. But as shown in option (A), ba is also part of language. So (C) is not correct.
Option (D) says it can have any number of a’s and any numbers of b’s in any order. So (D) is correct.

Question 3 : The regular expression 0*(10*)* denotes the same set as
(A) (1*0)*1*
(B) 0 + (0 + 10)*
(C) (0 + 1)* 10(0 + 1)*
(D) none of these

Solution : Two regular expressions are equivalent if languages generated by them are same.
Option (A) can generate 101 but 0*(10*)* cannot. So they are not equivalent.
Option (B) can generate 0100 but 0*(10*)* cannot. So they are not equivalent.
Option (C) will have 10 as substring but 0*(10*)* may or may not. So they are not equivalent.

The post Regular Language in Automata Thoery appeared first on .

What is Chomsky Hierarchy in Theory of Computation

ComputeNow — Wed, 19 Sep 2018 16:54:42 +0000

What is Chomsky Hierarchy?

Noam Chomsky categorised regular and other languages which called as Chomsky Hierarchy.

Language Class	Grammar	Automaton
3	Regular	NFA or DFA
2	Context-Free	Push-Down Automaton
1	Context-Sensitive	Linear-Bounded Automaton
0	Unrestricted (or Free)	Turing Machine

This is a hierarchy, so every language of type 3 is also of types 2, 1 and 0; every language of type 2 is also of types 1 and 0 etc.

The distinction between languages can be seen by examining the structure of the production rules of their corresponding grammar, or the nature of the automata which can be used to identify them.

Type 3 – Regular Languages

A regular language is one which can be represented by a regular grammar, described using a regular expression, or accepted using an NFA or a DFA.

Type 2 – Context-Free Languages

A Context-Free Grammar (CFG) is one whose production rules are of the form: A -> α , where A is any single non-terminal, and α is any combination of terminals and non-terminals.

A NFA/DFA cannot recognise strings from this type of language since we must be able to “remember” information somehow. Instead we use a Push-Down Automaton which is like a DFA except that we are also allowed to use a stack.

Type 1 – Context-Sensitive Languages

Context-Sensitive grammars may have more than one symbol on the left-hand-side of their production rules (provided that at least one of them is a non-terminal). However, the production rules must now obey the following:

CS1: The number of symbols on the left-hand-side must not exceed the number of symbols on the right-hand-side
CS2: We do not allow rules of the form A → ε unless A is the start symbol and does not occur on the right-hand-side of any rule.

Since we allow more than one symbol on the left-hand-side, we refer to those symbols other than the one we are replacing as the context of the replacement.

The automaton which recognises a context-sensitive language is called a linear-bounded automaton: this is basically a NFA/DFA which can store symbols in a list.

Conditions CS1 and CS2 above mean that the sentential form in any derivation must always increase in length every time a production rule is applied. This basically means that the size of a sentential form is bounded by the length of the sentence (ie. word) we are deriving.

Since the sentinel form cannot thus grow infinitely large before deriving a sentence, a linear-bounded automaton always uses a finitely-long list as its store.

Type 0 – Unrestricted (Free) Languages

Free grammars have absolutely no restrictions on their grammar rules, (except, of course, that there must be at least one non-terminal on the left-hand-side).

The type of automata which can recognise such a language is basically a NFA/DFA with an infinitely-long list at its disposal to use as a store; this is called a Turing machine.

The post What is Chomsky Hierarchy in Theory of Computation appeared first on .

Regular Expression Using Perl

ComputeNow — Mon, 03 Sep 2018 17:07:52 +0000

Regular Expression Using Perl : Perl is the language that is the most famous for its use of regular expression for good reasons.

We use the =~ operator to denote a match or an assignment depending upon the context. The use of !~ is to reverse the sense of the match.

There are basically two regex operators in perl:

Matching: m//
Substitution: s///

The purpose of the // is to enclose the regex. However, any other delimiters like {}, "", etc could be used.

Matching

To use the matching operator, we simply check both sides using the =~ and m// operator.

The following sets $true to 1 if and only if $foo matches the regular expression foo:

$true = ($foo =~ m/foo/);

It is not difficult to see that just the opposite is achieved with !~:

$false = ($foo !~ m/foo/);

Capturing

As promised, the () could be used for capturing parts of the regexes. When the pattern inside a parentheses match, they go into special variables like $1, $2, etc in that order.

Example:

Here’s how one would extract the hours, minutes, seconds from a time string:

if ($time =~ /(\d\d):(\d\d):(\d\d)/) { # match hh:mm:ss format
 $hours = $1;
 $minutes = $2;
 $seconds = $3;
}

In list context, the list ($1, $2, $3, .. ) would be returned.

my ($hours, $minutes, $seconds) = ($time =~ m/(\d+):(\d+):(\d+)/);

Substitution

This is our favorite search and replace feature. Almost the same syntax rules apply here except that there is an extra clause between the second // that tells us what to match with.

$x = "Time to feed the cat!";
$x =~ s/cat/hacker/; # $x contains "Time to feed the hacker!"
if ($x =~ s/^(Time.*hacker)!$/$1 now!/) {
 $more_insistent = 1;
}
$y = "'quoted words'";
$y =~ s/^'(.*)'$/$1/; # strip single quotes,
# $y contains "quoted words"

Modifiers

Modifiers could be appended to the end of the regex operation expression to modify their matching behavior.

Here is a list of some important modifiers:

Modifier	Description
`i`	Case insensisitive matching
`s`	Allows the use of `.` to match newlines
`x`	Allows use of whitespace in the regex for clarity
`g`	Globally find all matches

Here’s how one might want to use the g modifier:

$x = "I batted 4 for 4";
$x =~ s/4/four/; # doesn't do it all:
# $x contains "I batted four for 4"

$x = "I batted 4 for 4";
$x =~ s/4/four/g; # does it all:
# $x contains "I batted four for four"

Read About: Regular Expressions – (Regex) – Regular Expression

The post Regular Expression Using Perl appeared first on .

Context Free Grammars

ComputeNow — Thu, 23 Aug 2018 18:21:25 +0000

Context free grammars (CFGs) are used to describe context-free languages. A context-free grammar is a set of recursive rules used to generate patterns of strings. A context-free grammar can describe all regular languages and more, but they cannot describe all possible languages.

Context-free grammars are studied in fields of theoretical computer science, compiler design, and linguistics. CFG’s are used to describe programming languages and parser programs in compilers can be generated automatically from context-free grammars.

Two parse trees that describe CFGs that generate the string “x + y * z”. Source: Context-free grammar wikipedia page.

Context Free Grammars:

Context-free grammars can generate context-free languages. They do this by taking a set of variables which are defined recursively, in terms of one another, by a set of production rules. Context-free grammars are named as such because any of the production rules in the grammar can be applied regardless of context—it does not depend on any other symbols that may or may not be around a given symbol that is having a rule applied to it.

Context-free grammars have the following components:

A set of terminal symbols which are the characters that appear in the language/strings generated by the grammar. Terminal symbols never appear on the left-hand side of the production rule and are always on the right-hand side.

A set of nonterminal symbols (or variables) which are placeholders for patterns of terminal symbols that can be generated by the nonterminal symbols. These are the symbols that will always appear on the left-hand side of the production rules, though they can be included on the right-hand side. The strings that a CFG produces will contain only symbols from the set of nonterminal symbols.

A set of production rules which are the rules for replacing nonterminal symbols. Production rules have the following form: variable  string of variables and terminals.

A start symbol which is a special nonterminal symbol that appears in the initial string generated by the grammar.

For comparison, a context-sensitive grammar can have production rules where both the left-hand and right-hand sides may be surrounded by a context of terminal and nonterminal symbols.

To create a string from a context-free grammar, follow these steps:

- Begin the string with a start symbol.

- Apply one of the production rules to the start symbol on the left-hand side by replacing the start symbol with the right-hand side of the production.

- Repeat the process of selecting nonterminal symbols in the string, and replacing them with the right-hand side of some corresponding production, until all nonterminals have been replaced by terminal symbols. Note, it could be that not all production rules are used.

Formal Definition

A context-free grammar can be described by a four-element tuple (V, Σ, R, S) , where

V is a finite set of variables (which are non-terminal)
Σ is a finite set (disjoint from V) of terminal symbols
R is a set of production rules where each production rule maps a variable to a string s ∈ (V ∪ Σ) *
S (which is in V ) which is a start symbol.

Example:
Come up with a grammar that will generate the context-free (and also regular) language that contains all strings with matched parentheses.

There are many grammars that can do this task. This solution is one way to do it, but should give you a good idea of if your (possibly different) solution works too.

Starting symbol -> S
Non-terminal variables = {(,)}
Production rules:

- S -> ( )

- S -> SS

- S -> (S).

A way to condense production rules is as follows:

We can take

S->()
S->SS
S->(S)

and translate them into a single line: S -> ( ) | SS | (S) | ε where ε is an empty string.

Context-free grammars can be modeled as parse trees. The nodes of the tree represent the symbols and the edges represent the use of production rules. The leaves of the tree are the end result (terminal symbols) that make up the string the grammar is generating with that particular sequence of symbols and production rules.

The parse trees below represent two ways to generate the string “a + a – a” with the grammar

Example of an ambiguous grammar—one that can have multiple ways of generating the same string

Because this grammar can be implemented with multiple parse trees to get the same resulting string, this is said to be ambiguous.

Relationship with other Computation Models

A context-free grammar can be generated by pushdown automata just as regular languages can be generated by finite state machines. Since all regular languages can be generated by CFGs, all regular languages can too be generated by pushdown automata.

Any language that can be generated using regular expressions can be generated by a context-free grammar.

The way to do this is to take the regular language, determine its finite state machine and write production rules that follow the transition functions.

The post Context Free Grammars appeared first on .