Automata Theory Archives -

Every regular expression describes regular language

ComputeNow — Sat, 27 Jul 2019 08:51:39 +0000

Every regular expression describes regular language, let R be an arbitrary regular expression over the alphabet Σ. We will prove that the language described by R is a regular language. The proof is by induction on the structure of R.

The first base case of induction: Assume that R = ε. The R describes the language of {ε}. In order to prove that this language is regular, it suffices, by the theorem which says,

Theorem 1: Let A be a language. Then A is regular if and only if there exists a nondeterministic finite automaton that accepts A.

thus, let construct the NFA M = (Q, Σ, δ, q, F) that accepts this language. This NFA is obtained by defining Q={q}, q is the start state, F = {q}, and δ(q,a) = ε, for all a ∈ Σ_ε. The figure below gives the state diagram of M:

The second base case:Assume that R= ε. The R describes the language of {ε}. In order to prove that this language is regular, we know , by theorem 1, which state that if language is regular then it should be accepted by NFA.

So, let construct the NFA M = (Q, Σ, δ, q, F) that accepts this language. This NFA is obtained by defining Q={q}, q is the start state, F = θ, means final state not exist, and δ(q,a) = θ, for all a ∈ Σ_ε. The figure below gives the state diagram of M:

The third base case: Let a ∈ Σ and assume that R = a. The R describes the language of {a}. In order to prove that this language is regular, we know , by theorem 1, which state that if language is regular then it should be accepted by NFA.

So, let construct the NFA M = (Q, Σ, δ, q₁, F) that accepts this language. This NFA is obtained by defining Q={q₁, q₂}, q₁ is the start state, F = {q₂}, and

δ(q₁,a) ={q₂},

δ(q₁,b) = θ for all b ∈ Σ_ε\ {a}

δ(q₁,b) = θ for all b ∈ Σ_ε

The figure below gives the state diagram of M:

The first case of the induction step: Assume that R = R1 ∪ R2, where R1 and R2 are regular expressions. Let L1 and L2 be the languages described by R1 and R2, respectively, and assume that L1 and L2 are regular. Then R describes the language L1 ∪ L2, which, by,

Theorem 2: The set of regular languages is closed under the union operation, i.e., if A1 and A2 are regular languages over the same alphabet Σ, then A1 ∪ A2 is also a regular language.

The second case of the induction step: Assume that R = R1 ∪ R2, where R1 and R2 are regular expressions. Let L1 and L2 be the languages described by R1 and R2, respectively, and assume that L1 and L2 are regular. Then R
describes the language L1 ∪ L2, which, by Theorem 3, is regular.

Theorem 3: The set of regular languages is closed under the concatenation operation, i.e., if A1 and A2 are regular languages over the same alphabet Σ , then A1A2 is also a regular language.

The third case of the induction step: Assume that R = (R1)*, where R1 is a regular expression. Let L1 be the language described by R1 and assume that L1 is regular. Then R describes the language (L1)*, which, by Theorem 4, is regular.

Theorem 4: The set of regular languages is closed under the star (Kleene) operation, i.e., if A is a regular language, then A* is also a regular language.

This concludes the proof of the claim that every regular expression describes a regular language.

Read: Regular Language in Automata Thoery

The post Every regular expression describes regular language appeared first on .

Turing Machine Definition

ComputeNow — Sat, 22 Dec 2018 19:19:28 +0000

Definition of a Turing Machine

We start with an informal description of a Turing Machine. Such a machine consists of the following:

There are k tapes , for some fixed k ≥ 1. Each tape is divided into cells, and is infinite both to the left and to the right. Each cell stores a symbol belonging to a finite set Γ , which is called the tape alphabet. The tape alphabet contains the blank symbol Δ. If a cell contains Δ , then this means that the cell is actually empty.
A Turing machine with k = 2 tapes
Each tape has a tape head which can move along the tape, one cell per move. It can also read the cell it currently scans and replace the symbol in this cell by another symbol.
There is the state control, which can be any in any one of a finite number of states. The finite set of states is denoted by Q. The set Q contains three special states: a start state, an accept state, and a reject state.

The Turing machine performs a sequence of computation steps. In one such steps, it does the following:

Immediately before the computation step, the Turing machine is in a state r of Q, and each of the k tape heads is on a certain cell.
Depending on the current state r and the k symbols that are read by the tape heads,
1. the Turing machine switches to a state r’ of Q (which may be equal to r)
2. each tape head writes a symbol of Γ in the cell it is currently scanning (this symbol may be equal to the symbol currently stored in the cell), and
3. each tape head either moves one cell to the left, moves one cell to the right, or stays at the current cell.

We now give a format definition of a deterministic Turing machine.

Definition: A deterministic Turing machine is a 7-tuple

M = (Σ, Γ, Q, δ, q, q_accept, q_reject),

where

Σ is a finite set, called the input alphabet; the blank symbol Δ is not contained in Σ,
Γ is a finite set, called the tape alphabet; this alphabet contains the blank symbol Δ, and Σ ⊆ Γ,
Q is a finite set, whose elements are called states,
q is an element of Q; it is called the state state,
q_accept is an element of Q; it is called the accept state,
q_reject is an element of Q; it is called the reject state,
δ is called the transition function, which is a function

δ: Q x Γ^k x {L, R, N}^k.

The transition function δ is basically the “program” of the Turing machine. This function tells us what the machine can do in “one computation step”: Let r ∈ Q, and let a₁,a₂,…..,a_k ∈ Γ. Furthermore, let r’ ∈ Q, a’₁,a’₂,a’₃,….,a’_k ∈ Γ, and σ₁, σ₂,σ₃,….,σ_k ∈ {L,R,N} be such that

δ(r,a₁,a₂,…..,a_k) = (r’, a’₁,a’₂,a’₃,….,a’_k ,σ₁, σ₂,σ₃,….,σ_k ).

This transition means that if

the Turing machine is in state r, and
the head of the i-th tape reads the symbol a_i, 1 ≤ i ≤ k,

then

the Turing machine switches to state r’,
the head of the i-th tape replaces the scanned symbol a_i by the symbol a’_i, 1 ≤ i ≤ k, and
the head of the i-th tape moves according to σ_i, 1 ≤ i ≤ k: if σ_i = L, then the tape head moves one cell to the left; if σ_i = N, then the tape head does not move.

We will write the computation step in the form of the instruction

ra₁a₂…..a_k→ r’a’₁a’₂a’₃….a’_kσ₁σ₂σ₃….σ_k

We now specify the computation of the Turing Machine

M = (Σ, Γ, Q, δ, q, q_accept, q_reject).

Like us: Theory of Computation

The post Turing Machine Definition appeared first on .

Context Sensitive Grammar and Linear Bounded Automata

ComputeNow — Fri, 21 Sep 2018 18:15:41 +0000

A context sensitive grammar (CSG) is a grammar where all productions are of the form αAβ → αγβ where γ ≠ ε.

During derivation non-terminal A will be changed to γ only when it is present in the context of α and β.

*Note the constraint that the replacement string γ ≠ ε ; as a consequence we have α ⇒ β implies |α| ≤ |β|

CSG is a Noncontracting grammar.

Formal Definition of Context Sensitive Grammar

A context sensitive grammar G = ( N, Σ, P, S), where

N is a set of non-terminal symbols
Σ is a set of terminal symbols
S is the start symbol, and
P is a set of production rules, of the form αAβ → αγβ , where A in N, α, β ∈ (N ∪ Σ) and γ ∈ (N ∪ Σ)⁺

The production S → ε is also allowed if S is the start symbol and it does not appear on the right side of any production.

Linear Bounded Automata

Linear Bounded Automata (LBA) is a single tape Turing Machine with two special tape symbols call them left marker < and the right marker >.

The transitions should satisfy these conditions:

It should not replace the marker symbols by any other symbol.
It should not write on cells beyond the marker symbols.

Thus the initial configuration will be : < q0a1a2a3a4a5…..an >

Real Also Definition of Pushdown Automata

Formal Definition

Formally Linear Bounded Automata is a non-deterministic Turing Machine , M = ( Q, Σ, Γ, δ, ε, q0, <, >, t, r)

Q is set of all states
Σ is set of all terminals
Γ is set of all tape symbols, Σ ⊂ Γ
δ is set of transitions
ε is blank symbols or null
< is left marker and > is right marker
t is accept state
r is reject state

The post Context Sensitive Grammar and Linear Bounded Automata appeared first on .

Regular Language in Automata Thoery

ComputeNow — Thu, 20 Sep 2018 17:22:27 +0000

Regular Languages or Formal Language : A language is regular if it can be expressed in terms of regular expression.

Closure Properties of Regular Languages

Union : If L1 and If L2 are two regular languages, their union L1 ∪ L2 will also be regular. For example, L1 = {aⁿ | n ≥ 0} and L2 = {bⁿ | n ≥ 0}
L3 = L1 ∪ L2 = {aⁿ ∪ bⁿ | n ≥ 0} is also regular.
Intersection : If L1 and If L2 are two regular languages, their intersection L1 ∩ L2 will also be regular. For example,
L1= {a^m bⁿ | n ≥ 0 and m ≥ 0} and L2= {a^m bⁿ ∪ bⁿ a^m | n ≥ 0 and m ≥ 0}
L3 = L1 ∩ L2 = {a^m bⁿ | n ≥ 0 and m ≥ 0} is also regular.
Concatenation : If L1 and If L2 are two regular languages, their concatenation L1.L2 will also be regular. For example,
L1 = {aⁿ | n ≥ 0} and L2 = {bⁿ | n ≥ 0}
L3 = L1.L2 = {a^m . bⁿ | m ≥ 0 and n ≥ 0} is also regular.
Kleene Closure : If L1 is a regular language, its Kleene closure L1* will also be regular. For example,
L1 = (a ∪ b)
L1* = (a ∪ b)*
Complement : If L(G) is regular language, its complement L’(G) will also be regular. Complement of a language can be found by subtracting strings which are in L(G) from all possible strings. For example,
L(G) = {aⁿ | n > 3}
L’(G) = {aⁿ | n <= 3}

Note : Two regular expressions are equivalent if languages generated by them are same. For example, (a+b*)* and (a+b)* generate same language. Every string which is generated by (a+b*)* is also generated by (a+b)* and vice versa.

How to solve problems on regular expression and regular languages?

Question 1 : Which one of the following languages over the alphabet {0,1} is described by the regular expression?
(0+1)*0(0+1)*0(0+1)*
(A) The set of all strings containing the substring 00.
(B) The set of all strings containing at most two 0’s.
(C) The set of all strings containing at least two 0’s.
(D) The set of all strings that begin and end with either 0 or 1.

Solution : Option A says that it must have substring 00. But 10101 is also a part of language but it does not contain 00 as substring. So it is not correct option.
Option B says that it can have maximum two 0’s but 00000 is also a part of language. So it is not correct option.
Option C says that it must contain atleast two 0. In regular expression, two 0 are present. So this is correct option.
Option D says that it contains all strings that begin and end with either 0 or 1. But it can generate strings which start with 0 and end with 1 or vice versa as well. So it is not correct.

Question 2 : Which of the following languages is generated by given grammar?
S -> aS | bS | ∊
(A) {aⁿ b^m | n,m ≥ 0}
(B) {w ∈ {a,b}* | w has equal number of a’s and b’s}
(C) {aⁿ | n ≥ 0} ∪ {bⁿ | n ≥ 0} ∪ {aⁿ bⁿ | n ≥ 0}
(D) {a,b}*

Solution : Option (A) says that it will have 0 or more a followed by 0 or more b. But S -> bS => baS => ba is also a part of language. So (A) is not correct.
Option (B) says that it will have equal no. of a’s and b’s. But But S -> bS => b is also a part of language. So (B) is not correct.
Option (C) says either it will have 0 or more a’s or 0 or more b’s or a’s followed by b’s. But as shown in option (A), ba is also part of language. So (C) is not correct.
Option (D) says it can have any number of a’s and any numbers of b’s in any order. So (D) is correct.

Question 3 : The regular expression 0*(10*)* denotes the same set as
(A) (1*0)*1*
(B) 0 + (0 + 10)*
(C) (0 + 1)* 10(0 + 1)*
(D) none of these

Solution : Two regular expressions are equivalent if languages generated by them are same.
Option (A) can generate 101 but 0*(10*)* cannot. So they are not equivalent.
Option (B) can generate 0100 but 0*(10*)* cannot. So they are not equivalent.
Option (C) will have 10 as substring but 0*(10*)* may or may not. So they are not equivalent.

The post Regular Language in Automata Thoery appeared first on .

What is Chomsky Hierarchy in Theory of Computation

ComputeNow — Wed, 19 Sep 2018 16:54:42 +0000

What is Chomsky Hierarchy?

Noam Chomsky categorised regular and other languages which called as Chomsky Hierarchy.

Language Class	Grammar	Automaton
3	Regular	NFA or DFA
2	Context-Free	Push-Down Automaton
1	Context-Sensitive	Linear-Bounded Automaton
0	Unrestricted (or Free)	Turing Machine

This is a hierarchy, so every language of type 3 is also of types 2, 1 and 0; every language of type 2 is also of types 1 and 0 etc.

The distinction between languages can be seen by examining the structure of the production rules of their corresponding grammar, or the nature of the automata which can be used to identify them.

Type 3 – Regular Languages

A regular language is one which can be represented by a regular grammar, described using a regular expression, or accepted using an NFA or a DFA.

Type 2 – Context-Free Languages

A Context-Free Grammar (CFG) is one whose production rules are of the form: A -> α , where A is any single non-terminal, and α is any combination of terminals and non-terminals.

A NFA/DFA cannot recognise strings from this type of language since we must be able to “remember” information somehow. Instead we use a Push-Down Automaton which is like a DFA except that we are also allowed to use a stack.

Type 1 – Context-Sensitive Languages

Context-Sensitive grammars may have more than one symbol on the left-hand-side of their production rules (provided that at least one of them is a non-terminal). However, the production rules must now obey the following:

CS1: The number of symbols on the left-hand-side must not exceed the number of symbols on the right-hand-side
CS2: We do not allow rules of the form A → ε unless A is the start symbol and does not occur on the right-hand-side of any rule.

Since we allow more than one symbol on the left-hand-side, we refer to those symbols other than the one we are replacing as the context of the replacement.

The automaton which recognises a context-sensitive language is called a linear-bounded automaton: this is basically a NFA/DFA which can store symbols in a list.

Conditions CS1 and CS2 above mean that the sentential form in any derivation must always increase in length every time a production rule is applied. This basically means that the size of a sentential form is bounded by the length of the sentence (ie. word) we are deriving.

Since the sentinel form cannot thus grow infinitely large before deriving a sentence, a linear-bounded automaton always uses a finitely-long list as its store.

Type 0 – Unrestricted (Free) Languages

Free grammars have absolutely no restrictions on their grammar rules, (except, of course, that there must be at least one non-terminal on the left-hand-side).

The type of automata which can recognise such a language is basically a NFA/DFA with an infinitely-long list at its disposal to use as a store; this is called a Turing machine.

The post What is Chomsky Hierarchy in Theory of Computation appeared first on .

The Pumping Lemma for Context-Free Languages

ComputeNow — Mon, 10 Sep 2018 18:57:45 +0000

The Pumping Lemma for Context-Free Languages (CFL)

Proving that something is not a context-free language requires either finding a context-free grammar to describe the language or using another proof technique (though the pumping lemma is the most commonly used one). A common lemma to use to prove that a language is not context-free is the Pumping Lemma for Context-Free Languages.

Theorem
The pumping lemma for context-free languages states that if a language L is 
context-free, there exists some integer length p ≥ 1 such that every string s ε L 
has a length of a p or more symbols, |s| ≥ p, that can written s = uvwxy where 
u, v, w, x and y are substrings of s such that:

- |vwx| ≤ p
- |vx| ≥ 1
- uvⁿwxⁿy ∈ L ∀ n ≥ 0

All context-free languages are “pumpable” meaning that the pumping lemma constraints hold true for all context-free languages. If a language is not pumpable, then it is not a context-free language. However, if a language is pumpable, it is not necessarily a context-free language. Because the set of regular languages is contained in the set of context-free languages, all regular languages must be pumpable too.

Essentially, the pumping lemma holds that arbitrarily long strings can be pumped without ever producing a new string that is not in the language .

To prove that a language is not context-free, use proof by contradiction and the pumping lemma. Set up a proof that claims that is context-free, and show that a contradiction of the pumping lemma’s constraints occurs in at least one of the three constraints listed above.

Basically, the idea behind the pumping lemma for context-free languages is that there are certain constraints a language must adhere to in order to be a context-free language. You can use the pumping lemma to test if all of these constraints hold for a particular language, and if they do not, you can prove with contradiction that the language is not context-free.

Example

Use the Pumping Lemma to prove that L = { aⁿbⁿcⁿ|n>0 } is not a context-free language.

Assume, for the sake of contradiction, that L = {aⁿbⁿcⁿ |n > 0  } is a context-free
language. By the pumping lemma, there exists an integer pumping length p for L. 
We need a string s that is longer than or equal to the length of p. Certainly 
s = a^pb^pc^p is longer than p, so we choose this for the s string. This s is in L since 
it has p a's , p b's and p c's.

Now by the pumping lemma, |vwx| ≤ p. There are five possible places in the string that 
we can assign to be vwx:

- vwx = a^j for some j ≤ p. This means that vwx is contained purely in the a’s section.
- vwx = a^jb^k for some j and k where j+k ≤ p. This means that the vwx segment is contained somewhere in the a’s and b’s section.
- vwx = b^j for some j ≤ p. This means that vwx is contained purely in the b’s section.
- vwx = b^jc^k for some j and k where j+k ≤ p. This means that the vwx segment is contained somewhere in the b’s and c’s section.
- vwx = c^j for some j ≤ p. This means that vwx is contained purely in the c’s section.

In any of these five cases, we can easily verify that the third constraint for the pumping lemma, that uvⁿwxⁿy ∈ L ∀ n ≥ 0, does not hold. In other words, for any of these five choices of vwx, the string s cannot be pumped in a way that results in a string that has an equal number of a’s, b’s and c’s (the definition of the language L).

Context Free Languages

ComputeNow — Thu, 06 Sep 2018 19:18:28 +0000

Context-free languages (CFLs) are generated by context-free grammars. The set of all context-free languages is identical to the set of languages accepted by pushdown automata, and the set of regular languages is a subset of context-free languages.

An inputed language is accepted by a computational model if it runs through the model and ends in an accepting final state. All regular languages are context-free languages, but not all context-free languages are regular. Most arithmetic expressions are generated by context-free grammars, and are therefore, context-free languages.

Context-free languages and context-free grammars have applications in computer science and linguistics such as natural language processing and computer language design.

Context Free Languages Definition

In formal language theory, a language is defined as a set of strings of symbols that may be constrained by specific rules. Similarly, the written English language is made up of groups of letters (words) separated by spaces. A valid (accepted) sentence in the language must follow particular rules, the grammar.

A context-free language is a language generated by a context-free grammar. They are more general (and include) regular languages. The same context-free language might be generated by multiple context-free grammars.

The set of all context-free languages is identical to the set of languages that are accepted by pushdown automata (PDA).

Here is an example of a language that is not regular (proof here) but is context-free:

{ aⁿbⁿ | n ≥ 0}. This is the language of all strings that have an equal number of a’s and b’s.

In this notation,a⁴b⁴ can be expanded out too aaaabbbb, where there are four a’s and then four b’s. (So this isn’t exponentiation, through the notation is similar).

Closure Properties

Context-free languages have the following closure properties. A set is closed under an operation if doing the operation on a given set always produces a member of the same set. This means that if one of these closed operations is applied to a context-free language the result will also be a context-free language.

Union: Context-free languages are closed under the union operation. This means that if are both context-free languages, then is also a context-free language.

Proof:

Here is a proof that context-free grammars are closed under union

Let L and P be generated by the context-free grammars, G_L = (V_L, Σ_L, R_L, S_L) and G_P = (V_P, Σ_P, R_P, S_P), respectively.

Without loss of generality, subscript each nonterminal symbol in G_L with an L, and each nonterminal of G_P with a P such that V_L ∩ V_P = ∅.

Define the CFG, G, that generates L ∪ P as follows: G=(V_L ∪ V_P ∪ {S}, Σ_L ∪ Σ_P, R_L ∪ R_P ∪ {S -> S_L|S_P}, S).

Concatenation: If L and P are both context-free languages, then LP is also context free. The concatenation of a string is defined as follows: S₁S₂ = vw: v ∈ S₁ and w ∈ S₂.

Proof:

Here is a proof that context-free grammars are closed under union

Let L and P be generated by the context-free grammars, G_L = (V_L, Σ_L, R_L, S_L) and G_P = (V_P, Σ_P, R_P, S_P), respectively.

Without loss of generality, subscript each nonterminal symbol in G_L with an L, and each nonterminal of G_P with a P such that V_L ∩ V_P = ∅.

Define the CFG, G, that generates L ∪ P as follows: G=(V_L ∪ V_P ∪ {S}, Σ_L ∪ Σ_P, R_L ∪ R_P ∪ {S -> S_LS_P}, S).

Every word that G generates is a word L followed by a word P, which is the definition of concatenation.

Kleene Star: If L is a context-free language, then L ∗ is also context free. The Kleene star can repeat the string or symbol it is attached to any number of times (including zero times). The Kleene star basically performs a recursive concatenation of a string with itself. For example, {a,b}∗ = {ε, a, b, ab, aab, aaab, abb, ….} and so on. We’ve already proved that CFLs are closed under concatenation.

Context-free languages are not closed under complement or intersection.

If CFL’s were closed under intersection then there would be CFLs that violate the pumping lemma for context-free languages which cannot be.

Please wait for our next post on Pumping Lemma.

Please Like Our Post on Facebook

Also see: Definition of Pushdown Automata

The post Context Free Languages appeared first on .

Regular Expressions – (Regex) – Regular Expression

ComputeNow — Sun, 02 Sep 2018 04:35:46 +0000

Regular Expressions was initially a term borrowed from automata theory in theoretical computer science. Broadly, it refers to patterns to which a sub-string needs to be matched.

The comic should have already given you an idea of what regular expressions could be useful for. It should not be surprising that many programming languages, text processing tools, data validation tools and search engines make extensive use of them.

The key idea is that a regular expression is a pattern which matches a set of target strings.

\w+@\w+\.(com|org|net|in) is a regex that matches a most email addresses that end with a .com, .net, .org or a .in.

Regular Expressions Concepts

There are many forms of regex syntax that vary with the language. Here, we will be examining Perl regex since most other regexps are usually a variation on this.

Before we dive into the syntax, these are the kinds of things that the patterns consist of:

Literals: They are the simplest things to match. When they are there, we just match them. It could be like an a or a 1.
Meta characters: They do not mean what they look like. They usually refer to something else. For example, \d could refer to any digit.
Vertical Bar: The | is a symbol of boolean OR. It gives an option to match any of the things it delimits.
Quantifiers: They specify how many of the concerned pattern needs to be matched.
Grouping and Capturing: Parentheses could be used to group parts of the regex or capturing parts for later use.

Regular Expression Syntax

Let’s look at what the meta characters do in a little more detail.

Meta character	Description
`^`	Start of a string
`$`	End of a string
`\t`	Tab
`\n`	Newline
`\r`	Carriage Return
`\s`	Any whitespace character
`\S`	Any non-whitespace character
`\d`	Any Digit
`\D`	Any non-digit
`\w`	Any word-character
`\W`	Any non-word character
`\b`	Any word boundary
`\B`	Any non-word-boundary
`.`	Any single character, usually barring a newline

By the way, if you want to match a metacharacter literally, you need to use \ to escape it. For example, \. would just match the . character.

Now, let us look into more flexibility stuff.

Expression	Meaning
`[abc]`	Matches any of `a`,`b`, or `c`
`[^abc]`	Matches anything other than `a`, `b`, or `c`
`[a-d]`	Matches any of the characters in the range `a-d`
`a*`	Matches `a` zero or more times
`a?`	Matches `a` zero or one time
`a+`	Matches `a` one or more times
`a\|b`	Matches either `a` or `b`
`a{3}`	Matches exactly 3 of `a`
`a{3,}`	Matches 3 or more of `a`
`a{3,5}`	Matches 3, 4 or 5 of `a` (inclusive range)
`( )`	Captures everything inside the bracket

Example:

We are now ready to explain why \w+@\w+\.(com|org|net|in) does what it claims.

Firstly, what should an email look like? That's right, it should have a structure like user@domain.extension.

The user and domain consists of any letter, number or underscore but at least one of them. So, we use \w+.

We restrict the extension to org, com, net or in by using the |.

Context Free Grammars

ComputeNow — Thu, 23 Aug 2018 18:21:25 +0000

Context free grammars (CFGs) are used to describe context-free languages. A context-free grammar is a set of recursive rules used to generate patterns of strings. A context-free grammar can describe all regular languages and more, but they cannot describe all possible languages.

Context-free grammars are studied in fields of theoretical computer science, compiler design, and linguistics. CFG’s are used to describe programming languages and parser programs in compilers can be generated automatically from context-free grammars.

Two parse trees that describe CFGs that generate the string “x + y * z”. Source: Context-free grammar wikipedia page.

Context Free Grammars:

Context-free grammars can generate context-free languages. They do this by taking a set of variables which are defined recursively, in terms of one another, by a set of production rules. Context-free grammars are named as such because any of the production rules in the grammar can be applied regardless of context—it does not depend on any other symbols that may or may not be around a given symbol that is having a rule applied to it.

Context-free grammars have the following components:

A set of terminal symbols which are the characters that appear in the language/strings generated by the grammar. Terminal symbols never appear on the left-hand side of the production rule and are always on the right-hand side.

A set of nonterminal symbols (or variables) which are placeholders for patterns of terminal symbols that can be generated by the nonterminal symbols. These are the symbols that will always appear on the left-hand side of the production rules, though they can be included on the right-hand side. The strings that a CFG produces will contain only symbols from the set of nonterminal symbols.

A set of production rules which are the rules for replacing nonterminal symbols. Production rules have the following form: variable  string of variables and terminals.

A start symbol which is a special nonterminal symbol that appears in the initial string generated by the grammar.

For comparison, a context-sensitive grammar can have production rules where both the left-hand and right-hand sides may be surrounded by a context of terminal and nonterminal symbols.

To create a string from a context-free grammar, follow these steps:

- Begin the string with a start symbol.

- Apply one of the production rules to the start symbol on the left-hand side by replacing the start symbol with the right-hand side of the production.

- Repeat the process of selecting nonterminal symbols in the string, and replacing them with the right-hand side of some corresponding production, until all nonterminals have been replaced by terminal symbols. Note, it could be that not all production rules are used.

Formal Definition

A context-free grammar can be described by a four-element tuple (V, Σ, R, S) , where

V is a finite set of variables (which are non-terminal)
Σ is a finite set (disjoint from V) of terminal symbols
R is a set of production rules where each production rule maps a variable to a string s ∈ (V ∪ Σ) *
S (which is in V ) which is a start symbol.

Example:
Come up with a grammar that will generate the context-free (and also regular) language that contains all strings with matched parentheses.

There are many grammars that can do this task. This solution is one way to do it, but should give you a good idea of if your (possibly different) solution works too.

Starting symbol -> S
Non-terminal variables = {(,)}
Production rules:

- S -> ( )

- S -> SS

- S -> (S).

A way to condense production rules is as follows:

We can take

S->()
S->SS
S->(S)

and translate them into a single line: S -> ( ) | SS | (S) | ε where ε is an empty string.

Context-free grammars can be modeled as parse trees. The nodes of the tree represent the symbols and the edges represent the use of production rules. The leaves of the tree are the end result (terminal symbols) that make up the string the grammar is generating with that particular sequence of symbols and production rules.

The parse trees below represent two ways to generate the string “a + a – a” with the grammar

Example of an ambiguous grammar—one that can have multiple ways of generating the same string

Because this grammar can be implemented with multiple parse trees to get the same resulting string, this is said to be ambiguous.

Relationship with other Computation Models

A context-free grammar can be generated by pushdown automata just as regular languages can be generated by finite state machines. Since all regular languages can be generated by CFGs, all regular languages can too be generated by pushdown automata.

Any language that can be generated using regular expressions can be generated by a context-free grammar.

The way to do this is to take the regular language, determine its finite state machine and write production rules that follow the transition functions.

The post Context Free Grammars appeared first on .

Translating Between Context-Free Grammars and Pushdown Automata

ComputeNow — Sun, 19 Aug 2018 10:29:46 +0000

Context-free Grammar to Pushdown Automata

Each derivation or sequence of production rules that results in a given string is made up of intermediate strings (which are made at each step of the derivation).

The pushdown automata’s nondeterminism helps it to guess the sequence of steps in the derivation that will result in the desired string. So at each step in the derivation, one of the production rules for a given variable is selected nondeterministically and substituted in for the variable.

The pushdown automata begins by pushing a symbol onto the stack and then goes through the series of intermediate strings until it arrives at a string that contains only the terminal symbols (this will happen if the string is actually in the grammar, otherwise it will reject).

Here’s what to do

Push the start symbol, $, to the stack.

Then the following steps are repeated until the automaton finishes:

If there is a variable X on top of the stack, nondeterministically pick one of the production rules for X and substitute X with the string on the right-hand side of the production rule.
If there is a terminal variable a on the input, read the next symbol from the input and compare it to a . If they are the same, repeat and if they are not, reject on this branch of the nondeterminism.
If it is the end of the input and the top of the stack has the start symbol, $, then accept

Can you come up with a diagram and formal description of a pushdown automaton that recognizes strings containing only parentheses and accepts on strings that have matched parentheses?

Σ = {(,)}

Γ = {$,Χ} note:, where the Χ could be any symbol you want

Q = { A, B, C, D }

F = {D}

q0 = A

Z = $

δ = {(A,ε, ε, A, $), (A,(,$,B,X), (B, (, X,B,X), (B,),X,C,ε) , (C,),X,C,ε), (C, ε, $, D, ε) }

[the_ad_group id=”24″]

The post Translating Between Context-Free Grammars and Pushdown Automata appeared first on .