Use of NFA's for Closure Properties of Regular Languages

The set of regular languages is closed under the following operations:

Complement
Intersection
Union
Concatenation
Star (Kleene Closure)

L1 = { w ∊ {0,1}^* | w contains the substring "1101" }
L2 = { 0ⁿ1^m | n >= 0 and m >= 0}
L3 = { aa, ab, ba, bb }
L4 = { a, aa, aaa}
∅ (the empty set; not the empty string)

What is the complement of L1?

What is L3 ∩ L4?

What is L3 o L4 (concatenation)?

What is L3^*

What is ∅^*

How does one prove that regular languages are closed under each of these operations?

NFAs are useful to show regular languages are closed under the last three operations (union, concatenation, star).

Union

An NFA to recognize L₁ ∪ L₂

s₁ is the old start state of L₁.

s₂ is the old start state of L₂.

f₁, f₂ are old final states of L₁.

f is the old final state of L₂.

Concatenation

An NFA to recognize L₁ o L₂.

s₁ is the old start state of L₁.

s₂ is the old start state of L₂.

f₁, f₂ are old final states of L₁.

f is the old final state of L₂.

Star (Kleene Closure)

An NFA to recognize L^*

s₁ is the old start state of L.

f₁, f₂ are old final states of L.

Equivalences

NFAs
DFAs
regular expressions
regular grammars

Regular Expressions (1.3)

Regular expressions over an alphabet ∑ represents sets of strings in the alphabet; that is, a formal language over the alphabet.

So a regular expression describes a formal language.

In the notation for regular expressions, an element a ∈ ∑ is also used to represent the regular expression that denotes the set of strings consisting of 'a' alone: { a }.

Formal Definition

The formal definition of a regular expression over an alphabet ∑ is (page 64):

R is a regular expression if R is

a for some a in the alphabet ∑,
ε,
∅,
(R1 ∪ R2), where R1 and R2 are regular expressions,
(R1 o R2), where R1 and R2 are regular expressions, or
(R1^*), where R1 is a regular expression

∅ represents the empty language

ε represents the language { ε }

a ∈ ∑ represents the language { a }

Operations and Examples

The operations on regular expressions are

union
concatenation
star (closure)

∑ = {0,1}

Union: 0 ∪ 1 or 0 | 1 represents the set {0} ∪ {1} = {0,1}
Concatenation: (0 ∪ 1)1 represents the set {01, 11}
Star: (01|11)* represents the set {ε, 01, 11, 0111, 1101, ... }
∅^* represents the set { ε }
(0 ∪ 1)^*1101(0 ∪ 1)^* What language does this describe?

Theorem

A language is regular if and only if some regular expression describes it.: Proof requires two parts.; First Part: If a language is regular, then it is described by some regular expression.; Second Part: If a language is described by some regular expression, then it is a regular language.

GNFAs

A Generalized Nondeterministic Finite Automaton is similar to an NFA but the transition function takes a state and a regular expression in the alphabet instead of a state and an alphabet element.

A generalized NFA for strings
containing 1101

A GNFA for strings in {1,0} that contain the substring "1101".

The idea is that in state q₀ the transition to state q₁ can be taken if the next input matches the regular expression 1101.

The formal definition is given (on page 73) by:

A generalized nondeterministic finite automaton is a 5-tuple, (Q,∑, δ, q_start, q_accept), where

Q is the finite set of states,
∑ is the input alphabet
δ : (Q - {q_accept} x (Q - { q_start }) → R is the transition function, where R is the set of all regular expressions over ∑
q_start is the start state
q_accept is the accept state

Note that there is only one accept state. However, this is no real restriction for a nondetrministic automaton. (Why?)

On the other hand, the transition function is defined on a different arguments than is the case for an ordinary NFA.

GNFA Transition Function Example

The transition function, δ, for the example GNFA:

The example GNFA

	q₁	q₂	q₃
q₀	ε	⌀	⌀
q₁	(0 \| 1)	1101	⌀
q₂	⌀	(0 \| 1)	ε

Language of a GNFA

The definition of the language of a GNFA is technically different than that of an NFA because the transition function is defined differently. However, the idea is really similar, but extended to allow regular expressions on the transitions.

The formal definition is given by (page 73):

A GNFA accepts a string w in ∑^* if w = w₁w₂...w_k, where each w_i is in ∑^* and a sequence of states q₀, q₁, ..., q_k exists such that

q₀ is the start state
q_k is the accept state
for each i, w_i ∊ L(R_i) where R_i = δ(q_i-1, q_i); that is, where R_i is the expression on the arrow from q_i-1 to q_i.

Example String Accepted by GNFA

The string 0011011 is accepted by

Example GNFA

If 0011011 is partitioned into blocks w₁=ε w₂=0 w₃=0 w₄=1101 w₅=1 w₆=ε; that is,

0011011 = ε 0 0 1101 1 ε

then

	State	Input	Next
w₁=ε ∊ L(δ(q₀, q₁))	q₀	w₁	q₁
w₂=0 ∊ L(δ(q₁, q₁))	q₁	w₂	q₁
w₃=0 ∊ L(δ(q₁, q₁))	q₁	w₃	q₁
w₄=1101 ∊ L(δ(q₁, q₂))	q₁	w₄	q₂
w₅=1 ∊ L(δ(q₂, q₂))	q₂	w₅	q₂
w₆=ε ∊ L(δ(q₂, q₃))	q₂	w₆	q₃

For example, w₁=ε ∊ L(δ(q₀, q₁)) means for state q₀ and input w₁, next state is q₁

Converting a DFA to an GNFA

A DFA can be converted to an equivalent GNFA by

Adding a new start state with an ε transition to the old start state.
Adding a new accepting state and adding ε transitions from all old accepting states to the new one.
Definining the transition function δ^' for the GNFA in terms of the transition function δ for the DFA by
δ^'(q_i, q_j) = a if and only if δ(q_i, a) = q_j

Example conversion of DFA to GNFA

Steps 1 and 2 require adding a new start state and a new accepting state. Step 3 require no changes in the diagram except labeling the transitions out of the new start state and into the new accepting state.

DFA to GNFA

Eliminating a state in an GNFA

Any state except the start state and the accept state of a GNFA can be eliminated to obtain a new equivalent GNFA with one fewer states.

For example, to eliminate state q₁ in

replace each existing path q to q₁ to q^' by a path from q to q^'
Label the new path with a regular expression that describes the strings that would cause a transition from state q to q₁ to q^'

There are 3 arrows coming into state q₁ from other states: q₀, q₂, and q₄.

If state q₁ is removed, paths must be replaced by new transitions:

q₀ to q₂
q₂ to q₂
q₄ to q₂

Determining the Regular Expression Labels

Eliminating q₁, what regular expressions are needed?

GNFA Regular Expression Labels

Suppose q_a is to be removed and

the path being considered goes from q_i to q_a to q_j,
transition from q_i to q_a is labeled by the regular expression R_ia
transition from q_a to q_j is labeled by the regular expression R_aj
transition from q_i to q_j is labeled by the regular expression R_ij (Note this may be the empty regular expression, ࣼ)

Then the path from q_i to q_j should be labeled

    R_iaR_a^*R_aj ∪ R_ij

Result

Original DFA (converted to a GNFA):

After eliminating state q₁:

Now the Theorem Again

A language is regular if and only if some regular expression describes it.: Proof requires two parts.; First Part: If a language is regular, then it is described by some regular expression.; Second Part: If a language is described by some regular expression, then it is a regular language.

DFA to Regular Expression

To show the first part, if we are given an DFA, we need to show that there is a regular expression that describes exactly the language of the DFA.

The construction of the regular expression begins with the DFA and each step will eliminate one state of the DFA until the only state(s) remaining are the start state and a final state.

Example

Exercise 1.21

Convert this DFA to a regular expression that describes the same language.

Regular Expresssion to NFA

To show that for any regular expression there is an NFA that recognizes the same language described by the regular expression, the proof describes a procedure for constructing the NFA from the regular expression.

Parse the regular expression into its parts based on the regular expression operator precedence and parentheses used if any to determine the operands of each operator.
star is highest, then concatenation, and union is lowest precedence.
For each of the operators use the construction described in showing the closure properties of regular languages to construct an NFA for each operator and its operands.

See Lemma 1.55 in the text.

Example regular expression

   a(ba)*b ∪ bab

Construct the NFA for (ba): concatenation of b and a
Construct the NFA for (ba)*: star of (1)
Construct the NFA for a(ba)*: concatenation of a and (2)
Construct the NFA for a(ba)*b: concatenation of (3) and b
Construct the NFA for ba: concatenation of b and a
Construct the NFA for bab: concatenation of 5 and b
Construct the NFA for a(ba)*b ∪ bab: union of (4) and (6)

More Text Exercises

1.19
1.20
Is this language L over the alphabet {a, b} regular?
L = {aⁿb^m | n >= 0, m >= 0 and n ≠ m }

Pumping Lemma

Pumping Lemma for Regular Languages: If A is a regular language, then there is a number p such that if s is any string in A of length at least p, then s may be divided into three pieces, s = xyz with the properties:

for each i >= 0, xyⁱz ∈ A,
|y| > 0, and
|xy| <= p

Strategies

Strategies for deciding whether a formal language is regular.

It may be easier to show whether the complement is regular.
Use the pumping lemma to show the language is not regular.

Example 1: { aⁿbⁿ | n >= 0 } is not regular.

Example 2: {aⁿb^m | n >= 0, m >= 0 and n ≠ m } is not regular.