THEENWTThffiSSPROBLEMFOREXTENDED
REGULAR EXPRESSIONS
By
MINCAI
Bachelor of Science
Shantou University
Shantou, China
1997 .
Submitted to the Faculty oftheı
Graduate College oftbeı
Oklahoma State Universityı
in partial fulfillment ofı
the requirements forı
the Degree ofı
MASTER OF SCIENCEı
May, 2003ı
ACKNOWLEDGEMENTS
I wish to express my sincere appreciation to my major advisor, Dr. H. K. Dai for
his intelligent supervision, constructive guidance, inspiration and encouragement. My
sincere appreciation extends to my other committee members Dr. Hedrick and Dr.
Chandler, whose guidance, assistance and support are also invaluable.
More over, I wish to express my sincere gratitude to Dr. H. G. W. Burchard,
who provided suggestions and assistance for this study.
I would also like to give my special appreciation to my parents in Hong Kong,
my two elder brothers, Ben Tsoi, and Yingcnao Cai whose support, encouragement and
love go through my study in us.
ill
Contents
1 Preliminaries 1
1.1 Problems, Algorithms, and Complexity 1
1.2 The Time Complexity . ..
. 2
1.2.1 Deterministic Computation and the Class P 2
1.2.2 Nondeterministic Computation and the Class NP 3
1.2.3 The Relationship between P and NP 5
1.2.4 The Structure of NP 5
1.2.5 Some Problems in NP 7
1.2.6 The Class CoNP . 9
1.2.7 Exponential Time . 9
.1"'
1.2.8 The Polynomial Hierarchy 11
1.3 The Space Complexity 12
1.3.1 The Class PSPACE . 12
1.3.2 The Class NPSPACE . 14
1.3.3 The Class DLOGSPACE . 15
1.3.4 The Class NLOGSPACE . 16
IV
1.4 Relations of the Standard Complexity Classes 18
2 Intractable Problems for Extended Regular Expressions 19
2.1 Intractability . . . . 19
2.2 Intractable Problem.s for Extended Regular Expressions 21
2.2.1 Regular Expressions 21
2.2.2 Semiextended Regular Expressions 24
2.2.3 Extended Regular Expressions . . . 26
3· Emptiness Problelll for Extended Regular Expressions 29
3.1 An Algorithm for Solving the Emptiness Problem for Extended Regular
Expressions . . _.............. 29
3.1.1 Constructions of Finite Automata. 30
3.1.2 Reachability Problem .. 35
3.2 A Complexity Analysis 38
3.3 Conclusions . . . . . 39
Bibliography 41
Appendixes 44
A A Program to Solve the Emptiness ProblelIl for Extended Regular
Expressions 44
v
LIST OF FIGURES
Figure Page
1. The World of Complexity Classes 18
2. NFAAs for <1>, {A}, {a} 32
·f'
vi
Chapter 1
Preliminaries
1.1 Problems, Algorithms, and Complexity
A problem is a general question to be answered, and it has several parameters. To
describe a problem, we describe all its parameters, and state what properties the
answer is required to satisfy.. Given a problem, if we specify particular values for all
its parameters, then it is an instance of the problem. The input length for an instance
of a problem is defined to be the number of symbols in the description of a problem
instance. If a problem IT has only two possible·solutions, either "yes" or "No", then it
.f'
is a decision problem. Such a problem IT consists of an instance set Drr and a subset
. .
Yrr ~ D n of yesinstances.
Algorithms are procedures for solving problems, they are general, stepbystep.. If
an algorithm can be used for any instance of a problem and can always solve that
instance, then we say that this algorithm solve the problem. In general, we· are
1
I
space are usually two factors determining whether or not an algorithm is efficient.
Complexity theories provide mechanisms to classify combinatorial problems and measure
computational resources that are necessary to solve them. Two most important
and common measures in the computation are time and space complexities. The
time complexity is the number of steps a program takes to execute, and the space
complexity is the amount of storage used in the computation.
:
1.2 The Time Complexity
1.2.1 Deterministic Computation and the Class P
Deterministic Thring Machine is a particular model for computation. It is defined as
follows:
Definition 1 (Deterministic Turing Machine) (10J
A deterministic Turing machine is a system
M = (Q, :E, r, 0, qo, H, qaccept, qreject), where
.t·
Q is the finite set of states,
r is the finite tape alphabet,
B E r is the blank symbol,
:E is the input alphabet, :E c r  {B},.
qo E Q is the initial state,
qaccept is the accepting state,
2
o is the transition junction, 0 : (Q  {qaccept, qrejecd) x r t Q x r x {£, R}.
P is the class of all languages accepted by a deterministic Turing machine program
that runs in polynomial time in the input length. A polynomial time algorithm is the
polynomial time deterministic Turing machine program.
1.2.2 Nondeterministic Computation and the Class NP
Informally the class NP can be defined in terms of a nondeterministic algorithm.
Such an algorithm consists of two separate stages: guessing stage and checking stage.
Given a problem instance I, the guessing stage "guesses" some structure S. Then we
take I and S as inputs of checking stage and compute it in a normal deterministic
. .
manner. A nondeterministic algorithm "solves" a decision problem if, for all instants
of the decision problem, there exists some structure S that, when guessed for input
I, will lead the checking stage to respond "yes" if and only if I is a yesinstance.
A nondeterministic algorithm is said to solve a decision problem in "polynomial time"
if, for every yesinstance, th~e is some guess S that leads the deterministic checking
stage to respond "yes" in the polynomial time in the input length. The class NP
is defined informally to be the class of all decision problems that can be solved by
polynomial time nondeterministic algorithms.
A nondeterministic 'lUring machine is the same as a Thring machine, except that the
transition function has the fonowing form:
3
A.
The formal counterpart of a nondeterministic algorithm is a program for a nondeterministic
Thring machine.
The language recognized by a nondeterministic 'lUring machine is the set of all input
strings accepted by the nondeterministic Turing machine. The time required by a
nondeterministic Turing machine to accept a string is defined to be the minimum, over
all accepting computations of the nondeterministic Turing machine, of the number of
steps occurring in the guessing and checking stages up until the halt state is entered.
NP is the class of all languages accepted by a nondeterministic Turing machine program
that runs in polynomial time in the input length.
We usually envision a nondeterministic algorithm as guessing a structure that in some
way depends on the given instance. The guessing module of a nondeterministic Thring
machine disregards the given input. However, we always design out a nondeterministic
'lUring machine program s}1' that the checking stage begins by checking whether or
not the guessed string corresponds to an appropriate guess for the given input. If
not, the program can halt immediately.
4
We observe that every deterministic algorithm can be used as the checking stage of a
nondeterministic algorithm. From this, we can conclude that every decision problem
that can be solved by a polynomial time deterministic algorithm can be also solved
by a polynomial time nondeterministic algorithm. That implies that P ~ NP. There
are many reasons to believe that this inclusion is proper, which means that P is not
equal to NP [8]. It is not surprising that polynomial time nondeterministic algorithms
are more powerful than polynomial time deterministic algorithms, even though it has
not been proved as yet. Therefore, it looks reasonable to assume that P =I= NP.
1.2.4 The Structure of NP
If P =I= NP, then the distinction between P and NP  P is meaningful and important.
There is no hope of showing that any problem is in NP  P until we can prove that
P =I= NP.
Definition 2 (Polynomial Transformation) fB}
A polynomial transformation from one language L1 ~ Ei to another language L2
.r>
~ E; is a function f : L1 f=...+ L2 that satisfies two conditions:
1. There exists a polynomial time deterministic Turing machine program computing
f, and
2. For all x E Ei, x E L1 if and only if f(x) E L 2 •
5
but not symmetric. A language L 1 is defined to be NPcomplete if L 1 is in NP and,
for any language L 2 that is in NP, there exists a polynomial transformation from
L2 to £1' The NPcomplete theory focuses on proving results of weaker form "if P
f:. NP, then a decision problem is in NP  P". NPC is made up of all NPcomplete
languages. The "polynomial transformability" relation imposes a partial order on the
equivalence classes of languages (or decision problems). The class P is the Hleast"
equivalence class under this partial order. The class of NPcomplete problems contains
the "hardest" languages (or decision problems) in NP. If any NPcomplete problem
can be solved in polynomial time, then all problems in NP can be solved in polynomial
time. Therefore, if P =f: NP, then any NPcomplete problem is in NP  P. Moreover,
any NPcomplete problem is in P if and only if P = NP.
Problems in NP are considered to be in NPI if they have not yet been proved either
in P or in NPC. Since it has been showed that there are some problems in NPI [8], we
can conclude that if P f:. NP, then there exist some problems in NP neither solvable
in polynomial time nor NPcomplete.. That is, there exists some problems in NP but
not in P or NPC. The clasStJOf all languages that are not in P Or NPC but in NP is
the class NPL
Assuming that P =1= NP, the NP class consists of three parts: P, NPC, and NPl. Their
"difficulty" levels from the least difficult to the most difficult are P, NPl, NPC.
6
As we mentioned before, the P class includes all languages accepted by deterministic
'lUring machines in polynomial time, without any regard to the degree of the polynomial.
Integer Divisibility by Four [8], Primes and its complement problem Composite
Numbers have be~m proved to be in P [9].
Integer Divisibility by Four
Instance: Given an integer n > 1.
,
Question: Is there an integer m 2::: 1, such that n I m = 4 and n mod m = O?
Primes
Instance: Given an integer k 2::: 1.
Question: Is k a prime?
Composite Numbers
Instance: Given an integer k ~ 1.
Question: Are there two integers m, n ~ 2, such that mn = k?
.f'
The NPC class includes many problems that are natural and have been solved efficiently
[8]. Some problems such as the Vertex Cover problem, the Hamilton Circuit
problem, and the Clique problem have been proved to be NPcomplete [7]. For all
of these problems we can find exponential algorithms, but so far no polynomial time
algorithm has been found to solve any of the NPcomplete problems.
7
Instance: Given a graph G = (V, E) and an integer k, where 1 :S k :S IVl·
Question: Is there a set V' ~ V, such that IV' I :S k and, V {u, v} E E, 3 at least one
u or v in V' ?
Hamiltonian Circuit
Instance: Given a graph G = (V, E).
Question: Is there an ordering of vertices of G, (Vb V2, .... vn ), where n = lVI, such
that ei E E, V 1 :S i < n, where ei ={Vi1 Vi+l}, V 1 :S i :S n 1, and en = {Vn , VI}?
Clique
Instance: Given a graph G = (V, E) and an integer j, where 1 :S j :::; IVI·
Question: Is there a set V' ~ V such that I~ I 2: j and for any two vertices Vi, Vj E
V', {ViI Vj}E E?
NPI consists of problems in NP which have not yet been proved either in P or in NPC.
The NPI class is not empty if P =I NP. Graph Isomorphism, and Linear Programming
are some examples of problems in NPI [8].
Graph Isomorphism
Instance: Given two graphs G = (V, E) and d = (V, E').
Question: Is there an injective function f: V t V, such that {u, v} E E if and only
if {f(u), f(v) } E E' ?
8
I
,
I
Instance: Given an integer B and three integer vectors Vi = (vi[l], vd2], ..., vdn]), D
Question: Is there a rational vector X = (Xl, X2, .•. , xn ) such that Vi . X ::; di and
c . X > B, where 1 ::;: i < m ?
1.2.6 The Class CoNP
The class coNP is the set of all languages whose complement is in NP. It is defined
as follows:
Definition 3 (coNP) [8}
co  N P = {B*  L I L is a language ove, the alphabet Band LENP.}
Many problems in coNP seem not to be in NP, which means NP =I coNP. The class
P is closed under complementation, so NP =1= coNP implies P =f. NP, although P =1=
NP does not imply NP =f. coNP. Nevertheless, there exists a link between the NPcomplete
problems and the conjecture that NP =I coNP. This link is that if there is
an NPcomplete problem whose complement is in NP, then NP = coNP [8]. From
.f'
this, we can conclude that a problem whose complement is in NP can not .be in NPC
unless NP = coNP.
1.2.7 Exponential Time
There are some decision problems only solvable by a Turing machine in exponential
time. The set of an decision problems that can be solved by a deterministic [respec9
pen) is a polynomial function of n, is the class EXPTIME [respectively NEXPTIME].
For all polynomial functions q(n), mq(n) = 2P(n), where m is any positive integer, and
pen) is a polynomial function of n, the class EXPTIME [respectively NEXPTIME] is
also the set of all decision problems solved by a deterministic [respectively nondeterministic]
Turing machine in O(mq(n» time, where m is any positive integer, and q(n)
is a polynomial function of n. We believe that there exists some decision problems
that are beyond the class EXPTIME. That means, those problem can only be solved
by a deterministic or nondeterministic Turing machine in O(2P/(n» time, where p'(n)
is an exponential function of n.
The class EXPTIMEcomplete is also a set of decision problems. A decision problem is
in EXPTIMEcomplete [respectively NEXPTIMEcomplete] if it is in EXPTIME ~respectively
NEXPTIME], and every problem in EXPTIME [respectively NEXPTIME]
has a polynomial transformation to it. EXPTIMEcomplete might be thought of as
the hardest problem in EXPTIME. EXPTIME is a strict superset of NPComplete,
NP, and P.
.ft
One example of EXPTIMEcomplete problems is the Chess problem [10].
Chess
Instance: Given a chess or go position.
Question: Can the first player force a win?
10
an nxn board instead of the usual board with fixed size. Since EXPTIMEcomplete
is defined by asymptotic behavior as the problem size grows without bound.
1.2.8 The Polynomial Hierarchy
For two sets A and B, we can write a program that is an acceptor for A and allow it
to make subroutine calls of the form "y E B" . These calls return true if the Boolean
test is true and return false otherwise. Such a program is called a reduction procedure
and the set B is called an oracle set. An oracle TUring machine is a standard Thring
machine with an additional oracle tape and three special states: Q, YES and NO.
When the Turing machine is in state Q, if the word currently written on the oracle
tape is in the oracle. set, then the next state is YES, otherwise, the next .s tate is NO.
A Thring reduction from one oracle set A to another oracle set B is an oracle TUring
machine M whose oracle is B such that M accepts A and M halts on every input
[11].
The polynomial hierarchy is a useful tool to classify and measure the complexity of
,f'
combinatorial problems. A' class pY [respectively N pYj is the set of languages from
which there is a polynomial time deterministic [respectively nondeterministic] Turing
reduction to a language in Y. A class coY is the set oflanguages whose complement
is in Y. The polynomial hierarchy is defined inductively as foHows [12]:
~p  IIP  A P  P L10  0  L..lo  (1.1)
11
(1.2)
(1.3)
(1.4)
The process of inductively defining new classes can be extended infinitely and it
creates classes of greater and greater apparent difficulty.. To check if a problem is in
the hierarchy at all or not, it is useful to show it is in a particular class directly rather
than apply the inductive definitions. If we can show a "hardest" problem in L;~, then
it must be in L;~  L;tl if the two classes are not equal.
The polynomial hierarchy extends the classes P and NP. Nevertheless, under the
assumption P i= NP, the polynomial hierarchy remains of theoretical interest. It is
not known whether any of the classes are distinct or whether there exists infinitely
many classes in the polynomial hierarchy so far.
1.3 The Space Complexity
1.3.1 The Class PSPACE
What we have introduced above is j,ust on the time complexity. In practice, the space
complexity is also important. In a Turing machine computation, the time complexity
12
the number of distinct tape squares visited by the readwrite head. The number of
tape squares visited is less than or equal to the number of steps in the computation.
It follows that all decision problems in polynomial time can be solved in polynomial
space, however, there stiU exists some decision problems in polynomial space that
cannot be solved in polynomial time.
The class PSPACE [8] is the set of languages that are recognizable by polynomial
space bounded deterministic Turing machines that halt on every input. There exist
some problems solvable in PSPACE that appear to be "harder" than problems in P
orNP.
PSPACE is a class beyond the poly~omial hierarchy. A language L 1 is PSPACEcomplete
(with respect to polynomial transformability) if L 1 is in PSPACE and, for
any language L 2 that is in PSPACE, there exists a polynomial transformation from
L2 to L 1 . From this, we can conclude that if L} is PSPACEcomplete, then L1 is in
P[respectively NP] if and only if P(respectively NP] = PSPACE. Quantified Boolean
Formulas and Linear Space Acceptance are PSPACEcomplete.
.;'
Quantified Boolean Formulas
Instance: Given a formula F = (Q1xd(Q2XZ) ... (Qnxn)E, where E is a Boolean
expression over Xi, and Qi E {3, V}, for all 1 ~ i ~ n.
Question: Is F true?
13
r ~~',~l
Instance: Given a linear bounded deterministic Turing machine M and a finite string
x that is over the input alphabet of M.
Question: Does the Turing machine M accept the string x?
1.3.2 The Class NPSPACE
The class of NPSPACE consists of those languages that can be recognized by a nondeterministic
Turing machine in polynomial space bounded. How to deal with the
space used by the "guess" in a nondeterministic 'lUring machine? In fact, for many
computations, it is not necessary to remember an the symbols once they have been
read. A nondeterministic Turing machine that is used to measure space can be viewed
as an additional device from which the program can always request the immediately
following symbol of the guess without using any other space. The program records
the symbol that is needed later on its tape and use "space" for it only if the program
wants to remember the symbol. Defining the class NPSPACE to be the set of
languages recognized by programs for this additional device in polynomialiy bounded
.
space in its accepting comp)1tation, then we can ask a question: Is PSPACE equal
to NPSPACE? Savitch has implied that the answer is "yes" [15]. This will follow
that PSPACEcompleteness is the strongest type of completeness result we have introduced
above.
14
Since an input string with length n takes up n tapes squares by itself, any deterministic
Turing machine seems taking at least linear space. However, it is different between the
space required by the input string and the additional space in which the computation
is carried out. In fact, it is possible to use less than linear space for a computation.
This is the class DLOGSPACE. The class DLOGSPACE is the class in which an
languages can be recognized by a deterministic Turing machine in a space that is
only logarithmic in the input length. It is within P and NP.
There are some nontrivial problems solvable in logarithmic space. It has been shown
that DLOGSPACE =I P [15]. Many problems in P look to require more than logarithmic
space. Moreover,. both P = DLOGSPACE and P = PSPACE can not be
held.
Since polynomial transformation can not make distinctions within P, let us introduce
logspace transformation.
Definition 4 (Logspace Transformation) [8j
·f·
A logspace transformation from one language L1 S; Ei to another language L2 C
E; is a function f : L 1 It L2 that satisfies two conditions:
1. f can be computed by a deterministic Turing machine program using space
bounded by rlogn + 11, where n is the input length, and
2. For all x E Ei, x E L 1 if and only if f (x) E £2'
15
that is in P, there exists a logspace transformation from L 2 to L 1 • If there exists a
logspace transformation from one problem A in P to another problem B in P, then
we can conclude that problem A is also logspace complete for P. The Path System
Accessibility problem has been proved to be logspace complete for P [5]. .
Path System Accessibility
Instance: Given a finite set X, a relation R ~ XxX xX, and two sets S and T of
"source" and "terminal" nodes, where S, T ~ X.
Question: Is there an "accessible" terminal node, where a node x E X is accessible if
xES or if there are accessible nodes y and z such that (x, y, z) E R?
Logspace t~ansformation is not only used to prove logspace com?leteness for P. Most
transformations used to prove NPcompleteness and PSPACEcompleteness are also
logspace transformations. The set of all languages that are logspace complete for
NP is at least a large subset of NPC, while we can not conclude that all languages
that are logspace complete are PSPACEcomplete.
·f' 1.3.4 The Class NLOGSPACE
The logspace transformation can be used to address another question of determinism
versus nondeterminism. Similar to the class DLOGSPACE, the class NLOGSPACE is
the class of all languages in which all languages can be recognized by a nondeterministic
Turing machine in spacebounded of logarithmic in the input length. It has been
16
therefore, DLOGSPACE =1= NLOGSPACE. Like DLOGSPACE ~ P, NLOGSPACE
~ P.
1.3.5 Exponential Space
Similar to the time complexity, there are some decision problems unsolvable by a 1\1ring
machine in polynomial space. The class EXPSPACE [respectively NEXPSPACE]
is the set of decision problems which are solved by a deterministic [respectively nondeterministic]
Turing machine in O(2P(n» space, where n is the input length, and
p(n) is a polynomial function of n. That is, the class EXPSPACE [respectively
NEXPSPACE] is the set of all decision problems which are solved by a deterministic
[respectively no.ndeterministic] Turing machine in O(mq(n» space, w. here m is
any positive integer, and q(n) is a polynomial function of n. The class EXPSPACEcomplete
is also a set of decision problems. A decision problem is EXPSPACEcomplete
!respectively NEXPSPACEcomplete] if it is in EXPSPACE [respectively
NEXPSPACE], and every problem in EXPSPACE [respectively NEXPSP.ACE] has a
polynomial transformation to it. EXPSPACEcomplete might be thought of as the
.T'
hardest problems in EXPSPACE. EXPSPACE is a superset of EXPTIME, PSPACE~
NPcomplete, NP, and P. We believe that there exists some decision problems that
are beyond EXPSPACE and those problems can only be solved by a deterministic
or nondeterministic Turing machine in O(2P'(n» space, where p'(n) is an exponential
function of n.
17
Figure l.i: The world of complexity classes.
1.4 Relations of the Standard COIllplexity Classes
Figure 1.1 shows the relationship of complexity classes we introduced above.
18
Chapter 2
Intractable Problems for Extended
Regular Expressions
2.1 Intractability
Different algorithms process different time and space complexities, and which are "efficient
enough" and which are "too inefficient" will always depend on the situation at
hand. Fortunately, the distinction between polynomial time algorithms and exponential
time algorithms offers considerable insight into these matte.r. Polynomial time
.f'
algorithms are considered as "good" algorithms, whereas exponential time algorithms
are not "good" algorithms.
A problem has not been "wellsolved" until a polynomial time algorithm can be
found for it. Hence, a problem is said to be intractable if there is no polynomial
time algorithm that can solve it. That means, all algorithms to solve an intractable
19
algorithm to solve a problem, then this problem is tractable. Although an exponential
time algorithm may be faster than a polynomial time algorithm for a problem instance
with some limited input length, such kind of problems are quite rare in practice.
Therefore, it is appropriate to define intractability as above.
There are two reasons for the intractability of a problem: one is that the problem is
so difficult that an exponential time algorithm is needed to solve it, the other is that
the solution itself is so extensive that it can not be described with an expression with
length bounded by a polynomial function of the input length. To show a problem is
intractable, we need to show it is beyond the class P. Since several complexity classes
have been known to contain intractable sets, one approach to prove a problem is
intractable is by proving that the problem is complete for a complexity class that is
known to contain intractable problems.
To prove a particular problem is NPcomplete or PSPACEcomplete, we show how to
express an arbitrary problem in NP or PSPACE in terms of the particular problem.
Essentially the technique oft.proof is simulation. However, so far nobody can find a
problem in NP or PSPACE but be proved not in P. To show a problem is not in P,
we need to show there exists at least one language not accepted by any deterministic
'lUring machine in polynomial time. The diagonalization technique is usually used.
If a deterministic Turing machine whose input is a string of n 1's halts after it takes
exactly F(n) steps, then we call F(n) a time constructible function. For the time
20
given time constructible functions TI and T2 , if T1 grows "faster" than T2 , then there
is a language accepted by a deterministic Turing machine of time complexity T2 but
by no deterministic Turing machine of time complexity TI .
For the space complexity, we have observed that any multitape deterministic Thring
machine has an equivalent onetape deterministic 'lUring machine, and both deterministic
Turing machines have the same space complexity. If a deterministic Thring
machine whose input is a string of n l's halts after its readwrite head has visited
exactly F(n) tape squares, then we call F(n) a space constructible function. Furthermore,
it has been proved that for two given space constructible functions 8 1 and
82 , if 81 grows "faster" than 8 2 , then there is a language accepted by a deterministic
Turing machine of space complexity 82 but by no deterministic Thring machine of
space complexity 8 1 0
2.2 Intractable Problems for Extended Regular Ex.
pressIons
2.2.1 Regular Expressions
A regular expression is recursively defined as follows:
Definition 5 (Regular Expression) flO}
Let ~ be an alphabet. The regular expressions over 'E and the sets that they denote
21
1. 0 is a regular expression and denotes the empty set.
2. E· is a regular expression and denotes the set {E}.
3. For each a in E, a is a regular expression and denotes the set {a}.
4. If l' and 8 are regular expressions denoting the languages Rand S, respectivf
then (r+s), (1'8), and (1'*) are regular expressions that denote the sets R RS, and R*, respectively.
Some intractable problems concerning regular expressions are the followings:
Regular Expression Inequivalence
Instance: Gi~en two regular expressions R1 and R2 over an alphabet E.
Question: Does the language denoted by R1 differ from the language denoted Let us introduce some related definitions before giving the complexity for the Reg'/;
Expression Inequivalence problem.
·r·
Let X be a finite alphabet, a language L ~ X* is bounded if there exist words
W2, ... Wn E X* such that L ~ w;W2"'w~, If L is not bounded, then we call unbounded language [4]. Star height of a regular expression is a limited nesting of Kleene stars in the regular expression. If the star height of a regular expressio
equal to 0, then it is star free [3].
22
• If Rz is a fixed expression and denoting an {(unbounded" language, then the
Regular Expression lnequivalence problem is PSPACEcomplete,
• If Rz is a fixed expression and denoting an infinite "bounded" language, then
the Regular Expression lnequivalence problem is NPcomplete,
• If Rz is a fixed expression and denoting an finite language, then the Regular
Expression lnequivalence problem is solvable in polynomial time,
• If both R 1 and R2 have star height k and k is a fixed number which is greater
than 0, then the Regular Expression lnequivalence problem is PSPACEcomplete,
• If both R 1 and R z are star free, then the Regular Expression lnequivalence
problem is NPcomplete,
• If one or both of R 1 and Rz denote bounded languages or the size of L; is exactly
equal to 1, then the Regular Expression Inequivalence problem is NPcomplete
.ft
/
• If the regular expressions are limited to four operators: union, concatenation,
the Kleene star, and squaring (two copies of an expression), then the Regular
Expression Inequivalence problem is EXPSPACEcomplete, and
• If the Kleene star is left out, then the Regular Expression Inequivalence problem
is NEXPTIMEcomplete.
23

and R2 is ~"', we can get the following problem.
Regular Expression Nonuniversality
Instance: Given a regular expression R over a finite alphabet E.
Question: Does the language denoted hy R differ from ~?
The Regular Expression Nonuniversality prohlem is PSPACBt..'>{}mpiete if the bet is {O,l}. But if we allow the abbreviation (ar~ denoting a ~ a, then the pro~
will become intractable and has been shown to have eA.'])onential spaee complail;Emptinessofcomplement of Regular Expressions
Instance: Given a regular expression R over the alphabet E.
Question: Is the complement of the language denoted by R empty?
By constructing of a PSPACE nondeterministic Turing machine, the Emptinesscomplement oj Regular Expressions problem can be proved to be PSPACEcompl~
[1]. Since the PSPACE class is known to include the set of intractilble problems, Emptinessojcomplement qf Regular Expressions problem is intractable.
2.2.2 Semiextended Regular Expressions
Since the class of regular set is closed under intersection, it does not increase the of sets described by adding the intersection operator to "ordinary" regular expn
sions. The semiextended regular expression allows the intersection operator to 24
the Emptinessojcomplement oj Semiextended Regular Expressions problem.
Emptinessofcomplement of Semiextended Regular Expressions
Instance: Given a semiextended regular expression R over the alphabet L:.
Question: Is the complement of the language denoted by R empty?
The Emptinessojcomplement oj Semiextended Regular Expressions problem has .proved to not in NPSPACE, by the relationship between PSPACE and NP, we that under the assump~ion P =I NP, we have NP is a subset of PSPACE, and subset of NP, hence the Emptinessojcomplement oj Semiextended Regular Expre
sions problem is not to be in P, NP, or PSPACE, that is, there is no polynomialbounded or polynomialspacebounded a:lgorithm for the Emptinessojcomplement
Semiextended Regular Expressions problem. Furthermore, it has been shown that algorithm to solve the Emptinessojcomplement oj Semiextended Regular Expressio'
problem requires more than c' cvn/logn time and space, where c' ~ 1 and c ~ 2, the length of the semiextended regular expression[l].
,tt
One equivalent problem of the Emptinessojcomplement oj Semiextended Regul
Expressions problem is that whether a given semiextended regular expression denot
all strings over its alphabet, so any algorithm to decide whether a semiextendl
'f
regular expression denotes all strings over its alphabet requires at least PSPAC
complexity, particularly, it has been proved its space and time complexity are
least c' cvn/logn, where c' ~ 1 and c~ 2, n is the length of the semiextended regul
25
problem for semiextended regular expressions. Fortunately, it has been proved the equivalent problem for semiextended regular expression requires c' cvn/logn, c' ~ 1 and c~ 2, and an infinity of n's.
2.2.3 Extended Regular Expressions
Similar to the intersection operator, the class of regular set is closed under the plementation operator. Extended regular expressions allow intersection and comp:
mentation operators in regular expressions. Formally, it is defined as the following
Definition 6 (Extended Regular Expression) [1]
An extended regular expression over an alphabet L: is defined as follows:
1. E, '/) and a for each in L:, are extended regular expressions denoting {€}, empty set, and {a}, respectively.
2. If R 1 and R2 are extended regular expressions denoting the languages ££2J respectively, then (R1 + R 2), (R1 • R 2), (R1*), (R1 n R 2), and (. R extended regular expr~,*sions, denoting £1 U £2, £1£2J £1 *, £1 n £2, and £1, respectively.
Adding the intersection and complementation operators will make the length expressions denoting certain regular languages shorter, but some problems for
tended regular expressions require more time than regular expressions.
26
Jor Extended Regular Expressions.
Emptiness Problem for Extended Regular Expressions
Instance: Given a extended regular expression R over the alphabet :E.
Question: Is the language denoted by R empty?
The Emptiness Problem Jor Extended Regular Expressions is harder than the Emptin<
ojcomplement oj Regular Expressions problem or the Emptinessojcomplement
Semiextended Regular Expressions problem.
If a function g(m, n) is defined as:
n for m = 0
g(m, n) = { 29(ml,n) for m > 0 Then a function is elementary if it is bounded above for all but a finite set of n's g(mo, n) for some fixed mo.
Based on the elementary function and impossibility of construction of determinist
Turing machines for extended regular expressions, it has been shown that there
.'"
no elementary function S(n) for which the Emptiness Problem Jor Extended Regul
Expressions is of space complexity with S(n) [1]. Moreover, there is no S(n)spa(
bounded or S(n)timebounded deterministic Turing machine to decide whether language denoted by a extended regular expression is empty or not.
27
beyond PSPACE and any algorithm to solve the Emptiness Problem for Extend(
Regular Expressions requires at least exponential time and space complexity. [r particular, the Emptiness Problem for Extended Regular Expressions requires at g(m,n) time complexity, where m is any finite number. Therefore, the Emptine.
Problem for Extended Regular Expressions is intractable.
28
Chapter 3
Emptiness Problem for Extended
Regular Expressions
3.1 An Algorithm for Solving the Emptiness Prob
lem for Extended Regular Expressions
In this thesis we present an algorithm to solve the emptiness problem for extendE
regular expressions. By analyzing the complexity of our algorithm, we verify the emptiness problem for .e~tended regular expressions requires at least exponenti
. time and space. Our algorithm consists of two parts: one is constructions of automata, the other is the reachability algorithm.
29
1
Finite automata are useful tools to present regular expressions. They are defined the followings:
Definition 7 (Finite Automaton) [10]
A finite automaton, or finitestate machine (abbreviated FA) is a 5tuple (qo, <5, A), where
• Q is a finite set (whose elements we will think of as states),
• ~ is a finite alphabet of input symbols,
• qo E Q (the initial state),
• A ~ Q (the set of accepting states), and
• <5 is a function from Q x ~ to Q (the transition function).
For any element q E Q and any symbol a E ~, we interpret <5(q, a} as the state
which the FA moves, if it is in state q and receives the input a.
Definition 8 (Nondeterministic Finite Automaton) [10]
A nondeterministic finite automaton, (abbreviated NFA), is a 5tuple M = (qo, A, <5), where Q, ~, qo, and A are defined as for FAs, and the transition functi.
1,S
<5 : Q x ~ + p(Q) As usual, p(Q) means the power set of a set Q.
30
A nondeterministic finite automaton with Atransitions (abbreviated NFAA) 5tuple M = (Q, L:, qo, A, 0), where Q, L:, qo, and A are defined as jar FAs, and transition junction is
0: Q x (L:U {A}) + p(Q) p(Q) means the power set oj a set Q.
The above three models of finite automata are equivalent. The following theorems us the connection between finite automata and regular expressions. In the proof Kleene's theorem [10], structural induction is used, that is, construct a finite automc
ton equivalent to the given extended regular expression by combining finite automat
which is corresponding to the subexpressions of the given extended regular expressior
Theorem 1 (Kleene's Theorem) [10]
Any regular language can be accepted by a finite automaton.
The extended regular expression is obtained by adding the intersection and complE
mentation operators to the,t~gular expression. Kleene's theorem has showed that FA accepting the regular expression can be constructed. In order to construct an equivalent to the given extended regular expression, we only need to show that complementation and intersection can be accepted by an FA. As in Kleene's theoren
structural induction is used.
31
0 0
(a) (b) (c)
Figure 3.l: NFAAs for fIJ, {A}, {a}.
Theorem 2 Any language denoted by an extended regular expression can be accepte
by a finite automaton.
Proof: It is sufficient to show any language obtained by an extended regular pression can be accepted by an NFAA. The set of languages obtained by extende<
regular expressions over the alphabet L; is defined to be the smallest set of language
containing the basic languages fIJ, {A}, and each of the languages {a} (a E L;). class of regular sets is closed under the operations of union, concatenation, Kleene*
complementation, and intersection. Using structural induction to prove that any guage denoted by an extended regular expression over an alphabet L; can be accepte(
by an NFAA, we must show that the three basic languages can be accepted by NFAA, and that if £1 and £2 are languages that can be accepted by an NFAA, their union, concatenation, Kleene*, complementation, and intersection can also accepted by an NFAA. .f'
NFAAs for the three basic languages are obvious. They are shown in Figure 3.1.
Suppose that £1 and £2 are accepted by the N FAAs M1 and M2 , respectively. and M2 are defined as the following:
32
1
necessary). We will construct NFAAs Mu, Me' Mk, Mcom , and Min, recognizing The constructions of Mu, Me, and Mk are given in Kleene's theorem. Here we describ
the constructions of Mcom and Min'
(1) Construction of Mcorn = (Qcorn, E, qcom, Acorn, ocom). Since £1 is accepted by NFAA M1 = (Ql, E, ql, AI, 01), by the equivalence of NFAA and DFA, £1 can accepted by a DFA.
• Qdl = P(Ql)' p(Qd means the power set of a set Qll
be a DFA recognizing £1,' then Mcom can be obtained by swapping the acceptin
states with the nonaccepting states of Mdl . That is,
33
(2) Construction of Min There are two methods to construct Min'
The first one is to construct Min directly. Since £1 and £2 are accepted by the NFAM1 and M2 , respectively, £1 and £2 can be recognized by the DFAs.
Let Md1 = (Qdl' E, qdl, Ad1 , Od1) and Md2
recognizing £1 and £2, respectively.
• Oin is defined as follows:
,f'
for all PI E Qdl, P2 E Qd2, and a E E.
The second method is based on De Morgan's law.
34
Since languages are sets, all set operations on languages are inherited from those sets. De Morgan's law applies to sets, so it also applies to languages. If L1 and L2 regular languages, then their complements ,(L1) and ,(L2) are regular languages
Since the regular sets are closed under union, (,(L}))U(,(L2 )) is regular. Henci
,((,(Ld)u('(L2 ))) is regular. Therefore, L1nL2 is regular.
Since L} and L2 are accepted by NFAAs M} and M2 , respectively, from the construe
tion of Mcom , we can construct FAs Mcom1 and Mcom2 accepting ,(L1 ) and ,(respectively. By Kleene's Theorem, an NFAA MU1 accepting (,(L1 ))U(,(L2 )) be constructed. Using the construction of Mcom again, we can construct an FA accepting ,((,(Ld)u(,(L2))). Obviously, MJ accepts L}nL2 .
3.1.2 Reachability Problem
The two most common computational representations of graphs are adjacency and adjacency matrices.
The adjacencylist representation of a graph G = (V, E) consists of an array Adj IVllists, one for each vertex in V. For each U E V, the adjacency list Adj[u] contain
all the vertices v such that there is an edge (u, v) E E. For the adjacencymatri
representation of a graph G = (V, E), we assume that the vertices are numberel
35
G that consists of a IV Ix IVI matrix A. = (aij) such that
a.. = {1 if {i, j} EE, tJ 0 otherwise.
For any constructed FA M = (Q, L;, q, A, 0) that recognizes the given extendel
regular expressions, if M has k final states, i.e., k = IAI, where k ~ 1, then we convert M into an NFAA M' with exactly one final state F by setting F = {A) I Fi E A }.
Without loss of generality, we may assume that the first vertex in the graph is start state of the NFAA, the last vertex in the graph is the final state of the NFAIf there exists a path from the first vertex reachable to the last vertex in the graph
i.e., there exists a string accepted by the NFAA, then the language denoted by extended regular expressions is not empty. Otherwise, the language denoted by extended regular expressions is empty.
We use the depthfirst search algorithm to solve the reachability problem. The depth
first search algorithm is described as follows [1.3}:
.r<
DFS(G)
1. for each vertex u E V[G]
2. do colorful + WHITE
3. 1f[u] + NIL
36
5. for each vertex u E V[C]
6. do if colorful = WHITE
7. then DFSVisit(u)
DFSVisit(u)
1. colorful ~ GRAY
2. d[u] ~ time ~ time + 1
3. for each v E Adj[u]
1 4. do if color[vl = WHITE
5. then 7f[v] ~ U
6. DFSVisit(v)
7. colorful ~ BLACK
8. flu] ~ time ~ time/too 1
In the depthfirst search, each vertex is initially white, becomes grey when it is covered in the search, and becomes black when it is finishes. Let the start state of NFA be the root of a new tree in the depthfirst forest. If the final state of the is discovered or finished, Le., the color of the final state of the NFA is changed white to grey or black, after the depthfirst search algorithm is done, then there 37
accepted by the NFA, and the language denoted by the extended regular expression:
is not empty, Otherwise, the language denoted by the extended regular expression:
is empty.
3.2 A Complexity Analysis
Let us consider the time and space complexities of our algorithm.
In the procedure of constructions of FAs, since in Kleene's theorem, NFA is for the union, concatenation and Kleene closure operators, the number of states additive. The complementation operator requires a conversion from an NFA DFA, so the number of states increases exponentially. For the intersection operator
if we construct the FA using the first method, then the number of states increases the product. If we use the second method which depends on the complementatiOI
operator, then the number of states increases exponentially. So the procedure constructions of FAs for extended regular expressions requires at least exponentia
time and space.
Considering the time complexity of the solution for the reachability problem, let be the number of vertices, and lEI be the number of edges in the graph G(V, then the running time of depthfirst search algorithm is 8(IVI + lEI). For the spaCI
complexity of the solution for the reachability problem, obviously, the adjacency requires O(IVI + lEI)·
consists of (l)constructions of FAs and (2) the reachability algorithm. Since of FAs require exponential time and space, our algorithm for solving Emptiness Problem for Extended Regular Expressions requires at least exponential
time and space.
3.3 Conclusions
In this thesis, we solve the Emptiness Problem for Extended Regular Expressions constructing FAs equivalent to the given extended regular expressions and applying
depthfirst search algorithm on the reachability problem.
The pr?cedure of constructions of FAs works by creating N~AA for the three basic
. languages 0, {A}, and {a} and then constructing FAs for the union, concatenation
and intersection of two languages accepted by FAs, and the Kleene closure, and complementation
of a language accepted by FAs. By structural induction, all language~
denoted by extended regular expressions over the alphabet 2; can be accepted by FA.
.t'
For the procedure of solution to the reachability problem, all vertices can be reached
from the start state of the NFA will be discovered in the depthfirst search algorithm
The algorithm will explore all the possible paths that exist between the start statE
and the final state of the NFA. If there is a path from the start state to the state, then the algorithm can find it.
39
Emptiness Problem for Extended Regular Expressions is intractable and it requires least exponential time and space.
,f'
40
Bibliography
[1] AHO, HOPCROFT, AND ULLMAN. NPcomplete problems. The Design Analysis of Computer Algorithm (1974), pp. 395423.
[2] G. L. MILLER. Riemann's hypothesis and tests for primary. J. Comput. Syste17l
Sci. (1976), pp. 300317.
[3] H. B. HUNT. On the time and tape complexity of languages. Doctoral Thesis)
Dept. of Computer Science, Cornell University, Ithaca, NY. (1973).
[4] H. B. HUNT, AND T. G. SZYMANSKI. Complexity metatheorems for contextfree
grammar problems. J. Comput. System Sci. 13 (1976), pp. 318334.
[5] J. S. VITTER, AND R. A. SIMONS. Parallel algorithms for unification other complete problems in P. In Proceedings of the 1984 Annual Conference .ft
the ACM on the Fifth Generation Challenge (1984), pp. 7584.
[6] 1. J. STOCKMEYER. Planar 3colorability is NPcomplete. SIGACT New.
(1973), pp. 1925.
41
NPcomplete problems. In Proceedings of the Sixth Annual ACM Symposium Theory of Computing (1974), pp. 4763.
[8] M. R. GAREY, AND D. S. JOHNSON. Computers and Intractability.. W.Freeman and Company, 1979.
[9] M. AGRAWAL, N. KAYAL, AND N. SAXENA. Primes is III http://www.cse.iitk.ac.in/news/primality.html (2002).
[10] M. SIPSER. Introduction to the Theory of Computation. PWS Publishing Company,
1997.
[11] S. HOMER, AND A. L. SELMAN. Computability and Complexity Theory.
Springer, 2001.
[12] S. R. BUSS. The polynomial hierarchy and fragments of bounded arithmetic. Proceedings of the Seventeenth Annual A CM Symposium on Theory of Computing
(1985), pp. 285290.
[13) T. H. CORMEN, C. E. LEISERSON, AND R. 1. RIVEST. Introduction to ·ft
gorithms. The MIT Press, 1999.
[14] V. PRATT. Every prime has a succinct certificate. SIAM J. Comput. (1975), 214220.
[15] W. J. SAVITCH. Relationship between nondeterministic and deterministic tape
complexities. J. Comput. System Sci. (1970), pp. 177192.
42
Conf. on Information Science and Systems (1974), pp. 2123.
·ft
43ı
Appendix A
A Program to Solve the Elllptiness
Problem for Extended Regular
Expressions
44ı
A PROGRAM TO SOLVE THE EMPTINESS PROBLEM FOR
EXTENDED REGULAR EXPRESSIONS
The emptiness problem for extended regular expressions is described as follows:ı
Instance: Given a extended regular expression R over the alphabet]; .ı
Question: Is the language denoted by R empty.ı
The algorithm consists oftwo parts:
1. constructions of finite automata for the given extended regular expressions
2. the depth_fIrst search algoritlun to solve the reachability problem
,The algorithm is implemented in a C/C++ program. Th~ program includes two files:
pgm1.h and main.cpp.
.,..
PROGRAMMERS' GUIDE
This program is to solve the emptiness problem for extended regular expressions.
It is implemented in Microsoft Visual C++. The program consists of two files: one head
file (pgml.h) and one C/C++ source code .file (main.cpp). In the program, we define one
class,. two structures, one main function and twentyseven subroutines.
There are two following structures:
1.ı State: it holds four elements: start,. final, value and color. Start is a character to
check the state is the start state ofthe finite automaton or not. Ifthe state is the
start state, then we assign 'Y' to the element state, else we assign 'N' to the
element state. Final is a character to check the state is the final state of the
finite automaton or not. If the state ~s the final state, then we assign 'Y' to the
element final, else we assign 'N' to the element final. Value is a character to
give the input data of states. In the program, let us assume that the symbol '$'
is not in the alphabet of extended regular expressions,. then we initialize values
of all states in the finite automaton to be '$'. The above three elements are
used in the constructions of FAs. The fourth element of the state structure is
.rcolor,
which is a character to assign the color to every state in the finite
automaton. It could be 'w', 'g', or 'h': 'w' means white; 'g' means grey; 'h' means
black. The element color is used in the depth_first search algorithm to solve
the reachability problem.
PROGRAMMERS' GUIDE
This program is to solve the emptiness problem for extended regular expressions.
It is implemented in Microsoft Visual C++. The program consists of two files: one head
file (pgm1.h) and one C/C++ source code file (main.cpp). In the program, we define one
class, two structures, one main function and twentyseven subroutines.
There are two following structures:
1.ı State: it holds fOUf elements: start, final, value and color. Start is a character to
check the state is the start state ofthe finite automaton or not. If the state is the
start state, then we assign 'Y' to the element state, else we assign 'N' to the
element state. Final is a character to check the state is the final state of the
finite automaton or not. If the state ~s the final state, then we assign 'Y' to the
element final, else we assign 'N' to the element final. Value is a character to
give the input data of states. In the program, let us assume that the symbol '$'
is not in the alphabet of extended regular expressions, then we initialize values
of all states in the finite automaton to be '$'. The above three elements are
used in the constructions of FAs. The fourth element 01 the state structure is
,f'
color, which is it character to assign the color to every state in the finite
automaton. It could be 'w', 'g', or 'h': 'w' means white; 'g' means grey; 'h' means
black. The element color is used in the depth_first search algorithm to solve
the reachability problem.
Every element in the array is a structure of state. Size is an integer denoting
the number of states in a finite automaton.
There"is one class named FA: it holds a variable and twentyone functions.
1.ı The variable called m_fa which is a pointer pointing to a structure of fa.
2.ı Twentyone functions are described as follows:
•ı FA( )  it creates the constructor function
•ı ~FA( )  it creates the destructor function
•ı GetUserInput (char *, char *)  it gets the alphabet and extended
regular expressions from the input
•ı Get_Array(int *, char *)  it decides the highest priority ofoperators
in the extended regular expression
•ı Get_Max(int *, char *)  it gets the maximum in an. integer array
•ı Get_Left(char *, char *, int}  it gets the left hand side of the k
elements in a given sentence of size 3
•ı Get_Right(char *, char *, int)  it gets the right hand side of the k
elements in a given sentence ofsize 3
•ı Get_Left...L{)p(char *, int *, char *, int)  it gets the left hand side ofthe
k elements in a given sentence
•ı Get_Right_Op(char *, int *, char *, int)  it gets the right hand side of
the k elements in a given sentence
•ı Is_In_Sigma (char *, char)  it checks a character is in the alphabet or
not
•ı Transfer_Basic (fa *, char *, char)  it transfers the tlrree basic
languages into a structure of fa
•ı Get_C_Value (int n, int m)  it calculates the choice number which is
the number of ways of picking m unordered outcomes from n
possibilities
•ı Get C Sum M (int n, int m_b, int m_e)  it calculates the sum of
some choice numbers, n is unchanged
choice numbers, m is unchanged
•ı Get_Index (int *, int, int)  it calculates the index III a DFA
corresponding to an NFA
•ı Sort_Array (int *, int)  it sorts one array in increasing order
•ı NFA_To_DFA (fa *, fa *, char *)  it transfers an NFA to a DFA by
using the subset construction
•ı Union (fa *, fa *, fa *)  it does the union operation on two structures
of fa
.
•ı Concatenatj.on (fa *, fa *, fa *)  it does the concatenation operation on
two structures of fa
•ı Star (fa *, fa *)  it does the Kleene star operation on one structure of
fa
•ı Complementation (fa *, fa *)  it does the complementation operation
on one structure of fa
•ı Intersection (fa *, fa *, fa *)  it does the intersection operation on two
structures offa
•ı Func1 (char *, char *, int)  it does the operation on the highest
priority operator from an input string
•ı Func2 (fa *, char, fa *, fa *, char *)  it does five operators for the
given extended regular expressions
•ı FuncO (fa *, char *, char *, int)  it transfers the extended regular
expressions to a "tree" structure and calls the corresponding
subroutines to do five operations on structures of fa
•ı DFS_Visit (fa *,. int)  it changes the color of vertices on the path
starting from the start state of the finite automaton, it is used for
function DFS
• DFS (fa *)  it uses depthfirst search to solve the reachbility problem
There IS one main function: it inputs the alphabet and extended regular
expressions and output the given extended regular expression is empty or not
The program requires exponential time and space.
This program is using C/C++ to solve the emptiness problem for extended regular
expressions. The operating system is Windows XP/2000/98/95. It runs in Microsoft
Visual C++ (Version 6.0).
Users may use the tools of the menu bar in Microsoft Visual C++ (Version 6.0) to
compile and execute the program. After clicking the "!" button in the tool bar, the
program executes, and one window to get the input and show the output comes out on the
screen.
In the program, we use the following denotation when defining the five operations
on extended regular expressions:
•ı 'u' denotes union
•ı '1' denotes intersection
•ı 'c' denotes concatenation
•ı '!' denotes implement
• ,*, denotes star
The input of the program consists of two parts: the alphabet and extended regular
.f'
expressions. Such as the follows:
1.ı fu the program, the alphabet could be any character except'$' and five
operators that are 'U', '1', 'C', or', and '*'. For example: "abcdefg",
"1234567", etc.
•ı One message "please enter the alphabet" to ask users to input the alphabet.
•ı Users may enter (1) "abc", then press the "return" key.
!~.'\
or (2) «a" then press the «return" key.
2.ı In the program, we suppose that every extended regular expression is fully
parenthesized. For example: «aUb)I(aUc)), (a*), (b!), etc.
•ı One message "please enter the extended regular expression" to ask users
to input the extended regular expression.
•ı Users may enter (1) "«aUb)I(aUc)" (or «a!") then press the "return" key.
or (2) "a!" then press the "return" key..
The output of the program is one message to show that the extended regular
expression is empty or not. Such as the output ofthe above input will be:
•ı (1) The extended regular expression is not empty.
•ı (2) The extended regular expression is empty.
.f·
Ilpgrn1.hı
Ilpgm1.h is the head file of the program.ı
IIIt defines all classes, structures, and functions.ı
11m this program, there is one class: FA,ı
I/two struotures: state and fa,ı
Iione main function, and twentyone called functions.ı
IIı
11==
I!Define a structure named state which holds four elements:ı
liThe first three elements are: start, final, and valueı
Iistart is a character to check the state is the startı
Iistate of the finite automaton or not:ı
llifthe state is the start state, then we assign 'Y' toı
lithe element state, else we assign 'N' to the element state.ı
lifinal is a character to check the state is the finalı
Iistate of the finite automaton or not:ı
Ilif the state is the final state, then we assign 'Y' toı
lithe element final, else we assign 'N' to the element final.ı
Ilvalue is a character to give the input value of states:ı
IILet us assume that the symbol '$' is not in the alphabetı
Ilof extended regular expressions. Then we initialize the valueı
IlofaU states in the finite automaton '$'.ı
liThe above three elements are used in the constructions ofFAs.ı
liThe last element ofthe state structure is color,ı
Ilwhich is a character to assign the color to every state inı
lithe finite automaton. It could be 'w', 'g', or 'b'.ı
II'w' means white; 'g' means grey; 'h' means black.ı
Ilcolor is used in the depth_first search algorithm toı
Iisolve the reachability problemı
11==========
typedef stroct state{
char start; .f'
char final;
char value;
char color;
}state;
11================
l!Define a structure named fa which holds two elements:ı
I/node and size. Node is a twodimension array. Every elementı
Ilin the array is a structure of state.ı
IISize is an integer denoting the number of states in a faı
II
typedefstruct fa{
state node[SO][80J;
int size;
}fa;
//==ı
I!Define a class named FA.ı
IIThis class has a public variable m faı
Iland twentyone functions ~
1/=========================ı
class FA {
public:
fa* ill_fa;
public:
1/
I/constructor and destructor
/1'=========================
FAO; .
~FAO;
/1======================
lifive operator functions: Union, Concatenation,
118tar, Complementation, and Intersection
11'===================
void Union (fa* pM3, fa* pMl, fa* pM2);ı
void Concatenation(fa* pM3, fa* pM!, fa* pM2);ı
void Star(fa* pM3, fa;'pMl);ı
void Complementation(fa* pM2, fa* pM!);ı
void Intersection(fa* pM3, fa* pM!, fa* pM2);ı
II
lifunction NFA To DFA is to convert an NFA to a DFA
11=================== =====
void NFA_To_DFA (fa* pMd, fa* pMn, char* Sigma);
II ===:::=: =====
lifunctions FuncO, Func! and Func2 are used to
/Itransfer the extended regular expressions
//'=================
typedef struct fa{
state node[80] [80];
int size;
}fa;ı
//'==============='ı
/fDefine a class named FA.ı
//This class has a public variable rn_faı
//and twentyone functions "ı
//'=========================ı
class FA {
public:
fa* rn fa",
public:
//'========================
//constructor and destructor
//'========================
FAO; .ı
FAO;ı
/1'==============,===========
//five operator functions: Union, Concatenation,
//Star, Complementation, and Intersection
//'===================
void Union (fa* pM3, fa* pMI, fa* pM2);ı
void Concatenation(fa* pM3, fa* pMI, fa* pM2);ı
void Star(fa* pM3, fa"pMI);ı
void Complementation(fa* pM2, fa* pMl);ı
void Intersection(fa* pM3, fa* pMl, fa* pM2);ı
//'========================
//function NFA_To_DFA is to convert an NFA to a DFA
//'=========================
void NFA_To_DFA (fa* pMd, fa* pMn, char* Sigma);
//'========================
//functions FuncO, Funcl and Func2 are used to
//transfer the extended regular expressions
llsubroutines to do the operation on five operators
1/
void FuncO (fa* resultO, char* SigrnaO, char* sentenceO, int lengthO);
struct fa* Func1 (char* Sigmal, char* sentence, int length) ;
void Func2(fa* pRes,char op, fa* pL, fa* pR, char* Sigma);
//'====:=======================
//function GetUserInput is to get the alphabet
//and extended regular expressions
/1'=========================
void GetUserInput(char* sigma, char* input);
//'=========================
IIfunction Get_Array, Get_Max, Get_Left, Get_right,
I/Get_Left_Op, and Get_Right_Op are to get the left_hand
/Iside and the right_hand side ofthe operator in
//a given sentence
1/==================:========
void Get_Array(int* arrayl, char* sentencel);ı
int Get_Max(int*· array2, char* sentence2);ı
void Get_Left(char* LHS, char* sentence2, int k);ı
void Get_Right(char* RHS, char* sentence3, int k);ı
void Get_Left_Op(char* LHS, int* array, char* sentence2, int k);ı
void Get_Right_Op(char* RHS, int* array, char* sentence2, int k);ı
/f================:======
f/ftmctions Get_C_Value, Get_C_SuID_M, Get:C_Sum_N, and Get_Index
flare to get the index in the DFA corresponding to the NFA
//'============================
int Get_C_Value(int l}.. int m);ı
int Get:.C_SuID_M(int n, int ill_b, int ill_e);ı
int Get_C_Suffi_N(int n_b, int b_e, int m);ı
int Get_Index(int *p, int size, int n);ı
11'=========================
flftmction Transfer_Basic is to transfer the three basic
Ilianguage into fa structures
1/'=========================
void Transfer_Basic(fa* pM, char *Sigma, char c);
!ı
11=========
lifunction Is_In_Sigma is to check a character is
Ilin the alphabet or not
11============
bool Is_In_Sigma(char *Sigma, char c);
11'============================
lifunction Sort_Array is to sort an array in
/Iincreasing order
11=================,======,===
void Sort_Array(int*p, int size);
11==========================
lifunctions DFS_Visit and DFS use depthfirst search
lito solve the reachability problem
11=========================
void DFS_Visit(fa* pMM, int u);ı
void DFS(fa* pMM);ı
};
·ft
/Imain.cpp
/IThis program is implemented in Microsoft Visual C++ 6.0
1/After compiling and executing the program, one
I/message "please enter the alphabet" appears
lion the screen. The user enters the alphabet
/Isuch as "abcdefg" and then presses the enter key.
IIAnother message "please enter the extended regular
/Iexpressions" comes out. The user can enter the
/Iextended regular expressions and presses the enter
l/key. Finally, after the user enter
lithe alphabet and extended regular expressions,
lithe program will output whether the languages denoted by the
/Iextended regular expressions
/lis empty or not.
11'======
#include <iostream.h>
#inc1ude <string.h>
#include <iomanip.h>
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include "pgm1.h"
1/==============
II Functions declaration
11======:=====
void Union (fa* pM3, fa* pMl, fa* pM2);ı
void Concatenation(fa* pM3, fa* pMl, fa* pM2);ı
void Star(fa* pM3, fa* pMl);ı
void Complementation(fa* pM2, fa* pMl);ı
void lntersection(fa* pM3, fa* pM!, fa* pM2);ı
.f"
void NF'A_To_DFA (fa* pMd, fa* pMn, char* Sigma);
void FuncO (fa* resultO, char* SigmaO, char* sentenceO, iut lengthO);
struct fa* Funcl (char* Sigma!, char* sentence, int length) ;
void Func2(fa* pRes,char op, fa* pL, fa* pR, char* Sigma);
void GetUserInput(char* sigma, char* input);ı
void Get_Array(int* arrayl, char* sentencel);ı
int Get_Max(int* array2, 'char* sentence2);ı
void Get_Left(char* LHS, char* sentence2, int k);ı
void Get_Right(char* RHS, char* sentence3, int k);ı
~~~..'.
   
void Get_Left_Op(char* LHS, int* array, char* sentence2, int k);
void GetYight_Op(char* RRS, int* array, char* sentence2, int k);
int Get C Value(int n, int m);
int GetCSum M(int n, int rn_b, int m_e);
int GetCSumN(int n b, int b_e, int m);
int Get_Index(int *p, int size, int n);
void Transfer Basic(fa* pM, char *Sigma, char c);
bool Is In Sigma(char *Sigma, char c);
void Sort_Array(int*p, int size);
void DFS_Visit(fa* pMM, int u);
void DFS(fa* pMM);
//========================
//constructor function
//==================
FA::FAO {
}
1/===============ı
Iidestructor functionı
1/'==============================ı
FA::~FAO {}
1/=========:==
/Imain function: its input is sigma and extended regular expressions,
I/its output is the given extended regular expression is empty or not
11===================
·r'
int main(void) {
char sigma[80} = {O};
char input[80] = {O};
GetUserInput(sigma, input);
fa* rfa = new fa;ı
FA* pFA = new FAO;ı
if(strlen(input) = 1) Ilfor the case ofinput which is only one character
{
/f0 means empty set
cout « "the extended regular expression is not empty'" « '\0';
else
cout « "the extended regular expression is empty" «\n';
}
else
{
for(mt i=O; i<80; i++)
for(mt j=O; j<80; j++)
{
rfa>node{ilf.j].final = 'Nf~
rfa>node[i][j].start = 'N';
rfa>node[i][j].value = '$';
1/ $ denote initial value and it is not in the sigma
}
FuncO{rfa, sigma, input, strlen(sigma»;ı
DFS(tfa);ı
intflag=O;ı
fortin! IT = 0; IT <: rfa>size; rr++)ı
{ı
if«rfa>node[rr][rr].color != 'w') && (rfa>node[rr][rr].final = 'Y'»
flag = 1;
}
if (flag = 1)
cout« "the extended regular expression is not empty"« '\n';
else
cout « "the extended regular expression is empty" « '\n';
}
pFA>~FAO;
delete pFA;.
pFA=O;
return 0;
}
//'==========================
/lFllnction GetUserInput is to get the alphabet and extended regular
//expressions
//'====================='
I/O means empty set
cout « "the extended regular expression is not empty" « '\n';
else
cout « "the extended regular expression is empty" « '\n';
}
else
{
for(int i=O; i<80; i++)
for(int j=O; j<80; j++)
{
rfa>node[i][j].final = 'N';
rfa>node[i)[j].start = 'N';
rfa>node[i][j].value = '$';
1/ $ denote initial value and it is not in the sigma
}
FuncO(rfa, sigma, input, strlen(sigma»;
DFS(rfa);
int flag = 0;
for(int IT = 0; rr < rfa>size; rr++)
{
if((rfa>node[rr][rr].color != 'w') && (rfa>node[rr][rr].final = 'Y'»
flag = 1;
}
if (flag = 1)
cout« "the extended regular expression is not empty" « '\il";
else
cout « "the extended regular expression is empty" « '\n';
}
pFA>FAO;
delete pFA;
pFA=O;
return 0;
}
11'=========================
I/Function GetUserInput is to get the alphabet and extended regular
Ilexpressions
1/'========================
void GetUserInput(char* sigma, char* input)
{
char buffI80] = {O};
cout « "please enter an alphabet" « end!;ı
cin.getline(buff,80,'\n');ı
strcpy(sigma, buff);ı
cout « "please enter an extended regular expression" « endl;ı
cin.getline(buff, 80, '\n');ı
strcpy(input, buff);ı
}
/1'=========================
II Function Get_Array is to decide the highest priority of operators
II in the extended regular expression
1/'====:======================,
void Get_ArraY(int* array1, char* sentencel)
{
int len = strlen(sentencel);
for (int index =0; index < len; index++)
{
if (sentence I [index] = '(')
arrayI [index] = arrayI [indexI] + 1;
else if (sentenceI [index] = '}')
arrayl[index] = arrayI [indexI]  1;ı
elseı
arrayI [index] = array1[indexI];ı
}ı
}
//==========================
II Function Get_Max is to gettthe maximum in an integer array
II . .
int Get_Max(int* array2, char* sentence2)
{
int len = strlen(sentence2);ı
int max_value = 1;ı
for(int i=O; i<len; i++)ı
{ı
if(array2[i] > max_value)ı
max_value = array2[i];ı
}ı
return max_value;
}
//==========================
// Function Get_Left is to get the left hand side ofthe k elements
II in a given sentence ofsize 3
11====:=====================
void Get_Left(char* LHS. char* sentence2. int k)
{
char Ieft[80] = {O};
for (int m=O; m<=k2; m++)
left[m] = sentence2[m+l];
for (int rom = kl; mm<80; mm++)
left[mm] = '\0';
strcpy(LHS, left);
}
11=========================
II Function Get_Right is to get the right hand side ofthe k elements
II in a given sentence ofsize 3
11====:=====================
void Get_Right(char* RHS, char* sentence3, int k)
{
int len = (int)strlen(sentence3);
char right[80] = {O};
for(int n=k+1; n<lenl; n++)
right[nkl] = sentence3[n];
for(int nn = lenk2; nn <80; nn++)
rigbt[nn] = '\0';
strcpy(RHS, right); .f'
}
11==========:===============
II Function Get_Left_Op is to get the left hand side of the k elements
II in a given sentence
11'=======================
void Get_Left_Op(char* LHS, int* array, char* sentence2, int k)
{
char left[80] = {O};
int i=k;
int index = 0;
if(array[i] >= array[kJ)
i;
else
exit(O);
}
index = i;
for(int j= index+1; j<k; j++)
left[jindexI] =sentence2[H;
for(intjj =kindexl;jj<80;jj++)
left[jj] = '\0';
strcpy(LHS, left);
}
11'=====:==:::==================
II Function Get_Right_Op is to get the right hand side oillie k elements
II in a given sentence
/1===========================
void Get_Right_Op(char* RHS, int* array, char* sentence2, int k)
{
charright[80]= {OJ;
int i=k;
int index = 0;
while{i< (int)strlen{sentence2»
{
if(array[i] >= array[k])
i++;
else
break;
}
index = i; ,r'
for(int j= k+1; j<:index; j++)
rightfjkl] = sentence2fj];
forCint jj = index k; jj<80; jj++)
right[jj] = '\0';
strcpy(RHS, right);
}
11'========================
/I Function ls_In_Sigma is to check a character is in the alphabet or not
1/===============
for (int i=O; i<len; i++)
if(Sigma[i] = c )\1 (c = 'e'»
return true;
return false;
}
//'==================:=======
// Function Transfer_Basic is to transfer the three basic languages
/1 into an fa structure
//
void Transfer~asic(fa*pM, char *Sigma, char c)
{
for (int i=O; i<2; i++)
for (intj=O;j<2; j++)
{
pM>node[i][j].start = 'N';
pM>node[i][j].value = '$';
pM>node[i)[j].final = 'N';
}
if(c = 'e') {//empty string which is epsilon
pM>node[O][O].start = 'Y';
pM>node[O] [l].value = 'e';
pM>node[l][l].final = 'Y';
}
else if(c = '0') /1"0" denotes empty set
pM>node[O][O].start = 'Y';
else if(Is_In_Sigma(Sigma, c» {
pM>node[O][O].start = 'Y';
pM>node[O][l).value = c;
pM>node[l][J..,].final = 'Y';
}
pM>size = 2;
}
//=========================
// Function Union is to do the union operator on two fa structures
//==========================
void Union (fa* pM3, fa* pMl, fa* pM2)
{
int rr=O, cc=O;
for(rr= 1; rr<rl+l; rr++)ı
for (ee = 1; ec < rl +1; cc++)ı
pM3>node[rr],[ec].value = pMl>node[rrI] [ecI].value;
pM3>node[O][I].value = 'e';
pM3>node[O][r1+I].value = 'e';
for(rr = rl+ 1; rr < rl +r2+1; rr++)
for(ec =r1+1; ce <r1+r2+1; cc++)
pM3>node[rr], [ee].value = pM2>node[rrr11] [ecrll ].value;
pM3>node[rl][rl +r2+1].value = 'e';ı
pM3>node[r1+r2][r1+r2+1].value = 'e';ı
pM3>node[O][O].start = 'V';ı
pM3>node[rl +r2+1][r1+r2+l].final = 'Y';ı
pM3>node[1][1].start = 'Nt;ı
pM3>node[rl+ I ][rl+1].start = 'N';ı
pM3>node[r1],[rl].final = 'N';ı
pM3>node[r1+r2Url+r2].final = 'N';ı
pM3>size = pM1>size + pM2>size + 2;ı
}
11===========================
II Function Concatenation is to do the concatenation operator on
II two fa structures
If
void Concatenation(fa* pM3, fa* pMl, fa* pM2)
{
int rr, cc;
int r 1 = pMl>size;
int r2 = pM2>size;
for (rr = I; IT < rl +1; H++)ı
for(cc= 1; cc <r1+1; cc++)ı
pM3>node[IT], [cc],.value = pMI>node[rrI] [ccl ].value;
pM3>node[rl][rl +l).value = 'e';
for(rr = rl +1; IT < rl+r2+ 1; rr++)
forecc = Jj1+1; cc < rI +r2+1; cc++)
pM3>node[rr],[cc].value = pM2>node[rrrl1 ],[ccrlI].value;
pM3>node[rl+r2][rl+r2+1],.value = te';ı
pM3>node[O)[I].va1ue = 'e';ı
pM3>node[O][O).start = 'Y\
pM3>node[rl +r2+1][rl +r2+1].final = 'Y';
pM3>node[1][1].start = 'N';ı
pM3>node[rl +1][rl +1].start = 'N';ı
pM3>node[rl][rl].final = 'N';ı
pM3>node[rl +r2][rl +r2].final = 'N';ı
pM3>size = pM l>size + pM2>size + 2;
}
11'==
II Function Star is to do the Kleene star operator on one fa structure
11'====
void Star(fa* pM3, fa* pMl)
{
int IT, cc;ı
int rl = pMl>size;ı
pM3>node[O][1].value = 'e';
pM3>node[rl][l].value = 'e';ı
for(IT = 1; IT < rl +1; IT++)ı
forecc = 1; cc < r1+1; cc++)ı
pM3>node[IT] [CC].value = pM l>node[IT1] [cc1].value;
pM3>node[r1][rl +1].value = 'e';
pM3>node[O][rl+1].value = 'e';
pM3>node[O][O].start = 'Y';
pM3>node[1][1].start = 'N';
pM3>node[rl+l][rl+1].final = 'Y';ı
pM3>node[rl][rl].final = 'N';ı
pM3>size = pMl>size + 2;ı
.,.
}
11'====================
II Function Complementation is to do the complementation opertor on
Ilone fa structure
11'===
void Complementation(fa* pM3, fa* pMl)
{
struct fa* pM2 = new fa;
int rl = pMl>size;
for(int rr = 0; rr < r1; rr++)
for(int cc = 0; cc < r1; cc++) {
if(pM 1>node[rr] [cc]. final = 'Y')
pM2>node[rr][cc].final ='N';
else if(pM l>node[rr] [cc] .final = 'N')
pM2>node[rr] [cc].final = 'Y';
pM2>node[rr][cc].start = pMl>node[rr][cc].start;
pM2>node[rr] [cc].value = pMl>node[rr] [cc].value;
}
for(int r = 0; r < rl; r++)
for(int cc' = 0; cc < r1 ; cc++)
{
pM3>node[rr+1][cc+l].value =pM2>node[rr][cc].value;
pM3>node[rr+1][cc+1].final = pM2>node[rr][cc].final;
pM3>node[rr+1][cc+l].start = 'N';
if(pM2>node[rr][cc].start = 'Y')
pM3>node[0][rr+l].value = 'e';
if(pM2>node[rr][cc].final = 'Y')
pM3>node[rr+l][pMl>size+l].value = 'e';
}
pM3>node[O][0].start = 'Y';
pM3>node[pMl>size+ 1][pM1>size+1] .final ='Y';
pM3>size = pMl>size + 2;
delete pM2;
}
11==============
II Function Intersection is to do the intersection operation on two fa
II structures by applying De Morgan rule .
II ,~.r=='===:=== ====void
Intersection(fa* pM3, fa* pMl, fa* pM2)
{
struct fa *M3 = new fa;
struct fa *CM1 = new fa;
struct fa *CM2 = new fa;
Complementation(CMl,pMl);
Complementation(CM2,pM2);
Union(M3, CMl, CM2);
Complementation(pM3, M3 );
}
II
II Function Sort_Array is to sort one array in increasing order
II
void Sort_Array(int* p, int size)
{
int t=O;
for(int i=O; i<size  1; i++)
{
if(p[i] > p[i+1J)
{ı
t = p[i];ı
p[iJ = p[i+ IJ;ı
p[i+l] = t;ı
}
}
}
11===:==
// Function Get_C_Value is to get choice number which is the number of
1/ ways of picking m unordered outcomes from n possibilities.
II
int Get_C_Value(int n, int m)
{
intNum=l;ı
int Den=l;ı
int Value = 1;ı
if((n=O) II (m=O))
return Value;ı
else{ ,r'ı
for(int ii=n; ii > nm;'ii)ı
Num = Num*ii;ı
for(int j=nl; j>O; j)ı
Den = Den*j;ı
Value = (int)(NumlDen);ı
return Value;ı
}
}
/1===========================
// Function Get_C_Sum_N is to get the sum of choice numbers
II from lin_begin choose mil to lin_end choose mil
II
int Get_C_Sum_N(int n_begin, int n_end, int m)
{
int sum=O;
for(int i=n_begin; i> n_endI; i)
sum = sum + Get_C_Value(i, m);
return sum;
}
II
II Function Get_C_Sum_Mis to get the sum of choice numbers
II from "n choose m_begin" to "n choose m_end"
II
int Get_C_Sum_M(int n, int m_begin, int m_end)
{
int sum=O;
for(int i=m_begin; i>m_endl; i)
sum = sum + Get C Value(n, i);
return sum;
}
11·====
II Function Get_Index is to get the index in dfa corresponding to
II the nfa
11====
int Get_Index(int* p, int size, int n)
{
int index=O;
int sl=O;
int s2=0;
ints3=0;"
if(size = 0)
index = 0;
else if(size = 1)
index = p[O]+1;
else{
sl = Get_C_Sum_M(n, 1, sizeI);
s3 = p[sizel]  p[size2]; lito get s2
for(int k=O; k<= size2; k++)
s2 = s2 + Get_C_Sum_N(nkl, np[k]+l, size2k);
index = sl+s2+s3;
}
return index;
}
II
II Funciton NFA_To_DFA is to transfer an NFA to a DFA by using the
II subset construction. It is used in doing the complementation operation
II ==
void NFA_To_DFA (fa* pMd, fa* pMn, char *Sigma) {
int Mn_num = pMn>size;
int Md_num = (int)pow(2,Mn_num);
int rr, cc, jj;
int k=O;
if(Md_num >= 65536)
{
cout« "out of space! " «'m';
exit(O);
}
for(rr = 0; rr < Md_num; rr++)
for(cc = 0; cc < Md_num; cc++)
{
pMd>node[rr][cc].start = 'N';
pMd>node[rr][cc].final = 'N';
pMd>node[rr] [cc].value = '$';
}
int count = 0;
int Ccount = 0;
int v_count = 0;
int u_count =0;
int p[20] = {O};
int fl20] = {O};//final
int v[lOO] = {O}; Ilvalne
int u[20] = {O};
int x[lOO] = {O};
int w[20][20] = {O};
int w_size[20] = {O};
int r_count = 0;
int c_count = 0;
int x_count = 0;
pMd>size = Md_num;
for(rr = 0; rr < Mn_num; rr++)
if(pMn>node[rr] [rr].start = 'V')
{
p[count++] = IT;
for(jj = 0; jj < Mn_num; jj++)
{
if(pMn>node[rr][jj].value = 'e')
{
if(jj != p[countI])
{
p[count] = jj;
count++;
}
IT = JJ;
jj=O;
} l/if
} Ilfor jj
Sort_Array(p, count);
int index_dfa= Get_Index(p, count, Mn_num);
pMd>node[index_dfa][index_dfa].start = 'Y';
} l/if
} Ilfor IT
for(rr = 0; rr < Mn_num; rr++)
{
if(pMn>node[rr][IT].final = 'Y')
{
f[ C count++] = IT;
int index_Cdfa = Get_Index(f, Ccount, Mn_num);
pMd>node[index_Cdfa][index_Cdfa].final = 'Y';
for(k=O; k< Mn_numI; k++)
{
if(k !~)
, {
f[Ccount] = k;
Ccount++;
Sort_Array(f, Ccount);
int index_Cdfa = Get_Index(f, Ccount, Mn_num);
pMd>node[index_Cdfa][index_Cdfa].final = 'Y';
}
}
} II if
} II for rr
int S[20] = {O};
int SUb_set[20] = {O};
int index = 0;
intj =0;
int len = strlen(Sigma);
for(k = 0; k < len; k++)
{
iut flag = 0;
{
for(rr = 0; IT < Mn_num; rr++)
{
for(cc = 0; cc < MU_l1um; cc++)
{
if(pMn>node[rr][cc].value = Sigma(k])
{
flag = 1;
v[v_count++] = cc;
u[u_count++] = IT;
for(jj = 0; jj < Mn_num; jj++)
{
if(pMn>node[cc] [jj].value = 'e')
{
if(jj != v[v_countI])
{
v[v_count] = jj;
v_count++;
}
cc = JJ;
jj = 0;
or} Ilif
, }//for jj
Sort..:.Array(v, v_count);
Sort_Array(u, u_count);
int index_dfa = Get_Index(v, v_count, Mn_num);
int index_dfa_u = Get_Index(u, u_count, Mn.num);
pMd>node[index_dfa_u] [index_dfa]. value = Sigma[k];
while« v_count >= 2) && (v_count <= Mn_num»
{
for(int i=O; i<v_count; i++)
{for(int j=O; j<Mn_num; j++)
{
if(pMn>node[v[i]][j].value = Sigma[kD
{
x[x_count++] = j;
} llif
for(int m=O; m < Mn_num; m++)
{
if(pMn>node[j] [m].value = 'e')
{
if(m != x[x_countID
{
x[x_count] = m;
x_count++;
} llif
J =m;
m=O;
}/lif
}llfor m
} Ilfor j
}llfor i
Sort_Array(x, x_count);
int index_dfa_x = Get_Index(x, x_count, Mn_num);
pMd>node[index_dfa] [index_dfa_x].value = Sigma[k];
if((v == x) II (x_count = Mn_num))
return;
for(int n=O; n<x_count; n++)
v[n] = x[n];
v_count = x_count;
} Ilwhile
} llif
} Ilfor cc
if(flag == 0)
pMd>node[IT+l][O].value = Sigma[k];
} Ilfor IT .r'
} Ilfor set_size
} Ilfor k
for(int mm=O; mm < (int)strlen(Sigma); mm++)
pMd>node[O] [O].value= Sigma[mm];
for(int i= 0; i < Md_num; i++)
{
intj=O;
int flag 1 = 0;
for(j=O; j< Md_num; j++)
{
if(pMd>node[i][j].value != '$')
flagl = I;
}
if(flagl == 0)
{
for(int mm=O; mm < (int)strlen(Sigma); mm++)
pMd>node[i][O].value= Sigma[mm];
}
}
}
//============ === ===c===
// Function Func2 is to do five operators for the given extended
// regular expressions
//=======:====
void Func2(fa* pRes,char op, fa* pL, fa* pR, char *Sigma)
{
struct fa* pDL = new fa;
struct fa* pDR = new fa;
struct fa* pResl = new fa;
switch (op) {
case 'U':
Union(pRes, pL, pR);
break;
case 'C':
Concatenation(pRes, pL, pR);
break;
case '*':
Star(pRes, plJ);
break; ,
case '!':
NFA_To_DFA(pResl, pL, Sigma);
Complementation(pRes, pRes l);
break;
case'!':
NFA_To_DFA(pDL, pL, Sigma);
NFA_To_DFA(pDR, pR, Sigma);
Intersection(pRes, pDL, pDR);
break;
default:
cout« "An invalid operation!" «endl;
II
}
}
II Function Func1 is to transfer a given string to a "tree" structure,
II then it do the operation on the highest priority operator
11===='
struct fa* Func1 (char* Sigmal, char* sentence, int length)
{
int count = 0;
int array[80] = {O};
char left[80] = {O};
char right[80] = {O};
int max_op=O;
struct fa* result = new fa;
FA* left_result = new FAO;
FA* rightJesult = new FAO;
struct fa* Middle = new fa;
struct fa* F_left = new fa;
struct fa* F_right = new fa;
Get_Array(array, sentence);
int k=O;
while(k< (int)strlen(sentence))
{
booI bSymbol = false;
bSymbol = sentence[k] = 'U' II sentence[k] = '*'
II sentence[k] = 'I'
" sentence[k] ~ 'C' II sentence[k] = 'l';
if (array[k] / 1 && bSymbol)
{
Get_Left(left, sentence, k);
Get_Right(right, sentence, k);
Transfer_Basic(F_left, Sigmal, left[O]);
Transfer_Basic(F_right, Sigmal, right[O]);
Func2(result, sentence[k], F_left, F_right, Sigmal);
return result;
}
else {
k++;
}
}I/while
delete F_left;
delete F_right;
return result;
}
1/======
II Function FuncO is to transfer the extended regular expressions
II to a "tree" structure and call the corresponding subroutines
II to do the operation on five operators
1/=========
void FuncO(fa* resultO, char* SigmaO, char* sentenceO, int lengthO)
{
int count = 0;
int array[80] = {O};
char left[80] = {O};
charright[80] = {O};
int max_op=O;
FA* leftJesult = new FAO;
FA* right_result = new FAO;
struct fa* Middle = new fa;
struct fa* F_left = new fa;
struct fa* F_right = new fa;
Get_Array(array, sentenceO);
max_op = Get_Max(array, sentenceO);
int k=O;
while(k< (int)strlen(s(ffitenceO))
{ ,
bool bSymbol = false;
bSymbol = sentenceO[k] = 'U' " sentenceO[k] = '*'
" sentenceO[k] = 'I'
II sentenceO[k] = 'C' " sentenceO[k] = 'l';
if((array[k] = 1) && bSymbol)
{
Get_Left_Op(left, array, sentenceO, k);
if(strlen(left) = 1)
Transfer_Basic(F_left, SigmaO, left[O]);
else if(strlen(left) <= 5)
F_left = Func 1(SigmaO, left, lengthO);
else
FuncO(F_left, SigmaO, left, lengthO);
Get_Right_Op(right, array, sentenceO, k);
if(strlen(right) = 1)
Transfer_Basic(F_right, SigmaO, right[O]);
else if(strlen(right) <= 5)
F_right = Func1(SigmaO, right, lengthO);
else
FuncO(F_right, SigmaO, right, lengthO);
Func2(resultO, sentenceO[k], F_left, F_right, SigmaO);
return;
}
else
k++;
}llwhile
}
11====
II Function DFS_Visit is to change the color of vertices on the
II path starting from vertex u. It is used for function DFS
11===
void DFS_Visit(fa* pMM, int u)
{
int 1= pMM>size;
pMM>node[u][u].co}or = 'g';
for(int v = 0; v < 1; v++)
{
if(pMM>node[u][v].value != '$')
{
if(pMM>node[v] [v].color = 'w')
{
DFS_Visit(pMM, v);
}
}
II
}
pMM>node[u][u].color = 'b';
}
II Function DFS is to use depthfirst search to solve the
II reachbility problem
11=====:===
void DFS(fa* pMM)
{
int 1= 1;
int S = 0;
int F = 0;
int len = pMM>size;
int IT = 0;
for(IT = 0; IT < len; IT++)
{
pMM>node[IT][rr].color = 'w';
}
for(rr = 0; rr < len; IT++)
{
if((pMM>node[IT] [IT]. start = 'Y') && (pMM>node[rr][IT].color = 'w'))
DFS_Visit(pMM, IT);
}
}
.ft
}
pMM>node[u)[u).color = 'b';
}
11==========
II Function DFS is to use depthfirst search to solve the
II reachbility problem
II
void DFS(fa* pMM)
{
int 1= 1;
int S = 0;
int F = 0;
int len = pMM>size;
int rr = 0;
for(rr = 0; rr < len; rr++)
{
pMM>node[rr][rr].color ='w';
}
for(rr = 0; rr < len; rr++)
{
if((pMM>node[rr) [rr].start = 'Y') && (pMM>node[rr) [rr].color = 'w'))
DFS_Visit(pMM, rr);
}
}
VITA
Min Cai
Candidate for the Degree of
Master of Science
Thesis:ı THE EMPTINESS PROBLEM FOR EXTENDED REGULAR
EXPRESSIONS
Major Field:ı Computer Science
Biographical:
Education: Graduated from Jinshan High School, Chanzhou, Guan,gdong, China
in July 1993. Received Bachelor of Science degree in Mathematics from
Shantou University, Shantou, Guangdong, China in June 1997. Completed
the requirements for the Master of Science degree with a major in
Computer Science at Oklahoma State University in December, 2002.
Experience: Employed by PULON Computing Company as a programmer and
system analyst, Shantou, Guangdong, China in 1997. Employed by Nokia
Mobile Phone Company as a parttime programmer, HongKong, March
1999 to May 1999.