ALGORTTHMS FOR THE MTNTMUM-COST
SET-COVERING PROBLEM
By
CHIN-CHIEH CHIANG
Master of Science
Oklahoma State University
Stillwater, Oklahoma
2002
Submitted to the Faculty of the
Graduate College of the
Oklahoma State University
in partial fulfillment of
the requirements for
the Degree of
MASTER OF SCTENCE
August, 2002
ALGORITHMS FOR THE MTNIMUM-COST
SET-COVERING PROBLEM
Thesis Approved:
if J
." E .'
- I
..
11
PREFACE
The set-covermg problem is an Opt1m1zat10n problem that models many
resource-selection problems. For a smaller scale set covering problem, a greedy
solution is a good approximation compared to the optimal solution. However, as
the input set size gets larger, the ratio bound of the greedy solution filay grow
and be bounded by a logarithm function. In this paper, we explore some other
algorithms and generalize the set-covering problem so that each set in the given
cover has an associated cost, which is an arbitrary positive integer instead of one.
Srnce the set-covering problem is known to be NP-complete, it is almost
imposs1ble to determine the optimal solution for any large sized problem.
Indeed, solving the larger scale set covering problem would imply an enormous
computational effort, in both time and computer memory. In addition, this
paper investigates the performance of other algorithms, namely genetic
algorithms and simulated annealing algorithms, on the minimal se~-covermg
problem.
111
ACKNOWLEDGMENTS
I wish to express my sincere apprec1auon to my major advisor, Dr. John
Chandler for his intelligent supervision, constructive guidance, inspiration and
friendship. My sincere appreciation extends to my other committee members Dr.
Hedrick and Dr. Dai, whose guidance, assistance, encouragement, and friendship
are also invaluable.
tV
TABLE OF CONTENTS
Chapter 1. Introduction 1
Chapter 2. Problem Formulation u •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 2
Chapter 3 . Greedy Algorithm eo ••••• 3
Chapter 4. Optimal Algorithm 6
Chapter 5 . Genetic Algorithm eo 10
5.1 Introduction 10
5.2 Population size and initial population 11
5.3 Fitness of individuals 14
5.4 Parent selection technique 0- 15
5.5 Crossover operator ~ 15
5.6 Mutation operator 16
5.7 Population replacement ~ 18
5.8 Overview 19
Chapter 6 . Simulated Annealing Algorithm 20
Chapter 7 . Computational results 23
Chapter 8 . Conclusions 28
Appendix 31
v
Chapter 1. Introduction
Let X be a set of m elements, and let F be a collection of subsets Sj eX,
n
1~ j ~ n with X =USj. Set covering is the problem of finding the smallest
j=l
number of subsets in F that cover all elements.
Now, we generalize the set-covering problem so that each set Sj in the family
F has an associated cost and the cost of a cover C is the sum of the costs
associated with the chosen subset Sj E c. The minimal cost set covering is the
problem of finding the cover C with the smallest cost.
Because the problem of selecting a m1nimum number of subsets that cover all
elements in X is NP-complete, adding the constraint of m1nimal cost make the
problem more complicated. In the rest of the paper, we will focus on the general
case and explore four algorithms, a greedy approximation, an optimization
algorithm using the method of stacks and splits, a genetic algorithm and a
simulated annealing algorithm.
1
Chapter 2. Probletn Fonnulation
We first introduce some notat1ons. Let X ={1,2,3, ...,m} and let F
=: {Sl' S2' S3'···' Snl with X =: U=ISj. For each Sj' there is an associated cost
cj =cost(SJ, j=:l, ...,n. We will assume that for each iE X ,there is at least one
Sj E F covering i. Let M = [au] be the mx n matrix whose j - th column is the
characteristic vector of Sj. In other words, Qij =1, if i E Sj , and au =0, otherwise.
Furthermore, let the column vector c = (c)' ) denote the cost vector. Finally, we nxl
have the following set-covering problem:
Minim1ze cTy
Subject to
My 21
Yj E {O, 1}, j == 1,...,n
Note that the problem is sim1lar to an integer progratl11111ng problem. The
column vector y= (yj )nxl represents an instance in the solution space, where
Y. == 1 if s. is selected, and y. == 0 otherwise. J) J
2
5.
Chapter 3. Greedy Algorithtn
The technique of a greedy algorithm for the set-covermg problem is as follows:
iteratively, select the set that covers the most yet uncovered elements, until all
elements are covered. As for the minimal cost set-covering problem, each set has
a cost, and one needs to find a collection of sets of smallest total cost that covers
all elements. We use a greedy algorithm for this problem, but instead of picking a
set that covers the most yet uncovered elements, we pick the set with the
smallest average cost, which is the cost of the set divided by the number of yet
uncovered elements that it contains.
Greedy Algorithm (X , F )
1. U~X
2. c ~</J
3. cost ~ 0
4. while U *¢
d I S h
... cost(S)
o se ect an EFt at mID1ffi1zes ISnUI
6. U ~U \S
7. C~CU{S}
8. cost ~ cost +cost(S ) / / end while loop
9. return C, cost
In the rest of this section, we will discuss the ratio bound of the above greedy
algorithm. Let H (d )=L~-l ~ be the d - th harmonic number. Let C be the set
- l
cover obtained by the above algorithm and let C* be the minimal cost (opcimaI)
set cover. Then the ratio bound p would be cost(C)1 cost(C*).
3
Let Sl,S2,S3, ...,Sk be the sequence of sets E F chosen by greedy algorithm such
that U:=lSj = X. Therefore, cost(C) = [:=1 cost(S). Using the above sequence
of sets, a new partition of X can be defined inductively as follows:
j-l
S~ =Sj \ USI (1 < j s k)
/=)
() cost(Sj ) I) ()
Let cost x = IS~I for XES~, lsj5:k. Then cost\Sj =costS~ , lsjsk. Let
X),X2,X3 ' •••'Xm be a sequence of all elements of X in the ascending cost order. To
see the upper bound of cost(x): Given xj ' there is a unique S; for some b,
( )_ cost(Sb)_. cost(S) .
Observethatcostxj - 1'1 -mln{1 ( U U ~.SEF\{Sl' ...,Sb_I}}
Sb S \ 51 ... Sb-l
. cost(S) * U U ~ ffiln{1 ( U U l:SE C \{Sl .. Sb-I}=~}
S \ SI ... Sb-l
Therefore,
4
cost(C)= tcost(xJ~t cost(~*) =cost(c*).H(m)=cost(C*).H~XI)
j=) j=1 m - } + 1
So, the ratio bound p = H~Xj).
Moreover, p = H~xD~ 1+ fm .!dx =1+lnIXI.. J1 x
The ratio bound given in [6] is H{maxISI: S E F } with the assumption that the
cost of each set is one. Thus, the gap between the approximation solution and
the optimal solution gets larger if the constraint of costs is considered.
5
Chapter 4. Optitnal AIgorithtn
When searching for the optimal solution to a problem that has many feasible
solutions, the search may progress in one of several ways, depending on the
structure of the problem. The approaches are exhaustive search, dynamic
programming, a ~eedy algorithm, and branch and bound. The algorithm we
present here uses the method of splits and stacks to reach the optimal solution
with minimal cost that covers the given set.
Recall in chapter 2 that the set-covering problem can be formulated as covering
at a minimal cost all the rows of an m x n matrix of ones and zeros by a subset of
columns. That is,
Minimize cTy
Subject to
My 21
YjE {O, 1}, j=l,...,n
For convenience in developing the optimal algorithm, we may assume that th-e
entries in the cost vector c are non-decreasing with C1
~ C2 S ... ~ en. In~ practice,
for implementing this algorithm, \ve can fmd a permutation p on {1,2,3,..., n}
such that cpel) 5 Cp(2) 5 ... 5 Cp(n) taking time O(n In n) with any of the sortmg
methods introduced in [6].
Let Y ={O,lr > where n =I F I, then IY I=2n
• An instance y= (Yj)EY represents a
selection of {SI,SZ,...,Sn} such that Yj =1 if Sj is selected, and Yj =0 otherwise.
6
We say y is a feasible solution if the union of corresponding subsets is a cover of
x . Also that y has a corresponding cost, namely cost (y) =t Yj · cost(Sj ).
j=l
The search space is Y with 2n elements. The trivial solution is 1 = (1,1,1,... ) and
the infeasible solution is 0= (0,0,0,...,0) in any set covering problem. OUf tracking
for getting the optimal solution begins with 1 == (1,1,1,...,1). For any given parenty
=(YI' Yz, Y3 ,..., Yn)' flip each entry (bit) of y to generate a child and continue this
process to generate all c"hildren of y until flipping meets the first 0 or reaches the
end.
For instance, if the parent is (11011), then the first child is (01011) obtained by
flipping the first entry of the parent, the second child is (10011) by flipping the
'. second entry of the parent, and then stop since the third entry of the parent is o.
If the parent is (11111), then there are five children obtained by flipping each
entry of the parent. Note that the Hatnt111ng distance of the parent and any of
the children is one.
Split Algorithm (parent)
1 i ~l
2 while (parent[i]:;t 0 and i ~ n)
3 do for j ~ 1 to n
4 do child[j] Ir- parent[j] / / end for loop
5 child[i] ~ 0
6 i ~ i + 1 / lend while loop
In the process of generating children, the set covering cost corresponding to
each child is non-increasing, and the set covering cost of each child is less than
7
that of the parent. Besides, the union of subsets corresponding to each child is
contained in that of the parent. Below example shows the implementation of the
Split Algorithm:
Example: n == 4
(1111) (0111)
(1011) --. (0011)
(1101) ~ (0101)
~ (1001)--. (0001)
(1110) ~(0110)
~ (1010)--. (0010)
(1100)~ (0100)
~ (1000)--. (0000)
From above example, it is easy to see that the tree generated by the Split
Algorithm has the following properties:
1. The set covering cost among the sibling is non -increasing in the order of
their ages.
2. If the parent is infeasible, then all its children are infeasible.
Property 2 removes infeasible solutions from the search space and hence
improve the convergence of optimal solution. By virtue of property 1, children
among the sibling are pushed onto a stack in the order of their ages to maintain
the top one in the stack is optimal at the run time. Note that all items in the stack
are feasible solutions and the youngest child (minimal cost) is always the top one
at that moment.
8
Optimal Algorithm
/ / end inner while loop
/ lend outer while loop
optimal ~1
Push(Stack, 1)
while stack"* l/J
do y ~ Pop(Stack)
ify is not reducible
then update optimal
else while y is reducible
do Push(Stack, y. child)
1.
2.
3.
4.
5.
6.
7.
8.
9.
10. return optimal
In line 5, we define that y is reducible if it has at least one child which is a
feasible solution. Thus, if the child is z and Mz 2:: 1, then y is reducible. The
computation of Mz~l takes time 8(mn) , but it can be improved to 8(m). To
see this: if we already have the result of My, we can get the result of Mz by
subtracting some column vector of M from My since the Hamming distance
bernteen y and z is one.
9
Chapter 5 . Genetic Algorithtn
5.1 Introduction
The theoretical foundations of genettc algorithms (GAs) were originally
developed by Holland. He began his work on genetic algorithms in the early 60s.
A first achievement was the publication of Adaptation in Natural and Artificial
Systems [14] in 1975. Later, David Goldberg, a student of John Holland, is one
of the preeminent researchers in the field. David Goldberg's Genetic Algorithms in
Search, Optimization and Machine Learning [12] is by far the best introduction to
genetic algorithms.
The idea ofGAs is based on the evolutionary process of biological organisms in
nature. During the process of evolution, natural populations evolve according to
the principals of natural selection and "su1\Tival of the fittest". Individuals \\7hich
are more successful in adapting to their environment will have a better chance of
surviving and reproducing, while individuals which are less successful will be
eliminated. This means that the genes from the highly fit individuals will spread
to an increasing number of individuals in each successive generation. The
combination of good characteristics from highly adapted ancestors rna)' produce
even more fit offspring.
A GA simulates these processes by taking an initial population of individuals and
then applying genetic operators in each reproduction. Each individual in this
population is encoded into a string or chromosome which represents a feasible
solution to a given problem. The fitness of an individual is evaluated with respect
to a given objective function. Highly fit individuals or solutions are given
opportunities to reproduce with other highly fit individuals by exchanging pieces
10
of their genetic information. This crossover procedure produces new offspring
solutions, which share some characteristics taken from both parents. Mutation is
often applied after crossover by altering some genes in the chromosomes. The
offspring can either replace the whole population or replace less fit individuals.
This evaluation-selection-reproduction cycle is repeated until a satisfactory
solution is found. Below are the basic steps of a simple Genetic Algorithm. The
more detailed of each step will be discussed later.
Genetic Algorithm
1. Generate an initial population
2. Evaluate fitness of individuals in the population
3. Repeat
4. Select parents from the population
5. Recombine parents to produce children
6. Evaluate fitness of the chlIdren
7. Replace some or all of the population by the children
8. Until a satisfactory solution has been found
5.2 Population size and initial population
The size of the population and the initial population are chosen such that the
solution domain is suitably covered by the population. To determine the size of
population, we need to consider the density of a set-covering problem which is
the ratio of ones in the matrix. If the density of a randomly generated SCP is p,
then the average number of ones in each solution is )Ip. In order for the union
of individuals in the population to cover the entire solution domain, we need to
11
have the population size at least N such that N x jp > n, where n is the column
size of the matrix m by n. Therefore, the population size we choose here is
cnp for some positive integer c ~ ].
In the process of initializing the population, we expect each individual 1S
generated randomly, but the individual may be not a feasible solution. To
conquer this, extra variables to record the information for each row and column
are declared first in order to modify each randomly generated individual and
make it a feasible solution. We defme those variables as:
Let I = the set of all rows
J = the set of all columns
ri = {j E J Icolumn j covers row i }, for each i E I
cj == { i E I Irow i covers column j }, for each j E J
S== { j E J Icolumn j =I} with respect to an individual
U == { i E I Irow i is not covered yet} with respect to S
ltV; == t {j E S Icolumn j covers row i } I, for each i E I
Once the population size is determined, the way of generating each individual is
by stepping through each row of the matrix and selecting a column randomly
which covers that row. This would guarantee that each individual is a feasible
solution. We have the following algorithm:
Algorithm5.1.1
1. S~l/J
w. f- 0 for each i E I l
2. for each row i E I do
a. randomly select a column j E 1j
b. if j~ S then do
S~Su{j}
w. f- w. + 1for each i E C .
I I J
12
By the above algorithm, it is easy to see that each row would be covered by at
least one column in the resultant S. On the other hand, the resultant S may
contain redundant columns, so we used extra variables Wi in the algorithm to
update the row weight (number of columns in S that cover row i) whenever a
new column is added to S .
It is straightforward to remove redundant columns in the resultant S by
stepping through each column in S and checking the rows that it covers. If the
row weights Wi for those rows that it covers are all greater than or equal to 2,
then the column is redundant and safe to be removed. So, we have the following
algorithm:
Algorithm5.1.2 (feasible S)
1. for each j E S in increasing order
2. if Wi ~ 2 for each i E Cj
3. then
a. Sf- S\{j}
b. Wi f- Wi - 1for each i E Cj
Without loss of generality, recall that we assume the cost of each column is in
non-decreasing order. Therefore, if we modified step 1 in above algorithm by
selecting j E S in decreasing order, then the resultant S would be a cover with
smaller cost. However, when we initial the population, we prefer each individual
is randomly generated without redundant columns, and thus the population can
cover the solution domain. The cost of each individual is not a big issue at this
moment, because other operators, like crossover and mutation, will be applied
later to produce offspring and thus for those unfit individuals with heavy cost
will be replaced by their offspring. A slight modification for the above algorithm
is given as below:
13
Algorithm5.1.3 (feasible s)
1. tmpS f- S
2. while tmp~) =t </J do
3. select j E tmpS randomly
4. tmpS f- tmpS \ {j}
5. if Wi 2 2 for each i E c j then
a. S f- S \{j}
b. Wi (- Wi - 1for each i E C j
Finally, we have the following algorithm to generate the initial population:
Initial Population Algorithm
1. for if-I to popuSize N
a. AlgorithmS.l.1
b. Algorithm5.1.3
5.3 Fitness of individuals
Once the population is initialized, we might consider this population as a matrix
P with size N x n , wllere N the population size, and n the number of columns
in the set covering problem. The binary representation of an individual's
chromosome is a n - bit string. A value of 1 for the j - th bit implies the column
j is in the individual S (solution). The fitness of an individual is directly related
to its objective function value. With the binary representation, the fitness /; of
an individual i (the i - th row of matrix P) is calculated simply by
14
n
f =EcostU) * P[i] [j] where cost(j) the cost of column j and P[i] [j] the
j=)
ij - entry in matrix P . A lower fitness individual means less cost.
5.4 Parent selection technique
Parent selection is the task of assigning reproductive opportun1TIeS to each
ind1vidual in the population. The method used widely is proportionate selection.
IAn individual with lower fitness (cost) will have higher chance of being selected.
The probability of an individual i being selected is based on the probability
distribution, that is
Pr(i) = Yri ,where f 1S the fitness of the i - th indiv1dual and N 1S the E:·Yri
population size. For each mating, two different individuals will be selected in the
population for reproduction via crossover and mutation.
5.5 Crossover operator
The crossover operator works by randomly generating one crossover point and
then swapping segments of two parent strings to produce two child strings. For
example,
15
Let PI ==a1Q 2···a n' P2 ==b)b2 ··hn be two parent strings. Generate a crossover point
k, 1~ k < n . Then the two child strings are
C1 == Q I •••akbk+) •• bn , c2 == b1··bkQ k+l···a n. The purpose of applying the crossover
operator to the parent strings is to generate two child strings and to see if the
child strings have better fitness. Since the crossover point k is generated
randomly in [l,n), either one of the child strings could be identical to one or
other of the parent strings. To avoid this, the range of choosing k must be
restricted as below:
Note that the child strmgs might not be feasible solutions. The mutat10n
operator would be applied first before we make them into feasible solutions.
5.6 Mutation operator
The mutation operator is applied to each offspring after crossover. It is another
important operator that randomly changes the genes present in the
chromosome. The expected number of changes, called the mutation rate, is
either [tXed or varies according to a certain criterion. Mutation is considered as a
background operator that provides the ability to allow for a small amount of
random search. It also helps to guard against the loss of important genetic
information that is either not present at all in the initial population, or has been
discarded during earlier evolutionary process. Thus, it reintroduces valuable
16
genetic information into the current population and allows the genetic algorithm
to explore the new search space.
Notice that the offspring generated by the crossover and mutation operators
.may violate the problem constraints that some rows may not be covered. We use
a repair algorithm presented by Beasley [4] to maintain feasible solutions. This
repair algorithm is responsible for covering uncovered rows. To make an
offspring feasible, additional operators are needed. The notations will be the
same as we used before:
Let I = the set of all rows
J = the set of all columns
ri = {j E J Icolumn j covers row i }, for each i E I
Cj == { i E II row i covers column j }, for each j E J
S ~ { j E J Icolumn j == I} with respect to an individual
U == {i E I Irow i is not covered yet} \vith respect to S
Wi == 1{j E S (column j covers row i } I, for each i E I
The algorithm below searches for additional columns based on the ratio benveen
the cost of a column and the number of uncovered rows which it covers, and
hence it makes an offspring S into a feasible solution:
Algorithm5.5.! (offspring S)
1. w- f-I S n r. I, for all i E I I I
2. U == {i E I I Wi == 0 }
3. for each i E V
a. f11ld the column jEri such that cost(j)/IU n Cj llS
b. S~Su{j}
c. Wi f-Wi +lforeachiE cj
d. U ~U \cj
17
ffi1fltmUffi
It is interesting to note that once the repa1r process is completed and an
infeasible individual is converted into a feasible one, then a local optimization
step can be performed by the following algorithm which removes the
redundant columns from a feasible solution:
Algorithm5.5.2 (feasible s)
1. for each j E S in decreasing order
2. if w. ~ 2 for each i E C.
1 }
3. then
a. S (- S\ {j}
b. WI· f- W. - 1for each i E C . I )
In line 1 of algorithm 5.5.2, columns with higher costs are dropped first.
5.7 Population replacetnent
The replacement strategy for a genetic algorithm not only decides which
individuals that shall be replaced by the new offspring, but it also establishes the
types of replacement. The types of commonly used method are steady-state
replacement and generational replacement. The method of the former one is
that once a new feasible child solution is generated, the child will replace a
randomly chosen member in the population. In our implementation, we use the
latter method, in which the whole parent generation is replaced by a new
population of children.
18
5.8 OvervieW"
The following steps are used in our GA for the SCP:
1. Generate an initial population of N random feasible solutions.
2. Select two solution PI and pz randomly from the population.
3. Combine PI and Pz to form two new solutions C1 and C2 using the
crossover operator.
4. Mutate one randomly selected column in each Ci , and then make them
feasible and remove redundant columns.
5. If Ci is identical to anyone of the solutions in the population or more
cost than Pi' place Pi back into the population; otherwise Ci replace Pi
and insert into the population.
6. Repeat steps 2-5 until a satisfactory solution has been found.
19
Chapter 6 . Sitnulated Annealing Algorithtn
Annealing is a technique that was adopted from the thermodynamic process.
Simulated annealing is a problem solving technique based on the way in which a
metal is annealed in order to increase its strength. When a heated metal is cooled
very slowly, it freezes into a regular crystalline structure.
The objective here is that the energy of the crystalline solid should be a
minimum as in a perfect solid. Once the temperature is increased to a very high
value, a cooling schedule is adopted which is applied to the molten solid. That is,
the temperature is decreased in a regular fashion or decremented by a certain
value that results in a new state. As the temperature is decreased, if the energy of
the new state is less than the energy of the previous state, the new state is
accepted. If the decrease in temperature increases the energy of the new state,
the temperature is increased to its previous value. This process is repeated until a
crystalline solid is formed that is in a minimum energy state.
This idea has been applied to a number of optimization problems. The basic idea
of this approach is to start with an initial solution to the problem and to improve
the solution iteratively. The more iterations, the better the solution produced.
A simulated annealing algorithm searches for the optimum solution to a given
problem in an analogous \vay. Specifically, it moves randomly in the solution
space looking for a solution that minimizes the value of some objective function.
Because it is generated randomly, a given move malT cause the object function to
increase, to decrease, or to remain unchanged.
20
A simulated annealing algorithm always accepts moves that decrease the value of
the object function. Moves that increase the \Talue of the objective function are
accepted with the probability
p =e¥r
where ~ is the change in the value of the object function and T is a control
parameter called the temperature. A random number generator that generates
numbers distributed uniformly on the interval [0,1] is sampled, and if the sample
is less than p, the move is accepted.
By analogy with the physical process, the temperature T is initial by high.
Therefore, the probability of accepting a move that increases the 0 bjeccive
function is initial by high. The temperature is gradually decreased as the search
processes. In the end, the probability of accepting a move that increases the
objective function becomes very small. In general, the temperature is lowered in
accordance with an annealing schedllle.
21
The choice of suitable values for a, To, and Tf is highly problem-dependent.
Ho\vever, empirical evidence suggests that a good value for a is 0.95 and that To
should be chosen so that the initial acceptance probability is 0.8.
Finally, there is the question of selecting the initial solution from which to begin
the search. A key requirement is that it be generated quickly. Therefore, the
initial solution we select here is the result of a greedy algorithm.
Simulated Annealing Algorithm
1. Initialize To, Tf ' and a
2. Scurr (- greedy solution
3. while To > Tf do
a. Select next solution Snext in the neighborhood of Scurr
b. if cost(Snext) < cost( Scurr)
then Scurr ~ Snext
else
if random[0, 1] < exp{cost( Scurr) - cost(Snext) / To }
then Scurr ~ Snext
In step 3c, the method of generating uniformly random real numbers between 0
and 1 was introduced by Dr. Chandler [5]. D.H. Lehmer's multiplicative
congruent method is used to generate one long cycle that contains every integer
value of seed from 1 to 2147483646 exactly once. From that method, we can
produce real numbers distributed uniformly on the desired interval.
22
Chapter 7 . Cotnputational results
The algorithms presented in this paper were implemented in C++ and tested on
a PC with 128 ME of RAM and a CPU speed of 700 MHz. The test problem
sets were obtained electronically from OR-Library posted by J.E. Beasley. The
program code was tested on various sizes and densities. The set-covering
problem was solved by the approximation of each algorithm with up to 400 rows
and 4,000 columns.
Below table shows the SCP test files that we used in our implementation. The
density is determined from the Genetic Algorithm.
No. File Name Size(Row by column) Density Unicost
1 Scp41.txt 200 * 1000 0.020035 N
2 Scp410.txt 200 * 1000 0.01951 N
3 Scp42.txt 200 * 1000 0.019895 N
4 Scp43.txt 200 * 1000 0.019915 N
5 Scp44.txt 200 * 1000 0.020035 N
6 Scp45.txt 200 * 1000 0.019685 N
7 Scp46.txt 200 * 1000 0.0204 N
8 Scp47.txt 200 * 1000 0~O19585 N
9 Scp48.txt 200 * 1000 0.020075 N
10 Scp49.txt 200 * 1000 0.019765 N
11 Scp51.txt 200 * 2000 0.0199825 N
12 Scp510.txt 200 * 2000 0.0199975 N
13 Scp52.txt 200 * 2000 0.0199875 N
14 Scp53.txt 200 * 2000 0.020035 N
15 Scp54.txt 200 * 2000 0.019835 N
16 Scp55.txt 200 * 2000 0.019635 N
17 Scp56.txt 200 * 2000 0.0199825 N
18 Scp57.txt 200 * 2000 0.0201425 N
19 Scp58.txt 200 * 2000 0.0198 N
20 Scp59.txt 200 * 2000 0.019675 N
21 Scp61.txt 200 * 1000 0.017025 N
22 Scp62.txt 200 * 1000 0.0178075 N
23 Scp63.txt 200 * 1000 0.01814 N
24 Scp64.txt 200 * 1000 0.017315 N
23
25 Sep65.txt 200 * 1000 0.017175 N
26 Sepal.txt 300 * 3000 0.0201 N
27 Sepa2.txt 300 * 3000 0.0200789 N
28 Scpa3.txt 300 * 3000 0.0200833 N
29 Sepa4.txt 300 * 3000 0.02009 N
30 Sepa5.txt 300 * 3000 0.0200767 N
31 Scpbl.txt 300 * 3000 0.0499022 N
32 Sepb2.txt 300 * 3000 0.0498456 N
33 Sepb3.txt - 300 * 3000 0.0498844 N
34 Sepb4.txt 300 * 3000 0.0498722 N
35 Sepb5.txt 300 * 3000 0.04986 N
36 Sepel.txt 400 * 4000 0.020025 N
37 Scpc2.txt 400 * 4000 0.0199694 N
38 Scpc3.txt 400 * 4000 0.0199794 N
39 Scpc4.txt 400 * 4000 0.01998 N
40 Scpc5.txt 400 * 4000 0.0199706 N
41 Scpd1.txt 400 * 4000 0.0500819 N
42 Scpd2.txt 400 * 4000 0.0500581 N
43 Scpd2.txt 400 * 4000 0.050095 N
44 Scpd4.txt 400 * 4000 0.0500169 N
45 Scpd5.txt 400 * 4000 0.0500381 N
46 Sepe1.txt 50 * 500 0.19648 Y
47 Scpe2.txt 50 * 500 0.2004 Y
48 Scpe3.txt 50 * 500 0.20156 Y
49 Scpe4.txt 50 * 500 0.19796 Y
50 Sepe5.txt 50 * 500 020052 y
The density of a SCP is the ratio of ones in the matrix.
24
The table below shows the result of cost and execution time for each algorithm
on five SCP's with size 50*500, density near 0.2, and unicost. The initial
population size of the Genetic Algorithm was based on the density of SCP and
the number of iterations is set to 500.
File G cost Gtime SA SA GA GA GA
No. cost time cost time pop
46 5 0 5 0 6 0 98
47 5 0 5 0 5 0 100
48 5 0 5 0 5 0 100
49 "6 0 5 0 5 0 98
50 5 0 5 0 5 0 100
Notations: G (Greedy), SA (Simulated Annealing), GA (Genetic),
Pop (initial population size).
omeaI1S the execution is less than one second.
Similarly, we have the results tested on different set sizes various from 200*1000
to 400*4000. Also, we set the number of iterations for the Genetic Algorithm to
500.
File No. Gcost Gtime SA cost SA time GA cost GA time GApop
1 467 1 437 0 383 0 20
2 559 1 531 0 469 0 19
3 585 2 530 0 449 0 19
4 595 2 525 0 446 0 19
5 552 2 509 0 451 0 20
6 581 1 512 0 470 0 19
7 609 I 588 0 502 1 20
8 484 1 446 0 408 0 19
9 538 1 523 1 458 0 20
10 745 I 669 0 612 0 19
11 290 2 270 0 224 0 39
12 294 3 276 0 249 0 39
13 349 2 331 0 289 1 39
14 249 2 235 0 208 1 40
15 267 2 252 0 225 0 39
16 237 3 210 0 205 0 39
25
17 251 3 224 0 195 0 39
18 327 3 301 0 275 0 40
19 324 3 312 0 258 0 39
20 308 2 289 0 267 0 39
21 101 0 101 0 100 0 34
22 120 0 118 0 109 1 35
23 106 0 106 0 101 1 36
24 104 0 103 0 103 0 34
25 108 0 107 0 103 0 34
26 288 6 261 0 233 2 60
27 285 6 271 0 260 1 60
28 270 6 245 0 232 1 60
29 279 5 242 1 228 1 60
30 270 6 247 0 227 1 60
31 77 3 73 0 139 2 149
32 86 4 78 0 138 2 149
33 90 3 88 1 134 2 149
34 91 3 82 0 116 2 149
35 79 3 76 0 142 2 149
36 258 12 237 0 237 3 80
37 249 12 220 0 232 2 79
38 277 11 258 0 250 2 79
39 257 11 237 0 245 3 79
40 234 11 220 0 223 2 79
41 71 5 68 0 271 4 200
42 73 5 70 0 212 4 200
43 83 6 81 0 181 4 200
44 69 6 66 1 190 4 200
45 69 5 66 0 266 4 200
From the above table, we can see the costs and execution times for Genetic
Algorithm were performed better than the other !\\To -algorithms on test files
(No. 1-30 and 36-40).
The densities of those test files (No. 1-30, 36-40) are less than or around 0.U2.
For those test files (No. 31-35,41-45), the densities are around 0.05, and the
performance of the Genetic Algorithm \vas weaker. To improve this, the results
are in below table as we set the number ofiterations to 1000, 1500, respectively.
26
File No. Greedy T=500 T = 1000 T = 1500
cost time cost time cost time cost time
31 77 3 139 2 71 4 68 4
32 86 4 138 2 81 4 73 5
33 90 3 134 2 78 3 75 5
34 91 3 116 2 75 3 75 5
35 79 3 142 2 77 4 68 5
41 71 5 271 4 65 6 60 9
42 73 5 212 4 77 7 65 9
43 83 6 181 4 77 7 72 10
44 69 6 190 4 72 7 61 10
45 69 5 266 4 67 7 62 9
Greedy vs. GA for each T~ where T the number of iterations in GA.
27
Chapter 8 . Conclusions
The approach was implemented and tested on various s et sizes and densities to
compare the costs and execution times for each algorithm. We have presented
the results from our experiments with a greedy algorithm, a simulated annealing
algorithm, and genetic algorithm. The simulated annealing algorithm takes the
greedy solution generated by the greedy algorithm as the initial solution and then
randomly search the solution space to improve the cost within a very short time.
The genetic algorithm is a very powerful method which can obtain very good
results in a reasonable time in comparison with the other t\vo algorithms.
28
References
[1] A.O.L. Atkin, B.]. Birch, Computers in Number Theory, Academic Press Tnc. (London) Ltd.,
1971
[2] T. Back, Evolutionary Algon"thJns in Theory and Practice, Oxford University Press, Tnc., 1996
[3] J-P. Barthelemy, G. Cohen, A. Lobstein, Algorithmic Complexity, pp. 79-104, UCL Press
Limited, 1996
[4 ]J.E. Beasley and P.C. Chu, A genetic algorithm for the set covering problem. the
European Journal of Operational Research, June 195.
[5] John P. Chandler, class notes, unpublished.
[6] T. H. Carmen, C.E. Leiserson, R.L. Rivest, Tntroduaion to Algorithms, pp. 974-985, The MIT
Press, 1997
[7] W.T. Cook, W.H. Cunningham, W.R. Pulleyblank, A.Schrijver, Combinatorial Optimization,
John Wiley &_ Sons, Tnc., 1998
[8] A. Dolan, J. Aldous, Networks and Algorithms, pp. 403-411 , John Wiley & Sons, Ltd., 1993
[9] M.R. Garey and D.S. Johnson, Computers and Tntractabili!y: A Guide to the Theory ofNPCOJJJjJleteness,
\V.H. Freeman and Co., San Francisco, 1979
[10] K.O. Geddes, S.R. Czapor, G. Labahn, AlgorithmsfOr Computer Algebra, Kluwer Academic
Publishers, 1993
[11] 11. Gen, R. Cheng, Genetic Algorithms and Engineering Design, John \Viley & Sons, Tnc., 1996
[12] D.E. Goldberg, Genetic Algonthms in Search, Optimization, and Machine Learning, AddisonWesley
Pub Co, 1989
[13] R.L Haupt, S.E. Haupt, Practical Genetic Algon"thms, ] ohn Wiley & Sons, Tnc., 1998
[14] J.H. Holland, Adaptation in Natural and Artificial Systems, Bradford Books, 1975
[15] J.H. Holland, EmergencefirJm ChaoJ to Order, Perseus Press, 1998
[16] J.H. Holland, Genetic Algorithm, Scientific American, July 1992
[17] R. Horst, P.M. Pardalos,Handbook ofGlobal Optimization, Kluwer Academic Publishers,
1995
29
[18] Z. Michalewicz, Genetic Algorithms + Data Stmctures =Evolution Programs, Springer Verlag,
1996
[19] M.W. Padberg, M.P. RijaI, Location, Scheduling, Design and Tnteger Programming, Kluwer
Academic Publishers, 1996
[20] R. Sedgewick, An Tntroduaion to AnalYsis ifAlgorithms, Addison-Wesley Pub. Co, 1996
[21] R. Sedgewick, Algorithms, Addison-Wesley Pub Co, 1998
[22] W.T. Trotter, Combinatorics and PaJtial/y Ordered Sets, The Johns Hopkins University Press,
1992
[23] A. Tucker, Applied Combinatorics, John Wiley & Sons, 1980
[24] D).A. Welsh, Combinatorial Mathematics and its Applications, Academic Press Tnc. (London)
Ltd., 1969
[25] S.G. Williamson, CombinatoncsfOr ComputerScience, Computer Science Press, Tnc, 1985
30
Appendix
c++ code for the set-covering problem:
1. data.h
2. genetic.h
3. greedy.h
4. simuAnnealing.h
5. split.h
6. stack
7. genetic.cpp
8. greedy.cpp
9. main.cpp
10. simuAnnealing.cpp
11. split.cpp
12. stack.cpp
Note: Source code (40 KB) and test files (5. 13:MB) are available upon request at
c11iangc02a.cs.okstate.edu.
Example: Sample output of scpel.txt pp. 32
31
Test file: .. /SCPdata/scpe1.txt
Matrix size: 50 * 500
«< Begin at: Sun May 19 12:05:23 2002
Greedy Cost = 5
Greedy Solution
01000100000000000000010000000000000000000000000000000000000000000100000000000000
00000000000000000000000000000000010000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000
»> End at: Sun May 19 12:05:23 2002
Program takes 0 second(s)
«< Begin at: Svn May 19 12:05:23 2002
Simulated Annealing Cost = 5
Simulated Annealing Solution
0100,0100000000000000010000000000000000000000000000000000000000000100000000000000
00000000000000000000000000000000010000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000
»> End at: Sun May 19 12:05:23 2002
Program takes 0 second(s)
«< Begin at: Sun May 19 12:05:23 2002
Genetic Cost = 5
Density = 0.19648
Population Size = 98
Genetic Solution =
01000100000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000001000000010000000001000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000
»> End at: Sun May 19 12:05:23 2002
Program takes 0 second(s)
32
V"ta)
CHIN-CHIEH CHIANG
Candidate for the Degree of
Master of Science
Thesis: ALGORITHMS FOR THE MINIMUM-COST SET-COVERING PROBLEM
Major Field: Computer Science
Biographical:
Personal Data: Born in Kaohsiung city, Taiwan.
Education: Received Bachelor of Science degree in Applied Mathematics from
Feng-Chia University, Taiwan in June 1987; receive Master of Science
degree in Mathematics from University of Connecticut, Storrs, Connecticut
in May 1994; studied in the Ph. D. program of Mathematics at Michigan
State University from 1994 to 1998.
Completed the requirements for the Master of Science degree with a major in
Computer Science at Oklahoma State University in August 2002.
Experience: Taught at Kaohsiung Sr. High School, Taiwan as a Mathematics
teacher from 1989 to 1992.
Employed by Upperspace Corp. Oklahoma as a software developer,
2000 to 2001.
Professional Memberships: American Mathematical Society; passed Actuarial
Exam 100.