A STATISTICAL MODELING OF AN OLFACTORY
SYSTEM FOR ELECTRONIC
IMPLEMENTATION
By
RAHUL RA TILAL SHAH
Bachelor of Engineering
University of Po on a
Pune, India
1993
Submitted to the Faculty of the
Graduate College of the
Oklahoma State University
in partial fulfilment of
the requirement for
the Degree of
MASTER OF SCIENCE
July, 1996
A STATISTICAL MODELING OF Ai\'" OLFACTORY
SYSTEM FOR ELECTRONIC
IMPLEMENTATION
Thesis approved:
r.!heSiS Adviser
rp~
~=
Dean of the Graduate College
ii
PREFACE
This thesis attempts to provide an understanding of the clustering mechanism
observed in the olfactory neural network. A new model is proposed that has potential to
overcome the problems associated with the clustering mechanism. An additional layer of
network that can extend the proposed network to L VQ network is also discussed.
I wish to express my sincere gratitude to my adviser Dr. C. Hutchens for his
invaluable guidance, inspiration and financial support. I am also thankful to Dr. M.
Hagan and Dr. R. Ramakumar for for serving on my committee. I would like to thank Dr.
P. Shoemaker for his guidance throughout this work.
I extend my thanks to my friends for their everlasting support and encouragement.
I express my deepest appreciation for my parents for their love, understanding and
moral support.
Finally I would like to quote a couplet from ShreemadBhagavadGeeta that has
inspired me at the moment of frustration:
Thy jurisdiction is in action alone
Never in its fruits at any time
Never should the fruits of action be thy motive
Never let there be attachment in thee to inaction
iii
TABLE OF CONTENTS
Chapter Page
1. ARTIFICIAL NEURAL NETWORKS AND OLFACTION ... .. ..... ...... ....... ...... .... ...... 1
1.1 INTRODUCTION ... ......... ...... .......... ...... ............... ........ ...... ..... ...... .................. . ........... 1
1.2 BIOLOGICAL INSPIRATION .................................... ......... ........................ .............. .... .2
1.3 ARTIFICIAL NEURAL NETWORKS ........................... ....... ............... ......... : ................ .. 3
1.4 OLFACTORY SySTEM ..... ....... ...... .. ..... .... .. ... . .......... ......... ......... ................................ 5
1.5 REVIEW OF SOME IMPORTANT TOPICS .. ...... ..... .................................................. ....... 8
1.5.1 Hierarchical clustering mechanism ... ... ... ..... ........ .... ... .. .. ...... ......... ...... ........ 8
1.5.2 Associative Learning : Kohonen Network ....... ...... ... .. ... ... ... ................ ......... 9
1.5.3 WinnerTakeAll (WTA) Learning Rule ...... ...... ..... .......... ........ .... ..... ... .. .... 10
1. 5.4 Learning Vector Quantization (LVQ) Network .. ....... .......... ......... ............ .. 11
2. STATISTICAL MODEL OF OLFACTORY SYSTEM ........... . ............ .. ........ .......... 14.
2.1 OVERVIEW OF THE GLA MODEL ................ .. .. ...... .. .... .................. ...................... .. .. 14
2.1.1 Block diagram and basic operation ... .... .. ... .... ... ........... .. .... .. .... ....... ........ .. 14
2.1.2 Network description and nomenclature .... .. .... .. .... .. .......... .. .. ... .. .. ..... .. ....... . 17
2.2 OVERLAP AND HAMMING DISTANCE .............................................. .. ........... .. .. ....... 1 7
2.3 PROBABILITY OF WINNING ON MULTIPLE PATTERNS IN THE NAIVE NETWORK ........ 18
2.4 PROCESS OF CAPTURE ........... .. ........... ..... ..... ........ ....................... ........................... 19
2.4.1 Capture by Precedence ..... ... ... .. ... ..... ..... .......... .... ... ..... ... .... ..... ........ ... ........ 20
2.4.2 Capture by Overtake .......................... ... .. .... .. ... .... ...... .. ...... .. ............ ... ..... .. . 27
2.5 REVIEW OF THE COUL TRIPGRANGER MODEL. ............. .. ......... ... ...................... ..... 43
2.5.1 Network Description ........... .. .... .. ... .. ........... .. ; ......... .. .. .. .... .... .. .. .... ... .... .. .... .43
2.5.2 Working of the Network .. .. .......... .... ......... .... .. .... .. .... .. .......... .......... .. .. .. ... .. .44
2.5.3 Advantages and disadvantages of the network .. ... .. .. .... .. .. .. ........... .. .. ...... .. .46
2.5.4 Similarity and difference with Kohonen network ..... .... .. .. .... .. ............. ...... .4 7
2.6 PROPOSED MODEL ... ...... ...... ......... ... ..................... .... .... ...... ....... ...... ..... ................ . 47
2.7 PROPOSED TRAINING ALGORITHM ........... .. ........................................................... .48
2.8 ADVANTAGES OF THE PROPOSED MODEL OVER THE OLD MODEL ....................... .. ... 54
2.9 SIMULATIONS .......................................................................... .. ......... . .................. 55
2.9.1 Experiment 1 ... ...... .. ... ............... .... ............. .... .... .. ..... .. ..... ....... ... ... .. .. ..... .. ... 55
2.9.1 Experiment 2 .. .... .... .. ... .. .. .. .......... ..... .... ..... .. .. .. ...... .. .. ............ ...... .. .. ............. 57
2.9.3 Conclusion .. .. .. ... ... .. ..... .. ... ........ ... ...... ... ............. .... ...... ........ ....... ............. .. . 58
2.10 RECONSTRUCTING THE TEMPLATE ................... ..... .................. .. ............. .. ............ 59
2.11 LVQ NETWORK .. ..... ........ ...... ...... ........... ............................................... .. ............ 66
3. CONCLUSIONS AND FUTURE PROSPECTS .................................. .. .................... 70
REFERENCES ........................ ..... ........... .. .................................................... ... ....... ..... ..... 74
IV
LIST OF FIGURES
Figure Page No.
1 GLA MODEL: BLOCK DIAGRAM ............................................................................... 15
2 P[SINGLE CELL WINS ON TWO PATTERNS (NAIVE NETWORK)] ..................................... 19
3 CAPTURE BY PRECEDENCE : N=960 ............ ......................................................... 2223
4 CAPTURE BY PRECEDENCE: N=1920 ..................... .............................................. 2324
5 CAPTURE BY PRECEDENCE: N=3840 ................................................................... 2526
6 CAPTURE BY OVERTAKE: N=960, m=2 ............................................................... 2930
7 CAPTURE BY OVERTAKE: N=I920, m=2 ..................................................... ........ 3031
8 CAPTURE BY OVERTAKE: N=3840, m=2 ........................................................... .. 3233
9 CAPTURE BY OVERTAKE: N=960, m=4 ... ............................................................ 3334
10 CAPTURE BY OVERTAKE: N=1920, m=4 ............................................................. 3536
11 CAPTURE BY OVERTAKE: N=3840, m=4 ................................ ............................. 3637
12 CAPTURE BY OVERTAKE: N=960, m=8 ............................................................... 3839
l3 CAPTURE BY OVERTAKE: N=1920, m=8 ............................................................. 3940
14 CAPTURE BY OVERTAKE: N=3840, m=8 ............................................................. 4142
15 COUL TRIPGRANGER MODEL: BLOCK DIAGRAM ..................................................... .44
16 PROPOSED MODEL: BLOCK DIAGRAM ...................................... ............................... .48
17 TRAINING ALGORITHM: SIMPLIFIED FLOW CHART ................. .............. .. ............. ..... 52
18 TRAINING ALGORITHM: FLOW CHART ...................................................................... 53
19 TRAINING ALGORITHM: TRAIN ROUTINE ................................................................ 54
20 EXPERIMENT 1 .. ......................................................................................................... 56
21 EXPERIMENT 2 ........................................ .............. ........................................ ............. 57
22 BACKPROPAGATION THROUGH FIXED PATHWA Y ....................................................... 60
23 L VQ LAYER: BLOCK DIAGRAM ................................................................................ 67
v
NOMENCLATURE
g number of mitral patches
n number of mitral cells per mitral patch
N number of LOT lines
r LOT activity
P overlap between two patterns
H hamming distance between two patterns
Pmin minimum overlap between the center of the cluster and vector in that cluster
P n P[ single cell wins on two patterns having overlap Pmin between them]
Pcp P [capture by precedence for Pm in]
W weight matrix
L1w weight increment
Wsat saturated weight value
q weight matrix sparsity
p number of piriform patches
h number of piriform cells per piriform patch
B output of the binary encoder
v size of the output of binary encoder
R AND array in PLA
u number of AND gates in PLA
A outputs of the AND gates in PLA
o OR array in PLA
Q output of PLA
vi
CHAPTER 1
ARTIFICIAL NEURAL NETWORKS AND OLFACTION
1.1 Introduction
From the ancient times, the goal of humankind has been to improve the quality of
life. Different sciences and arts are the products of the effort of the humankind to achieve
a goal that can never be fully achieved. Man has developed many machines such as
digital computers to get rid of the tedious tasks and enjoy a more fruitful life. He/she is
still trying to make these machines as fast and efficient as possible. Nowadays engineers
and scientists are trying to develop intelligent machines. Artificial neural systems are
presentday examples of such machines that have great potential to further improve the
quality of our life.
The recent resurgence of interest in neural networks has its roots III the
recognition that the brain performs computations III a different manner than do
conventional digital computers. Although the digital computers outperform both
biological and artificial neural systems for tasks based on precise and fast arithmetic
1
2
operations (rather the computers are meant for that), humans (or for that matter animals)
are much more efficient and faulttolerant than the fastest computers at computationally
complex tasks such as pattern recognition. A neural network's ability to perform
computations is based on the hope that we can reproduce some of the flexibility and
power of the human brain by an artificial means.[l]
1.2 Biological Inspiration
Much of the inspiration for the artificial neural systems comes from biology, or to be
more specific, from neuroscience. However, we are not directly concerned with the
biological neurons or neural networks.
The brain consists of a large number (approximately 1011) of highly connected
elements (approximately 104 connections per element) called neurons. For our purposes,
these neurons have three principal components: dendrites, cell body and axon. The dendrites
(from Greek dendron, tree) are treelike receptive networks of nerve fibers that carry
electrical signals into the cell body. The cell body effectively sums and thresholds these
incoming signals. The axon is a single long fiber which carries the signal from the cell body
out to other neurons. The point of contact of an axon of one cell with a dendrite of another
cell is called a synapse (from Greek syn + haptien, to copulate). It is the arrangement of
neurons and the strengths of individual synapses, determined by complex chemical process,
which determines the function of the neural network. [2]
3
Some of the neural structure is defined at birth. Other parts are developed through
learning. Neural networks continue to adjust throughout the life. These later changes tend to
consist principally of strengthening or weakening of synaptic junctions.
1.3 Artificial Neural Networks
Artificial neural networks (ANNs) are physical cellular systems which can acquire,
store, and utilize experiential knowledge. The neurons that we consider in artificial neural
networks are not biological. They are extremely simplified abstractions of biological
neurons, realized as elements in programs or as circuits made of silicon. Networks of these
artificial neurons do not approach the complexity of the brain. There are, however, two key
similarities between biological and artificial neural networks. First the building blocks of
both networks are simple computational devices (although artificial neurons are much
simpler than biological neurons) which are highly interconnected. Secondly, the
connections between the neurons determine the function of the network.
It should be noted that even though biological neurons are very slow when
compared to electrical circuits (103 s compared to 109 s), the brain is able to perform many
tasks much faster than any conventional computer. In contrasts to a large software program
which operates serially on a conventional computer, the brain operates with large number of
parallelly operating functional paradigms that execute in a more efficient way. The brain is
very resistant to noise and rather robust whereas most computers are fault intolerant [3].
ANNs share this parallel structure. Even though most ANNs are currently implemented on
4
conventional digital computers, their parallel structure makes them ideally suited to
implementation using custom VLSI, optical devices or parallel processors. [2]
The functional model of an artificial neuron or processing element consists of
weighted input connections, a summation function and a non linear threshold function
which transforms the input to an output. In biologically inspired artificial neural networks,
neurons are regular and offer highly interconnected topologies. ANNs are composed of
many nonlinear computational elements which operate in parallel and are arranged in a
pattern reminiscent of biological neural network in which elements are connected via
sparsely connected weights. Neural networks learn to recognize the patterns by adjusting
the weights of connection between processing elements. In its simplest form, an electronic
neuron sums weighted inputs and passes the result through a nonlinearity. A neuron is
characterized by an internal threshold or offset and by the type of nonlinearity.
Neural networks are inherently adaptive in the manner in which they derive meaning
from imprecise, ambiguous and faulty real world data. In an ANN one ofthe basic problem
is how its interconnection strengths i.e. synapses adjust their weights so the resulting neural
networks shows the modified properties of pattern recognition. That is what type of
learning algorithm is adopted? Adaptation or learning is a major area of neural network
research. An example of such adaptation is speech recognition where training data is
limited. Neural network classifiers are nonparametric and they make weaker assumptions
concerning the shapes of underlying distributions than traditional statistical classifiers do.
Such neural network adaptive systems are often described by an energy function or
probability distribution. Their adaptivity and nonlinearity make neural networks best
5
suited to predict, control and optimize such processes as industrial inspection, control,
robotics, market timing, stock picking and credit approval.
ANNs are very effective in processing complex scientific and engineering problems
such as speech processing, pattern mapping, pattern recognition and image recognition.
Neural network models exhibit a number of properties analogous to the brain, e.g.,
generalization, parallel search, learning and flexibility.
Adaptation in neural networks also provides a degree of robustness by compensating
for minor variability in characteristics of processing elements as well as sparse inputs Most
sequential computers are less fault intolerant when compared to the neural networks which
are resistant to noise and robust, meaning that a fault to an individual neuron or even a
group of neurons does not degrade the overall perfonnance of the brain whereas the loss in
one memory location of classical computer memory results in complete system failure. [3 ]
1.4 Olfactory System
In biological neural networks, the modulation of synaptic strength has long been
regarded as the likely mechanism for learning and memory [4]. The long tenn potentiation
(L TP) that is observed in the hippocampus, limbic system, and other cortical structures of
the brain uses a similar mechanism for learning [5]. The synaptic strength due to LTP
changes rather coarsely compared to graded and precise weight changes required by the
artificial neural networks such as back propagation or Kohonen's self learning networks.
How a nervous system might work within such constraints to perfonn useful computation
6
and to learn effectively is an elusive question [6]. Understanding the multiple ways in
which the brain uses neuronal system for pattern processing serves as a key to the design of
a neural network model. Elucidation of the sensory code by which information is
transduced into neural information has been the major goal of investigators for several
decades.
Of all the sensory modalities, the olfactory (from Latin olefacere, to smell) system is
one of the least understood in terms of anatomy and physiology as well as the least studied
for electronic modeling. What might be the simplest possible olfactory neural network
model that is capable of learning and recognizing odors is a question. R. Granger, G.
Lynch, and AmbrosIngerson of U. C. Irvine have reported a potentially useful model for
the investigation of the aggregate network learning and memory properties of olfaction in
behaving animals. This model referred to as the GLA model henceforth, deals with the
interacting structures of the olfactory bulb and piriform cortex that has been observed in rats
[7,8,9]. Computer simulations of this model have been reported to have attractive
computational properties, such as,
• the ability to identify clusters in the input cue environments at various levels of detail,
hence achieving a form of hierarchical clustering
• the extensibility to unsupervised learning
• the ability to detect weak odor obscured by a stronger one or identifying a the
component of a complex odor.
A central feature of this model is the periodic sampling of input odors at the theta
rhythm to which the network response is locked. The theta rhythm matches the rhythm for
7
both the hippocampal firing and the rate at which rats sample odors during learning. With
successive sniffs of inputs, hierarchical clustering and unmasking operations have been
reported to proceed sequentially. On the first sniff, a general class, let us call it as "parent
class', is identified. On the second sniff, a class (child class), derived from the parent class,
is identified. The child class inherits the properties of the parent class and has its own
additional features. On the third sniff, some child class, derived from the child class
discussed above, is identified and so on. In other words, on the first sniff, an abstract class is
identified. All the classes identified in the successive sniffs hang down to this class.
The GLA olfactory model can be nicely distinguished from traditional abstract
neural networks as the olfactory model more closely imitates the nervous systems. The
GLA olfactory model (a tightly coupled network) employs sparsely connected networks of
more elaborate neurons in which substantial information processing occurs within a single
neuron whereas most ANN are typically comprised of densely connected layered network of
simple neurons [4].
The GLA olfactory model shows great promise for the classification of unknown
signal vectors by hierarchical clustering. The clustering of the input signals takes place by
a method refereed to as "process of capture." However the capture mechanism in the
GLA olfactory model has some problems associated with it. R. Coultrip and R. Granger
have discussed a new model for sparse random networks with LTP learning rules that can
have the problems associated with the GLA model partially solved. This model is the
statistically similar to the Kohonen network. The CoultripGranger model is reviewed in
Chapter 2. A new model, which is a modified version of the CoultripGranger model, is
8
presented. The training algorithm for this model and the errors associated with the
reconstruction of the template are discussed at length. It is shown that the newly proposed
model is statistically similar to the Kohonen network. Further, for the proposed model, it
is shown that if an additional layer is added to the network, it can be converted to the
L VQ (learning vector quantization) network. The objective of this effort is to validate the
model by simulations and discuss the additional linear layer to convert the model to L VQ
while maintaining biological plausibility, qucker learning and electronic implementability
1.5 Review of some important topics
Before proceeding to Chapter 2 let us review some of the important topics in
neural networks which are used in the rest of the thesis report.
1.5.1 Hierarchical clustering mechanism
In a hierarchical classification the data is not partitioned into a particular number
of classes or clusters at a single step. Instead the classification consists of a series of
partitions which may run from a single cluster containing all individuals, to n clusters
each containing a single individual. Hierarchical clustering techniques may be subdivided
into agglomerative methods which proceed by a series of successive fusions of the n
individuals into groups, and divisive methods, which separate the n individuals
successively into finer groupings. Both types of hierarchical clusterings can be viewed as
9
attempting to find the most efficient step, in some defined sense, at each stage in the
progressive subdivision or synthesis of the data. With such methods, divisions or fusions
once made are irrevocable, so that when an agglomerative method has joined two
individuals, they cannot subsequently be separated and when a divisive algorithm has
made a split, it cannot be reunited. In other word, the hierarchical clustering suffers from
the defect that it can never repair what was done in previous steps. Since all
agglomerative hierarchical techniques ultimately reduce the data to a single cluster
containing all the individuals, and the divisive techniques will finally split the entire set
of data into n groups each containing a single individual, the investigator wishing to have
a solution with an optimal number of clusters, will need to decide on a particular stage to
stop.[ll]
1.5.2 Associative Learning: Kohonen Network
When a particular pattern (stimulus) is applied to a network, it is trained to
produce some particular pattern as response. This is called associative learning.
Association between the input and output patterns of the network is the fundamental
aspect of associative memory. There are several algorithms to implement associative
learning. Let's review the Kohonen's rule for associative learning which can be stated as
follows:
new old + [ () old] Wi = Wi a P q  Wi for i E X(q) (1.1 )
10
where ex is the learning rate, Wi is the ith column of the weight matrix, p(q) is the qth
input pattern (stimulus). It can be seen from the equation, Kohonen's rule tries to make
the ith column of the weight matrix as similar to the stimulus as possible. [2]
1.5.3 WinnerTakeAll (WT A) Learning Rule
This rule is an example of Competitive learning, and it is used for unsupervised
network training. Typically winnertakeall learning is used for learning statistical
properties of inputs. The learning is based on the premise that one of the neurons in the
layer, say the ith, has the maximum response due to input p. This neuron is declared the
winner.
Let W be the weight matrix (mxn). As we have assumed, let ith neuron be the
winner. As a result of this winning event, the weight vector Wi
is the only one adjusted in the unsupervised learning step. The learning rule can be stated as
follows
new _ old + [ Old]
Wj Wj ex p  Wj for j = i (1.2)
forj:t:. i
wherej = 1,2, 3, ... , m and ex is learning rate, typically decreasing as learning progresses.
The winner selection is based on the following criterion of maximum activation among all
m neurons participating in the competition.
11
W;p = max (wS p)
/=I,1" ... m
(1.3)
This criterion corresponds to finding the weight vector that is closest to the input p. Thus
the rule (1.2) reduces to incrementing Wi by a fraction of p  Wi' After the weight
adjustment, the weight vector corresponding to the winner neuron tends to be better
estimate of the input pattern in question. Weights are typically initialized at random values
and their values are normalized during learning in this method. [2]
1.5.4 Learning Vector Quantization (LVQ) Network
The LVQ network is a hybrid network which uses both unsupervised and supervised
learning to form classifications. It is a two layer network. The first layer of the L VQ
network is a competitive layer and the second layer is a linear layer.
As with the competitive network, each neuron in the first layer learns a protoptype
vector which allows it to classify a region of the input space. In the competitive network,
the neuron with the nonzero output indicates which class the input vector belongs to. For
the LVQ network, the winning neuron indicates a subclass. There may be several different
neurons (subclasses) which make up each class.
The second layer of the L VQ network is used to combine subclasses into a single
class. Let the weight matrix associated with the second layer be w2. The columns of w2
represent subclasses, and the rows represent classes. w2 has a single 1 in each column, with
the other elements set to zero. The row in which the 1 occurs indicates which class the
appropriate subclass belongs to.
(w2 ki = 1) => subclass i is a part of class k
12
The process of combining subclasses to form a class allows the L VQ network to create
complex class boundaries. A competitive layer alone has the limitation that it can only
create decision regions which are convex. But the actual clusters can be concave. [2]
L VQ Network Learning with the Kohonen Rule
The learning in the L VQ network combines competitive learning with supervision.
As with all supervised learning algorithms. it requires a set of examples of proper network
behavior:
Before learning can occur, each neuron in the second layer is assigned to an output neuron.
This generates the matrix w2. Typically equal number of neurons are connected to each
output neuron, so that each class can be made up of some number of convex regions.
Once w2 is defined, it will never be altered. Let a I be the output of the competitive
layer and a2 be the output of the linear layer i.e. the final output. The weight matrix
associated with the competitive layer (Wi) is trained with a variation of Kohonen rule. The
output of the winning neuron, say ith neuron, is set to 1 whereas the outputs of the
remaining neurons are reset to zero. Next al is multiplied by w2 to get a2 which also has
only one nonzero element, say kth element, indicating that p belongs to class k.
13
Kohonen's rule is used to improve the performance of the competitive layer of the
network. If the pattern p is classified correctly, then the weights of the winning neuron are
moved toward p.
I new = laId + [ ( ) _ laid ]
Wi Wi a P q Wi , (l .4a)
If p is classified incorrectly, then the weights of the winning neuron are moved away from
p.
lnew laid [ ( ) laId] ·f f lO Wi = Wi  a P q  Wi , 1 ak = :t: tk = (l.4b)
Equation (1.4) is the modified Kohonen Rule for LV Q networks. As a result of this learning
process, each neuron moves towards vectors that fall into the class for which it forms the
subclass, and away from vectors which fall into other classes.[2]
CHAPTER 2
STATISTICAL MODEL OF OLFACTORY SYSTEM
Before proceeding to the discussion of the clustering mechanism in the olfactory
system, it is worth taking an overview of the original GLA model.
2.1 Overview of the GLA model
2.1.1 Block diagram and basic operation:
A simplified block diagram of the GLA model is shown in Fig. 1.
The olfactory bulb receives excitatory input from the olfactory receptors in a
roughly topographic fashion: receptor cells of a particular type project axons to a
delimited area of the bulb termed as a glomerulus (from Latin glomer, ball). The
aggregate firing rate of receptors of each type is regarded as a component of an input
vector to the system. Each glomerulus is also subject to specific inhibitory feedback from
the piriform cortex and the total activity in the bulb is mediated by lateral inhibitory cells
which perform an effective normalization. The excitatory ("mitral") cells in the glomeruli
14
15
are regarded as twostate devices (either active or inactive) and they have differing
thresholds of excitation. The output of the glomeruli are thermometercoded versions of
the inhibited/normalized inputs, with signal intensity represented by the number of active
cells in the glomerulus. The normalization constrains the bulb so that only some fraction
(assumed to be 0.2) of the mitral cells can become active. The outputs of the mitral cells
project to the piriform cortex via the lateral olfactory tract (LOT).
Olfactory Bulb
LOT
....
System ~ Glomeruli
input /)
___ ~ i~~
~   _/
( : ./ /   . ~..
: c . . ",rn'hlbitory .
feedback
Piriform Cortex
Weight Matrix
~   . ~
/. ~ v
WT A (piriform) patches
       '\~/7      
System Output
Fig. 1 GLA Model: Block Diagram
The LOT forms excitatory synapses with cells in the piriform in a sparse random
fashion: connections appear to be made essentially at random with low probability
(assumed to be 0.1). Piriform cells are also twostate devices and are arranged winner
...
.;.i ...
16
takeall patches in which lateral inhibition insures that only one cell at a time becomes
active. The sparse patterns of piriform activity is the system output.
After stimulation, active piriform cells in tum inhibit the glomeruli in the bulb via
a feedback pathway. Connections are developed according to a correlational learning rule
so that the glomeruli most responsible for the firing of a piriform cell are inhibited most
strongly. Feedforward excitation of the bulb followed by feedback inhibition of the
piriform is repeated at the socalled theta rhythm, to which operation of the system is
synchronized. After the first sniff of the odor, the most active glomeruli are inhibited the
most strongly, allowing expression of secondary components of the input vector on the
next sniff, and so on in a hierarchical fashion during subsequent sniffs.
The scope of this thesis is restricted to the open loop (i.e. feedforward) system.
Learning in the model is based on coactivity: the weights of synapses from active
mitral onto winning piriform cells are incremented when learning is enabled. This
learning rule is called long term potentiation or L TP. Weights saturate, and when fully
potentiated are two or three times larger than the naive weights. Increments are coarse,
comprising 10%20% of the range between naive and saturated weights. There is no
weight decay. The net effect of learning is that the network tends to cluster its inputs: the
output codes for sufficiently similar input vectors become very similar or identical.
Moreover, the feedback inhibition of the bulb by the piriform tends to inhibit the
glomeruli not in proportion to their activity, but rather in relation to the expected activity
of the cluster mean. The net result is that a hierarchical clustering takes place during the
17
multisampling (sniffing) process, in which the initial codes indicate the broad class
membership and subsequent codes, subcluster or narrower class membership. [1 2]
2.1.2 Network description and nomenclature:
There are g inputs to the system. Corresponding to each input there is one mitral
patch consisting of n mitral cells (i.e. n levels in the thermometer code). Size of the
output of the olfactory bulb (LOT vector) is ngx 1. Let us denote the product ng by N. The
activity of the LOT vector (approximately 0.2) is denoted by r.
There are p piriform (WT A) patches. Each patch consists of h piriform cells in a
piriform patch. So the size of the system output will be 1 xhp.
The weight matrix is denoted by W and its size is Nxhp. The weight matrix
sparsity (approximately 0.1) is denoted by q. The weight increments are denoted by ~W.
The naive (untrained) weights are assumed to have a value of 1. The maximum value a
weight can attain is denoted by Wsat. For simulations ~W and W sat are assumed to be 0.4
and 3 respectively.
2.2 Overlap and Hamming distance
Overlap between two patterns (LOT vectors) is defined as the fraction of the total
number of active lines in any pattern active for both the patterns. It is denoted by p.
18
Hamming distance between two binary strings is the total number of mismatching
bits. It can be given by the number of the nonzero bits in the binary string obtained by
XORing the two binary strings in question. The Hamming distance is denoted by H.
It can be shown that H=2Nr(1p).
2.3 Probability of winning on multiple patterns in the naive network
Let us assume that a pattern P! is presented to the network and cell C I from a
particular patch wins. Let us further assume that the network is not yet trained. Now a
pattern P2 is presented to the network. There exists a finite probability that the same cell
CI wins on pattern P2 also. This probability is a function of overlap p between patterns PI
and P2 and the piriform patch size h.
By intuition, for p=O, the probability should be Ilh since patterns PI and P2 will
be totally different and all h cells will have equal probability to win. For p=l, obviously
the probability is 1 since the same pattern is presented again.
The theoretical expression for P[single cell wins on two patterns] are derived in
[12]. Simulations results agree with the theory. Fig. 2 shows the probability curves for
different values of patchsize h. It can be seen that this probability is a function of overlap
p and patch size h. The figure suggests that the piriform patch should have a moderate
number of piriform cells. Because for very small overlap also, there is certain probability
that one cell wins on both the patterns. But as h increases, this probability decreases.
2 0.9
(;
~
e<lJ 0.8 .
<lJ
.i> ii 0.7
c
'fj)
c Qj 0.6 .
:)::
C\J c.. 0.5 .~·
0
~
c 0.4 _
0
en
e
.~ 0.3
ai
u 0.2 ..
~
OJ .ic ii 0.1 .
i:L
o 0.1 0.2 0.3
N = 960, P = 30
number of trials = 3000
0.4 0.5 0.6 07
Overlap (p)
/
/
0.8
/
/
0.9
Fig. 2 P[single cell wins on two patterns (naive network)]
2.4 Process of Capture
19
h=2
h=4
h=8
h=16
As the GLA network starts learning, it acquires the ability to cluster the input
data, i.e., for similar inputs, the output codes become more overlapping although they are
not necessarily so in the naive network. This indicates that some cells capture patterns in
the cluster as learning progresses at the loss of other cells which won in the naive
network. Two types of the capture mechanisms have been identified and are defined
below. [12]
1. Capture by precedence;
' ..
~
20
2. Capture by overtake.
2.4.1 Capture by Precedence
Capture by precedence is defined as the capture by virtue of precedence in the
order in which the cells win during training.
Let us assume that in a particular patch, a cell, C 1, wins on a number of patterns
before some pattern, P, is presented to the network which would elicit the winner from
some other cell, C2, in the naive network. But the network is already trained on a number
of patterns on which the cell C 1 wins. When P is presented to the network, if the
coactivation of C 1 exceeds the coactivation of C2, C 1 will be the winner even if pattern P
is not very similar to the patterns on which C 1 won.
Capture by precedence is desired to some extent since it gives the network the
ability to cluster the input signals. However, this can create problems when it starts
capturing the patterns which are not members of the same cluster.
Simulations were carried out for different sets of parameters, i.e. different values
ofN, h. The general procedure is as follows:
1. Generate a random sparse weight matrix with sparsity q= 0.1.
2. Generate a random LOT vector with activity r=0.2 and present to the network.
Determine the winners. Train the network.
3. Generate a series of vectors with overlaps rangmg from 0 to 1 with the vector
generated in (2) and present to the naive network first. Determine the winners.
21
4. Compare the winners of (2) and (3). The number of different winners is the number of
"capturable" cells.
5. Present the vectors generated in (3) to the trained network. Determine the winners.
6. Compare the winners of (3) and (6). The number of different winners will give the
number of captured cells.
7. P[ capture by precedence] = (number of captured cells)/( number of "capturable" cells)
...
8. Repeat (1 )(7) several times and determine the average of the results.
The simulation results are shown in Fig. 3,4 and 5.
If we see the figures 3.1, 4.1 and 5.1, it can be seen that even for a very small
overlap (except 0) there are some cells captured by precedence. This number is a function
of overlap p, patch size h, as well as the size of LOT vector, N. As N increases (from
figure 3.1 to 5.1) the number of captured cells also increases. This fact is also reflected in
the probability curves shown in figures 3.2, 4.2 and 5.2 which show that the probability
of capture by precedence increases as N increases.
It can be seen that P [capture by precedence] increases as N. It can be shown that a
cell which wins on even a single pattern, if its weights are first ones to be incremented,
will capture any other pattern with nonzero overlap of the first, provided N is sufficiently
large.[12] This restricts the size of the LOT vector. We cannot increase the LOT size
indefinitely.
Figures 3.3, 4.3 and 5.3 show the probability curves for a single cell winning on
two patterns in the trained network. These curves show that as N increases (from figure
3.3 to 5.3) these curves shift upward.
0.9 ~
>.0
"0 0.8
Q)
a~. 0.7 .
(\)
t.>
.!!!. Q) 0.6
Qi g
t.> Q) '0 "00.5 .'
:It; ~ m Q)
 0..0.4 . .8
'0
c
o
~
~
IJ...
0.3 .
0.2 .
N = 960, P = 30
number of trials = 3000
~ .
Overlap (r)
' \
0.8 09
Fig 3.1 Capture by Precedence: N = 960
N = 960, P = 30
number of trials = 3000
1 ~. __ . _ ______ _ _ ~ .~ _.
0.9 ..
0.8 ..
Q)
t.> 0 . 7~
c
Q)
"0
~
~
06_
~ 0.5 _
.0
~
::l a.
(\)
..£
c..
0.4 .
03_
02 _
0.5 0.6 0.7 0.8
Overlap (P)
1/:
' r
(:
;."
0.9
Fig 3.2 Capture by Precedence : N = 960
.' h=2
h=4
h=8
_ h=16
~_.~~_ h=2
__ _ h=4
...... h=8
._ . _ h=16
22
,
N = 960, P = 30
number of trials = 3000
"Q)
c
0.9 .
~ 0.8 ... en _______ ~
E 07 . /
~ .~
o0. . :::::: 0.6 _____ ~  ~~
~c /"
c ~ 0.5 ./
~ Q)
C c 0.4 .
. ~
./
0.3 .. ./
/
0.2 .
0.1 .. ... .
o __ . ___ .. _ . _ __ ._. _____ _
o 0.1 0.2 0.3 0.4 0.5
Overlap (p)
/
0.6
/
/
0.7 0.8 0.9
Fig 3.3 Capture by Precedence: N = 960
1 ___ .
0.9 _
0.8 _
0.7 _
en
~ Q) 0.6 _
u u
~ c
ro Q)
.:: ~ 0.5 ..
o u
:;t: ~ ro 0.. 0.4 .,
B
'0 0.3 ':'
c
o
U
~
LL
0.2 
0.1
0.2 0.3
N=1920,p=30
number of trials = 3000
0.4 0.5 0.6 0.7
Overlap (P)
0.8 0.9
Fig. 4.1 Capture by Precedence: N = 1920
. . h=2
h=4
h=8
h=16
__._ h=2
. . _ h=4
.. h=8
_ . _ h=16
23
0.9
0.8
Q)
U 0.7 c
Ql
"0
Ql 0.6  U
Ql a. > 0.5
.0
Ql
' OA ::J a. ro
u 0.3 a::
0.2
o D.1 0.2
0.2
0.3
N = 3840, P = 30
number of trials = 3000
0.4 0.5 0.6 0.7
Overlap (p)
0.8 0.9
Fig 4.2 Capture by Precedence: N = 1920
0.3
N = 1920, P = 30
number of trials = 3000
0.4 0.5 0.6 0.7
Overlap (P)
0.8 0.9
Fig 4.3 Capture by Precedence : N = 1920
h=2
h=4
h=8
h=16
_ ___ h=2
_ _ _ h=4
____ _  h=8
_ h=16
24
..
~,
'" S.. l
i
;:I
~
~ •· · ~
~
' . ..
~..
)..
~
C
1
0.9 _
>
.0
"'0 Q) 0.8 :s
0. ro 0.7
u
en
~ Q) 0.6 _
u u m a5
.:: al 0.5 
o u
:;t: Q) ro C. OA .
B
'0 0.3 .
c
.2
13 0.2 .
~
LL
N = 3840, P = 30
number of trials = 3000
 ~~:... '
,/ '
0.1  ~
O ~~~~ 3~~~:~~~:=. _..__ __ ~ ___.~ _~ .___ _._. ._ .~ .
o 0.1 0 .2
1 __
o 0.1
0.3 04 0.5 0.6 0.7 0.8 0.9
Overlap (p)
Fig 5.1 Capture by Precedence: N = 3840
0.2 0.3
N = 3840, P = 30
number of trials = 3000
0.4 0.5 0.6 0.7
Overlap (p)
0.8 0.9
Fig 5.2 Capture by Precedence: N = 3840
__._. h=2
h=4
h=8
h=16
_ _ h=2
__ _ h=4
. . .. . h=8
_ _ __ h=16
25
.~ ...
..
Qj
u
Cll
Ol
c
'iii
CL
1):3 _
02
0.1
. ~ .
0 __ _
o 0.1
To summarize:
0.2 0.3
N = 3840, P = 30
number of trials = 3000
/
0.4 0.5 0.6 0.7
Overlap {P}
0.8
Fig 5.3 Capture by Precedence: N = 3840
0.9
__ h=2
h=4
h=8
h=16
26
• even for a very small overlap (except 0) between the input patterns, there exists a
finite probability of capture by precedence.
• capture by precedence is dependent on the size of the input vector (N)
• it is also dependent on patchsize h
We will be using these figures in the training algorithm for the proposed model.

27
2.4.2 Capture by Overtake
Let pattern P I elicit a winner from cell CI and patterns Pk (k = 2, 3, ... , m+ 1) elicit
winners from cell C2. Let's further assume that P I and P k each appear once per training
epoch during learning and there is no capture by precedence during first training epoch.
For cell C2 to capture pattern PI> the growth rate due to the input to C2 for PI must exceed
the growth rate of the input to C I for P I during training. Capture by this mechanism is
defined as capture by overtake.
Capture by overtake is desired to some extent as it gives the ability to the network
to capture the patterns that were not classiiied correctly during the first training epoch.
But it can be undesired if it allows capture of a pattern which is not very similar to the
patterns eliciting winners from the capturing cell. Capture by overtake can also become
more problematic if the training data is not fair, i.e. some clusters are overrepresented
whereas others are underrepresented.
Simulations were carried out for different sets of parameters, i.e. different values
ofN, h, m. The general procedure is as follows:
1. Generate a random sparse weight matrix with sparsity q= 0.1.
2. Generate a random LOT vector with activity r=0.2 and present to the network.
Determine the winners. Train the network m times.
3. Generate a series of vectors with overlaps ranging from 0 to 1 with the vector
generated in (2) and present to the naive network first. Determine the winners.
28
4. Compare the winners of (2) and (3). The number of different winners is the number of
"capturable" cells.
5. Present the vectors generated in (3 ) to the trained network. Determine the winners.
6. Compare the winners of (3) and (6). The number of different winners will give the
number of captured cells.
7. P[capture by overtake] = (number of captured cells)/(number of "capturable" cells)
8. Repeat (1 )(7) several times and determine the average of the results.
The simulation results are shown in figures 513 .
If we compare figures 6.1 , 9.1 and 12.1 (N constant, m is varied), it can be seen
".
that as m increases more and more cells are captured for all values of overlap p (except 0
and 1). For m= 1, these curves should be identical to capture by precedence curves. It
shows that overrepresentation of some cluster can create error in classification of the
patterns in other clusters.
If we compare figures 6.1, 7.1 and 8.1 (m consta.Tlt, N increases), it can be seen
that as N increases, a fewer number of cells are captured by overtake for overlap p < lim
and more number of cells are captured for overlap p> 11m. As N increases the P[ capture
by overtake] curves (figures 6.2, 7.2, 8.2) become more and more sharp. This is reflected
in the P[single cell wins on 2 patterns (trained network)] curves (fig. 7.3, 8.3, 9.3) also.
"0
~
:::J
D..
ro
u
en
Q)
u
'0
Qj
.c
E
:::J c
"iii .8
(!)
:..5..
0
c
0
t5
~
LL
(!)
.::.:.
ro
t
(!)
>
0
.>c
0.9
0.8
0.7
0.6
0.5
0.4
03
0.2
0.1
..
N = 960, P = 100, m = 2
number of trials = 5000
0 ___ ___ _ "_ ___
Q) 0.7 .
.::.:.
~
(!) o>
0.6 _
£ 0.5 .
~
:::J
D..
ro
u
D:'
0.4 .
0.2_
o 0.1 0.2 0.3 0.6
Overlap {P}
Fig 6.1 Capture by Overtake : N = 960, m =2
N = 960, P = 1 00, m = 2
number of trials = 5000
,,;
// o ~ ~ ____ ____ ._ ..... _ ... _ _ _ _ __ __ . ___ ___ __ . ___ _
Overlap {P}
Fig 6.2 Capture by Overtake: N = 960, m =2
h=2
h=4
h;::8
h=16
.. h=2
h=4
h=8
_ h=16
29
, '
0.9 .
"0
QJ
gC 0.8 .
'ril
c 0.7
Q)
'iii
0.. 0.6
o::=:::
~=
c 10.5
o QJ
.~ c 0.4 . :::
Qi 0.3 ()
QJ
OJ
c 0.2
(j)
CL
0.1
a
a
1 _ ..
QJ
~ 0.9 .
t
QJ 2i 0.8 _
.>c"
0 0.7 ..
~
::J
li 0.6 _
tu
()
!Q!2i 0.5 .
() o 0.4 
=It
cBo 0.3 _
'0
c 0.2 .. o
U
~ 0.1 ._
LL
a
0.1
0.1
02
0.2
0.3
N = 960, P = 100, m = 2
number of trials = 5000
. ' /
/ /
/ /
./
0.4 0.5
//' ~./;'
/// /
/ . / .
/ ./
// //
.. 'f
/
" /
0.6 0.7
Overlap (P)
,:/',"/
0.8 0.9
Fig 6,3 Capture by Overtake: N = 960, m =2
0.3
N = 1920, P = 100, m = 2
number of trials = 5000
/'
0.4 0.5 0.6 0.7
Overlap (P)
0.8 0.9
Fig 7,1 Capture by Overtake: N = 1920, m =2
h=2
h=4
h=8
h=16
_ ___ h=2
_ _ h=4
h=8
__ ~ h=16
30
1\ it
Q)
.0£
CO
t
Q)
>
0
n> 
~
::J a.
CO
u
(L
0.9
0.8
0.7
0.6
0.5
04
0.3
0.2 _
0.1
0
0 0.1 0.2 0.3
N = 1920, P = 100, m = 2
number of trials = 5000
0.4 0.5 0.6 0.7
Overlap {P}
0.8 0.9
Fig. 7.2 Capture by Overtake: N = 1920, m =2
0.3 _
0.2 ._
0.1 .:..
o ~~_. _ __ . _ . _ .
o 0.1 0.2 0.3
N = 1920, P = 100, m = 2
number of trials = 5000
/
/
/ 1
/ / , /
I
.. ... ~~~~~
0.4 0.5 0.6 0.7 0.8 0.9
Overlap {P}
Fig 7.3 Capture by Overtake: N = 1920, m =2
h=2
h=4
h=8
h=16
__ h=2
h=4
h=8
h=16
31
0.9
>
..Cl
a
(j) 0.8 :s
Q.
ro 0.7
u
(/) 1! (j) 0.6 .8g
ro (j)
.:: ~ 0.5
o u
:;:t: ~
:§ Cl.. 0.4
.8 a 03
c
o
U
~
LL
Q)
.:.:. ro
t::::
Q)
>
0
>
..Cl
~
:::J
Q.
ro
~
CL
0 .2
0.1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
02
0.1
N = 3840, P = 100, m =2
number of trials = 5000
/
'/
    { /
:;/
//1 /~//
,'/
o '" o o
o 'o"
o
~ 0 '0"
N
ci
N
0
'N" o 'o" '<"t
o
'o"
Overlap (P)
Fig 8.1 Capture by Overtake : N = 3840, m =2
'N0" '0" ''0""
N = 3840, P = 100, m =2
number of trials = 5000
....
0
~
.'."..
0
/,
/
/
!
j:
'I/'
'0" ''0""
Overlap (r)
co '" rci
'" 0
0 'r"
0
CD o U')
CD o
Fig. 8.2 Capture by Overtake: N = 3840, m =2
h=2
h=4
h=8
____ h=2
h=4
_ h=8
32
TI
OJ
C
0.9
~ 0.8
(j)
c 0.7
§
ro a. 0.6
o::=::
~c
c ~ 0.5
o OJ
~ c 04
~ .
Qi u 0.3
OJ en c 0.2
~
D
0.1 ~
0 .. __ ...... _ . .. . __ .... _
>
..Cl
TI
~
:::J
15..
(1J
u
'0
c
.Q
t5
(1J u:
o
1 .
0.9
0.8 .
0.7 ~
0.3 ~
0.2 ~
0.1 ~
0
0
'o"
o
0.1
N
o 'N"
6
N = 3840, P = 100, m = 2
number of trials = 5000
     "_ ..   ..  
'" ": '", '" '" CD '" r
'" 0 ": 0 '" 0 '" ci
ci "" ci 0
Overlap (P)
'r" c"'i "'"' '0" ci 0
Fig 8.3 Capture by Overtake: N = 3840, m =2
N = 960, P = 100, m = 4
number of trials = 5000
/
/
;, / ..
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Overlap (P)
Fig 9.1 Capture by Overtake: N = 960, m = 4
''c""i
h=2
h=4
h=8
. _. h=2
h=4
h=8
.. __ h=16
33
\
I
1
0.9
0.8
Q) 0.7
.::.::.
(U
1:: Q) 0.6
>
0
.>Q O.S
~
:::J 0.4 i5.
(U
u
CL 0.3
o 0.1
0.3 _
Cc l 0.2 _
iii
CL
0.1 _ ./
0.2
N = 960, P = 100, m = 4
number of trials = 5000
./
III ! ;"
/'
I'
f
I
0.3 0.4 O.S 0.6 OJ
Overlap {P}
0.8 0.9
Fig. 9.2 Capture by Overtake: N = 960, ill = 4
/
N = 960, P = 100, m = 4
number of trials = 5000
/
/ " , " /
/
" " / I ' /
/
/
o _ __ _ . __ , _ _ ~ _ __ . ___ __ _
o 0,1 0.2 0,3 0.4 O.S 06 0.7 0,8 0.9
Overlap {P}
Fig 9.3 Capture by Overtake: N = 960, ill = 4
h=2
h=4
h;::8
h=16
____ h=2
_ _ _ h=4
h=8
_ __ h=16
34
\
\
1
09
>
..0
"0 0.8
~
:J am. 0.7
()
(})
~ 0.6
() (]) ro.O£
Cl. ~ 0.5 o %!
'1:1: 0 ro 0.4 .
:§
'0 0.3
c
0
t5 0.2 .
~
IJ...
0.1
o _ _ ~~~/
a 0.1 0.2
I
/
N = 1920, P = 100, m = 4
number of trials = 5000
/
'..
/
/
/
/ "
!. '  
.__._ .. _
0.3 0.4 0.5 0.6 0.7
Overlap (r)
~
'.
0.8 0.9
Fig 10.1 Capture by Overtake: N = 1920, ill = 4
0.9
0.8 _
~ 0.7 _
<U
t
Q) 0.6_
o>
li' 0.5 _
~
:::l 0.4 . c..
<U
U 0.3_
i5:'
0.2 
0.1 .
o 0.1 0.2 0.3
N = 1920, P = 100, m = 4
number of trials = 5000
0.4 0.5 0.6 0.7
Overlap (p)
0.8 0.9
Fig. 10.2 Capture by Overtake: N = 1920, ill = 4
h=2
h=4
h=8
h=16
_. __ ___ h=2
h=4
h=8
h=16
35
....
"0
Q)
c
0.9
~ 0.8
rn
~ 0.7,.,
~
Q. 0.6
0:;::::: i:c
c 10.5
o Q)
~ c 0.4
3:
Q) u 0.3
Q)
OJ
c 0.2 'en
CL
0.1
o
o

/
0.1 0.2 0.3
N = 1920, P = 100, m=4
number of trials = 5000
0.4 0.5 0.6 0.7 0.8
Overlap (P)
0.9
Fig 10.3 Capture by Overtake: N = 1920, m = 4
N = 3840, P = 100, m = 4
number of trials = 5000
1 _ _ ~~_._~_ , _ _ .
0.9
>. .c
"0 0.8 ~ ~
:::J
Ci. C\l 0.7 _ u
(/)
~ 0.6 _
1il~
.;: ~ 0.5
o OJ
=l:t: ~
(ij 0.4 _
:§
'0 0.3 
c
0
U 0.2 _
C\l
U:
0.1 ~.
0
0 0.1
!
. I / /
0.2 0.3 0.4
/
~~ " ' ", ", \
~ ' \\ , ,
0.5 0.6 0.7
Overlap (P)
....':~ ~
\.:
~
 ~'  '
0.8 0.9
Fig 11.1 Capture by Overtake: N = 3840, m = 4
__ h=2
h=4
h=8
h=16
 ______ h=2
~ _ _ h=4
 h=8
_ h=16
36
m ~
CO
t
Q)
>
0
>
.0
~
::J
C.
CO
£
a.
Qj
U
~
OJ c
·iii
Q:"'
0.9
O.B
0.7
0.6
0.5 
0.4
0.3 
0.2 / i / 0.1 _ )
o ____ ~.~_ ..... _. ___ _
o 0.1 0.2 0.3
N = 3840, P = 100, m = 4
number of trials = 5000
~  
/
If
l'
l
0.4
 ;:.; 
/f
y 
0.5
Overlap (p)
0.6 0.7 O.B 0.9
Fig. 11.2 Capture by Overtake: N = 3840, m = 4
0.3 
0.2 
0.1 ..
0
0 0.1
/
/
0.2
/ / 1
. I
: I
/
0.3
N = 3840, P = 100, m =4
number of trials = 5000
0.4 0.5 06 0.7
Overlap (P)
O.B 0.9
Fig 11.3 Capture by Overtake: N = 3840, m = 4
h=2
h=4
h=8
h=16
______ h=2
___ ._ h=4
 h=8
_ h=16
37
Q)
r'o"
1::
Q)
>
0
>
.0
"0
~
::) a. ro
()
!!1
Qi
() a
'It:
ro
:§
a
C
02
U
ro u:
Q)
r'o"
1::
Q)
>
0
>
.0
(J.) :;
Q.
co
u
ii"
0.9
0.8
0.7
0.6
0 .5 /
/.
0.4 <0 '
/' ./
0.3 j/ I
:/
/. __ 0'///
0.2 II ." / y
N = 960, P =100, m = 8
number of trials = 5000
./ "
"
'.
"
,
"
" "
"
'. ~ \
0.1 / /
. / ~\
0 ~.. _.__ .. . . 
0 "' '" N "' (") "' ..,. "' "' "' <D "' r "' co 0 0 0 N 0 "' '" "' (") 0 .,. 0 "' =:; <D 0 " 0 co 0 '" 0 0 0 0 0 0 0 0 0 0
Overlap (P)
Fig 12.1 Capture by Overtake: N = 960, m = 8
1 ______ . _ . ___ .....
0.9
0.8
0.7 _
0 .6 
0 .5 
0.4 _
0.3 _
0.2 .
0.1 
,:/
1,'0
1,/
10 ;
II
I!
I~/
o ~:,;:Y ~ __ ;. _ .. ~. __ ~ __ ;.
o "o '
o
N o
N = 960, P = 100, m = 8
number of trials = 5000
Overlap {P}
'a" 6"'
Fig. 12.2 Capture by Overtake: N = 960, m = 8
~ \
h=2
h=4
h=8
h=16
. ___ h=2
_ __ h=4
h=8
_ . . h=16
38
.i. ...
"0
Q)
c
0.9
~ 0.8
(j)
c 0.7_
Q;
~
D.. 0.6.
0:::::
~~
c ~ 0.5
o Q)
.~ c OA .
~
Qi U 0.3
Q)
Ol
c 0.2
iii
0:::.
0.1
0
0
/
/

0.1
1//
/ /
/ ,'/
/ . /
. /
N = 960, p = 100, m = 8
number of trials = 5000
 ~.~~
/'
~/ / <
/ .'
. /
: /
0.2 0.3 OA 0.5 0.6 0.7 O.S 0.9
Overlap {P}
Fig 12.3 Capture by Overtake: N = 960, m = 8
N = 1920, P = 1000, m = 8
number of trials = 5000
1 ~ ... __.. ...... __ _ ~_ .. . ___ ~ __ ~ ___ ... . ~ ... ~ _ .. _ __ ... ~ __ .. ~_ .. __ ~. ~ .~~
0.9 ~
.>a
"0 O.S _
~
::J
E. C\l 0.7 ~
U
<Jl
~ 0.6 ~
B~
_~_C \t l 0.5 ~
o Q)
'It 1)
c
o
U
C\l u:
OA ~
0.3 ..
0.2 ..
o 0.1
/
. / ! / '.
\
0.2 0.3 OA 0.5 0.6 0.7 O.S 0.9
Overlap {P}
Fig 13.1 Capture by Overtake: N = 1920, m = 8
h=2
h=4
h=8
h=16
__ _.. _~ h=2
.. h=4
 h=8
_ h=16
39
0.9
0.8
(jj' 0.7
.0<: co
t a.> 0.6
>
0
> 0.5 .0
~
::l Q. 0.4
co
u
CL 0.3
0.1
a
a 0.1
I
1/ ,,'
N = 1920, P = 100, m = 100
nu mber of trials == 5000
0.2 0.3 0.4 0.5 0.6 0.7
Overlap (P)
0.8 0.9
Fig. 13.2 Capture by Overtake : N = 1920, ill = 8
N = 1920, P = 100, m = 8
number of tnals = 5000
1 . _ __ ._. __________ _ . ==,;::;~.=
..... """'/ :/
/" / ,,'
/ /j
/ / ~;:
/ / / , /
/ . .
/ " /
/ / ," /
/ I ,'.
0.. 0.6 _ / / : I
~:? . / ,I,' .'
§ O 5 / ,.1 ~O ~
0.9 _
0.8
0.7 _
CJ) a.>
c c 0.4 _ '3:
/ .
/ ," /
I ' !
• I
0.3 _
. ./
o 2 ~ , /
/
0.1 ' /
0 _ _· _ _. _ _ _
o 0.1 0.2 0.3 0.4 0.5 0.6 0.7
Overlap (P)
0.8 0.9
Fig 13.3 Capture by Overtake: N = 1920, ill = 8
h=2
h=4
h=8
h=16
_ h=2
_ h=4
__ h=8
 h=16
40
0.9
>
.0
"0 0.8
~
:::J
0.. CIl 0.7 u
en 1! 0.6.
U Ql
ro~
0.. t:: 0.5 .
o~
=It 0 m 0.4 _
:§
c
o
t5
CIl u::
0.3 _
0.2 
0 .1
o 0.1
1   _. 
0.9 _
08 
Q) 0.7 _
.:£ ro
t:: 06 Ql 
>
0
> 0.5  .0
~
:::J 0.4 _ 0..
ro
U
Ci:' 0.3 _
0.2 
/
/
0.2 0.3
N = 3840, P = 100, m = 8
number of trials = 5000
0.1 0.5 0.6 0.7
Overlap (p)
0.8 0.9
Fig 14.1 Capture by Overtake: N = 3840, m = 8
N = 3840, P = 100, m = 8
number of trials = 5000
0.5 0.6 0.7
Overlap (P)
0.8 0.9
Fig. 14.2 Capture by Overtake: N = 3840, m = 8
h=2
h=4
h=8
h=16
_ ____. h=2
h=4
h=8
h=16
41
0.9 .
g 0.8
V;
E 07 .
<J.>
~ a.. 0.6
o::=::
~= c ~ 0.5 .
o <J.>
(j) C
.S: 0.4 .
~
Q)
0 0.3
~
OJ c 02
'00
CL
0.1
o _ _ .
0
To summarize:
0.1
/
/
0.2 0.3
N = 3840, P = 100, m = 8
number of trials = 5000
,   
0.4 0.5 0.6 0.7 0.8 0.9
Overlap (P)
Fig 14.3 Capture by Overtake: N = 3840, m = 8
h=2
h=4
h=8
h=16
42
• as m increases, more and more cells are captured for all values of overlap p between
• for the same value of m,
 as N increases, capture by overtake reduces for p < 11m
 as N increases, capture by overtake increases for p > 11m
 p = 11m (approximately) is the point of intersection of these curves
In other words, the curves of probability of capture by overtake become steeper as
N increases.
 as N increases, the patch size h has less effect on capture by overtake, for very
large N, capture by overtake is almost independent ofN.
43
2.5 Review of the CoultripGranger Model
Robert L. Coultrip and Richard H. Granger [10] have suggested a model for
sparse random networks with L TP learning rules. The main feature of this model is that it
implements linear separability by statistical means.
2.5.1 Network Description
The basic network is shown in figure 15. The model consists of a single layer of
,
)
~
cells that is referred as the principal layer. The excitatory cells in the principal layer •
(principal neurons) receive sparse random innervation by two distinct afferent pathways,
a fixed pathway and a plastic pathway. Axons on the principal neurons form the outgoing
response pathway. Input pathways are assumed to have some fixed percentage of active
lines when the input is present. The inputs to the fixed and plastic pathway may not be of
the same dimensions. Although two sets of WT A patches are shown in the block
diagram, there is only one set of WT A patches. During training mode (to be discussed
shortly) it is used by fixed pathway whereas during performance mode it is used by
plastic pathway.
44
p patch I patch 2 patch 3 patch P Fixed pathway
:J(
_ .0 1 WfA
 .fh  t[. .
J
="'l=> JO',h 1 p~'h 2 1"toh 3 patchp
Plastzc pathway
Fig 15 CoultripGranger model: Block diagram
2.5.2 Working of the network
The working of the network can be divided into three modes:
1. Preset mode;
2. Training mode;
3. Performance mode.
45
1. Preset Mode
In the preset mode, the input is impressed on only the fixed pathway. WTA
competition selects one cell per patch to fire. The number of distinct input stimuli Jj is
assumed to be some finite number Nf These stimuli on the fixed pathway produce
responses rj .
Since the weights in the fixed pathway are never altered (hence the name fixed
pathway), whenever Jj is presented to the it, it will always produce the response rj . In
other words, for every Jj there is a unique rj which represents the same information as Jj
but with different dimension.
2. Training Mode
The general training algorithm for supervised learning of the CoultripGranger
model is described as following:
The network is presented with (x,y) pairs where x is training datum and y is its
class. The network learns the association. The training data (x) is presented to the plastic
path. The class information (y) is presented to the fixed pathway. So we have to train with
fixed path present, to direct the net what class the datum is supposed to be trained into.
Let there be Nj possible stimuli representing classes. These stimuli are the class
information to be presented to the fixed pathway. One of these NI stimuli is presented to
the NJ lines constituting the fixed pathway and WT A action selects one neuron from each
patch. Then the stimulus Pj (training pattern) is presented to the N2 lines constituting the
plastic pathway, and the synapses between active plastic pathway fibers and winning
46
principal neurons are strengthened via long term potentiation, L TP. That is, f j alone
decides which cell in each WT A patch can learn and Pj alone decides which synapses on
the winning cell are potentiated. This is desirable because this guarantees that the
changing plastic path synapses do not alter the winners selected for some fixed path input
fj over time. Since inputfj always produces output ri , we can think of this as training the
net on an associative inputoutput pair (Pj , r).
3. Performance Mode
In the performance mode, an input is presented to the plastic path and nothing is
presented to the fixed path. The output of the WT A patches will indicate the code of the
class to which this input belongs.
2.5.3 Advantages and disadvantages of the network
Since the fixed pathway determines the cells to be trained, there is no undesired
capture by precedence. But still there will be some undesired capture by overtake in the
performance mode if the training data set is not fair.
Also, since class information is presented to the fixed pathway, a priori
knowledge of the classes/clusters is necessary.
47
2.5.4 Similarity and difference with Kohonen network
In a Kohonen network, the weight vector becomes more and more similar to the
input vector as the learning progresses. In other words, component of the weight vector
along the input vector increases as the learning progresses thereby reducing the size of the
error or difference vector. In CoultripGranger model, the component of the weight vector
along the input vector increases as the network is trained but the size of the
error/difference vector remains same. Still the angle between the input vector and the
weight vector reduces.
Like the Kohonen network, the class information is needed before training can be
initiated. However the learning rule (L TP) is totally different from the Kohonen rule.
2.6 Proposed Model
In the proposed model, an attempt has been made to overcome the disadvantages
of the CoultripGranger model.
F or our purpose, we have proposed two changes for an electronic implementation
of a modified CoultripGranger model. The modifications in the model are as follows:
1. The fixed and plastic pathways are assumed to be exactly identical. The sparsity of
these weight matrices is assumed to be q.
2. The same input is presented to both the pathways. A fraction r of the N input lines
(i.e. Nr lines) are active at any time the input is present.
48
The network is shown in Fig. 16. In this network, unlike the CoultripGranger
model, there are two sets of WT A patches.
patch 1 r> patch 2 patch 3 patch p FlXed patln1/ay
I !
I
N I
I
I
I
I I I
I 1 patch 1
~
patch 2 patch 3 patchp
Jl~ ~ ~lJ ~~~~~ ~~
i l_ ~[/U_, {L_,_  r~~~~ WTA WTA I WTA · ...... WTA tr tfLtfL_.• ~.. 'if
Plastic pathway
Fig. 16 Proposed model: Block diagram
2.7 Proposed Training Algorithm
The input signal is presented to both, fixed and plastic, pathways. The winners of
both the pathways are determined. The outputs of the winning cells are set to one and
other outputs are set to zero. The output binary strings of both the pathways are XORed
49
and sum of the bits in the resultant string is computed. Let 'z' be the sum of the bits in the
resultant string.
We define a cluster as a set of vectors having a hamming distance less than or
equal to some predefined distance from the center of the cluster. In our network, the
vector presented first becomes the center of the cluster. Let H be the allowable hamming
distance from the center for any vector to belong to the cluster, i.e., the maximum number
of mismatching 1 's between the center of the cluster and any other vector in the same
cluster is H12. This condition sets the minimum required overlap of active lines between
the center of the cluster and any other vector in the cluster. Let the minimum required
overlap be 'Pmin'.
H
Nr
_=2=_ = 1~
Pmin = Nr 2Nr (2.1)
As we have seen earlier (section 2.3) , for the naive weight matrix, there exists a
finite probability, Pn, that a single cell wins on two different patterns having an overlap
Pm in between them. P n is a function of the patch size h as well. The larger the patch size,
the lower is P n' From this information we can determine the overlap between the outputs
produced by the two vectors when presented to the naive weight matrix.
output overlap
number of matching I' s between output strings
number of patches
number of patches x Pn
number of patches
50
The vectors producing output codes having an overlap greater than or equal to P n
with the output code produced by the center of the cluster, by definition, belong to that
cluster.
Now consider the plastic pathway. Let's assume that the network is already
trained on one of the input vectors. Now a new vector is presented to the network. There
are two possibilities: either this vector belongs to the cluster defined by the first training
vector or it does not. To detennine this, we will have to set some criterion based on the
outputs of the two weight matrices (fixed and plastic). We know that there exists a
probability that a single cell wins on two patterns after the network is trained on the first
pattern. Let this probability be Pt for the overlap between the two input patterns equal to
Pmin which is defined above. This probability is greater than P n (defined above) because
of the fact that some of the cells are captured by "capture by precedence." As we have
seen, The probability of capture by precedence is a function of the overlap between the
two patterns. Let P[capture by precedence] = Pcp for P = Pmin' So the number of captured
The network does not know the overlap between the two input vectors, but it can
calculate the number of captured cells. If the output strings of the two WT A patches are
XORed, the number of nonzero bits, Z, is twice the number of captured cells. Which
gIves
51
number of captured cells = z/2
This number is a function of the overlap P between the two input vectors, the
present input and the input on which the network was trained. If zl2 :::: pXPcp(1Pn), it
indicates that P < Pmin and the vector belongs to the cluster and training is needed. If zl2 <
pxP cp(lP n), it indicates that P mayor may not be greater than Pmin (Refer to figures 3.1,
4.1 and 5.1). This indicates that the network mayor may not have been trained on the
same or similar pattern.
To determine, if the network has already been trained on the present pattern, the
activations of the winning cells producing matching I' s should be checked as discussed
below. The winning cells producing matching 1 's are termed as "common winners".
Out of p winners, let z' winners be common. For the fixed pathway, let the
activations of the p common winners be Xjh xf2.' ... , xjz" For the plastic pathway, let the
activations of the p common winners be Xpb xp2' ... ,xpz "
The differences between the activations x pi and xfi are calculated. If the network
has already been trained on the present input or on the center of the cluster it belongs to,
all the coactive synapses must be trained at least once, i.e., if the network has been trained
on the same input,
Xpi  X{z :::: ~ W . xfi ' i= 1, 2, ... , z', (2.2)
or if the network has been trained on the center of the cluster the present input vector
belongs to,
Xpi  xfi :::: Pm in . ~W . Xfi' i= 1,2, ... , z '. (2.3)
52
If the above condition is satisfied, training is not needed. Otherwise training is needed.
On the other hand, if the sum of the bits in the ANDed string is less than PI m.
obviously, training is needed.
The training algorithm discussed above is also shown in the flowcharts (fig. 17,
18 and 19). Fig. 17 shows the general training algorithm. Fig. 18 is the expanded version
of the flowchart in fig 17. Fig. 19 is the expanded version of the function TRAIN in fig.
18.
Take the fi rst input
' 11  ,
I L. _ . . __ _
Is the
network
trained on the
present input or the
on the center of the
cluster it
belongs to?
Take next input
. ~ " A  .....
I
Yes
INo ___.. _ _L ~ __ .__ _ Train the network
Fig. 17 Training algorithm: Simplified flow chart
T
Present the input to the
fixed pathway and
determine the winners
y
Set outputs of the
winning cells to I and
reset other outputs to O.
Let this binary string be
'A'
v
1RAINING ALGORITHM
Take the first input
'Y _
C = A exor 8
. _ _ 'L __ __ .
z = no. of nonzero bits in C
 y .
Present the input to the
plastic pathway and
determine the winners
v
Set outputs of the
winning cells to I and
reset other outputs to O.
Let this binary stn ng be
'8'
v
53
Take the next input
A.
Train the
network
~ It. 
Fig. 18 Training algorithm: Flow chart
/
/ Check for all
/ k's and C; 's
TRAIN
START
y
Take the input signal
's'
~ ~ ~~ ~ ~ ~ Y ~ ~
Present the input to the fixed
pathway. Determine the
winning cells. Let these cells
be c = [c\ Cz ..... cm]
 Y .
Present the input to the
plastic pathway
k = 1,2, ... , N ..
\ i = 1, 2, ... , m /
."'~ ~ ~_/
wk.c;= wk,c;+ ~w;
if (Wk.c;>Wsat) Wk,c;= W sat;
Fig 19 Training algorithm: TRAIN routine
2.8 Advantages of the proposed model over the old model
1. Capture by Precedence
54
In the proposed model, the cells to be trained are always determined by the
winners in the fixed pathway (naive weight matrix), there will not be any undesired
capture by precedence.
55
2. Capture by Overtake
The proposed model is not affected by capture by overtake since before training
the network, we are checking if the network was trained on similar pattern. If it has been
trained, we are not training it again by rule.
3. A priori knowledge of the classes is not needed
Since we are presenting the same data to both the pathways, a priori knowledge
of the classes/clusters is not needed.
3. Control over cluster dimension
Since the cluster dimension is user defined, we have a control over the quantum
for sampling the hypersurface.
4. Extensibility to LVQ Network
An additional layer of network, can extend this model to L VQ network, thereby
enabling to form concave or linearly inseparable regions on the hypersurface. This is not
true for the CoultripGranger model since that model is not free from capture by overtake.
In performance mode it can cause errors.
2.9 Simulations
2.9.1 Experiment 1
In this experiment, it is assumed that we know the ideal data set. In other words,
we know the cluster centers. N, p, h, Pmin are set to 960, 30, 16 and 0.9 respectively and
following procedure is followed:
56
1. The network was trained on 10 different randomly generated nonorthogonal patterns.
2. Test data consists of (i) 195,000 randomly generated uniformly distributed patterns
and (ii) 500 patterns from each cluster. The picture will look somewhat like figure 20.
IX
X X
X
X
X
X
)<
i
X
X
X
X
)(
X\ \ " \ , , X·
)(
X
X
X
X
X
Fig. 20 Experiment 1
For simplicity, only three clusters are shown. The solid circles show the cluster
centers (training data set). The x's show the randomly generated and uniformly
distributed data set and the hollow circles show the vectors from that cluster.
3. The error is calculated. There can be two types of error: (i) the vector not belonging to
the cluster is classified as belonging to the cluster; and (ii) the vector belonging to the
cluster is classified as not belonging to the cluster.
Only the second type of error is found. There is 0.14% error rate.
57
2.9.1 Experiment 2
In this experiment, unlike the experiment 1, ideal traming set is not used. N, p, h,
Pmin are set to 960, 30, 16 and 0.8 respectively and following procedure is followed:
1. The network was trained on four training patterns from each of the ten clusters. But
for training Pmin is calculated as follows: Pmintrain = Pmin +( 1Pmin)/2. In other words,
the training data set is chosen from a tighter cluster within the cluster defined by Pmin.
2. Test data consists of (i) 195,000 randomly generated uniformly distributed patterns
and (ii) 500 patterns from each cluster. The picture will look somewhat like figure 21.
x x
x
x
Fig. 21 Experiment 2
Again for simplicity, only three clusters are shown. The solid circles show the
cluster centers (training data set). They belong to the shaded region within the cluster,
i.e. they have a tighter distribution within the cluster. The x's show the randomly
58
generated and uniformly distributed data set and the hollow circles show the vectors
from that cluster.
3. The error is calculated. There can be two types of error: (i) the vector not belonging to
the cluster is classified as belonging to the cluster; and (ii) the vector belonging to the
cluster is classified as not belonging to the cluster.
Only type (i) error is found. Patterns in the vicinity (p=O.78) are identified as
belonging to the cluster resulting in 0.0875% error rate.
2.9.3 Conclusion
From the simulation results, we conclude that the network clusters the input data.
Since the cells to be trained are determined by the fixed pathway, there is no undesired
capture by precedence. Also by tightening the domain of training data set and the region
of influence of each training vector, we ensure that there is no undesired capture by
overtake. Since the same information is presented to both the pathways. the class
infonnation is not required. The user is required to define the cluster size thus has a
control over it.
59
2.10 Reconstructing the template
R. Meyyappan [10] has discussed the concept of reconstructing the template by
backpropagation through the trained weight matrix. But as the network is trained on more
number of patterns, it becomes difficult to determine the threshold.
Since, in the proposed model, we have two weight matrices available, an attempt
is made to reconstruct the template by backpropagation through fixed weight matrix. As
there are no weight changes in the fixed weight matrix, it seems to be easier to determine
the threshold.
The expected number of the coactive synapses on the winner cells, j..l.w, and the
deviation <Jw are a function of patchsize h. They are not readily determined in closed
form [10,12]. But from simulations, we can determine j..l.w and <Jw for different values of h.
For N=960, following values of j..l.w and <Jw are obtained:
h j..l.w crw
4 20.55 2.9
8 22.25 2.6
12 23.05 2.5
16 23.75 2.4
20 24.10 2.3
60
Fig. 22 shows the weight matrix and the LOT vector. For simplicity, all the active
lines of the LOT vector are grouped together.
LOT
vector 1
noncoactive
reglOn
I
I
  y  
coactive region
Weight matrix
2 p
, I
I I ;i
II iI · .. . .I i,iI .
I . 11 I
I I ,I.
Q.
...  _. . . +  I:.  .   qN~w I 1 r  IT  ;  ~
:: r.           ::~l)~:_! ~w l
winner 2 wznner P
hp ~,
Fig. 22 Backpropagation through fixed pathway
The sparsity of the weight matrix q is 0.1 and it will be maintained on the winner
cells also, although it might be high in some region and low in another. Sparsity of the
weight matrix (winner cell) in both the coactive and noncoactive regions is defined as
follows:
Sparsity of the weight matrix (winner cell) in coactive region = llw = qc (2.4)
Nr
Sparsity of the weight matrix (winner cell) in noncoactive region
qN  ll w
N(1r) =qll
(2.5)
61
When the signal is backpropagated through the weight matrix, all 'p' wmner
columns get added. The distribution of the synapses is binomial. Thus the expected value
of the sum of the weights in one column of the weight matrix is readily calculated.
In coactive region,
E [sum of weights in one rowJ = flc = pqc = P flw
Nr
In noncoactive region,
E [sum of weights in one row] = fl tI = pq n = P ~1':r)
(2.6)
(2.7)
(2.8)
(2.9)
Since the network does not know which part is coactive and which region is noncoactive,
it is advisable to set some threshold and call the subthreshold sums as belonging
to the nonactive region and set them to zero and call the others as belonging to the
coctive region. In doing so, we are losing some data and adding some noise due to the
variances associated with the expected values. To minimize this error, we should choose
optimum decision boundary between fln and flc as the threshold. This decision boundary
is given by solving the equation:
(2.10)
62
where Pn is the probability of a bit belonging to the noncoactive region and Pc is the
probability of a bit belonging to the coactive region.
Let's assume that only 40% of the mitral patches are active at any time. Now in
active region Pn is equal to Pc. Now the decision boundary can be given by solving the
following equation for d.
(2.11 )
For patchsize h = 16, and number of patches p = 30, following results are
obtained analytically. Results obtained by simulations agree with the theoretical results.
)..tc = 3.71
)..tn = 2.82
d=3.61
NOTE
If)..tn and )..tc are to be separated by CJc +CJn, (this is equivalent to 3dB SNR) we can
determine the minimum number of patches p required to achieve this by solving the
following equation for p
(2.12)
If we substitute values of )..to )..tn' CJc' CJn from equations (2.6), (2.8), (2.7) and (2.9)
respectively and solve for p, we get p = 414.
1 63
If we threshold the sums by this decision boundary, we will be losing some
information at the same time removing some noise. The subthreshold sums are reset to 0
and superthreshold sums are set to 1. This results in two types of error: dropped and
residual.
The probability of losing information in the coactive region will be the probability
ofthe sum being subthreshold. Let us refer to these bits as "dropped bits".
(2.13)
where d' is the nearest integer to d.
Some noise in nonactive region is removed. Still there are some bits above
threshold which will cause error. Let us refer to these bits as "residual bits".
P [residual bits] = Pr = 1 t,(~)cq"y (1 q,,)Pi (2.14 )
If all the bits corresponding to one mitral patch are added, we will get some
distribution of the sum.
In nonactive region, there will be only residual bits. Let's denote the sum in the
noncoactive region by Sn and number oflevels in one thermometer by n. We can write,
(2.15)
Expected value and variance of Sn are given by
64
(2.16)
respectively.
Let's denote the sum in active region by Sa. We can write
(2.17)
The expected value and the variance of Sa are then given by
(2.18)
(2.19)
respectively.
N ow again we have to decide some decision boundary between ).!Sa and ).!Sn. If we
consider (JSa and (JSn approximately equal, the decision boundary is given by
ds = f.! Sa + ).! Sn + (J sn~ Sa In( P Sn )
2 ).! Sa ).! Sn PSa
(2.20)
where PSn and PSa is the probabilities of occurrence of sums in nonactive and active
regions respectively.
If we threshold the sums using this decision boundary and reset the subthreshold
sums to zero, we can calculate the expected value and the variance of the new distribution
similarly. Let's denote the expected value and the standard deviation of the sum in the
65
active region after thresholding by llsa' and GSa' respectively and those in nonactive
region by llsn' and Gsn' respectively.
Now we can write the signal to noise ratio as:
SNR = ll' Sa '
~ ,2 r 2 r 2 ll Sn +G SII +G Sa
(2.21 )
For the set of parameters described above, (p = 30, h = 16) we get following
results analytically:
flSa = 6.64 GSa = 1.97
llSn = 4.96 GSn = 1.85
llsa' = 5.62 GSa' = 3.43
flsn' = 2.58
SNR = 1.0277 = 0.2375 dB
So we conclude that backpropagation through the naive network will not produce
reliable results.
66
2.11 LVQ Network
To combine the clusters identified by the proposed model into a complex cluster
the network can be extended to LVQ network (refer section 1.5.4) by adding one more
layer of network. An LVQ network is a two layer network. The first layer of the LVQ
network is a competitive layer. The second layer of the LVQ network is used to combine
different subclasses identified by the first layer into a class. This layer can be as simple as
a programmable logic array (PLA).
Let us assume that the clusters created by our network are subclusters. We can
form clusters of different sizes and shapes if we could somehow combine these clusters.
A PLA can serve this purpose.
Fig. 23 shows the block diagram of the LVQ layer. The input to the LVQ layer is
the output of the piriform (or WTA) patches which is a binary string. Since only one bit
out of h bits corresponding to a piriform patch can be active at any time, it is advisable to
use a binary encoder that can reduce the number of bits to 10g2h where h is the number of
bits in one patch. Let the output of the binary encoder be B=[B\ B2 B3 ... BvJ, where
v=plog2h.
The output of the binary encoder is fed to the PLA. There are two programmable
arrays, AND array and OR array.
67
The input of each AND gate is the binary code of some subcluster. There can be
2v such codes. Let u be the actual number of AND gates (or the codes used) out of2v. Let
us denote the outputs of the AND gates by A=[A\ A2 A3 ... Au].
2 3
WfA patches
b. . ~. tc    ..... _ ...... ... .
\/ "\J _ V
2 3
Binary Encoder
p
n:_\;
I
p
.... _.
"
  . . _. .  '"    ~~
Arr ay of connections
 'y~
\' _ /J.: ~ j . 
. ~\~ _ Array of connections
j
, )
~/
Q,
.......... ,/"  
Output Code
Fig. 23 L VQ layer: Block diagram
Let the AND array be denoted by R. R can be written as follows:
RlvO R1v1
R2vO R2v\
68
The elements of R are either 1 or 0 but RxyQ and Rxyl both cannot be 1 at the same time.
Now the output of the xth AND gate can be written as
(2.22)
When all the inputs of the AND gate are asserted, the output of the AND gate will
enable a corresponding row in the OR array. According to the bits stored in the OR array,
network output appears at the output of the OR gates. Let the output code consist of t bits.
In other words, there are t bits stored in each row of the OR array. Let the bits stored) in
the OR array be denoted by O.
01I 0 12 °11
0= °21 °22 °Zl
°ul Ou2 Out
Let us denote output code by Q, where
Now we can write Qyas
So when output of xth AND gate goes high, xth row of the matrix 0, denoted by
Ox, will appear at the output since all other AND gates have low output.
Now the question is how do we combine different clusters. Let us assume that we
want to combine cluster x and cluster y. If they are combined they should produce the
69
same output code. This can be achieved by making row Ox and row 0 ); of the matrix 0
identical.
By extending the network to L VQ network, we can form clusters of various
shapes and sizes on the hypersurface. By forming concave surfaces, we can solve linearly
inseparable problems.
CHAPTER 3
CONCLUSIONS AND FUTURE PROSPECTS
A new model, which is a modified version of the CoultripGranger model, is
proposed. The goals while developing the new model were to retain the desirable
properties of the original model including hierarchical clustering, fast and rapid learning,
a hardware compatible implementation and to control the capture mechanism. We have
also addressed the issue of a dimensioning methodology to determine the dimensions of
the weight matrix. An attempt was made to reconstruct the template by backpropagation
of the output code through the naive weight matrix. Finally we have introduced an
additional layer to the network which converts the proposed network into an LVQ
equivalent. The L VQ network has the capability of forming concave clusters and thus
solving linearly inseparable problems.
Simulation results have shown that the proposed model clusters the input data.
Since the basic learning rule is the L TP rule with coarse weight increments, the property
of rapid learning is maintained. Also the proposed training algorithm ensures that there is
70
71
no undesired capture by precedence or capture by overtake. Unlike the CoultripGranger
model that requires the class information a priori, the proposed model does not require
the class information. Also by making the two pathways exactly identical, it becomes
easier to implement these weight matrices electronically. In the proposed model, the user
has to define the cluster size, thus has direct control over the quantization factor in
sampling the hypersurface.
It has been identified that the size of the input vector (LOT) cannot be increased
indefinitely (Refer to section 2.4.1). Still we have no control over LOT dimension since
the number of inputs is application specific whereas the number of mitral cells in each
mitral patch (or the number of levels in the thermometer code) is determined by the
fidelity requirement. The number of LOT lines is a product of the number of inputs and
the number of levels in the thermometer code. However if the dimension of LOT vector is
too large, one can adjust the weight increment (i1w) accordingly (Refer to section 2.4.1).
The number of cells in a piriform patch should be moderate in order to have less overlap
between the output codes of the patterns having less overlap with each other. A patch size
of 16 is a good choice since a further increase in patch size offers little benefit, i.e. the
expected number of active synapses on the winner cell does not increase considerably.
Also for reconstructing the template with maximum fidelity, it is necessary to have
appropriate number of piriform patches. Also a proper choice of number of piriform
patches ensures a sufficient hamming distance between the output codes of the clusters.
72
These four items (number of input signals, fidelity requirement, piriform patch size and
number of piriform patches) determine the dimensions of the weight matrix.
It has been shown that backpropagation through the naive or fixed weight matrix
will not produce reliable inhibitory feedback. But a heuristic approach, that exploits the
knowledge of the structure of the thermometer code but not mentioned in this thesis due
to lack of strong theoretical background, shows some promise in reconstructing the
template.
It has been shown that a PLA can be added to the network to achieve an LVQ
network. Thus we can form complex clusters of various shapes, including concave. The
L VQ network can solve linearly inseparable problems.
Let us consider a practical example of an ECG classifier. Let us assume that
applied time series consists of 250 samples starting at QRS minus fifty and the number
of possible ECG classes is limited to less than 30 while the fidelity requirement is 44 dB.
The 44 dB fidelity requirement sets the number of levels in the thermometer code
(number of mitral cells per patch) to 128. The number of inputs is 250, hence 250 mitral
patches are required. These two factors alone determine the number of LOT lines, i.e.
N=250xI28. From section 2.2, we know that the number of piriform cells per piriform
patch should be of moderate size with h equal 16 being an acceptable choice.
The factors to be determined are the number of piriform patches and the hamming
distance between two classes. Since the number of possible classes is limited to 30,
assuming that the clusters are distributed uniformly in the input space, we can determine
1 73
an average hamming distance between the classes. But for determining the number of
piriform patches, there is no known optimum solution. Thirty patches is a good starting
point since it ensures that the dropped bit error while reconstructing the template is
negligible. After a number of iterations we can arrive at an acceptable solution. If the
number of patches required are more than the actual number of patches fabricated on an
Ie, two or more Ies can be connected in parallel. But if the application requires larger N
than fabricated, we cannot connect Ies in order to achieve larger N. Further research is
needed to determine an optimum size of the weight matrix. The proposed model has the
following limitations and further research in those areas is needed.
Even though the simulation results show clustering properties of the network, no
attempt is made to show hierarchical clustering. Before hierarchical clustering we must
ensure that the template is reconstructed with sufficient fidelity. In the proposed training
algorithm, the user has control over the cluster size, i.e. he has control over the
quantization with which the hypersurface is sampled, although how to use this effectively
is still to be discovered. Taking into consideration the large number of possible output
codes, it is impossible to implement a PLA that includes all these combinations. Some
statistical experiments should be carried out in order to optimize the size of the AND
array. Also training algorithm for the PLA needs to be developed.
REFERENCES
1. Zurada, "Introduction to Artificial Neural Networks," West Publishing Company,
1992.
2. M. T. Hagan, M. Beale, H. B. Demuth, "Neural Network Design," ,1995.
3. M. Ramanathan, "Statistical Modeling of an Electronic Olfactory," Master Thesis,
OSU, 1995.
4. S. Patil, "VLSI Implementation of Olfactory Cortex Model," Master Thesis,
Oklahoma State University, 1992
5. J. Choi, B. L. Sheu, "A HighPrecision VLSI WinnerTakeAll Circuit for Self
Organizing Neural Networks," IEEE Journal of Solid State Circuits, Vol.
28, No.4, pp. 576583, 1993.
6. K. Wagner, T. M. Slagle, "Optical Competitive Learning with VLSI Liquid
Crystal WinnerTakeAll Modulators," Applied Optics, Vol. 32, No.8, pp.
14081435, 1993.
7. C. A. Mead, X. Arreguit, J. Lazzaro, "Analog VLSI Model of Binaural Hearing,"
IEEE Transactiona on Neural Networks, Vol. 2, pp. 230236, 1991
8. S. V. Gala. "System Level Integration of Olfactory Cortex Model," Master Thesis,
Oklahoma State University, 1993.
9. C. A. Mead. Mahowald, "A Silicon Model of Early Visual Processing," Neural
Networks, Vol. 1, pp. 9197, 1988.
10. R. Coultrip, R. Granger, "Sparse Random Networks with LTP Learning Rule
Approximate Bays Classifier Via Parzen's Method," Neural Networks,
Vol. 7, No.3, pp. 463476, 1994.
11. B. Everitt, "Clustering Analysis," Third Edition, Edward Arnold, 1993.
12. P. Shoemaker, "Analysis of Elementary Clustering Mechanisms in a Model of
Early Olfactory Processing," 1995.
74
~
75
13. J. AmbrosIngerson, R. Granger, G. Lynch, "Simulation of Paleocortex Performs
Hierarchical Clustering," Science, Vol. 247, pp. 13441348, 1990.
14. J. AmbrosIngerson, "Computational Properties and Behavioral Expression of
CortexPeripheral Interactions Suggested by a Model of the Olfactory Bulb
and Corex," Ph.D. Dissertation, University of California, Irvine, 1990.
15. G. Lynch, R. Granger, "Simulation and Analysis of a Simple Cortical Network,"
Psychology of Leaming and Motivation, Vol. 23, pp. 205241, 1989.
16. J. G. Taylor, "Model of Olfactory Learning," Department of Mathematics, King's
College, London, pp. 277280.
17. Z. Li, J. Hopfield, "Modeling the Olfactory Bulb and its Neural Oscillatory
Processing," Biological Cybernetics, Vol. 61, pp. 379392, 1989.
18. Z. Li, "A Model of Olfactory Adaptation and Sensitivity Enhancement in The
Olfactory Bulb," Biological Cybernetics, Vol. 62, pp. 349361, 1989.
19. C. Linster, C. Masson, M.Kerszberg, L. Personnaz, G. Dreyfus, "Computational
Diversity in the Formal Model of the Insect Olfactory Macroglomerulus,"
Neural Computation, Vol. 5, pp. 228241, 1993.
20. M. Schwartz, "Information Transmission Modulation, and Noise," Second
Edition, McGrawHill Book Company, 1970.
21. J. E. Freund, "Mathematical Statistics," Fifth Edition, Prentice Hall, Englewood
Cliffs, 1992.
VITA
RAHUL RA TlLAL SHAH
Candidate for the degree of
Master of Science
Thesis: A STATISTICAL MODELING OF AN OLFACTORY SYSTEM FOR
ELECTRONIC IMPLEMENTATION
Major Field: Electrical Engineering
Biographical:
Personal Data: Born in Pune, fudia, on October 19, 1971, son of Ratilal and
Bharati Shah
Education: Graduated from Sir Parashurambhau College, Pune, 1989; received
Bachelor of Engineering degree in Electronics Engineering from
University of Po on a, Pune, fudia in June 1993; completed requirements for
the Master of Science degree at Oklahoma State University in July 1996.
Experience: Research Assistant (January 1995 to present) Department of
Electrical and Computer Engineering, OSU; worked at NRaD, San Diego,
CA on Master's Thesis
Teaching Assistant (August 1995 to present) Department of Electrical and
Computer Engineering, OSU.