TIMING AND AREA OPTIMIZATION
OF CMOS BARREL SHIFTER
By
BYUNGHA JOO
Bachelor of Science
Yonsei Un.iversity
Seoul, Korea
1988
Submitted to the Faculty of the
Graduate College of the
Oklahoma State University
in partial fulfillment of
the requirements for
the Degree of
MASTER OF SCIENCE
May, 1996
TIMING AND AREA OPTIMIZATION
OF CMOS BARREL SHIFTER
Thesis Approved:
7! Th is Advisor
~~
~~c£Js
Dean of the Graduate College
ij
PREFACE
I wish to express sincere appreciation to my adviser, Dr. Louis G.
Johnson. His guidance throughout this research has been precious. My
sincere appreciation extends to my other committee members Dr. George
Scheets, and Dr. Chriswell Hutchens.
I would also like to give my special appreciation to my parents,
Jongheon Joo and Kyungja Cho; my parentsinlaw, Jaenam Lee and
Teaksung Yang for their support and encouragement. The strong
encouragement of my brotherinlaw, Sunyoung, Lee is also gratefully
appreciated. I want to share this joy with my younger sister, Hyunyee, my
brave son, Myungjoon and my pretty daughter, Yujin.
Finally, thanks also go to my wife, Jeeyoung Lee for her love, many
sacrifices, and precious care for us.
iii
Chapter
I. INTRODUCTION
TABLE OF CONTENTS
Page
1
Design Automation 1
Chapter Description 3
II. LITERATURE REViEW.......................................................... 5
Introduction 5
Switch Level CMOS Model.......................................... 5
CMOS Circuit Optimization 7
III. DELAY ESTIMATION 10
Introd uction .. ......... .... .............................. .. .... ... .. .... .. 10
Transistor Model 10
RC Delay Estimation 12
IV. TIMING AND AREA OPTIMIZATION
OF SERIES INVERTING BUFFERS 15
Introduction .. 15
Timing Estimation 15
Area Estimation 18
Simulation Results 20
V. TIMING AND AREA OPTIMIZATION OF BARREL SHIFTER ....... 26
Introduction 26
Overview of Barrel Shifter 26
Layout Parameters 28
Circuit Optimization 31
Optimization Results 37
Comparison to others work 39
IV
Chapter Page
VI. SUMMARY and FUTURE WORKS 48
LITERATURE CITED 50
APPENDIX I 53
APPENDIX II 61
v
Table
LIST OF TABLES
Page
I. MOS Channel Resistance Classification 11
II. Normalized Parameters 15
III. 3 Stage Inverter Optimization 21
IV. 7 Stage Inverter Optimization 23
V. 8 bit Barrel Shifter transize Result 37
VI. transize Process Parameters 39
VII. transize Result of 32 bit Barrel Shifter 40
VIII. transize Result of 8 bit Barrel Shifter 41
IX. iCONTRAST Result of 8 bit Barrel Shifter 42
X. Clocked Inverter Type 8 bit Barrel Shifter Result 45
XI. Computing Time to Optimize 8 bit Barrel Shifter 47
VI
LIST OF FIGURES
Figure Page
1. Series Inverters 1
2. Transistor Model 11
3. MOS Capacitance Model... 12
4. RC Tree 13
5. Capacitance Origin 16
6. Series Inverters for Optimization 16
7. Area Dependency of an Inverter Layout 18
8. Transistor Size of 3 Stage Series Inverters 22
9. Delay and Area Relationship of 3 Stage Series Inverters 23
10. Optimal Point Evaluation of 7 Stage Series Inverters 24
11. Variation of Number of Stage for Series Inverters 25
12. Layout Scheme of Barrel Shifter 27
13. Simplified Schematic of Barrel Shifter 27
14. Function of Standard Layout Blocks 28
15. RC Equivalent Circuit for Barrel Shifter 29
16. Worst Case Data Path in Barrel Shifter 31
17. Schematic of 8 bit Barrel Shifter 38
vii
Figure Page
18. Area and Timing Relationship of 32 bit Barrel Shifter
after transize Optimization 41
19. Area and Timing Relationship of 8 bit Barrel Shifter
after transize Optimization 42
20. Area and Timing Relationship of 8 bit Barrel Shifter
after iCONTRAST Optimization 43
21 . Schematic of 8 bit Barrel Shifter with iCONTRAST Result 44
22. Area and Timing Relationship of Clocked Inverter Type
Barrel Shifter after iCONTRAST Optimization 45
23. Schematic of Clocked Inverter Type 8 bit Barrel Shifter
with iCONTRAST Result 46
viii
CHAPTER I
INTRODUCTION
Design Automation
In a CMOS integrated circuit design, it is time consuming procedure to
determine the proper transistor sizes for the design with circuit simulator. A
lot of iterative simulations are needed to get the best performance of the
design. Making a design faster does not mean just making transistors bigger
to provide more current driving capability.
INVII·1) INVli) INVII+ 1)
Figure 1. Series Inverters
For a CMOS series inverter in figure 1, a bigger size of inv(i) reduces the
delay to inv(i + 1). However, from the inv(il )'s standpoint, invli1) has
bigger load. So, inv(i1) consumes more time than before.
2
Enlargement of all transistors is not an economical solution and the whole
chip size is restricted by process equipment in fabrication. Also, it may not
improve delay characteristics. The best solution for the designer is to
determine the most effective size for each transistor within a given area.
First of all, size and delay optimization is not the problem of gate level
design but the problem of transistor level. However, SPICE like full transistor
modeling is not effective, because finding the optimal size in big circuits
which have many transistors is important. Full transistor modeling takes too
much time and resource for practical use in very large circuits. Therefore, a
switch level model should be used. :In other words, among the various kinds
of transistor characteristics, the simplified model with selected
characteristics should be employed.
From the literature, the following contributions are of interest. The
transistor is regarded as a current source and its current characteristic is
piece wise linear [Ousterhout, 19851 [Hedenstierna, 1987] [Sakurai, 1990]
[Dutta, 1995]. The signal transferred among transistors is not a step input
but ramp input which has nonzero rise and fall time [Ousterhout, 1985]
[Hedenstierna, 19871 [Sakurai, 1990] [Dutta, 1995]. If capacitors of several
type are represented by a lumped capacitor, some degree of error rate shoulld
be included. A distributed capacitor model has better accuracy [Ousterhout,
19851 [Hedenstierna, 19871 [Sakurai, 19901 [Sapatnekar, 19931. Resistance
of poly gates in wide channel transistors is not negligible for wide or long
3
channel MOSFETs [Sakurai, 1985]. Series or parallel connected transistors
as in nand or nor gates also affect the performance of a circuit [Caisso,
1991].
There were a lot of efforts to get the optimal value of transistor size. The
law of diminishing returns was used [Lin, 1975]. Some people solved the
problem with lagrangian multipliers [Cirit, 1987] [Hedlund, 1987] [Marple,
1989]. Some papers dealt with a two step iteration method. First, timing is
optimized and then area is optimized with the determined timing restriction
[Chen, 1991] [Dai, 1989]. Recently, convex optimization was applied
[Sapatnekar, 1993].
Previous papers concerned general circuits which have comhinational
gates and sequential logic circuits.
The objective of this paper is to determine whether fast solution finding of
timing and area optimization with a simplified transistor model in a barrel
shifter design which has many inputs and outputs and a repeated structure
can be practical.
Chapter Description
In Chapter II, a literature review is presented which covers switch level
CMOS transistor model and CMOS circuit optimization. In Chapter III, the
basic idea of delay estimation is presented which covers transistor modeling
4
and RC circuit analysis. In Chapter IV, a series of inverters is optimized. In
Chapter V, the barrel shifter is optimized and compared to other work. In
Chapter VI, a summary and conclusion is given.
CHAPTER II
LITERATURE SURVEY
Introduction
This chapter describes the switch level CMOS model and CMOS circuit
opti mization.
Switch Level CMOS Model
Delay and area optimization of MOS circuits can be done with a swiltch
level transistor model, and delay estimation of RC circuits. Ousterhout
[Ousterhout, 1985] classified three simplified transistor models  lumped RC
model, lumped slope model, and distributed slope model. The lumped RC
model is a wellknown and simple model. All transistors are substituted with
an equivalent resistor, and all loads are replaced by a single capacitor. The
lumped RC model is the least accurate model among the three models. The
lumped slope model is basically the lumped RC model, but it takes care of
nonzero rise and fall times in the input signal. Effective equivalent
resistance is calculated through interpolation of two resistance values. One
is highpass equivalent resistance and the other is lowpass equivalent
resistance. The distributed slope model considers distributed capacitance. In
5
6
the Penfield and Rubinstein method [Rubinstein, 1983], capacitors are
considered separately.
Hedenstierna et al. [Hedenstierna, 1987] studied CMOS inverter delay in
response to a ramp input. From their paper, the input signal is defined as
follows.
r 0 , t < 0
V IN = 1s t , 0 < t < r
l1 , t > r
where r= 1/s is the rise time of input voltage ramp.
Sakurai et al. [Sakurai, 1990] introduced an apower law model. In this
model, the transistor is a piece wise linear current source.
In =
r 0 ,
( lIDO)
V 'J) 0 V /) S ,
J I I) 0 ,
eVes ~ VTff:cut  oft)
(VIJS < VDo:linear)
(VDS ~ Vno:saturation)
In this model, 1'00 and V'DO are modeled in apowered form. From this
model, they extracted propagation delay estimation. The apower law has
good accuracy according to their report.
Dutta et al. [Dutta, 1995] improved the apower law transistor model.
The apower law transistor model does not g:ive accurate delay estimation
when the input slope is very fast or slow. They also took care of negative
,
7
delay. Negative delay happens when the input slope is slow and load
capacitance is small so that the output switches faster than the input.
Rubinstein et al. [Rubinstein, 1983] proposed a method that can provide
upper and lower bounds on path delay in RC tree circuits. Lin et al. [Lin,
1985] extended Rubinstein et al.'s method to a general RC network.
Sakurai et al. [Sakurai, 1985] suggested a degradation factor that
accounts for the effect of poly gate sheet resistance on transistor
propagation delay. To make a transistor work properly, the transistor should
have a low degradation factor.
CMOS Circuit Optimization
Lin et al. [Lin, 1975J tried to find optimal transistor size values with a
normalized delay and area equation,
F=(normalized area)(normalized delayl
where k is weight factor. Optimal values are located at the point where F is
minimum.
Hedlund [Hedlund, 1987] was looking for del,ay, area and power
optimization. He used an RC transistor model and applied lagrangian
multipliers to solve the nonlinear problem. He minimized the scaling factor
between transistors in adjacent stages. Therefore, the initial PN transistor
ratio within each stage is unchanged.
8
Cirit [Cirit, 1987] used an RC transistor model and defined area as the
sum of all transistor widths. He built an equation to minimize with lagrangian
multipliers. And then, he tried to get optimized values with a bisection
method.
A two step iteration method was proposed by Dai et al. [Dai, 1989]. At
the first step, the program determines optimum timing for the target circuit,
and then, within the optimal timing, the program minimizes the circuit size.
Marple [Marple, 1989] developed an optimized Ilayout generator. His
program reads a layout database, and represents the circuit in a delay graph
to find the critical path. With lagrangian multipliers, an opt!imized circuit is
extracted. Finally, the program generates the optimized layout.
Cheng et al. [Cheng, 1991] developed iCOACH that is a poly cell based
optimized layout generator. iCOACH considers charge sharing and noise
margin. The circuit is optimized through a two step iterative process.
Sapatnekar et al. [Sapatnekar, 1993J developed iCONTRAST that used a
convex optimization method. The transistor model of the program is a nonRC
delay model. Originally, the delay model proposed by Hedenstierna et
al.'s model was applied. Later, the delay model was changed to the a.power
law. iCONTRAST minimizes area under a given timing specification.
iCONTRAST consumes a lot of time. According to their paper, to optimize a
9
32 bit adder, it takes about eight hours of execution time on a Sun
SparcStation I.
The purpose of this thesis is to determine whether a simplified automatic
optimization tool can give effective value with significantly small time
compared to iCONTRAST. Results of this study may provide an alternative
to the problem of timing and area optimization in large and repeated pattern
circuits like the barrel shifter.
CHAPTER III
DELAY ESTIMATION
Introduction
Delay calculation for a certain path in a CMOS circuit can be divlided into
two parts. One is the transistor model and the other is the RC tree delay
estimation. For fast calculation, the transistor is modeled as a fixed resistor.
The RC tree delay is considered in the method that was proposed by
Rubinstein et a!. [Rubinstein, 1983].
Transistor Model
The simplest model of MOSFET is a fixed resistor that is controlled by its
gate voltage. When the transistor is off, there is an open circuit. When the
transistor is on, there is a current path with fixed resistance value. This
model has about 25% error compared to SPICE results [Ousterhout, 1985J,
but still gives a benefit on the calculation time.
The resistance of a MOS transistor is proportional to the length and
inversely proportional to its channel width. This is one of the factors that
can make calculation fast. Generally in digital circuits, the minimum channel
length of a process is used. Therefore, the on resistance of a MOS transistor
is only inversely proportional to its width.
10
11
GI ~> '~~
5
Figure 2. Transistor Model
A MaS transistor has different resistance values depending upon the
signal value which passes between drain and source. In this paper, the on
resistance of a MaS transistor has four different cases.
Rnl low pass resistance of NMOS
Rnh high pass resistance of NMOS
Rpl low pass resistance of PMOS
Rph high pass resistance of PMOS
Table I. Mas Channel Resistance Classification
The capacitance of a MOSFET is also simplified. A MOS transistor has
capacitance on the gate, drain, and source. These capacitances have
nonlinear characteristics. For the convenience of calculation, all capacitances
are replaced by fixed capacitances.
12
The whole gate capacitance is located between gate and ground when
the transistor is off. When the transistor is on, halves of the gate
capacitance are applied between gate and drain, and gate and source.
s L _When 1r. is Oft When 1r. is On
Cg: gate capacitance
Figure 3. MaS Capacitance Model
RC Delay Estimation
A CMOS circuit can be replaced by the previous RC model. In the
equivalent circuit, the gate voltage determines the on or off state of the MaS
transistor. Drain and source voltage determine high or low pass resistance
value. From the RC equivalent circuit, delay can be produced in this way
[Chu, 19871.
Tdelay = LRkeCk
k
Figure 4 shows sample of RC tree.
13
VVV
R3
A
C3
"*" '. R4
··~·j,·A·······....
VVV .i C4
R2
p' vv~"'··········
Rl .•..•.. C2 1:. ............. =
YVI/'<
C1 ~ ~
"*" C5
INPUT
Figure 4. RC Tree
RC time constants between a certain node e and the input are all added
up at each stage. Then, from the rest of the branches, RC time constants
consist of the resistor which is directly shared between the path to e and the
branch, and the capacitor for the branch node. Total time delay for the RC
tree is the sum of all RC time constants. This method is valid only if all
nodes have same initial voltages.
T INnT," , =Rl' Cl +(R, + R2) .C2 +(RI + R2 + R4) . C4 : time constants ofRCpalh (input to e)
: time constanls ofthe rest ofthe branches
In circuits with transmission gates, both P and N type MOSFET are used
in parallel for improvement of signal transfer. But, because of this structure,
there ,is a resistor loop, so that the RC circuit is no longer a tree. When there
is only one transmission gate, capacitances from the P and N transistor can
be combined by addi,ng them together and the equivalent resistance is a
14
simple parallel combination of the two resistances. This usually makes the
equivalent circuit a tree again.
When there are two or more transmission gates, equivalent capacitance
and resistance are not easily available. Generally when there are two or
more transmission gates, for layout simplicity, there is no interconnection
between Nand P MaS transistors. In this way, the area of diffusion
contacts can be saved. To solve this case, the following assumptions are
needed [Lin, 1984]. The final capacitance can be divided for each path
according the path's current dri'vi,ng capability such that the delay timings for
two path's are the same.
For example, if portion a of the final capacitance is driven by the P type
MaS transilstor, then portion (1a) of the final capacitance is driven by the N
type MaS transistor. The delay equation for each path can be formed.
Then, since delay timings for both paths are the same, B can be obtained in
terms of Rand C. As a resu'lt, delay timing for transmission gates is
available.
CHAPTER IV
TIMING AND AREA OPTIMIZATION OF SERIES INVERTING BUFFERS
Introduction
Timing and area optimization of series CMOS inverters was performed.
The M0 S transistor is represented as a resistor and capacitor. The
optimization process used the law of diminishing returns. The law of
diminishing returns was applied to proper level finding of demand and supply
in Economics. Transistor size on both ends of the series inverters was fixed.
Parameter values from a 21lm Nwell process was used. The programming
language was MATLAB.
Timing Estimation
Timing and area optimization of series inverters in CMOS technology was
investigated. During the optimization process, resistance and capacitance
values are used in normalized form.
Rpl normalized PMOS resistance, Rp'=Rp·L
where Rp is PMOS channel sheet resistance per unit area
RN' normalized NMOS resistance, RN' = RN·L
where RN is NMOS channel sheet resistance per unit area
Co' normalized diffusion capacitance
CG' normalized gate capacitance
C1 interconnection capacitance
Table II. Normalized Parameters
J5
16
C1 is independent of channel width of MOS transistor, so it is regarded as a
constant. Capacitances are extracted from diffusion, gate, and
interconnection in layout. Figure 5 shows where each capacitance comes
from.
. 
c,
Figure 5. Capacitance Origin
Diffusion capacitances (CPOIFF, CNDlfF) are functions of the driving MOS
transistor width and Gate capacitances (CPGATE, CNGATE) are functions of the
driven MOS transistor width.
The series inverters for optimization are shown in figure 6.
II'N(Q 11'N(11 II'N(i)
"I
Vi
11'N(r'V
L
Figure 6. Series Inverters for Optimization
17
When a MOS transistor is replaced by a resistor and capacitor, the
resistor and capacitor are represented as follows.
R'
Ri=Wi
Total path delay can be represented as follows.
(1 )
R' R'
=...+ (eD',Wi  1+ C, + CG',Wi) + (CD',Wi + C, + CG',Wi + ,) +
Wil Wi
Differentiate both side by Wi to find
,0 < i < n (2)
From (2), the minimum delay point can be derived without any area
consideration.
Or
Let  =0
OWi
=
18
Wi = Wi I'(~~' + Wi + I) where on1ly a positive value of Wi has meaning.
However, this value is not economical, because this Wi does not take into
consideration how much area is used.
Area Estimation
The area of an inverter can be divided into two parts. One is constant
area that ,is essential area. The other is dependent area that varies according
to the width of the MaS transistor.
const.
I >
const. dependant
>k
const. dependant , ~
III
Figure 7. Area Dependency of an Inverter Layout
If the constant area of an inverter is always same for every inverter, the size
of each inverter can be represented as follows.
19
Ai = Ao + A',Wi
where, Ai: size of the ith inverter
Ao: constant area of the inverter
A ': inverter height
Wi: channel width of the ith inverter
The total size of the series inverters is
=~")Ao+ A',Wk)
k=O
Differentiate both sides with respect to Wi
oA
=A'
OWi
(3)
(4)
From the two differential equations for area and delay, optimization process
will be processed with the law of diminishing returns.
dr ciA
=K· r . A
r
dr=K··dA
A
Or r oA
=K··
OW; A OW;
(2),(4) => (5)
R',CG' R' A'
2(c, + Cd·Wi + I) =K'r'
WiI W; A
where K is an arbitrary weight factor.
15)
(6)
20
In 16), A is function of Wi and has L inside. For calculation convenience, (6)
is divided.
A'
M=r'A
From (1) and (3)
and,
R',CG' R'
2(O+CG'·Wi+ I) =K· M
WiJ Wi
R'·(Ct+ C(j',Wi+l)
R '·Co' + K, M
Wi  1
(7)
(8)
(7) and (8) are solved by making a guess at Wi and using the guess in (7) and
the right side of (8) to produce a new guess. This procedure is iterated unti'l
the guesses for Wi converge.
Simulation Results
Using equations (7) and (81, the optimization process was performed in a
2~m CMOS Nwell process. Typical values for the simullation are as foll'ows.
Normalized gate capacitance (CG') was 1.68 x 1O9 F/meter. Normalized
diffusion capacitance (CD') was 0.95 x 109 F/meter. Interconnection
capacitance (CI) is the sum of diffusion contact capacitance, metal
21
capacitance and poly capacitance. In this optimization, Interconnection
capacitance was assumed to be constant and typical value is 18fF.
Normalized channel resistance (R') was O.084Q·meter. In this optimization,
Nand Ptype MOS transistors were assumed to have the same channel
resistance. The height of the inverter (A') was set to 25j..tm. The essential
area for an inverter was 625 x 10 12 meter2
. The weight factor (K) for the law
of diminishing returns was set to 1. Three stage series inverters were
optimized. The MOS transistor size of first and last inverter were fixed to
10j..tm and 300j..tm each. The result is shown in table III.
Transistor Name Size (unit: j..tm)
WO 10 (fixed)
W1 48.4357
W2 300 (fixed)
Table III. 3 stage inverter optimization
The size of the MOS transistors in the middle inverter was traced with
various sizes of both end inverters. In figure 8, WO is size of tne first
inverter, W1 is size of the middle inverter, and W2 is size of the last inverter.
22
20 25 30 35 40 45 50
wO: width of first inverter
10 15
I I • • I I I I
: : : : : : : w2= 0
I It. I I • 1",'  : t ~ : ~  .. : : ~ ~ ...
I I I I I I ;~l
I I I \ I , ~ : : : ; : :,...........:
.....  { t  : t ~ ~ ~ ..
: : : :/I~: : I I • I I I ,
I I I • I : I .. _ .....  ; ......... t  ..  ....; _ .. ~ ......... ;_ ... , .... : ..... _ ... ~_ ..
·I .I • / •• I './' . . . : : % : : I ....   :  f ~ : _..  i '
: y/: : :
: //: : I :
~./.lt·l ~.. .. A·· ...;....:<~~ .~..=+=t"... +....
I l~: I I
I ..,., I I ."......"..., I I I •
.... '/~ I I I I I "'./....;: ~ .. .. : ~ ~ ~ ~ : ...
/~~l I I I I ••
/" I : : : : : : ~
f I I I I I • I
,..~  : t : ~ ~ ; ~ : ..
• I I I I I I •
I I I I I I I I
I I I I I I I I
• I I I I I I •
200
180
..... 160
Q)
1::
Q) 140 >c
"U
c 120 au
Q)
(f) ..a.... 100
..c
~
.~ 80
~
60
40
I
20
5
Figure 8. Transistor Sizes of 3 Stag,e Series Inverters (unit: Ilm)
In figure 9, the MOS transistor size of middle inverter was traced with
several MOS transistor sizes of the last inverter and one MOS transistor size
of the first inverter. ,*, marks show the optimal point for each combination.
The curves show delay and area relationship when the MOS transistor size of
the middle inverter was changed.
23
15,,;:;..;::;....,....,:,,,
6
x 10'
w1~ from 10 micro meterito the ·ze ofw2
1 2 345
Area total area of 3 inverters(unit: micro meterll2)
OL._'__JL.__' L.__'__l
o
Q)
E
i=
.t=
c:
2
~ 10
Q)
t:::
Q)
>
.!:
(T)
~
OJ
~
2~
Q) 5
Eu
Q)
(h
o
c:
~c:
Figure 9. Delay and Area Relationship for 3 Stage Series Inverters
For a seven stage series inverters, the MOS transistor size of the first
and last inverters are 10lLm and 500lLm respectively. Table IV shows the
optimization result.
Transistor Name Size
WO 10 (fixed)
W1 17.6032
W2 25.0262
W3 34.4724
W4 54.9984
W5 123.4313
W6 500.000 (fixed)
Table IV. 7 Stage Inverter Optimization (unit: ILm)
24
To verify that the result is optimal, the sizes of all middle inverters were
forced to be changed up to ±50% of the optimized value, then delay and
area relationship was observed.
2.55
x 10~
2.25 2.3 2.35 2.4 2.45 2.5
Area: total area of All inverters(unit: micro meterll2)
3.15,,,.,,,,
UQ) ..,...
en 31,  ~ _ ~ : : ~ ~ . o . \ . . . . . .
c ::::::
C1l : : : : : :
.~ 3.05 \ ·········f············l·············j·············[·············f············l············
.2. \' . . . . .
~ 3 :::~\::·:::J:::::::::::1::::::::::::r:::::::::::t::::::::::::1::::::::::::1:::::::::::: ~ 2.95
c \ : : : : : :
\ : : : : : :
<{ 2.9 ·········\~············1·············j·············t·············f············i············
12.85 ~\ ) .. ····t··).············;···········)············~········ . i 2.8 [\,,,j. \\L ,.,...., .
i 2.75 L ~." ',\..L. L L 1... .
.0... :: ••~ : : : : ..... : \: : : : m 2.7 : .. : : : : M ••••• _;. "................ .. •••. ;. :,. " •••••••••••• E . .
i= 2.65 L_i.__i...__L_i=:::::::i:::=====:J:...._.J
2.2
Figure 10. Optimal Point Evaluation of 7 Stage Series Inverters
In figure 10, all curves are focusing to one point.
The optimal number of stages was also investigated. With the same
MOS transistor sizes on the ends, number of stages was varied from 3 to
20.
25
7,.....,...r,.....,...,
~ ~ ~ ~ ! ;* ~ 6.5 ·················r·················T·················r ~..2U
~ 6 ] 1. t··············/~·)·~············
~ ~ i i * 1~ i 55 ······i(··t~17i···_·····_··_
=4:·:_::::_:I:j:::~~::Z~~_::::J:::::::·::
~ ~ ~ la ~ I 4 ·········j·········i··10·)I··f················i·················
E3.5 ·················1··················~··7··········+··················f·················
: 8/* ~ ~
~ 3 ( 7./*\ ( : .
Q.i 2.5 .3 j 6.."C ~ ~ ~ .
E ~, 5 /* ~ : ~
~ ~~ : : :
2'..;.;...:........L..''....L..'
1.5 2 2.5 3 3.5 4
Area: total area of All inverters(unit: micro meterJl.2) x 10'
Figure 11. Variation of Number of Stage for Series Inverters
When the number of stages is four, delay is minimum. And when the
number of stages is three, area is minimum. Though total area is bigger, five
stage series inverters shows faster response than three stage series
inverters.
CHAPTER V
TIMING AND AREA OPTIMIZATION OF BARREL SHIFTER
Introduction
With lagrangian multipliers, timing and area optimization of a barrel
shifter was performed. In this chapter, the layout of the barrel shifter is
divided into several standard layout blocks. That makes it easy to change
the configuration as needed. The program was named transize and written in
C language. Final results are compared to the results from the program
iCONTRAST which was developed by Sapatnekar et al.[Sapatnekar, 1993].
Overview of Barrel Shifter
The basic component of the barrel shifter is a 2 to 1 multiplexer. These
multiplexers are arrayed as shown in figure 12 and 13. In the layout, inputs
are located at the left side, outputs are at the right side, and control signals
are at the top side of the layout. All transistors are placed with their widths
in the horizontal direction, so changing the transistor size of each stage is
easy to accomplish.
26
CONTROL
27
INPUT
 0  c c
UJ UJ W UJ
Cl Cl Cl
I
Cl
 <l: <l: ........ .. c( c(
f f f f
(J) rJl (fl (fl
I I I
OUTPUT
Figure 12. Layout Scheme of Barrel Shifter
The schematic of a barrel shifter is shown in simplified form in figure 13.
A transmission gate does not have signal driving capability. It is only a
resistive switch. Therefore, the signal should be amplified with proper buffer
insertion between multiplexers.
50 s, 52
OUT?
oure
OUT5
OUT4
OUT3
OUT2
OUT'
OUTO
5
Figure 13. Simplified Schematic of Barrel Shifter
28
Layout Parameters
A 21lm CMOS Nwell process was used. Standard layout blocks are an
inverting buffer, a transmission gate multiplexer, and cross over cells which
connect multiplexers. Figure 14 shows the function of the standard layout
blocks. B(i) is the ith block of inverting buffers, X(i) is the ith block of cross
over cells, and M(il is the ith block of multiplexers.
X(i)
I  I
M(i)
X(i+ 1)

, 1
'~
M(i+ 1)
X(i +m1)
1 I
~i>
B(i+m)
M(i+m1)
m: number of multiplexers
Figure 14. Function of Standard Layout Blocks
For each multiplexer, only one transmi,ssion gate is turned on. The RC
equivalent circuit is showed in figure 15.
29
Pull up
Q
RBnl(i) ntr onresistance of inverting buffer
RBphli) ptr onresistance of inverting buffer
RMP1li) ptr onresistance of multiplexer when low data passing
RMPhli) ptr onresistance of multiplexer when high data passing
RMnl(ij ntr onresistance of multiplexer when low data passing
RMnhli) ntr onresistance of multiplexer when high data passing
Coli) total capacitance connected to inverting buffer output
CM/i) total capacitance connected to multiplexer output
Figure 15. RC Equivalent Circuit for Barrel Shifter
The delay of each case can be estimated with the Elmore delay formula,
Tdei<,v = I Rke Ck.
k
'rise = RBph(i) . CIJ(i) +(RBflh(i) + RMllh(i)1I RMflh(i)) . CM(i)+....
The capacitance driven by buffer is calculated in this equation.
CB(i) =COB(i) + Cx(i) + CIOM(i) + CllM(i)
where Cos(i) : inverting buffer output capacitance
mainly diffusion capacitance
Cx(i) cross over capacitance which is metal capacitance
between inverting buffer and multiplexer
ClOdiJ,ClIdiJ:input capacitance of multiplexer
diffusion capacitance and interconnection capacitance
difference of two parameters is interconnection
capacitance.
CJOM(i) =CIOM + C'IM/I . WMJI(i) + C'IMp .WMp(i)
CJ1M(i) = CflM + C'/MIl' WM/,(i) + C'IMp' WMp(i)
Each multiplexer drives this amount of capacitance.
CM(i) =COM(i) + Cx(i + 1) + CroM(i + 1) + CIIM(i + 1)
30
where Cadi) : output capacitance of multiplexer.
The last multiplexer of each stage drives this amount of capacitance.
CM(i +m  1) =COM(i + m  1) + CrB(i + m)
where C,B(i) : gate capacitance of inverting buffer and interconnection
capacitance
The area equation for barrel shifter is as follows
A = LA(i)
= L(Ao(i) + Ax(i) + AM(i))
where AB(i): Inverting buffer area
All(i) = Am + A',(WIIJI(i) + Wn,,(i))
AAiiJ: multiplexer area
AM(i) = AMI + A'.(WMIl(i) + WMP(i))
Ax<'i): cross over area, independent of transistor size. different
cross over cells are needed for each stage.
31
Circuit Opt'mization
There are many signal paths in the barrel shifter. Among them, the data
path is the main concern. Though data can flow through different paths, the
capacitive and resistive loads are always the same. From the low order bit to
the high order bit, the capacitive loads decrease. But, this slight difference
can be ignored. Therefore, the worst case data path can be chosen from low
order bit. The optimization results with the worst data path can be applied
to the whole circuit.
For a sample data path, equations for optimization wiu be derived. In
this optimization, the first and last inverting buffer have fixed transistor size.
For calculation simplicity, all outputs of multiplexers are amplified by an
inverting buffer. Figure 16 shows a sample worst case data path of the
barrel shifter.
X(O/ X(1/
1 r
>~>~
B(O/ ~.' JBm _ J
, ,I
X(N1J
r
.?' I;j./J
I: B(2/ :r»: 8(N!
MfO/ Mf1/ M(N1/
Figure 16. Worst Case Data Path in Barrel Shifter
There are N multiplexers and N + 1 inverting buffers. The top and bottom
transmission gates are off and the middle transmission gates are on. The top
32
and bottom transmission gates are considered as a capacitive load for their
diffusion capacitance. The sizes of the top and bottom transmission gates
are equal to size of the middle transmission gates.
Delay between the input and output of the path must be determined for
two uses.
The input rising delay is
T, =T/O + Trl + TI2 + Tr3 + TI4 + TrS + TI6 + Tr7+ .
= I Tr(i) + I T/(i)
i= odd i= evell
The input falling delay is
Tf = Tr0 + TI I + Tr2 + TI 3 + Tr4 + Tf 5 .. Tr6 + TI 7 + .
= I TI(i) + I Tr(i)
;=odd i=e,'en
For each stage, ri,se and fall delays are represented as follows.
TI(i) = RHIl/(i)· CI/(i) + (RHJlI(i) + RMII/(i)IIRMPI(i»)' CM(i)
where Cn(i) =Cun(i) +Cx(i) + CiOM(i) +CIIM(i)
CM(i) =COM(i) + Cm(i +1)
Using a lagrangian multiplier,
where 0 < a < 1 to avoid maximizing delays when minimizing f
f= I[(la)'Tr(i)+a''7(i)]+ I[a'Tr(i)+(Ia)''7(i)]
;=odd i=evell
= L [(Iq(i»),rr(i)+q(i).r/(i)]
j
33
where q(i) ={ a
Ia
i=odd
i =even
Minimizing f gives
0= Of =
OWBn(i)
0 Of   
OWBp(i)
0= if =
OWMn(i)
(1 q(i  1») , (RBph(i  1) + ]WJliI(iI)11 RMpil(i  1)) ,C' 18/1 +
q(i 1), (RR/I/(i 1) + RMII/(i l)IIRMp/(i 1)) 0 C' 1811 +
(1 q(i») , Rllpil(i  1) ° C' OR/I +
[
RB/lI(i) 2 ( )] q(i)· R8/11(i) 0 C' 18/1  , CH(i) + CM(i)
Rill
(1 q(i 1») ,( RSph(i 1) + RM1Ih(i  1)ljRMph(i  1») 0 C'IBp +
q(i 1)' (RIJnI(i 1) + RMJlI(i l)IIRMI'I(i 1»). C'IBp +
q(i) , RIJIIIU  1) . C' OBp +
(I  q(10») . flRHl'iI(0l)' C" IBpR,/lPil(i)2 (C'IJ('I) +CM (0I »)]
R ""
34
( )
(RMII/(i)IIRMp/(i))2
.~'I(i)lIlUfpl(i) . C' OMII  R' . CM(i)
III
0= if =
OWMp(i)
o = of
oa
(1 q(i)).[ R'ph(i)· (2C' IMp + C'OMp) +
r
q(i) 'l RSIlI(i)· (2C' IMp + C' OMP) +
= Tj  Tr
These equations are for delay minimization only. Area equations need to
be included. In this optimization, delay will be minimized within specified
area Ao. Using lag.rangian multipliers,
g = Tr + a, . (Tf  Tr) + aA . (A  A 0)
where
at is lagrangian multiplier for timing optimization
aA is lagrangian multiplier for area optimization
A is totall area
Ao is target area
Differentiate both sides with respect to W(i)
0; q oA
O==+aA'
&(i) bW(i) OW(i)
o og =  = T/  Tr
oaf
Area equation is
A = I A(i) = I(A8(i) + Ax(i) + AM(i)) = ACOIWClIII + A'·I Wei)
Differentiate both sides with respect to W(i)
oA 0 " OW(i) = OW(i) (Aeon,lImli + A"L..JW(i») = A'
With this equation,
35
0 0; 
OWBII(i)
0 0; 
OWBp(i)
(1 q(i 1»). (RIJPh(i 1) + RM/lh(i 1)11 RMph(i 1»)' C' iBlI +
q(i 1)· (RBII/(i 1) + RUIlI(i 1)11 RUI'I(i 1)), C' 1811 +
(1 q(i») . RHph(i  1) . C' 01]/1 +
[
RiJlll(i)2 ( )1
qU), RHIlI(i)· C'IHII  I CH(i) + eMU) J+ aA' AI Rill
(1 q(i 1)). (RiJPh(i 1) + RUllh(i 1)IIRMph(i 1))' C' IfJp+
q(i 1)· (RH/lI(i 1) + RMnl(i I}11 RMpl(i 1))' C' IHp +
q(l) . RIJIII(i  1) , C' onp +
( ,) r . , RIJph(i)2(. , )1
1 q(l) 'lRIJPh(l) ' C /HI'  , CB(l)+CU(l) J+ aA' A'
R ph
36
r (1 q(i»)'l RBph(i)' (2C' 1M" + C' OM,,) +
q(i). [RB"J(i) ·(2C' (M" +C' OM,,) +
aA' AI
r (1 q(i»)"l R,Jph(i)' (2C' IMp + C' OMp) +
aA' A'
When aA is zero, the area is not constrained and the delay is a minimum,
The fixed size for the first and last inverter keep the area finite. If aA is
negative, that means the target area is bigger than the minimum delay area.
In that case, the target area may be reduced or set aA to zero. When aA is
positive, the area is less than the minimum delay area and the delay is
37
increased. Thus, a positive aA allows a tradeoff between increased delay
and reduced area.
Optimization Results
In this optimization, the barrel shifter has full CMOS transmission gates
and inverting buffers for every two stages of transmission gates to amplify
the data signal. Layout was made with standard layout blocks. That makes
it easy to handle different barrel shifter configurations. All transistors are
oriented in one direction. Therefore, variation of transistor sizes affects the
size of only one side of barrel shifter. Figure 17 shows the schematic of an
8 bit barrel shifter.
For each group, the same value of optimized width is used. In other
words, WBp(O) and WBn(O) in TABLE V are the PMOS and NMOS transistor
widths of the B(O) group transistors.
WBp(O) 4 WBn(O) 4 WMp(1) 4 WMn(l) 5.307
WBp(l) 9.261 WBn(1) 6.712 WMp(2) 4 WMn(2l 4
WBp(2) 8.612 WBn(21 5.543 WMp(3) 4 WMn(31 4
WBp(3) 30 WBn(31 20
Total Area 6732.883789 Jlm L
Total Rising Delay 5.811049 nano seconds
Total Falling Delay 5.806156 nano seconds
Table V. 8 bit Barrel Shifter transize Result (unit:jlm)
38
M(l) ,
B(I),
M(2) \
B(2)
rM(3)
Bn)
O)~
(,~ ~~  r 1"\' I~r J "'b.. 0 Ii v ..., ~ ~O ..
Lf~ 10 ....  I:J' Lt'> .... . t>.f '::...  . W;   ~ ""i~  I'YI .... .....   liT'! 6'11 v_
. :..::.  b..f 1bJ v_ ,.... ~
I"'i~ v I .....   8:Y  , .
~ t>.F :;  . ~ tAF ~  l.f~ I . .....  ~KJ ....  r a ~~
v_
 . 1bJ= I:;>,;  v ....  ~ ~~ . KJ v_ .    t:.J' ~~ ....
IAF ,
 "."... w :hf ~
1~ .... kY  I .....   . t:.J" .j8:Y 
~ .. tAf w; .  r_ oj
.y~
~~
I .....  _.
LL  PY~ v_
""'.,j f
\....L L.... f_ t.oJ:l \L  r tlrJb \J'V ~ ~~ . .
V ~~ '\:J'
 ~. ""_ ...... 
B
SO): inverting buffer group
M(il: multiplexer group
Figure 17. Schematic of 8 bit Barrel Shifter
Within a given area, optimization process is looking for the fastest
combination of transistor sizes. Process parameters for program transize are
as follows. These values are from 2J.l.rn CMOS Nwell process parameters
and the standard layout blocks for the barrel shifter design. These
parameters depend on both the process and the layout style.
39
CIB buffer input constant cap 0.005
;1 CIBn buffer input cap coefficient for Wn 0.0018
CIBp buffer input cap coefficient for Wp 0.0018
COB buffer output constant cap 0.012
COBn buffer output cap coefficient for Wn 0.0026
COBp buffer output cap coefficient for Wp 0.0027
CIOM mux input 0 constant cap 0.012
CI1M mux input 1 constant cap 0.011
CIMn mux input cap coefficient for Wn 0.0026
CIMp mux input cap coefficient for Wp 0.0027
COM mux output constant cap 0.006
COMn mux output cap coefficient for Wn 0.0022
COMp mux output cap coefficient for Wp 0.0025
Rnl low s.ignal resistance coefficient for 1/Wn 26
Rnh high signal resistance coefficient for 1IWn 104
Rph low signal resistance coefficient for l/Wp 56
Rpl high signal resistance coefficient for 1/Wp 156
AlB buffer constant area 480
AIM mux constant area 552
H cell height 24
Cx(O) cross over cap 0
Cx(1) cross over cap 0.006
Cx(2) cross over cap 0.013
Cx(3) cross over cap 0.025
Ax(O) cross over area 0
Ax(1 ) cross over area 384
Ax(2) cross over area 576
Ax(3) cross over area 1152
Table VI. transize Process Parameters
Comparison to other work
A program developed at the University of Illinois, iCONTRAST finds critical
delay paths using PERT and the delay graph, and uses the convex
optimization method. PERT was originally developed for schedule
management of big projects. Circuit designers employed this technique to
40
delay estimation [Kirkpatrick, 1966]. iCONTRAST converts the given CMOS
circuit to undirected graph. Gate channel of each MOS transistor is replaced
by a net, and the connecti,ons of MOS transistors are considered as nodes.
Using this graph, PERT finds overall delay estimation like schedule
management. iCONTRAST is a general circuit optimization tool, but transize
is dedicated to barrel shifter optimization. Therefore, iCONTRAST gives
individual transistor sizes. In designs with the standard layout blocks,
iCONTRAST may have lower efficiency. Most of timing and area
optimization tools can not consider layout skill or variation of layout scheme,
so direct comparison between the two tools is difficult. Through the
comparisons of computing time and tendency of results, relative comparison
is shown.
32 bit barrel shifter with transize is shown in table VII and figure 18.
Area Rising Falling
15512.17 1 11.0652 10.9000
15857.31! 9.1670 9.1417
16620.21 7.6333 7.7724
17151.37 7.2467 7.3332
17303.75 7.2155 7.2070
18157.72 6.8295 7.0140
18648.52 6.8045 6.8497
19031.86 6.7091 6.8406
Table VII. transize Result of 32 bit Barrel Shifter
41
12 1 1 1 I
l' Pl=1 II
10
U:: I I I I
~ ~l 1. Rilling
!. 9 t I "H . Falling
E
1= I I
+ 1 , j : • j
I
6 J I
I j I  J
15000 15500 16000 16500 17000 17500 18000 18500 19000 19500
At~ (.nero mata,'21
Figure 18. Area and Timing Relationship of 32 bit Barrel Shifter after
transize Optimization
8 bit barrel shifter with transize is shown in table VIII and figure 19.
Area Rising Falling
6383.54 6.5415 6.5248
6383.54 6.5415 6.5248
6406.63 6.3008 6.3413
6712.27 5.1399 5.1377
6861.72 4.8605 4.8876
7050.78 4.6667 4.6477
7317.75 4.4769 4.4842
7418.19 4.4467 4.4315
7972.75 4.3000 4.3349
7942.09 4.2875 4.3517
8116.04 4.3295 4.2883
Table VIII. transize Result of 8 bit Barrel Shifter
42
7
6.5
6
u::
o
i 5.5
E.. E
i=
5
4.5
4
I
I • _._ + ..
I \ I
i
,   
II ~
~ "  .
~  ....  .... "
"  I  I    I
l ~~
Fa~
6500 7CIXJ 7500 8500
Figure 19. Area and Timing Relationship of 8 bit Barrel Shifter after
transize Optimization
The 8 bit barrel shifter with iCONTRAST is shown in table IX and figure
20.
Area Td
1E+ 10 26.90
1E+10 12.00
2227.96 12.20:
876.94 15.50
694.63 17.20
691.93 19.40
664.85 23.10
638.33 22.70
625.97 25.50
Table IX. iCONTRAST Result of 8 bit Barrel Shifter
43
• +
+j
1100 13Xl 1500 1700 1900 2100 23)) 2500
1><f!1iJ (mao rrelfr"'2J
700 !!CO
I
.  I I I II
I
I
I
I
I
~ I I I I I,
~ I
\1

I ~ I
I~
r
II rr   f r
I I
r_r,
I I
I I _l I  "   
500
10
26
14
u 22
cu
Ul
o
c:
co
c:
Qi
E
f= 18
Figure 20. Area and Timing Relationship of 8 bit Barrel Shifter after
iCONTRAST Optimization
After optimization, iCONTRAST produced irregular size like 291lm, 131lm
and 20/lm when most of the other transistor sizes are 41lm as shown in
figure 21. It is not an efficient layout. In each stage, the size of the load is
almost the same for every transistor. Therefore, almost the same sizes of
transistors for each stage are expected. One possibl'e reason this happens is
the wrong critical delay path searching on transmission gate circuit.
Actually, the transmission gate has a bidirectional characteristic, but, in this
barrel shifter, only one of two transmission gates is on at any time and data
signals flow only one direction.
OIl
a••,,"  ....
44
Other Tr Size 4
Figure 21. Schematic of 8 bit Barrel Shifter with iCONTRAST Result
45
iCONTRAST might consider backward signal passing on transmission gate.
To verify this assumption, the transmission gates are replaced by clocked
inverters.
Area Td
2738.88 7.94
1363.37 9.99
1037.46 12.00
960.39 14.00
934.85 15.90
928.89 16.80
928.89 16.80
928.89 16.80
928.89 16.80
Table X. Clocked Inverter type 8 bit Barrel Shifter Result
15 +_
I
": 13 ;I
71__
soo
I
I
1000 1500
I
+2000
2500
 .
3000
i • T~ 1
Au. (mao ",.... '2)
Figure 22. Area and Timing Relationship of Clocked Inverter Type Barrel
Shifter after iCONTRAST Opt,imization
46
From the result of clocked inverter circuit, transistor sizes are almost regular
for each stage shown in figure 23.
Figure 23. Schematic of Clocked Inverter Type 8 bit Barrel Shifter with
iCONTRAST Result
47
It happens that iCONTRAST finds impractical critical delay paths that do
not consider signal direction in transmission gates. From the result in figure
23, iCONTRAST produced the expected output with signal direction control.
A CMOS circuit is represent in graph by iCONTRAST [Sapatnekar, 1993].
PERT uses a delay graph of MOS circuit [Chen, 1991]. In the delay graph,
iCONTRAST does not consider the signal direction or complementary
relationship of the two transmission gates in a multiplexer. Therefore, the
both transmission gates of a 2 to 1 multiplexer could be turned on at the
same time. Thus, there is a signal path from an input to another input.
However, this signal path does not exist in real circuit and is very long and
large resistive path.
There is a big difference of computing time between transize and
iCONTRAST. Table XI shows computing times for a barrel shifter executed
at a Sun SparcStation 20.
transize real 0.4
user 0.0
sys 0.1
iCONTRAST real 5:30.4
user 5:28.0
sys 0.7
Table XI. Computing Time to Optimize 8 bit Barrel Shifter
CHAPTER VI
SUMMARY AND FUTURE WORKS
Area and timing optimizations of barrel shifters have been investigated
using lagrangian multipliers within reasonable computing time. In order to
find optimal solution values quickly, MOS transistors have been replaced by
resistors and capacitors. The critical delay path was chosen by a heuristic
method for simple calculation. Standard layout bilocks were used in the
layout.
Using the law of diminishing returns, basic behaviors of area and time
tradeoffs for MOS transistor were observed. With lagrang.ian multipliers, the
barrel shifter was optimized. The results were compared to iCONTRAST's.
transize computes the optimal sizes faster than iCONTRAST does. Second,
the critical delay path searching in iCONTRAST found an impractical critical
delay path. The RC transistor model, lagrangian multiplier, and circuit
simplification of repeated structure contributed to the quick response time of
transize.
The optimized barrel shifter using iCONTRAST has an abnormally big
transistor. That is because iCONTRAST is looking for critical delay paths
without consideration of signal flowing direction in the transmission gates.
The barrel shifter with clocked inverters does not have that kind of
irregularity in MOS transistor size. There were some variations of transistor
48
49
size because that transistor had a slightly different load than others. Also,
circuits with repeated structure such as the barrel shifter can be more easily
layouted and verified when simplified circuit was used for optimization.
This study only dealt with optimization of a repeated structure circuit
such as the barrell shifter. Also, these results may not be applicable for
general circuits because the standard layout block was used. The RC
transistor model has less accuracy than newly developed model that take
into account nonlinearities.
This study gives a basic idea of why dedicated optimization too~ls are
needed. With better transistor models, improvement of critical delay path
finding rules, and linking to layout editor, better optimization tools could be
developed.
REFERENCES
Caisso, J., E. Cerny and N. C. Rumin. (May 1991). "A recursive technique
for computing delays in seriesparallel MOS transistor circuits," IEEE
Trans. on ComputerAided Design, vo!.l O(no. 5): pp.589595.
Chen, H. Y. and S. M. Kang. (Jan. 1991). "iCOACH: A circuit optimization
aid for CMOS highperformance circuits,1/ Integration, vol. 10: pp.185212.
Chu, C. and M. A. Horowitz. (Nov. 1987). "Chargesharing models for
switchlevel simulation,1/ IEEE Trans. on ComputerAided Design, vol.
CAD6(no.6): pp.l0531061.
Cirit, M. A. (June 1987). "Transistor sizing in CMOS circuits," Proc. of 24th
ACM/IEEE Design Automation Conference:pp. 121124.
Dai, Z. and K. Asada. (May 1989). "MOSIZ: A twostep transistor sizing
algorithm based on optimal timing assignment method for multistage
complex gates," Proc. of IEEE 1989 Custom Integrated Circuit
Conference: pp.17 .3.117.3.4.
Dutta, S., S. S. M. Shetti and S. L. lusky. (Aug. 1995). "A comprehensive
delay model for CMOS inverters," IEEE J. of SolidState Circuits,
voI.30(no.8): pp.864871.
Kirkpatrick, T. I. and N. R. Clark. (March 1966). "PERT as an aid to logic
design," IBM Journal of Res. and Devel., vol. 10: pp.135141 .
50
51
Hedenstierna, N. and K. O. Jeppson. (March 1987). "CMOS circuit speed
and buffer optimization," IEEE Trans. on ComputerAided Design, vol.
CAD6Ino.2): pp.270281.
Hedlund, K. S. IJune 1987). "Aesop: A tool for automated transistor sizing/'
Proceedings of 24th ACM/IEEE Desig1n Automation Conf.: pp.114120.
Lin, H. C. and L. W. Linholm. IApril 1975). "An optimized output stage for
MOS integrated circuits," IEEE J. of SolidState Circuits, vol.SC
10(no.2): pp.1 06109.
Lin, T. M. and C. A. Mead. IOct. 1984). "Signal delay in general RC
networks," IEEE Trans. on ComputerAided Design, voI.CAD3InoA):
pp.331349.
Marple, D. (June 1989). I'Transistor size optimization in the taHor layout
system," Proc. of 26th ACM/IEEE Design Automation Conference:
ppA348.
Ousterhout, J. K. (July 1985). "A switchlevel timing verifier for digital MOS
VLSI," IEEE Trans. on ComputerAided Design, voI.CAD4Ino.3):
pp.336349.
Rubinstein, J., P. Penfield and M. Horowitz. IJuly 1983). "Signal delays in
RC tree network," IEEE Trans. on Computeraided Design, voI.CAD2:
pp.202211.
52
Sakurai, T. and A. R. Newton. (April 1990). "Alphapower law MOSFET
model and its applications to CMOS inverter delay and other formulas,"
IEEE J. of SolidState Circuits, voI.25(no.2): pp.584594.
Sakurai, T. and T. lizuka. (Feb. 1985). "Gate electrode RC delay effects in
VLSI's," IEEE J. of SolidState Circuits, voI.SC20(no.1): pp.290294.
Sapatnekar, S. S. et al. (Nov. 1993). "An exact solution to the transistor
sizing problem for CMOS circuits using convex optimization," IEEE
Trans. on ComputerAided Design of Integrated Circuits and Systems,
voI.12(no.11): pp.16211634.
APPENDIX I
1. MATLAB Program 1
w2 (micro) , )
%g' ,wO*le6,wI*le6,w2*le6)
%g' ,wl*le6,Wdiff*le6)
wI (micro)
%g %g
%g
%" This program finds optimal transistor size
% of 3 stage inverter series
clear
clc
%" definitions
% wO gate width of first inverter
%" wI gate width of second inverter
%" w2 gate width of third inverter
% Ci interconnection capacitance
% Cgp normalized gate capacitance Cg'
% Rp normalized gate resistance R'
% Mtaup: differatiation of tau M
% wI=sqrt( Rp*wO*(Ci+Cgp*w2)/(Rp*Cgp+taup*wO)
%" Ap height of inverter
%" AO : minimum size of inverter
% Cdp : normalized drain capacitance Cd'
% Mtaup=Ap/ (3*AO+Ap* (wO+wI+w2) ) * (Rp/wO* (Cdp*wO+Ci+Cgp*wI)
+Rp/WI* (Cdp*wI+Ci+Cgp*w2) +Rp/w2* (Cdp*w2+Ci) )
% initialization
wO=lOe6;
wI=le6
w2=300e6;
Ci=18e15;
Cgp=1.68e9;
Rp=O.084;
Ap=25e6;
AO=625e12;
Cdp=O.95e9;
Wdiff=lOO;
sprintf(' wI (micro) Wdiff(micro) ')
while abs(Wdiff»leIO,
wIbef=wI;
Mtaup=Ap/(3*AO+Ap*(wO+wI+w2) )*(Rp/wO*(Cdp*wO+Ci+Cgp*WI)
+Rp/wI* (Cdp*wI+Ci+Cgp*w2)+Rp/w2* (Cdp*w2+Ci) );
wIaft=sqrt( Rp*wO*(Ci+Cgp*w2)/(Rp*Cgp+Mtaup*wO) ) j
Wdiff=wIbefwlaft;
wI=wIaft;
sprintf ('
end
sprintf(' wO(micro)
sprintf (' %g
end
2. MATLAB Program 2
% This program shows transistor size relationship among
% 3 stage inverter series
clear
53
54
clc
clg
% definitions
% wO gate width of first inverter
% wl gate width of second inverter
% w2 gate width of third inverter
% Ci interconnection capacitance
% Cgp normalized gate capacitance Cg'
% Rp normalized gate resistance R'
% Mtaup: differatiation of tau M
% wl=sqrt( Rp*wO*(Ci+Cgp*w2)/(Rp*Cgp+Mtaup*wO)
% Ap height of inverter
% AO : minimum size of inverter
% Cdp : normalized drain capacitance Cd'
% Mtaup=Ap/(3*AO+Ap*(wO+wl+w2))*(Rp/wO*(Cdp*wO+Ci+Cgp*wl)
+Rp/Wl* (Cdp*wl+Ci+Cgp*w2)+Rp/w2* (Cdp*w2+Ci) )
% initialization
Ci=l8e15;
Cgp=l.68e9;
Rp=O.084;
Ap=25e6;
AO=625e12;
Cdp=O.95e9;
N=5;
w2i=lOOe6;
w2f=lOOOe6;
w2d=(w2fw2i)/N;
for j=l:N,
w2=w2d*j+w2i;
M=50;
wOi=5e6;
wOf=50e6;
wOd=(wOfwOi)/M;
for i=l:M,
wO(i)=wOd*i+wOi;
wlc=le6;
Wdiff=lOO;
while abs(Wdiff»lelO,
wlbef=wlc;
Mtaup=Ap/(3*AO+Ap*(wO(i)+wlc+w2))*(Rp/wO(i)*(Cdp*wO(i)
+Ci+Cgp*wlc)+Rp/wlc*(Cdp*wlc+Ci+Cgp*w2)
+Rp/w2*(Cdp*w2+Ci) );
wlaft=sqrt(Rp*wO(i)*(Ci+Cgp*w2)/(Rp*cgp+Mtaup*wO(i))) ;
Wdiff=wlbefwlaft;
wlc=wlaft;
end
wl(i,j)=wlcj
end
end
plot (wO*le6,wl*le6)
xlabel('wO: width of first inverter')
ylabel('wl: width of second inverter')
%title(' wO, wI, and w2 (unit: micron) ,)
for i=I:N,
w2=w2d*i+w2ij
mm=sprintf('w2= %g' ,w2*le6) j
text(O.9*wOf*le6,wI(M,i)*le6,mm)
end
end
3. MATLAB program 3
% This program shows optimal size of middle inverter
clear
clc
clg
% definitions
% wO gate width of first inverter
% wI gate width of second inverter
% w2 gate width of third inverter
% Ci interconnection capacitance
% Cgp normalized gate capacitance Cg'
% Rp normalized gate resistance R'
% Mtaup: differatiation of tau M
% wI=sqrt( Rp*wO*(Ci+Cgp*w2)/(Rp*Cgp+Mtaup*wO)
% Ap height of inverter
% AO : minimum size of inverter
% Cdp : normalized drain capacitance Cd'
% Mtaup=Ap/(3*AO+Ap*(wO+wl+w2»*(Rp/wO*(Cdp*wO+Ci+Cgp*wl)
+Rp/WI* (Cdp*wI+Ci+Cgp*w2)+Rp/w2* (Cdp*w2+Ci) )
% Total delay
% tau=Rp/wO* (Cdp*wO+Ci+Cgp*Wl)+Rp/Wl* (Cdp*wl+Ci+Cgp*w2)
+Rp/w2*(Cdp*w2+Ci)
% Total area
% A=3*AO+Ap*(wO+wl+w2)
% initialization
Ci:18e15;
Cgp=1.68e9j
Rp=O.084;
Ap=25e6j
AO=625e12j
Cdp=O.95e9j
wO=lOe6;
M=IOi
w2i=100e6j
w2f=1000e6j
w2d=(w2fw2i)/Mi
for j=I:M,
w2=w2d*j+w2ij
N=IOOj
wId=(w2wO)/Nj
for i:l:N,
wl=wld*i+wOj
tau(i,j)=Rp/wO*(Cdp*wO+Ci+Cgp*wl)
55
+Rp/Wl*(Cdp*Wl+Ci+Cgp*w2)+Rp/w2*(Cdp*w2+Ci) i
A(i,j)=3*AO+Ap*(wO+wl+w2) ;
end
wdiff=lOOi
while abs(Wdiff»le10,
wlbef=wli
Mtaup=Ap/(3*AO+Ap*(wO+wl+w2) )*(Rp/wO*(Cdp*wO+Ci+Cgp*wl)
+Rp/Wl* (Cdp*wl+Ci+Cgp*w2)+Rp/w2* (Cdp*w2+Ci) );
wlaft=sqrt( Rp*wO*(Ci+Cgp*w2)/(Rp*Cgp+Mtaup*wO) );
Wdiff=wlbefwlafti
wl=wlaft;
end
tauopt(j)=Rp/wO*(Cdp*wO+Ci+Cgp*wl)+Rp/wl*(Cdp*wl+Ci+Cgp*w2)
+Rp/w2*(Cdp*w2+Cil;
Aopt(j)=3*AO+Ap*(wO+wl+w2) ;
end
plot (A*le12,tau*le9,Aopt*le12, tauopt*le9, ,*,)
xlabel('Area: total area of 3 inverters (unit:micro meter~2) ')
ylabel('Time: total time through 3 inverters(unit:nano sec) ')
for i=l:M,
w2=w2d*i+w2ii
mm=sprintf('w2= %g urn' ,w2*le6) i
text(A(1,i)*le12,tau(1,i)*le9,mm)
end
mm=sprintf ( 'wo= %g micro meter' , wO*le6) ;
text(A(N,M)/3*le12,tau(N,M)*le9,mm)
mm=sprintf('wl= from %g micro meter to the size of w2' ,wO*le6) i
text(A(N,M)/3*le12,tau(N,M)*le9*9/10,mm)
grid
end
4. MATLAB Program 4
% This program finds optimal transistor size of
% multiple stages of inverter series.
clear
clc
% definitions
% W(i): gate width of ith inverter
% ci interconnection capacitance
% Cgp : normalized gate capacitance Cg'
% Rp normalized gate resistance R'
% Mtaup: differatiation of tau M
% W(i)=sqrt( Rp*W(il)*(Ci+Cgp*W(i+l))/(Rp*Cgp+Mtaup*W(il))
% Ap height of inverter
% AO : minimum size of inverter
% Cdp : normalized drain capacitance Cd'
% Mtaup=Ap/(m*AO+Ap*(wO+wl+w2))*(Rp/wO*(Cdp*wO+Ci+Cgp*Wl)
+Rp/wl*(Cdp*Wl+Ci+Cgp*w2)+Rp/w2*(Cdp*w2+Ci) )
% M : Number of Inverter Stages
% initialization
M=7i
56
57
% size of the first inverter
W(1)=lOe6i
% size of the last inverter
W(M)=500e 6 i
% set other inverter's initial value
for i=2 :Ml,
w(i)=W(M) i
end
Ci=lBe15i
Cgp=1.68e9;
Rp=O.084;
Ap=25e6;
AO=625e12i
Cdp=O.95e9;
Wdiff=lOO i
Wold=Wi
while abs(Wdiff»lelO,
Area=M*AO+Ap*sum(W) i
Mtaup=O;
for i=2:Ml,
Mtaup=Mtaup+Rp/W(il)*(Cdp*W(il)+Ci+Cgp*W(i) ;
end
Mtaup=Mtaup+Rp/W(M) * (Cdp*W(M)+Ci) ;
Mtaup=Ap/Area*Mtaup;
for i=2:Ml,
W(i)=sqrt(Rp*W(il)*(Ci+Cgp*W(i+l»/{Rp*Cgp+Mtaup*W(il)));
end
Wdiff=O;
for i=2:Ml,
Wdiff=Wdiff+abs(W(i)Wold(i))i
end
Wold=W;
W*le6
end
W*le6
end
5. MATLAB Program 5
% This program evaluates optimal transistor size.
clear
clc
clg
% definitions
% W(i) gate width of ith inverter
% ci interconnection capacitance
% Cgp normalized gate capacitance Cg'
% Rp normalized gate resistance R'
% Ap height of inverter
% AO minimum size of inverter
% Cdp normalized drain capacitance Cd'
% M Number of Inverter Stages
~ initialization
Ci=18e15i
Cgp=1.68e9i
Rp=O.08 4 i
Ap=25e 6 i
AO=625e12i
Cdp=O.95e 9 i
M=7i
W(1)=lOe6i
W(M)=500e6i
~ N : total number of forced error points
N=lOOi
% set optimized value of inverters
for i=2:Ml,
if i==2,
mm=sprintf('Size of %g nd inverter in micron 'ti);
elseif i==3,
mm=sprintf('Size of \g rd inverter in micron 'ti) i
else
mm=sprintf('Size of %g th inverter in micron ',i) i
end
xx=input (mm) i
W(i)=xx*le6i
end
Worg=Wi
tau=Oi
for i=l:Ml,
tau=tau+Rp/W(i)*(Cdp*W(i)+Ci+Cgp*W(i+l» i
end
tau=tau+Rp/W(M)* (Cdp*W(M)+Ci) i
Area=Oi
for i=l:M,
Area=Area+Ap*W(i) i
end
Area=Area+M*AOi
Topt=taui
Aopt=Areai
% set initial value of other inverter
for j=2:Ml,
W=Worgi
for i=l:N,
W(j)=O.5*Worg(j)+Worg(j)*i/Ni
tau=Oi
for k=l:Ml,
tau=tau+Rp/W(k)* (Cdp*W(k)+Ci+Cgp*W(k+l») i
end
tau=tau+Rp/W(M) * (Cdp*W(M)+Ci) i
Area=Oi
for k=l:M,
Area=Area+Ap*W (k) i
end
Area=Area+M*AOi
58
T(i,jl)=taui
A(i,jl)=Areai
end
end
plot(A*le12,T*le9,Aopt*le12,Topt*le9, '*')
xlabel('Area: total area of All inverters (unit:micro meterA 2) ')
ylabel('Time: total time through All inverters (unit:nano sec) ,)
grid
end
6. MATLAB Program 6
% This program finds optimal number of stages
clear
clc
clg
% definitions
% W(i) gate width of ith inverter
% Ci interconnection capacitance
% Cgp normalized gate capacitance Cg'
% Rp normalized gate resistance R'
% Mtaup: differatiation of tau M
% w(i)=sqrt( Rp*w(il)*(Ci+Cgp*w(i+l)/(Rp*Cgp+Mtaup*w(il))
% Ap height of inverter
% AD : minimum size of inverter
% Cdp : normalized drain capacitance Cd'
% Mtaup=Ap/(m*AD+Ap*(wD+wl+w2»*(Rp/wO*(Cdp*wO+Ci+Cgp*wl)
+Rp/wl*(Cdp*wl+Ci+Cgp*w2)+Rp/w2*(Cdp*w2+Ci) )
% M : Number of Inverter Stages
% initialization
for M=3: 20,
% size of the first inverter
W(1)=10e6;
% size of the last inverter
W(M)=500e6i
% set other inverter's initial value
for i=2:Ml,
W(i)=W{M)i
end
Ci=18e15;
Cgp=1.68e 9 i
Rp=O. 084 i
Ap=25e6;
AO=625e12;
Cdp=O.95e9;
Wdiff=lOO;
Wold=Wi
while abs(Wdiff»le10,
Area=M*AO+Ap*sum(W) ;
Mtaup=O;
for i=2:Ml,
Mtaup=Mtaup+Rp/W(il)*{Cdp*W(il)+Ci+Cgp*W(i») ;
59
end
Mtaup=Mtaup+Rp/W(M) * (Cdp*W(M) +Ci) i
Mtaup=Ap/Area*Mtaup;
for i=2:M1,
W(i)=sqrt( Rp*W(i1)
* (Ci+Cgp*W(i+1)/(Rp*Cgp+Mtaup*W(i1» );
end
Wdiff=O;
for i=l:M,
Wdiff=Wdiff+abs(W(i)Wold(i) i
end
Wold=W;
W*le6i
end
M;
W*le6;
for i=l:M,
Wopt(M2,i)=W(i) ;
end
tau=Oi
for i=1:M1,
tau=tau+Rp/W(i) * (Cdp*W(i)+Ci+Cgp*W(i+1» j
end
tau=tau+Rp/W(M)* (Cdp*W(M)+Ci) ;
Area=O;
for i=l:M,
Area=Area+Ap*W(i) ;
end
Area=Area+M*AO;
Topt(M2)=tau;
Aopt(M2)=Area;
end
plot (Aopt*le12,Topt*le9,Aopt*le12,Topt*le9, '*')
for i=1:10,
mm=sprintf('%g' ,i+2);
text(Aopt(i)*le12*29/30,Topt(i)*le9*21/20,mm) ;
end
for i=11:18,
mm=sprintf('%g' ,i+2);
text(Aopt(i)*le12,Topt(i)*le9*19/20,mm) ;
end
xlabel('Area: total area of All inverters(unit:micro meter A 2) ')
ylabel('Time: total time through All inverters (unit:nano sec) ')
grid
end
60
APPENDIX II
This program is the barrel shifter optimizer, transize.
1. Input Data File Sample
0.06 0.014 0.014
0.1 0.05 0.05
0.2 0.25 0.05 0.05
0.2 0.05 0.05
50 300
100 600
408
480
24
3
111 1
111
0.1 0.2 0.4
480 576 1152
4 8 8 20
4 8 8 20
488
488
0.5
0.1
0.0
0.0
7500
0.5
5
50
100
2. Makefile
HOME =
o FILES main.o \
readinit.o \
update.o \
neww.o \
iterate.o \
de1ay.o \
area.o
BINARY transize
LIBS 1m
I FLAGS
61
62
CFLAGS
CC cc
$ (BINARY) : $ (O_FILES)
$(CC) $ (CFLAGS) $ (IFLAGS) 0 $ (BINARY) $ (O_FILES) $ (GENLIB)
$ (LIBS)
.c.o:
$(CC) $ (CFLAGS) $ (IFLAGS) c $<
transize.h
install:
cp $ (BINARY) $(HOME)/bin
3. transize.h
#define MAXSTAGES 10
/* capacitance coefficients */
float CIBi /*buffer input constant cap*/
float CIBni /*buffer input cap coeff for Wn*/
float CIBPi /*buffer input cap coeff for Wp*/
float COBi /*buffer output constant cap*/
float COBn; /*buffer output cap coeff for Wn*/
float COBPi /*buffer output cap coeff for Wp*/
float CIOMi /*MUX input 0 constant cap*/
float CI1Mi /*MUX input 1 constant cap*/
float CIMn; /*MUX input cap coeff for Wn*/
float CIMp; /*MUX input cap coeff for Wp*/
float COM; /*MUX output constant cap*/
float COMn; /*MUX output cap coeff for Wn*/
float COMp; /*MUX output cap coeff for Wp*/
/* resistance coefficients */
float Rnl; /*low signal resistance coeff for l/Wn*/
float Rnh; /*high signal resistance coeff for l/Wn*/
float Rpl; /*low signal resistance coeff for l/Wp*/
float Rph; /*high signal resistance coeff for l/Wp*/
/* computed quantities */
float CB[MAXSTAGES]; /*buffer output cap for each stage*/
float CM[MAXSTAGES] i /*MUX output cap for each stage*/
float Cx[MAXSTAGES]; /*crossover cap for each stage*/
float RBph[MAXSTAGES]; /*buffer pFET high signal resistance for each
stage*/
float RBnl[MAXSTAGES] i /*buffer nFET low signal resistance for each
stage*/
float RMh[MAXSTAGES]; /*MUX parallel FET high signal resistance for each
stage*/
stages*/
number of inversions from input */
no inverter buffer, 1 inverter buffer present
63
float RMl[MAXSTAGES] i /*MUX parallel FET low signal resistance for each
stage*/
float alphaA; /*Lagrange multiplier for area constraint*/
float alphat; /*Lagrange multiplier for equal time delays*/
float WBn[MAXSTAGES+l]; /*nFET widths for each buffer stage*/
float WBp[MAXSTAGES+l]; /*pFET widths for each buffer stage*/
float WMn[MAXSTAGES] i /*nFET widths for each MUX stage*/
float WMp[MAXSTAGES] i /*pFET widths for each MUX stage*/
float dalphaA; /*change in Lagrange multiplier for area constraint*/
float dalphat; /*change in Lagrange multiplier for equal time delays*/
float dWBn[MAXSTAGESJ; /*change in nFET widths for each buffer stage*/
float dWBp[MAXSTAGES]; /*change in pFET widths for each buffer stage*/
float dWMn[MAXSTAGES]; /*change in nFET widths for each MUX stage*/
float dWMp[MAXSTAGES]; /*change in pFET widths for each MUX stage*/
float Altot; /* total constant area */
/* circuit structure */
int N; /*actual number of
int parity [MAXSTAGES] ; /*
int B[MAXSTAGES]; /* 0
*/
int M[MAXSTAGES]; /* 0 no MUX, 1 = MUX present */
float AO; /* desired circuit area for one bit path*/
float H; /* cell height */
/* convergence parameters */
float dWmax; /*max change in W's in an iteration*/
float Wtol; /*tolerance for W's: dWmax < Wtol */
float ttol; /*tolerance for diff between tr and tf */
float Atol; /*tolerance for area*/
int maxiter; /*maximum number of iterations allowed */
float dWs; /*convergence solving*/
int dWf; /*convergence solving flag*/
4. main.c
#include <stdio.h>
#include <math.h>
#include "transize.h"
main(argc, argv)
int argc;
char *argv[] ;
{
void readint();
void update();
void iterate();
void delay() ;
float area();
FILE *fPi
int i,iterW,itera,iterA;
64
char ans[10];
float tr,tf; /* rise and fall time delay */
float A; /* area */
float alphatnew,alphatold,delt,deltold;
float alphaAnew,alphaAold,delA,delAold;
int beenO;
float delAO;
if (argc > 2) {
fprintf(stderr, "usage:
fprintf(stderr," or
exit (1) ;
}
if (argc == 1)
fp stdini
%s \n", argv [0] ) ;
\s filename\n", argv[O]);
}
else
if (fp=fopen(argv[l],lr")) == NULL) {
fprintf(stderr, "cannot open file: %s\n",argv[l]);
exit(l) i
}
readinit (fp) i
update () ;
delay(&tr,&tf)i
A = area() i
printf{"\ninitial rise time %f, initial fall time %f\ninitial area
%f\n",tr,tf,A) ;
delA = Atol + Atol;
iterA 0;
beenO = 0;
while ( (fabs(delA) > Atol) && (iterA < maxiter) ) {
#ifdef DEBUG
printf("\niterA = %d, alphaA = %f, delA = %f, alphaAold tf,
delAold =%f\n",iterA,alphaA,delA,alphaAold,delAold);
#endif
if (iterA == 1) {
#ifdef DEBUG
printf("iterA 1 branch\n");
#endif
if (dalphaA == 0.0) exit(O);
alphaAold = alphaA;
alphaA = alphaA + dalphaA;
delAold delAi
}
if (iterA > 1) {
#ifdef DEBUG
printf("iterA > 1 branch\n");
#endif
if (delA == delAold) {
fprintf(stderr, "error: infinite alphaA!\n") i
65
exit (1) ;
}
alphaAnew = (alphaAold*delA  alphaA*delAold}/(delA  delAold);
if (alphaAnew <= 0.0) {
if (beenO) {
alphaAnew = (alphaA*delAO}/(delAO  delA);
if (alphaAnew <= 0.0) {
fprintf(stderr,"error: negative alphaA, choose smaller area
or use min delay area.\n");
exit (1) ;
}
else alphaAnew = 0.0;
}
alphaAold = alphaA;
alphaA = alphaAnew;
delAold = delA;
}
#ifde f DEBUG
printf("\niterA = %d, alphaA = %f, delA = %f, alphaAold H,
delAold =%f\n",iterA,alphaA,delA,alphaAold,delAold);
#endif
iterA++;
printf("\n*****************************************\n\nalphaA(td)
%f\n",iterA,alphaA) ;
itera = 0;
delt = ttol + ttol;
while ( (fabs ldelt) > ttol) && (itera < maxiter) ) {
#ifdef DEBUG
printf("\nitera = td, alphat = tf, delt = tf, alphatold tf,
deltoId =%f\n",itera,alphat,delt,alphatold,deltold);
#endif
if (itera == 1) {
#ifdef DEBUG
printf("itera 1 branch\n");
#endif
if (dalphat == 0.0) break;
alphatold = alphati
alphat = alphat + dalphati
deltoId delt;
}
if (itera > 1) {
#ifdef DEBUG
printf("itera > 1 branch\n") i
#endif
if (delt == deltoId) {
fprintf(stderr,"error: infinite alphat!\n") i
exit (1) ;
}
alphatnew
alphatold
(alphatold*delt  alphat*deltold)/(delt  deltoid);
alphat;
66
alphat = alphatnewi
deltoid = delti
}
#ifde f DEBUG
printf("\nitera = %d, alphat = H, delt = H, alphatold H,
deltold =%f\n",itera,alphat,delt,alphatold,deltold) i
#endif
itera++i
printf("\n*****************************************\n\nalphat(%d)
%f\n",itera,alphat) i
#ifdef DEBUG
printf("\nDo you wish to continue? (n) :") i
scanf ("%s", ansI i
if (ans[O] != 'yO) exit(O)i
#endif
if ( (alphat <= 0.0) I I
fprintf{stderr, "error:
rangel \n", alphat) i
exit (I) i
(alphat >= 1.0)
alphat = %f is out of valid
&& (iterW c maxiter) ) {
0;
Wtol + Wtol;
(dWmax > Wtol)
iterW
dWmax
while (
#ifdef DEBUG
printf (" \n buffer nFET widths are: \n") i
for (i=Oiic=Nii++) printf(" %f",WBn[i]) i
printf("\n buffer pFET widths are: \n") i
for (i=Oiic=Nii++) printf{" %f",WBp[i]);
printf("\n MUX nFET widths are: \n");
for (i=OiicNii++) printf(" %f",WMn[i]);
printf("\n MUX pFET widths are: \n") i
for (i=OiicNii++) printf(" %f",WMp[i]) i
delay(&tr, &tf);
printf("\nrising input delay = %f, falling input delay
%f\n",tr,tf) i
#endif
iterW++;
#ifdef DEBUG
printf("\niteration %d\n",iterW);
#endif
iterate() ;
update() i
#ifdef DEBUG
printf("\ndWmax %f\n" ,dWmax) ;
#endif
}
printf("\n\nFinal widths\n"} i
printf("\n buffer nFET widths are: \n") i
for (i=O i ic=N i i++) printf (" %f", WBn [i] ) ;
67
printfl"\n buffer pFET widths are: \n");
for li=O;i<=N;i++) printfl" H",WBp[i]);
printf("\n MUX nFET widths are: \n");
for (i=O;i<N;i++) printf(" tf",WMn[i]);
printfl"\n MUX pFET widths are: \n");
for (i=O;i<N;i++) printf(" \f",WMp[i]l;
delay(&tr,&tf) ;
printfl"\nrising input delay = tf, falling input delay
%f\n",tr,tf) ;
A = areal) ;
printf I "\nArea = %f\n" ,A) i
delt = tf  trj
if ldWmax > Wtol) {
printfl"\n\nerror: convergence failed. Increase iterW or
Wtol. \n") ;
exit(l) j
}
if ( (dalphat != 0.0) && lfabs(delt) > ttol) ) {
printfl"\n\nerror: convergence failed. Increase itera or
ttol. \n");
exit II) ;
}
delA = A  AO;
if ( (alphaA == 0.0) && (beenO
delAO delA;
beenO = Ii
0) ) {
}
if ( (dalphaA != 0.0) && (fabs(delA) > Atol) I (
printfl"\n\nerror: convergence failed. Increase iterA or Atol.\n") i
exit (1) ;
5. area.c
#include "transize.h"
float area()
{
float Ai
int i;
A = Altot;
for (i=O;i<N;i++) {
if (B[i]) {
A += H*(WBn[i] + WBp[i]);
}
if 1M [i]) {
A += H*(WMn[i] + WMp[i]);
}
#ifdef DEBUG
printf("\nA
#endif
return{A) ;
6. delay.c
%f\n",A);
68
#include "transize.h"
void delay(tr,tf)
float *tr, *tf;
{
float tri,tfi;
int i,j;
float f,p,q;
float Rh,Rl;
f = 0.0;
*tr = 0.0;
*tf = 0.0;
for (i=O;i<N;i++J
tri = 0.0;
tfi = 0.0;
if (B [i]) {
tri RBph[i}*CB[i};
tfi = RBnl [i] *CB [i] ;
}
Rh = 0.0;
Rl = 0.0;
for (j=i;;j)
if (M[j)J{
Rh += RMh[jJ;
Rl += RMl[j] ;
}
if (B[j)){
Rh += RBph[jJ ;
Rl += RBnl[j];
break;
}
#ifdef DEBUG
printf("i %d, Rh
#endif
if (M[iJ) {
tri += Rh*CM[i] ;
tfi += Rl*CM [i] ;
H, Rl %f\n", i,Rh,Rl) ;
}
if (parity [i]) { /* odd * /
*tr *tr + tfi;
*tf = *tf + tri;
p alphat;
q 1.0  alphati
}
else { /* even */
*tr = *tr + tri;
*tf = *tf + tfi;
P 1.0  alphati
q = alphat;
}
#ifdef DEBUG
f += p*tri + q*tfii
printf("delay partial f %"f\n" , f) i
#endif
}
#ifdef DEBUG
printf("\nf %"f, tr %C tf %f\n",f,*tr,*tf};
#endif
}
7. iterate.c
#include "transize.h"
void iterate(}
{
int i,j;
float p,q;
float a,b; /* d2f/dW2 and df/dW */
float f;
float CL,Rh,Rl;
float alphaxht;
alphaxht = alphaA*Hi
#ifdef DEBUG
f = alphat*RBph[O]*CB[O] + (1.0  alphat)*RBnl[O]*CB[O];
#endif
for (i=O;i<N;i++) {
#ifdef DEBUG
printf("\n i = %d\n",i};
#endif
if (parity{i]) { /* odd */
P = alphat;
q = 1.0  alphat;
else { /* even */
p 1.0  alphat;
q = alphat;
69
}
#ifdef DEBUG
printf ("p
#endif
H, q %f\n" ,p,q) i
/* inverter buffer width changes */
if ( (i==O) II (B [i]
dWBn[i] 0;
dWBp[i] 0;
}
else {
CL = CB[i] + CM[i];
for (j=i+l; ;j++) {
if (B[j]) break;
CL += CM[jJ;
}
Rh = 0.0;
Rl = 0.0;
for (j=il;;j)
if (M[j]) {
Rh += RMh [j] j
Rl += RMl[j] j
}
if (B[j]) {
Rh += RBph[jJ ;
Rl += RBnl[jJ;
break;
0) ) {
70
COBp) ;
+ q*RBnl[il*COBp;
}
#ifdef DEBUG
printf("Rh = %f, Rl = %f\n",Rh,Rl);
#endif
a (q+q)*RBnl[iJ*RBnl[i]/Rnl*(RBnl[i]/Rnl*CL  COBn);
b alphaxht + q*Rh*CIBn + p*Rl*CIBn +
p*RBph[i]*COBn + q*( RBnl[i]*COBn  RBnl [i]*RBnl[iJ *CL/Rnl } j
#ifdef DEBUG
printf ("df/dWBn = %f, d2f/dWBn2 =%f\n", b, a) ;
#endif
dWBn[i] = b/a;
a (p+p) *RBph[i] *RBph[i]/Rph* (RBph[iJ/Rph*CL
b = alphaxht + q*Rh*CIBp + p*Rl*CIBp +
p*( RBph[i]*COBp  RBph[iJ*RBph[i]*CL/Rph
#ifdef DEBUG
printf(rodf/dWBp = H, d2f/dWBp2 =%f\n",b,a);
#endif
dWBp[iJ = b/a;
#ifdef DEBUG
f += p*RBph[i)*CB[iJ + q*RBnl[i)*CB[i];
printf ("partial f = %f\n", f) ;
#endif
}
1* MUX width changes *1
if ( M[i) == 0 } {
dWMn [i) 0;
dWMp[i) = 0;
}
else {
CL = CM[i] ;
for (j=i+liij++) {
if (B[j)) break;
CL += CM[j);
}
Rh = 0.0;
Rl = 0.0;
for (j =i ; ; j   ) {
if ( (M[j] != 0) && (j != i) ) {
Rh += RMh[j] i
Rl += RMl[j);
}
if (B[j)) {
Rh += RBph[j) i
Rl += RBnl[j) i
break;
}
#ifdef DEBUG
printf(rrRh = %f, Rl = H\nrr,Rh,Rl);
#endif
a (p+p)*RMh[i)*RMh[i]/Rnh*(RMh[iJ/Rnh*CL  COMn) +
(q+q) *RMl[i) *RMI [i]/Rnl* (RMl[i)/Rnl*CL  COMn) i
b alphaxht + p*( Rh*(ClMn + CIMn + COMn) + RMh[i]*COMn RMh[
i]*RMh[i]/Rnh*CL) +
q*( Rl*(CIMn + ClMn + COMn) + RMl[i]*COMn RMl[
i)*RMl[i]/Rnl*CL );
#ifdef DEBUG
printf(rrdf/dWMn H, d2f/dWMn2 =H\n",b,a);
#endif
dWMn[i] = b/ai
a (p+p)*RMh[i]*RMh[iJ/Rph*(RMh[i]/Rph*CL  COMp) +
(q+q)*RMl [i]*RMl[i]/Rpl* (RMI [i]/Rpl*CL  COMp);
b alphaxht + p*( Rh*(CIMp + ClMp + COMp) + RMh[i]*COMp RMh[
i]*RMh[i)/Rph*CL ) +
q*( Rl*{CIMp + ClMp + COMp) + RMl[i]*COMp RMl[
iJ*RMl[i)/Rpl*CL );
#ifdef DEBUG
printf(rrdf/dWMp H, d2f/dWMp2 =H\n",b,a);
#endif
dWMp[il = b/a;
#ifde f DEBUG
f += p* (Rh + RMh [i] ) *CM [i] + q* (Rl + RMI [i]) *CM [i] ;
printf(rrpartial f = %f\n",f);
#endif
}
}
#ifdef DEBUG
printf (" f %f\n" I f) i
#endif
}
71
8. neww.c
#include <math.h>
#include "transize.h"
float neww(W/dW)
float W,dWi
{
float Wmin 4.0i
W = W + dWj
if (W < Wmin) {
dW = dW + Wmin Wi
W = Wmini
}
dW = fabs (dW) ;
if (dW > dWmax) dWmax dW;
return (W) i
9. readinit.c
/* reads constants and initial values for variables*/
#include <stdio.h>
#include "transize.h"
readinit(fp)
FILE *fp;
{
int i;
float AIB,AIM,Ax[MAXSTAGES] ;
printf("buffer input cap coeffs CIB, CIBn, CIBp: II);
fscanf(fp,"%f %f %f",&CIB,&CIBn,&CIBp) i
printf("ClB = %f, ClBn = %f , ClBp = tf\n",ClB,ClBn,ClBp);
printf("buffer output cap coeffs COB, COBn, COBp: II);
fscanf(fp,"%f %f tf",&COB,&COBn/&COBp) i
printf("COB = %f, COBn = tf , COBp = %f\n",COB,COBn,COBp);
printf("MUX input cap coeffs ClOM, CllM, ClMn, ClMp: ");
fscanf(fp,"%f %f %f %f",&ClOM,&CI1M,&ClMn,&ClMp) j
printf ("ClOM = H, CllM = H, ClMn = %f, CIBp =
H\n", ClOM, CIlM, ClMn, ClMp) i
printf("MUX output cap coeffs COM, COMn, COMp: II) i
fscanf(fp,"%f %f tf",&COM,&COMn,&COMp) i
printf("COM = H, COMn = H, COMp = H\n",COM,COMn,COMp)i
printf("nFET res coeffs Rnl, Rnh: II) i
fscanf(fp,"%f %f",&Rnl,&Rnh);
printf("Rnl = %£, Rnh = %f\n",Rnl,Rnh);
72
for (i=O;i<=N;i++l fscanf(fp,"%f",&WBp[i]l;
printf("WBp = "l; for (i=O;i<=N;i++l printf(" %f",WBp[i]l;
printf ("\n"l;
printf("initial widths for MUX nFETs: \n");
for (i=O;i<N;i++l fscanf(fp, "%f",&WMn[i]);
printf("WMn = "l; for (i=O;i<N;i++) printf(" %f",WMn[i]);
printf("\n") ;
printf("initial widths for MUX pFETs: \n"l i
for (i=Oii<N;i++l fscanf(fp,"%f",&WMp(ill;
printf("WMp = "l; for (i=Oii<N;i++) printf(" tf",WMp[ill;
printf("\n"l;
printf("initial lagrange multiplier for propagation time: "l i
fscanf(fp,"%f",&alphatl;
printf("alphat = tf\n",alphatl;
printf("initial change in lagrange multiplier for propagation time:
"l;
fscanf (fp, "%f", &dalphatl ;
printf("dalphat = tf\n",dalphatl;
printf("initial lagrange multiplier for area: "l;
fscanf (fp, "%f", &alphaAl ;
printf ("alphaA = %f\n", alphaAl ;
printf("initial change in lagrange multiplier for area: ");
fscanf {fp, ''If'', &dalphaAl ;
printf ("dalphaA = %f\n", dalphaAl ;
printf("desired circuit area: \n");
fscanf(fp, "%f",&AOl;
printf("AO = %f\n"/AO);
printf("W tolerance: \n ll
);
fscanf (fp, "%f" / &Wtol) ;
printf ("Wtol = %f\n", Wtol) ;
printf("tr,tf diff tolerance: \n");
fscanf(fp, "%f",&ttol) i
printf("ttol = tf\n",ttol);
printf("area tolerance: \n"l;
fscanf (fp, "%f", &Atoll;
printf("Atol = %f\n",Atoll;
printf ("max iterations: \n");
fscanf (fp, lI%d", &maxiterl ;
printf ("maxiter = %d\n" / maxiterl ;
AI tot = 0.0;
74
for (i=Oji<NjiH) {
dWBn[i] = 0.0;
dWBp[i] 0.0;
dWMn [i] 0.0;
dWMp [i] 0 . 0 ;
CB [i] = 0.0 j
CM [i] = 0.0 j
if (i) {
if (B[i]) parity[i] = 1  parity[i1] j
else parity[i] = parity[il] j
}
else parity[O] = 1;
if (B[i]) Altot += AIBj
Altot += Ax[i) j
if (M[i]) Altot += AIM;
#ifdef DEBUG
printf("parity = II); for (i=Oji<Nji++) printf(" %d",parity[i]);
printf ( "\n II) j
printf("Altot = %f\n"/Altot) j
#endif
}
10. update.c
#include "transize.h"
void update ( )
{
float neww () ;
int i;
/*alphat = alphat + dalphat;*/
dWmax = O.Oj
for (i=Oji<Nji++) {
WBn[i] neww(WBn[i] ,dWBn[i]) j
WBp[i] neww(WBp[i] ,dWBp[i]);
WMn[i] neww(WMn[i] ,dWMn[i]) j
WMp[i] neww(WMp[i] ,dWMp[i]);
}
for (i=O;i<Nji++) {
RBph[i] = Rph/WBp[i] j
RBnl[i] = Rnl/WBn[i];
RMh[i] = 1.0/(WMn[i]/Rnh + WMp[i]/Rph) j
RMl[i] = 1.0/(WMn[i]/Rnl + WMp[i)/Rpl);
if (B [i]) {
CB[i) = COB + COBn*WBn[i) + COBp*WBp[i) + Cx[i];
if (M [i]) CB [i] += ClOM + CIlM + 2. O*Cl.Mn*WMn [i] +
2.0*CIMp*WMp[i) j
else CB(i] += ClB + ClBn*WBn[i+l] + CIBp*WBp[i+1);
75
if (M[i]) {
CM[i] = COM + COMn*WMn[i] + COMp*WMp[i] i
if (B[i+l]) CM[ili += CIB + CIBn*WBn[i+l] + CIBp*WBp[i+l] i
else CM[i] += Cx[i] + CIOM + CIlM + 2.0*CIMn*WMn[i+l] +
2.0*CIMp*WMp[i+l] i
}
76
VITA ?
Byungha Joo
Candidate for the Degree of
Master of Science
Thesis: TIMING AND AREA OPTIMIZATION OF CMOS BARREL SHIFTER
Major Field: EI!ectrical Engineering
Biographical:
Personal Data: Born in Busan, Korea, November 19, 1965, son of Mr.
Jongheon Joo and Mrs. Kyungja Cho
Educational: Graduated from Posung High School, Seoul, Korea in
February 1984; received Bachelor of Science from Yonsei
University, Seoul, Korea in February 1988; completed requirements
for Master of Science at Oklahoma State University in May, 1996.
Professional Experience: Researcher, (19881989) SAMSUNG
Semiconductor and Telecommunication, KyungkiDo, Korea
Researcher, (198919911 SAMSUNG Electronics, KyungkiDo,
Korea
Associate Staff Researcher, (19911993) SAMSUNG Electronics,
KyungkiDo, Korea