REINFORCEMENT LEARNING CONTROL APPLIED
TO DIESEL ENGINES
By
ORLANDO DE JESUS
Engineer in Electronics
Universidad Simon Bolivar
Caracas, Venezuela
1985
Project Management Specialist
Universidad Simon Bolivar
Caracas, Venezuela
1992
Submitted to the Faculty of the
Graduate College of the
Oklahoma State University
in partial fulfillment of
the requirements for
the Degree of
MASTER OF SCIENCE
December, 1998
REINFORCEMENT LEARNING CONTROL APPLLED
TO DIESEL ENGINES
Thesis Approved.:
__[jJ~LIS ~ ie!
Dean of the Graduate College
/I
ACKNOWLEDGMENTS
I would like to thank my major adviser, Dr. Martin Hagan, for his guidance,
encouragement, patience and friendship.
My thanks also goes to Salim A. Jaliwala ofCummins Engine Company for his
support in this endeavor.
I would like to express my deep appreciation to my wife, Marisela, for her years of
loving support, encouragement and patience. I wish to thank my parents Manuel and Nelli
for all they have done to support all my years of study and work, especially the
encouragement for this one. Special thanks to my family Jose, Gi1berto, Pedro, Emesto,
Nelli, Juan, Genesis, Elier, Daniela, Debbi, Maria de los Angeles, Belkys, Debora, Elia,
Maria, Jorge.
Special regards to my friends from AETI c.A. 1 hope their efforts to maintain the
company will success. Thanks to my friends for your support Aitzol, Xabier, Hector,
Manuel, Magaly, Francis, Carmen Elena, Felipe, Fidelia, Jose Abraham, Jose Luis,
Yamirca, Noe, Rossany, Victor, Tamar, Gustavo, Sandra, Antonio, Carmen, Alejandro,
Adriana, Elsa, Rene, Paloma, Francisca, Hanz, Eliecer, Janice, Marta, Daniel.
iii
TABLE OF CONTENTS
CHAPTER 1
INTRODUCTION 1
CHAPTER 2
DIESEL ENGINE OPERATION .4
2. 1. Historical review 4
2.2. Classification of the diesel engines 4
CHAPTER 3
ENGINE MODEL 11
CHAPTER 4
SELFTIJNJNG REGuLATOR APPLIED TO DIESEL ENGINES 20
4.1. Introduction 20
4.2. Pole Placement Design 21
4.3. Parameter estimation 25
4.4. Parameter estimation for the diesel engine 32
4.5. Controller design .42
4.6. Adaptive control. 50
CHAPTERS
GENETIC REINFORCEMENT LEARNING FOR DIESEL ENGINES
CONTROL 57
5.1. Introduction 57
5.2. GENITOR Algorithm 57
5.3. Model Description 63
5.4. Genetic Reinforcement Learning applied to prD controller 64
5.5. Initial results 64
5.6. GENITOR applied to analog PID controller 68
5.7. Genetic Algorithms applied to digital controllers 100
5.8. GENITOR algorithm applied to digital controller and engine with
friction b1=0.1229 101
5.9. GENITOR algorithm applied to digital controller and engine with
friction bl=O.Ol 130
5.10. Summary 151
iv
CHAPTER 6
REINFORCEMENT LEARNING 152
6.1. Introduction 152
6.2. Reinforcement learning framework 153
6.3. Elements of Reinforcement Learning 156
6.4. Actions 158
6.5. Rewards and discount factor 159
6.6. Markov property 160
6.7. Markov decision process (MDP) 160
6.8. Types of expected cost functions 161
6..90pt'ImaI costtogo funct'lons 164
6.10. Elementary Solution Methods 166
6.11. Unified algorithms 186
6.12. GradientDescent Methods 191
6.13. Conclusions 214
CHAPTER 7
APPLYING REINFORCEMENT LEARNING TO THE DIESEL ENGINE. 216
7.1. Learning a speed transition 217
7.2. Learning a speed transition with lower border pena1ty 225
7.3. Learning a speed transition with lower and higher border pena1ty 232
7.4. Learning a speed transition with random initial speed 239
7.5. Leaming a speed transition with random initial speed and fueling ..247
7.6. Tracking a reference engine speed 259
7.7. Tracking a reference engine speed time reward scheme including
positive rewards 264
7.8. Tracking a reference engine speed with multiple neural networks.. 272
CHAPTER 8
CONCLUSIONS 278
8.1. Summary of Results 278
8.2. Rec~mmendations for Future Work 282
REFERENCES 284
APPENDIX A
MAlHEMATICAL MODELS FOR COMPUTER SIMULATION 287
AI. Mean Torque Model 287
A2. CylinderbyCylinder Model. 297
A.3. Watson's Model. 303
A.4. Tuken's model. 308
A5. Models evaluation 312
v
Table 3.1:
Table 4.1:
Table 4.2:
Table 4.3:
Table 4.4:
Table 4.5:
Table 5.1:
Table 5.2:
Table 5.3:
Table 5.4:
Table 5.5:
LIST OF TABLES
InputOutput range for combustion and torque production subsystem 18
Engine transfer functions for different values ofsampling time T and
engm. e ~In.ct.lOn b1 ?~4
Engine response mean square error and percent overshoot for
different size reduction of the real poles for fueling delay of 110 ms
and engine inertia of 2 Ibftsec2 52
Engine response mean square error and percent overshoot for
different size reduction of the real poles for fueling delay of 130 ms
and engine inertia of2 lbftsec2 , 53
Engine response mean square error and percent overshoot for
different size reduction ofthe real poles for fueling delay of 80 ms
and engine inertia of 2 Ibftsec2 54
Engine response mean square error and percent overshoot for
different size reduction of the real poles for fueling delay of 50 ms
and engine inertia of 2 Ibftsec2 55
Controller's results for mean square error based fitness 65
Engine with different inertia. Controller's results for mean square
error based fitness , 70
Engine with different inertia. Controller's results for percent
overshoot based fitness 78
Engines with different fueling delay. Controller's results for mean
square error based fitness 86
Engines with different fueling delay. Controller's results for percent
overshoot based fitness 93
vi
Table 5.6: Engine with different inertia. Controller's results for mean square
error based fitness. Searching step III 00. Error calculation after
t, = 1 second 103
Table 5.7: Engine with different inertia. Controller's results for mean square
error based fitness. Searching step 1/10. Error calculation after
t1 = 1 second 104
Table 5.8: Engine with different inertia. Controller's results for mean square
error based fitness. Searching step 1/10. Error calculation after
t1 = 5 seconds 105
Table 5.9: Engine with different inertia. Controller's results for percent
overshoot basedfitness. Searching step 1/100. Error calculation after
t] = 1 second 110
Table 5.10: Engine with different inertia. Controller's results for percent
overshoot based fitness. Searching step 1110. Error calculation after
t 1 = 1 second 111
Table 5.11: Engine with different inertia. Controller's results for percent
overshoot basedfitness. Searching step 1/10. Error calculation after
t] =5 seconds ]12
Table 5.12: Engines with different fueling delay. Controller's results for mean
square errorbased fitness. Searching step 1/1 00. Error calculation after
t I = 1 second 117
Table 5.13: Engines with different fueling delay. Controller's results for mean
square error basedfitness. Searching step 1/1 O. Errorcalculation after
t1 = 1 second 118
Table 5.14: Engines with different fueling delay. Controller's results for mean
square errorbased fitness. Searching step] /10. Error calculation after
t 1 = 5 seconds 119
Table 5.15: Engines with different fueling delay. Controller's results for percent
overshoot based fitness. Searching step 1/100. Error calculation
after t] = 1 second 124
Table 5.16: Engines with different fueling delay. Controller's results for percent
overshoot based fitness. Searching step J/ 1O. Error calculation after
t l = 1 second 125
vii
Table 5.17: Engines with different fueling delay. Controller's results for percent
overshoot based fitness. Searching step 1/10. Error calculation after
t I = 5 seconds )26
Table 5.18: Engine with different inertia. Controller's results for mean square
error based fitness. Searching step 1/10. Error calculation after
t I = 5 seconds 132
Table 5.19: Engine with different inertia. Controller's results for percent
overshoot basedfitness. Searching step 1/1 O. Error calculation after
t1 = 5 seconds 137
Table 5.20: Engines with different fueling delay. Controller's results for mean
square error based fitness. Searching step 1/10. Error calculation after
t) = 5 seconds 142
Table 5.21: Engines with different fueling delay. Controller's results for percent
overshoot based fitness. Searching step III O. Error calculation after
t) = 5 seconds 147
Table 8.]: Percentage of the original engine response according to the control
technique and fitness function 281
viii
LIST OF FIGURES
Figure 2.1: Fourstroke engine (turbocharged) (8)' 6
Figure 2.2: Fourstroke cycle for diesel engine (8)' 7
Figure 2.3: Air standard cycles: (a) constant pressure cycle; (b) constant volume
cycle; (c) dual combustion or composite cycle (8} 8
Figure 2.4: Phases of combustion process (8)' 9
Figure 2.5: Gas exchange fourstroke engine (8)' 10
Figure 3.1: Simple Engine model. Block diagram in sdomain 11
Figure 3.2: Simple Engine Control System 12
Figure 3.3: Engine Control System with engine friction and limited speed and
fueling 13
Figure 3.4: Basic Engine model with limits in fuel and internal losses 15
Figure 3.5: Engine response for different fueling delay using basic PIO controller..... 16
Figure 3.6: Engine response for different engine inertia using basic PIO controller. ... 16
Figure 3.7: Neural network based engine model. 17
Figure 3.8: Combustion and Torque production subsystem. " 18
Figure 3.9: Air compression subsystem 19
Figure 3.10: Engine friction subsystem 19
Figure 4.1: Block diagram ofa selftuning regulator (I ) 21
Figure 4.2: Engine model. 22
ix
Figure 4.3:
Figure 4.4:
Figure 4.5:
Figure 4.6:
Figure 4.7:
Figure 4.8:
Figure 4.9:
Simplified engine model. 23
Simulink model for selftuning control. 33
2 a::: +az+a
Parameter a2 for 2 I 0 versus fueling delay
_3 b2 4.  .:,
(T = lOOms) 35
2 az +az+a
Parameter a I for 2 3 I ') 0 versus fueling delay
z  bz
(]' = IOOms ). 35
2 az +az+a
Parameter ao for 2 3 I 2 0 versus fueling delay
:: b:::
(T = lOOms) 36
2 a:: +a::+a
Parameter b for 2 I 0 versus fueling delay
::3  bz2
(T = ]OOms ) 36
2 a::: +a:::+a
Parameter a2 for 2 I 0 versus fueling delay
_3 _ b72  
(T = 50ms) 37
2 a:: +az+a
Figure 4.10: Parameter a I for 2 I 0 versus fueling delay
ib::?
(T = 50ms ) 38
2 a:: +az+a
Figure 4. ] 1: Parameter ao for 2 3 1 2 0 versus fueling delay
z  bz
(T = 50ms ) 38
x
2 az +az+a
Figure 4.12: Parameter h for 2 3 1 2 0 versus fueling delay
z bz
(T = 50m ) 39
3 2 az +a~ +az+a
Figure 4.13: Parameter a3 for 3 ~ 3 ' 0 versus fueling delay
z hz
(T = 50ms) 40
3 2 a3z + a2z + a tZ + a Figure 4.14: Parameter a o 2 for 4 3 versus fueling delay
z  b~
(T= 50ms) 40
3 2
a~:: +a2=+a,z+ao Figure 4.15: Parameter a L for ' versus fueling delay
z4 .hz3
(T = 50ms) 41
3 2 a3z +a z +a z+a
Figure 4.16: Parameter ao for ~ 3 I 0 versus fueling delay
z  hz
(T = 50ms) .41
3 2 az +az +az+a
Figure 4.17: Parameter b for 3 ~ 3 I 0 versus fueling delay
z hz
(T= 50ms) 42
Figure 4.18: Engine response for different controllers and loads 49
Figure 4.19: Final pole location for different values in the magnitude reduction of
original real and complex poles 50
Figure 4.20: Engine response for different size reduction of the real poles for
fueling delay of 11 0 ms, engine inertia of2 Ibftsec2, friction
b] = 0.1229. Blue = Original System. Green = 90 %. Red = 80 %.
Cyan = 70 %. Magenta = 60 % 52
Figure 4.21: Engine response for different size reduction of the real poles for
fueling delay of 130 ms, engine inertia of2 Ibftsec2, friction
xi
b] = 0.1229. Blue = Original System. Green = 90 %. Red = 80 %.
Cyan = 70 %. Magenta = 60 % 53
Figure 4.22: Engine response for different size reduction of the real poles for
fueling delay of 80 ms, engine inertia of 2 Ibftsec2, friction
bl = 0.1229. Blue = Original System. Green = 90 %. Red = 85 % 54
Figure 4.23: Engine response for different size reduction of the real poles for
fueling delay of 50 ms, engine inertia of 2 Ibftsec2, friction
b l = 0.1229. Blue = Original System. Green = 90 %. Red = 85 %.
Cyan = 80 % 55
Figure 5.1 : A 3dimensional and a 4dimensional hypercube (24)'" 59
Figure 5.2: Probability density function for bias = 1.9 60
Figure 5.3: Original GEN1TOR algorithm (D. Whitley, et. al.) (25)' 62
Figure 5.4: Original engine response for Kp=2, Ki=0.5 Kd=O.05, a=l 5 63
Figure 5.5: Original fueling response for Kp=2, Ki=0.5, Kd=0.05, a=15 64
Figure 5.6: Optimal engine response for training with population = 5,
epochs =3000, resulting: Kp=1.77 I8, Ki=O.0788, Kd=0.6835,
a=20.8593 66
Figure 5.7: Detail of transition from 600 to 650 rpm. Blue = original pro
parameters, Green = Genitor optimized parameters. Population = 5,
epochs =3000 66
Figure 5.8: Detail oftransition from 650 to 600 rpm. Blue = Original PID
parameters, Green = Genitor optimized parameters. Population = 5,
epochs =3000 67
Figure 5.9: Detail oftransition from 600 to 650 rpm. Blue = Original PID
parameters, Green = Genitor optimized parameters. Population = 50,
epochs =30000 67
Figure 5.10: Detail oftransition from 650 to 600 rpm. Blue = Original PID
parameters, Green = Genitor optimized parameters. Population = 50,
epochs =30000 68
Figure 5.1 J: Detail transition from 600 to 650 rpm. Blue = Original PID
parameters, Green = Genitor optimized parameters (population = 5),
Red =Genitoroptimized parameters (Population=50). Epochs=1500,
Inertia = 1 Ibftsec2 , , , , 71
Figure 5.12: Learning rate for engine inertia = Ilbftsec2. Blue: Population = 5,
Green: Population = 50 , 71
Figure 5.13: Detail transition from 600 to 650 rpm. Blue = Original PID
parameters, Green = Genitor optimized parameters (Population = 5),
Red= Genitoroptimized parameters (Population =50). Epochs =1500,
Inertia = 1.4 Ibftsec2 , 72
Figure 5.14: Learning rate for engine inertia = 1.4 Ibftsec2. Blue: Population = 5,
Green: Population = 50 72
Figure 5.15: Detail transition from 600 to 650 rpm. Blue = Original PID
parameters, Green = Genitor optimjzed parameters (Population = 5),
Red=Genitoroptimized parameters (Population = 50). Epochs = 1500,
Inertia = 1.8 Ibftsec2 73
Figure 5.16: Learning rate for engine inertia = 1.81bftsec2. Blue: Population = 5,
Green: Population = 50 73
Figure 5.17: Detail transition from 600 to 650 rpm. Blue = Original PID
parameters, Green = Genitor optimized parameters (Population = 5),
Red = Genitor optimized parameters (Populati on = 50). Epochs =1500,
Inertia = 2.2 Ibftsec2, , , 74
Figure 5.18: Learning rate for engine inertia = 2.21bftsec2. Blue: Population = 5,
Green: Population = 50 74
Figure 5.19: Detail transition from 600 to 650 rpm. Blue = Original PID
parameters, Green = Genitor optimized parameters (Population = 5),
Red=Genitor optimized parameters (Populati on = 50). Epochs=1500,
Inertia = 2.6 Ibftsec2 75
Figure 5.20: Learning rate for engine inertia = 2.61bftsec2. Blue: Population = 5,
Green: Population = 50 75
Figure 5.21: Detail transition from 600 to 650 rpm. Blue = Original PID
parameters, Green = Genitor optimized parameters (Population = 5),
xiii
Red=Genitoroptimizedparameters(Population=50). Epochs=1500,
Inertia = 3 lbftsec2 76
Figure 5.22: Learning rate for engine inertia = 3 Ibftsec2. Blue: Population = 5,
Green: Population = 50 76
Figure 5.23: Detail transition from 600 to 650 rpm. Blue = Original PlD
parameters, Green = Genitor optimized parameters (Population = 5),
Red=Genitoroptimizedparameters(Population= 50). Epochs=1500,
Inertia = 1 Ibftsec2 79
Figure 5.24: Learning rate for engine inertia = Ilbftsec2 Blue: Population = 5,
Green: Population = 50 79
Figure 5.25: Detail transition from 600 to 650 rpm. Blue = Original PIO
parameters, Green = Genitor optimized parameters (Population = 5),
Red =Genitoroptimized parameters(Population =50). Epochs =1500,
Inertia = 1.4 Ibftsec2 80
Figure 5.26: Learning rate for engine inertia = 1.4Ibftsec2. Blue: Population = 5,
Green: Population = 50 80
Figure 5.27: Detail transition from 600 to 650 rpm. Blue = Original PID
parameters, Green = Genitor optimized parameters (Population = 5),
Red =Genitoroptimized parameters (Population =50). Epochs=l 500,
Inertia = 1.8Ibftsec2 ,.. 81
Figure 5.28: Learning rate for engine inertia = 1.8Ibftsec2. Blue: Population = 5,
Green: Population = 50 , 81
Figure 5.29: Detail transition from 600 to 650 rpm. Blue = Original PID
parameters, Green = Genitor optimized parameters (Population = 5),
Red=Genitoroptimized parameters(Population= 50). Epochs =1 500,
Inertia = 2.2 Ibftsec2 , 82
Figure 5.30: Learning rate for engine inertia = 2.2lbftsec2. Blue: Population = 5,
Green: Population = 50 82
Figure 5.31: Detail transition from 600 to 650 rpm. Blue = Original PIO
parameters, Green = Genitor optimized parameters (Population = 5),
Red =Genitoroptimizedparameters(Population=50). Epochs=1500,
Inertia = 2.6Ibftsec2 83
xiv
Figure 5.32: Learning rate for engine inertia = 2.6Ibftsec2 Blue: Population = 5,
Green: Population = 50 83
Figure 5.33: Detail transition from 600 to 650 rpm. Blue = Original PID
parameters, Green = Genitor optimized parameters (Population = 5),
Red= Genitoroptimized parameters (PopuIation= 50). Epochs=1500,
Inertia = 3 Ibftsec2 84
Figure 5.34: Learning rate for engine inertia = 3 Ibftsec2. Blue: Population = 5,
Green: Population = 50 84
Figure 5.35: Detail transition from 600 to 650 rpm. Blue = Original PID
parameters, Green = Genitor optimized parameters (Population = 5),
Red=Genitoroptimizedparameters (Population= 50). Epochs =1500,
Delay = 30 msec 87
Figure 5.36: Learning rate for engine delay = 30 msec. Blue: Population = 5,
Green: Population = 50 87
Figure 5.37: Detail transition from 600 to 650 rpm. Blue = Original PID
parameters, Green = Genitor optimized parameters (Population = 5),
Red=Genitoroptimized parameters (Population= 50). Epochs=1500,
Delay = 50 msec '" 88
Figure 5.38: Learning rate for engine delay = 50 msec. Blue: Population = 5,
Green: Population = 50 88
Figure 5.39: Detail transition from 600 to 650 rpm. Blue = Original PIO
parameters, Green = Genitor optimized parameters (Population = 5),
Red= Genitoroptimized parameters (Population = 50). Epochs=1500,
Delay = 70 msee 89
Figure 5.40: Learning rate for engine delay = 70 msec. Blue: Population = 5,
Green: Population = 50 89
Figure 5.41: Detail transition from 600 to 650 rpm. Blue = Original PID
parameters, Green = Genitor optimized parameters (Population = 5),
Red=Genitoroptimizedparameters(Population=50). Epoehs=1500,
Delay = 90 msee 90
Figure 5.42: Learning rate for engine delay = 90 msec. Blue: Population = 5,
Green: Population = 50 90
xv
Figure 5.43: Detail transition from 600 to 650 rpm. Blue = Original Pill
parameters, Green = Genitor optimized parameters (population = 5),
Red= Genitoroptirnizedparameters (Population = 50). Epochs =1500,
Delay = 110 msec 91
Figure 5.44: Learning rate for engine delay = 110 msec. Blue: Population = 5,
Green: Population = 50 91
Figure 5.45: Detail transition from 600 to 650 rpm. Blue = Original PID
parameters, Green = Genitor optimized parameters (Population = 5),
Red=Genitor optimized parameters (Population= 50). Epochs =1500,
Delay = 130 msec 92
Figure 5.46: Leaming rate for engine delay = 130 msec. Blue: Population = 5,
Green: Population = 50 92
Figure 5.47: Detail transition from 600 to 650 rpm. Blue = Original Pill
parameters, Green = Genitor optimized parameters (Population = 5),
Red= Genitor opt~mizedparameters (Population= 50). Epochs = 1500,
Delay = 30 msec 94
Figure 5.48: Learning rate for engine delay = 30 msec. Blue: Population = 5,
Green: Population = 50 94
Figure 5.49: Detail transition from 600 to 650 rpm. Blue = Original PID
parameters, Green = Genitor optimized parameters (Population = 5),
Red =Genitor optimized parameters (Population= 50). Epochs = 1500,
Delay = 50 msec 95
Figure 5.50: Learning rate for engine delay = 50 msec. Blue: Population = 5,
Green: Population = 50 95
Figure 5.51: Detail transition from 600 to 650 rpm. Blue = Original PID
parameters, Green = Genitor optimized parameters (Population = 5),
Red= Genitoroptimized parameters (Population = 50). Epochs = 1500,
Delay = 70 msec 96
Figure 5.52: Learning rate for engine delay = 70 msec. Blue: Population = 5,
Green: Population = 50 96
figure 5.53: Detail. transition from 600 to 650 rpm. Blue = Original PID
parameters, Green = Genitor optimized parameters (population = 5),
Red=Genitoroptimized parameters (Population= 50). Epochs =1500,
Delay = 90 m ec 97
xvi
Figure 5.54: Learning rate for engine delay = 90 msec. Blue: Population = 5,
Green: Population = 50 97
Figure 5.55: Detail transition from 600 to 650 rpm. Blue = Original PID
parameters, Green = Genitor optimized parameters (Population = 5),
Red =Genitoroptimizedparameters (Population= 50). Epochs =1 500,
Delay = 110 msec 98
Figure 5.56: Learning rate for engine delay = 110 msec. Blue: Population = 5,
Green: Population = 50 98
Figure 5.57: Detail transition from 600 to 650 rpm. Blue = Original PID
parameters, Green = Genitor optimized parameters (Population = 5),
Red=Genitoroptimizedparameters(Population = 50). Epochs =1500,
Delay = 130 msec 99
Figure 5.58: Learning rate for engine delay = 130 msec. Blue: Population = 5,
Green: Population = 50 99
Figure 5.59: Detail transition from 600 to 650 rpm. Epochs = 1500,
Inertia = 1 Ibftsec2 106
Figure 5.60: Detail transition from 600 to 650 rpm. Inertia = 1.4 Ibftsec2 106
Figure 5.61: Detail transition from 600 to 650 rpm. Inertia = 1.81bftsec2 107
Figure 5.62: Detail transition from 600 to 650 rpm. Inertia = 2.2 lbftsec2 107
Figure 5.63: Detail transition from 600 to 650 rpm. Inertia = 2.6 lbftsec2 108
Figure 5.64: Detail transition from 600 to 650 rpm. Inertia = 3 Ibftsec2 108
Figure 5.65: Detail transition from 600 to 650 rpm. Inertia = 1 Ibftsec2 113
Figure 5.66: Detail transition from 600 to 650 rpm. Inertia = 1.4 Ibftsec2 113
Figure 5.67: Detail transition from 600 to 650 rpm. Inertia = 1.8 Ibftsec2 114
Figure 5.68: Detail transition from 600 to 650 rpm. Inertia = 2.2 lbftsec2 114
Figure 5.69: Detail transition from 600 to 650 rpm. Inertia = 2.6 lbftsec2 115
xvii
Figure 5.70: Detail transition from 600 to 650 rpm. fnertia = 3 Ibftsec2 115
Figure 5.71: Detail transition from 600 to 650 rpm. Fueling delay = 30 msec 120
Figure 5.72: Detail transition from 600 to 650 rpm. Fueling delay = 50 msec 120
Figure 5.73: Detail transition from 600 to 650 rpm. Fueling delay = 70 msec 121
Figure 5.74: Detail transition from 600 to 650 rpm. Fueling delay = 90 msec 12 t
Figure 5.75: Detail transition from 600 to 650 rpm. Fueling delay = 110 msec 122
Figure 5.76: Detail transition from 600 to 650 rpm. Fueling delay = 130 msec 122
Figure 5.77: Detail transition from 600 to 650 rpm. Fueling delay = 30 msec. . 127
Figure 5.78: Detail transition from 600 to 650 rpm. Fueling delay = 50 msec 127
Figure 5.79: Detail transition from 600 to 650 rpm. Fueling delay = 70 rnsec 128
Figure 5.80: Detai] transition from 600 to 650 rpm. Fueling delay = 90 msec 128
Figure 5.8J: Detail transition from 600 to 650 rpm. Fueling delay = ] 10 msec 129
Figure 5.82: Detail transition from 600 to 650 rpm. Fueling delay = 130 msec. .. ...... 129
Figure 5.83: Detail transition from 600 to 650 rpm. Blue = Original PIO
parameters, Green = Genitor optimized parameters (Population = 5),
Red = Genitoroptimized parameters(Population= 50). Epochs=1500,
Inertia = 1 Ibftsec2 133
Figure 5.84: Detail transition from 600 to 650 rpm. Blue = Original PIO
parameters, Green = Genitor optimized parameters (Population = 5),
Red=Genitoroptimizedparameters(Population=50). Epochs= I500,
Inertia = 1.4 Ibftsec2 133
Figure 5.85: Detail transition from 600 to 650 rpm. Blue = Original PJD
parameters, Green = Genitor optimized parameters (Population = 5),
Red =Genitoroptimized parameters (Population = 50). Epochs=1500,
Inertia = 1.8Ibftsec2 134
Figure 5.86: Detail transition from 600 to 650 rpm. Blue = Original PID
parameters, Green = Genitor optimized parameters (Population = 5),
xviii
Red.=Genitoroptimized parameters (Population= 50). Epochs=1500,
Inertia = 2.2 Ibftsec2 134
Figure 5.87: Detail transition from 600 to 650 rpm. Blue = Original PID
parameters, Green = Genitor optimized. parameters (Population = 5),
Red =Genitor optimizedparameters (Population = 50). Epochs = 1500,
Inertia = 2.6 Ibftsec2 135
Figure 5.88: Detail transition from 600 to 650 rpm. Blue = Original PID
parameters, Green = Genitor optimized parameters (Population = 5),
Red =Genitor optimizedparameters (Population = 50). Epochs =1500,
Inertia = 3 Ibftsec2 135
Figure 5.89: Detail transition from 600 to 650 rpm. Blue = Original PID
parameters, Green = Genitor optimized parameters (Population = 5),
Red = Genitor optimized parameters (Population= 50). Epochs =1500,
Inertia = 1 Ibftsec2 138
Figure 5.90: Detail transition from 600 to 650 rpm. Blue = Original PID
parameters, Green = Genitor optimized parameters (Population = 5),
Red= Genitoroptimized parameters (Population=50). Epochs=1500,
Inertia = 1.4 Ibftsec2 138
Figure 5.9]: Detail transition from 600 to 650 rpm. Blue = Original PID
parameters, Green = Genitor optimized parameters (Population = 5),
Red= Genitoroptimized parameters (Population =50). Epochs =1500,
Inertia = 1.8 Ibftsec2 139
Figure 5.92: Detail transition from 600 to 650 rpm. Blue = Original PID
parameters, Green = Genitor optimized parameters (Population = 5),
Red=Genitoroptimized parametersCPopulation=50). Epochs= I500,
Inertia = 2.2 Ibftsec2 139
Figure 5.93: Detail transition from 600 to 650 rpm. Blue = Original PID
parameters, Green = Genitor optimized parameters (Population = 5),
Red = Genitoroptimized parameters (population= 50). Epochs =) 500,
Inertia = 2.6 Ibftsec2 140
Figure 5.94: Detail transition from 600 to 650 rpm. Blue = Original PID
parameters, Green = Genitor optimized parameters (Population = 5),
Red= Genitoroptimized parameters(Population = 50). Epochs= 1500,
Inertia = 3 Ibftsec2 140
xix
Figure 5.95: Detail transition from 600 to 650 rpm. Blue = Original PID
parameters, Green = Genitor optimized parameters (Population = 5),
Red=Genitoroptimized parameters(Population= 50). Epochs =1500,
Delay = 30 msec 143
Figure 5.96: Detail transition from 600 to 650 rpm. Blue = Original PID
parameters, Green = Genitor optimized parameters (Population = 5),
Red=Genitoroptimizedparameters(Population=50). Epochs= I500,
Delay = 50 msec 143
Figure 5.97: Detail transition from 600 to 650 rpm. Blue = Original PID
parameters, Green = Genitor optimized parameters (Population = 5),
Red=Genitoroptirnizedpararneters(Population=50). Epochs=1500,
Delay = 70 msec 144
Figure 5.98: Detail transition from 600 to 650 rpm. Blue = Original PID
parameters, Green = Genitor optimized parameters (Population = 5),
Red= Genitor optimizedparameters (Population= 50). Epochs =1500,
Delay = 90 msec 144
Figure 5.99: Detail transition from 600 to 650 rpm. Blue = Original PID
parameters, Green = Genitor optimized parameters (Population = 5),
Red=Genitoroptimized parameters(Population =50). Epochs =1 500,
Delay = 110 msec. 145
Figure 5.100: Detail transition from 600 to 650 rpm. Blue = Original PID
parameters, Green = Genitor optimized parameters (Population = 5),
Red=Genitoroptimized parameters (Population = 50). Epochs = 1500,
Delay = 130 msec 145
Figure 5.101: Detail transition from 600 to 650 rpm. Blue = Original pro
parameters, Green = Genitor optimized parameters (Population = 5),
Red=Genitor optimized parameters (Population =50). Epochs =1500,
Delay = 30 msec 14K
Figure 5.102: Detail transition from 600 to 650 rpm. Blue = Original PID
parameters, Green = Genitor optimized parameters (Population = 5),
Red=Genitoroptimized parameters(Population= 50). Epochs=1500,
Delay = 50 msec 148
Figure 5.103: Detail transition from 600 to 650 rpm. Blue = Original PID
parameters, Green = Genitor optimized parameters (Population = 5),
Red=Genitoroptimizedparameters(Population=50). Epochs=1500,
Delay = 70 msec 149
xx
Figure 5.104: Detail transition from 600 to 650 rpm. Blue = Original PID
parameters, Green = Genitor optimized parameters (population = 5),
Red =Genitoroptimizedparameters(Population=50). Epochs=1500,
Delay = 90 msec " 149
Figure 5.105: Detail transition from 600 to 650 rpm. Blue = Original PID
parameters, Green = Genitor optimized parameters (Population = 5),
Red=Genitoroptim ized parameters(Population= 50). Epochs=1500,
Delay = 110 msec , 150
Figure 5.106: Detail transition from 600 to 650 rpm. Blue = Original PID
parameters, Green = Genitor optimized parameters (Population = 5),
Red= Genitor optimized parameters (Population= 50). Epochs = 1500,
Delay = 130 msec 150
Figure 6.1: The reinforcement learning framework (12)' 154
Figure 6.2: Elements of Reinforcement Learning. Game sequence (12)' 158
Figure 6.3: Gridworld example. Original movement rules (12)' 162
Figure 6.4: Gridworld example. Costtogo values from original policy 163
Figure 6.5: Solving the Gridworld (12) 166
Figure 6.6: Iterative Policy Evaluation (12) 167
Figure 6.7: Policy iteration (12)' 169
Figure 6.8: Value Iteration (12)' 170
Figure 6.9: Generalized Policy Iteration (12)' 171
Figure 6.10: Algorithm for firstvisit Monte Carlo method for estimating.l (12)....... 172
Figure 6.11: Algorithm for Monte Carlo with exploring starts (12)' 173
Figure 6.12: TD(O) algorithm for estimating.ft (12) 175
Figure 6.13: Random Walk (12) · 175
xxi
Figure 6.14: Costtogo values learned by TD(O) 176
Figure 6.15: Costtogo values learned by MC. 177
Figure 6.16: Learning curves for TO(O) 177
Figure 6.17: Learning curves for MC 178
Figure 6.18: nstep TD prediction ([2)' 178
Figure 6.19: Sarsa Algorithm (12)' ] 80
Figure 6.20: Gridworld. Basic operation and real optimal path (12)' 18 1
Figure 6.21: Gridworld. Calculated optimal path 182
Figure 6.22: Number of completed episodes versus number of steps 183
Figure 6.23: QIeaming: Offpolicy TD control (12)' ]84
Figure 6.24: Cliff walking (12)' 185
Figure 6.25: Actorcritic method (12)' 186
Figure 6.26: Weighting given in the Arreturn to each of the nstep return (12) 188
Figure 6.27: Graphical representation of backward view of eligibility traces (12)' 189
Figure 6.28: Onhne Tabular TD(A) for estimating JTt (12)' 190
Figure 6.29: General neural network train ing 191
Figure 6.30: Neural network training for a Reinforcement Learning problem. 192
Figure 6.31: Twodimensional Tile Coding (12) 195
Figure 6.32: Twodimensional Tile Coding wi.th two tiles (green and blue) (12)'" 195
Figure 6.33: General1inear update algorithm ]97
Figure 6.34: Linear, gradientdescent Sarsa (A) (12)' 198
.reii
Figure 6.35: Linear, gradientdescent Q(A) (12)' 199
Figure 6.36: Mountain Car task (12)' 200
Figure 6.37: Approximate Action per state after 104 episodes (+(ftt), *(fir),
O(zt) 201
Figure 6.38: Approximate Action per state after 9000 episodes (+(ftf), *(fir),
O(zt)) , , , , 202
Figure 6.39: Costtogo function ( maxa Qt(s,a) ) learned during one run 203
Figure 6.40: Diagram ofan Acrobot (4)' 204
Figure 6.4l: Simple Acrobot notation (4, Il)' 205
Figure 6.42: Swingup the Acrobot (4) 206
Figure 6.43: Acrobot. Steps per Trial. Initial random weights. No exploring
actions. Minimum steps 172 (one time) 208
Figure 6.44: Acrobot. Steps per Trial. Initial random weights. Exploring actions
after 2 consecutive episodes with the same steps. Minimum 148
steps (1 time) , 209
Figure 6.45: Acrobot. Steps per Trial. Initial random weights. Exploring actions
atler two consecutive episodes with same steps, decreased by A.
Minimum steps 155 (one time) 210
Figure 6.46: Acrobot. Steps per Trial. Initial zero weights. No exploring actions.
Minimum steps 175 (one time) 211
Figure 6.47: Acrobot. Steps per Trial. Initial zero weights. Exploring actions after
two consecutive episodes with same steps, decreased by A..
Minimum steps 147 (3 7 times) 211
Figure 6.48: Acrobot movement for 172 steps (green = first link, red = second
link, b1 ue = trajectory of the Acrobot end) 212
Figure 6.49: Acrobot movement for 147 steps (green = fITst I1nk, red = second
link:, blue = trajectory of the Acrobot end) 213
Figure 7.1: Learning algorithm for speed transition basic engine model. 219
xxiii
Figure 7.2: Engine speed response for the basic model. Blue = Suboptimal
solution from the reinforcement learning algorithm. Green = Optimal
sol ution 219
Figure 7.3: Engine fueling for the basic engine mode.!. Blue = Suboptimal solution
from the reinforcement learning algorithm. Green = Optimal solution....220
Figure 7.4: Costtogo function for the speed transition experiment basic engine
model. 220
Figure 7.5: External load apphed to the Neural Network model engine 221
Figure 7.6: Learning algorithm for speed transition neural network model. 222
Figure 7.7: Engine speed for the 42 steps episode 223
Figure 7.8: Air mass (green) and fuel mass (x 100) (blue) for the 42 steps
episode 223
Figure 7.9: Costtogo function for speed transition experiment neural network
model 224
Figure 7.10: Learning algorithm for speed transition with lower border penalty
experiment using basic engine mode1.(*) = episodes where
650 ±1 rpm was not reached 227
Figure 7.11: Engine speed response for the basic model. 227
Figure 7.12: Engine fueling for the basic engine model. Blue = Suboptimal solution
from the reinforcement learning aI'gorithm. Green = Optimal solution....228
Figure 7.13: Costtogo function for speed transition with lower border penalty
experiment using basic engine model 228
Figure 7.14: Learning curve for speed transition with lower border penalty
experiment using neural network model. (*) = episodes where
1500 ±1 rpm was not reached 230
Figure 7.1.5: Engine speed for the 48 steps episode 230
Figure 7.16: Air mass (green) and fuel mass (x 100) (blue) for the 48 steps
episode 231
xxiv
Figure 7.17: Costtogo function for speed transition with lower border penalty
experiment with neural network model. 23 1
Figure 7.18: Learning algorithm for speed transition with lower and upper border
penalty experiment using basic engine model.(*) = episodes where
650 ±I rpm was not reached 234
Figure 7.19: Engine speed response for the basic model. 234
Figure 7.20: Engine fueling for the basic engine model. Blue = Suboptimal solution
from the reinforcement learning algorithm. Green = Optimal solution .... 235
Figure 7.21: Costtogo function for speed transition with lower and upper border
penalty experiment using basic engine model. 235
Figure 7.22: Learning curve for speed transition with lower and upper border
penalty experiment using neural network model. (*) = episodes
where 1500 ±1 rpm was not reached 237
Figure 7.23: Engine speed for the 51 steps episode 237
Figure 7.24: Air mass (green) and fuel mass (x 100)(blue) for the 51 steps
episode 238
Figure 7.25: Costtogo function for speed transition with lower and upper border
penalty experiment using the neural network model. 238
Figure 7.26: Learning algorithm for speed transition with random initial speed and
low fueling using the basic engine model.(.) = episodes where
650 ±1 rpm was not reached 241
Figure 7.27: Engine speed response for the basic model 241
Figure 7.28: Engine fueling for the basic engine model. Blue = Suboptimal solution
from the reinforcement learning algorithm. Green = Optimal solution .... 242
Figure 7.29: Costtogo function for speed transition with random initial speed
and low fueling using the basic engine model. 242
Figure 7.30: Learning algorithm for speed transition with random initial speed and
low fueling using the neural network model. (*) : episodes where
1500 ±1 rpm was not reached 244
Figure 7.31: Engine speed for the 67 steps episode starting with 800 rpm 245
Figure 7.32: Air mass (green) and fuel mass (x 100) (blue) for the 67 steps
episode starting with 800 rpm 245
Figure 7.33: Costtogo function for speed transition with random initial speed
and low fueling using the neural network model. 246
Figure 7.34: Learning algorithm for speed transition with random initial speed and
random initial fueling using the basic engine model.(.) = episodes
where 650 ±1 rpm was not reached 249
Figure 7.35: Engine speed response for the basic model. Blue: Initial fueling
0.0558 mm3/slroke. Green: Initial fueling 20 mm3/stroke 250
Figure 7.36: Engine fueling for the basic engine model. Blue: Initial fueling
0.0558 mm3
/ stroke. Green: Initial fueling 20 mm3/stroke 250
Figure 7.37: Costtogo function for speed transition with random initial speed
and random initial fueling using the basic engine model. 251
Figure 7.38: Learning curve for speed transition with random initial speed and
random initial fueling using the neural network model. (.) = episodes
where 1500 ±1 rpm was not reached 253
Figure 7.39: Engine speed for the 78 steps episode starting with 800 rpm 253
Figure 7.40: Air mass (green) and fuel mass (x 100) (bl ue) for the 78 steps
episode starting at 800 rpm 254
Figure 7.41: Costtogo function for speed transition with random initial speed
and random initial fueling using the neural network model. 254
Figure 7.42: Engine speed for continual operation 255
Figure 7.43: Load torque for continual operation 256
Figure 7.44: Engine speed for continual operation and zero load torque 257
Figure 7.45: Detail ofthe engine speed for continual operation and zero load
torque 257
xxvi
Figure 7.46: Detail ofthe engine speed for continual operation and zero load
torque 258
Figure 7.47: Tile description for engine speed. Case 1500 rpm 260
Figure 7.48: Tile description for engine acceleration 260
Figure 7.49: Engine speed for the 2500 steps episode starting with 576 rpm 262
Figure 7.50: Learning curve for tracking a reference speed with error based
reward. Number of steps per episode 263
Figure 7.51: Engine speed for the 14976 steps episode starting with 800 rpm 263
Figure 7.52: Learning curve for tracking a reference speed. Number of steps per
episode plus steps where the engine is inside 650 ±20 rpm 265
Figure 7.53: Engine speed for the 2500 steps episode starting with 576 rpm 266
Figure 7.54: Learning curve for tracking a reference speed with log sigmoid state
scaling for the engine speed. Number of steps per episode plus
number of steps where the engine is inside 650 ±20 rpm 267
Figure 7.55: Engine speed for the 250] steps episode starting with 560 rpm. 267
Figure 7.56: Learning curve for experiment 10. (.) = episodes where
1500 ±20 rpm was not reached 269
Figure 7.57: Engine speed for the 1715 steps episode starting with 800 rpm 269
Figure 7.58: Learning curve for experiment] 0 with log sigmoid state scaling.
(.) = episodes where 1500 ±20 rpm was not reached 270
Figure 7.59: Engine speed for the ]4976 steps episode starting with 800 rpm. 271
Figure 7.60: Engine speed for the 2500 steps episode starting with 576 rpm and
three CMAC neural networks 274
Figure 7.6 J: Engine speed for the 2500 steps episode starting with 576 rpm and
two CMAC neural networks 275
Figure 7.62: Learning algorithm for tracking a reference engine speed with multiple
neural networks 276
XXVII
Figure 7.63: Engine speed for the 14976 steps episode starting with 800 rpm 277
Figure A.]: Schematic diagram of a turbocharged diesel engine (7)' 288
Figure A.2: Block diagram interaction between submodels for turbocharged
diesel engine model (7} 289
Figure A.3: Engine temperature rise factor K (2)' 295
Figure A.4: Schematic of turbocharged engine (21, 22)' 305
Figure A. 5: Block diagram for a turbocharged diesel engine system (21)' 308
Figure A.6: Experimental apparatus for Tuken et. al. (18) experiment. 311
Figure A.7: Block diagram representation of throttletorque system (I8) 311
xxviii
a
BDC
b l
CA
CI
CMAC
CR
y
y
d
D
Dr
LIST OF SYMBOLS AND ACRONYMS
Pole location for the derivative action ofthe PID controller.
Action at time t .
Bottom dead centre.
Viscous friction for basic engine model.
Crank angle.
Compressionignition engine.
Cerebellar model articulation controller.
Specific heat for air.
Exhaust specific heat.
Specific heat ofthe exhaust manifold wall.
Crank radius.
Specific heat ratio.
Reward discount factor.
Diameter of cylinder.
Exhaust manifold diameter.
Direct injection.
xxix
DP Dynamic programming methods.
dQr Heat transfer tenn which defines the heat transfer from cylinder gas
to wall and vice versa.
E[ 1 Expected value.
~e/ Eligibility traces vector.
EVe Exhaust valve closes.
EVa Exhaust valve opens.
E Effectiveness of the intercooler.
f Engine fueling rate (mm3 /stroke).
F Equivalence ratio.
fix Rounds toward zero.
lmep Friction mean effective pressure.
is Stoichiometric'l fuel air ratio
(F/A )ocwol Actual fuel air ratio.
opr Generalized policy iteration.
h Specific enthalpy.
hoi Specific stagnation enthalpy
hi Heat transfer coefficient (kw/oK' m2
)
1D Ignition delay.
1. Stoichiometric: "pertaining to or involving substances that are in the exact proportions required
for a given reaction" (19)
xxx
rOI
IVC
IVa
J
J(s)
.1(8)
m
ma
MC
Indirect injection.
Intake valve closes.
Intake valve opens.
Engine Inertia (thllsed).
Costlago function starting at state Sf.
Varying inertia of crankshaft.
Derivative gain if PIO controller.
1ntegral gain of PID controUer.
Proportional gain ofPID controller.
Lower error gain.
Lower error load.
mass flow rate.
Air mass.
Compressor mass flow.
Monte Carlo methods.
Corrected mass flow rate.
Mass in the exhaust manifold.
Exhaust mass flow rate.
Engine fueling rate.
Burned fuel mass changing rate.
mfburn Fuel mass burning rate.
mim Mass in the intake manifold.
mwall Mass of the exhaust manifold wall.
m 3 Air mass flow after the intercooler.
m 4 Intake manifold mass flow.
m 5 Engine outlet mass flow.
N Actual engine speed (rpm).
Neorr Corrected turbocharger speed.
Nref Reference Engine Speed.
N te Turbocharger rotor speed (rpm).
11 t Turbine efficiency.
llv Volumetric efficiency.
llind Indicated efficiency (percent).
Jl Gas viscosity.
P Pressure.
Peyl Cylinder pressure.
Pd Down stream pressure.
Pstd Standard pressure.
xxxii
7t
7t(s, a)
QcooI
R
s
Probability that the system changes from the state St to the state
s, + I if the action a, is executed.
Upstream pressure.
Ambient pressure.
Intake manifold pressure.
Exhaust manifold pressure.
3.14159265358979.
Agent's policy as function of the possible states s and actions a.
Air density after the intercooler.
Reward or cost at time t .
Qfactor function or cost function starting at the state stand
executing the action at'
Heat released by combustion.
Heat rejected to charge air cooler.
Heat rejected from exhaust manifold.
Heat transfer rate to cylinder walls.
Fuel lower heating value (kJ/ kg ).
Gas constant (kJ/ kg . 0 K).
Laplace transfonn variable.
System state at time t .
xxxiii
SARSA
SI
Sp
T
TO
TOC
Tt
On policy TO method based on St' at, r" St+ 1 and a t + l'
Sparkignition engine.
Mean piston speed.
Sampling time.
Compressor Torque.
Temperature at the cylinder.
Temporal difference methods.
Top dead centre.
Engine Torque.
Friction Torque.
Indicated Torque.
Intake manifold temperature.
External load torque.
Standard temperature.
Turbine torque.
Up stream temperature.
Coolant temperature.
Wall temperature.
Ambient temperature.
xxxiv
T2 Compressor temperature.
T4 Intake manifold temperature.
Ts Engine outlet temperature.
T6 Exhaust manifold temperature.
't Fueling delay in milliseconds.
u Specific internal energy.
Veg Upper error gain.
Vel Upper error load.
V Volume.
Vd Displacement volume.
Vern Exhaust manifold volume.
Vim Intake manifold volume.
We Compressor work.
WpiS Piston work.
WI Turbine work.
(J) Ie Turbocharger speed.
z Ztransform variable.
CHAPTERl
INTRODUCTION
The principal objective ofthis research has been to investigate the potential ofusing
reinforcement learning techniques for the speed control ofdiesel engines. The first learning
technique is called genetic reinforcement learning. We will apply this technique to the
optimization of PID controller parameters. We will begin the training with baseline
controller parameters for a general engine configuration. However, each specific engine
will have different characteristics. Therefore the controller may not be suitable for all
engines. With the genetic reinforcement learning we will optimize the controller
parameters based on specific engine behavior. We will work with analog and digital
controllers and different engine configurations.
The second learning algorithm we will investigate is called reinforcement learning.
Reinforcement learning is an approximate fonn of dynamic programming, in which a
neural network controller is trained to optimize a specific performance function. At each
iteration the algorithm receives a certain reward or penalty, and attempts to control the
system so as to maximize future rewards or minimize penalties.
Let us now outline the flow of this thesis. Chapter 2 has a description of the basic
diesel engine operation. Classifications are based on the injection type and how the gas
exchange process is performed. Chapter 3 describes the implementation ofa diesel engine
1
model to be used in testing the various reinforcement learning algorithms. We will consider
a pseudolinear system and a system based on neural networks.
Chapter 4 has a discussion of a linear adaptive control technique for the diesel
engine. We will use the selftuning regulator technique. The results of this chapter will be
used as baseline to compare the results for the reinforcement learning algorithms.
Chapter 5 has a description ofthe GENITOR algorithm. This genetic reinforcement
learning algorithm will be used to optimize the parameters of PIO controllers for different
engine configurations.
Chapter 6 is a discussion of the general reinforcement learning framework.
Reinforcement learning is an approximate form ofdynamic programming. The objective is
to determine a control action which optimizes future perfonnance. Reinforcement learning
is an excellent strategy for the intelligent control of systems which are difficult to model
but easy to simulate.
Reinforcement learning involves a twostage process. First, a model must be
developed to predict future performance. Next, an appropriate action must be determined
to optimize the performance. In Chapter 6 the basic framework for reinforcement learning
is presented, and several variations ofreinforcement learning are described. Simulations
are used to illustrate the operation of the various algorithms.
Chapter 7 has different simulations based on the reinforcement learning algorithm.
The first cases demonstrate how the algorithm will learn to change the engine speed from
an initial condition to a desired speed. After that we will present cases which illustrate
engine speed tracking. We will consider reward schemes based on penalty per step on the
2
episode and instant absolute error. Other experiments will be related to the use of multiple
neural networks.
Chapter 8 will contain a summary of the main results and contri butions of this
thesis. This will be followed by recommendations for future work.
Appendix A describes different diesel engine mathematical models from different
researchers for use in simulation.
3
CHAPTER 2
DIESEL ENGINE OPERATION.
2.1. Historical review.
The history of diesel engines started in the last years of the 19th century with the
work of Dr. Rudolf Diesel. Until WWI "the diesel engine was used primarily in stationary
and ship propulsion applications in the form of relatively low speed fourstroke normally
aspirated engines"(8)' WWI spurred the use of diesel engines in transportation and WWII
increased the development ofhighly supercharged diesel engines. From that time, we have
seen a continuous process of improvement in the design of diesel engines and the
application of electronic modules and computer algorithms in their design.
2.2. Classification of the diesel engines
The first classification principle is the compressionignition principle. In contrast to
sparkignition (SI) engines, "the compressionignition (el) engine operates with a
heterogeneous charge of previously compressed air and a finely divided spray of liquid
fuel"(8)' That mix is injected into the cylinder engine, mixed with the air inside the cylinder
and compressed until combustion by the self ignition properties of the fuel. According to
the combustion process we have the following categories:
• a. Direct Injection (DI) systems. When the fuel is injected directly inside the
cylinder.
• b. Indirect Injection (101) systems. The fuel is injected in a prechamber and is
transferred at high speed to the cylinder through a narrow passage. With this
arrangement a high degree of air motion is obtained. This implies a faster air fuel
mixing.
A second division is based in the way in which the gas exchange process is
performed. We have two periods called closed and open periods, where the combustion or
power generation occurs and the exhaust gases are expelled from the combustion chamber
respectively. This division is similar to that applied to spark ignition engines. We can divide
the engines as:
• a. Twostroke engines. The combustion occurs in the region of top dead centre
(TDC) and the gas exchange is made in the region of bottom dead centre (BOC) of
each revolution. The scavenging or gas exchange process at the BOC takes from
1000 to 1500 of the crank angle (CA) period of 3600
• We can subdivide twostroke
engines into: loop scavenged engines, uniflow scavenge single piston
engines and uniflow scavenge opposed piston engines.
We can summarize the twostroke cycle as:
12 compression
23 heat release associated
with combustion
34 expansion
45 blowdown
56 scavenging
61 supercharge
}
5
Closed Period
Open Period
• b. Fourstroke engines. For this type of engines the combustion and the gas
exchange occur in alternate revolutions. As seen in Figure 2.1 the combustion
occurs in TDC region with all the valves closed. After that, the exhaust valve
opens (EYO) just before the BCD region, then the inlet valve opens (IYO) just
before the TDC region. Just after the TDC region the exhaust valve closes (EYC)
and the inlet valve closes (lYC) just after the BDC region. Here the engine starts
the closed period where the combustion occurs, continuing with the next cycle. For
this type of engine the crank angle (CA) period is 7200
•
TDC
BDC
Figure 2.1: Fourstroke engine (/urbochar~ed)(8)
6
,
We can summarize the fourstroke cycle as shown in Figure 2.2.
12 compreSSIOn } 23 heat release associated
with combustion
34 expansion
45 blowdown
56 exhaust
67 overlap
78 induction
81 precompressIOn
Closed Period
Open Period
Figure 2.2: Fourstroke cycle for diesel engine (8)
We can study the engine cycle based on air standard cycles, as shown in Figure 2.3.
The first case is the constant pressure or diesel cycle (Figure 2.3a), where the combustion
process is modeled by a constant pressure heat addition (points 23). This was the
description for "classical" diesel engines, with little relevance today. The second case is the
constant volume or Otto cycle (Figure 2.3b), where the combustion process is modeled by
a constant volume heat addition (points 23). This cycle is normally used for spark ignition
engines, but is valid for diesel engines with light load conditions. The third case is the dual
combustion or composite cycle (Figure 2.3c), where the combustion process is a
combination of the previous cases. This cycle is closer to the actual operation of diesel
engines. Other important theoretical cycles are the Atkinson cycle and the Carnot cycle.
7
p 2
:v
(a)
p
(b)
4
v
P 3
2
(c)
v
Figure 2.3: Air standard cycles: (a) constant pressure cycle; (b) constant volume cycle;
(c) dual combustion or composite cycle (8)'
The real processes of a diesel engine are different from the ideal cycles from the
previous page. The combustion process occurs in the closed period, that is similar for twostroke
and fourstroke engines. We can say that the combustion process has three periods,
as shown in Figure 2.4:
 (i) The delay period.
 (ii) The premixed burning phase (chemically controlled).
 (iii) The diffusion burning phase (controlled by mixing rate).
8
1 1
"0 s::
Q.) s: tl ti Q.) '' ;:I s:: .0
d) E0
rE! u
'
Q.) 0
:> Q.)
~ ".i<..i
;:I
E
::l
U
/ Delay
Start of
;"jection )
Ignition
Cumulative fuel injected
Crank angle 
Tail of combustion ___
\\:: Diffusioo combusHoo
Premixed combustion
Figure 2.4: Phases ofcombustion process (8)'
For the open period we have a gas exchange process as shown in Figure 2.5 for the
case of fourstroke engine. The numbers at each step are related with the fourstroke cycle
shown in Figure 2.2 and Figure 2.1.
9
p
4
5 6 7 8
EYO BDC1 Iya TDC2 EYC
l"Angle .1
overlap
BDC2 IYC
Crank angle ___
Figure 2.5: Gas exchange fourstroke engine (8)
To analyze the engine cycles in detail we can use a step by step basis, using small
crank angle increments de (usually 0.5 0 < de < 20 CA). The step could change according
to the phase in the cycle. Another important term in the calculations is the heat transfer term
dQL which defines the heat transfer from cylinder gas to wall and vice versa.
In the Appendix A the reader can see several different detailed mathematical
models for the diesel engine. Each model is intended to define mathematically the
combustion process inside the diesel engine. There models could be applied to the design
and control of diesel en,gines.
10
CHAPTER 3
ENGINE MODEL.
Initially we tried to implement the Kao and Moskwa mean torque model (7) as
shown in Figure A.2. That model requires specific parameters ofthe engine that were
unavailable at the time of this project. Initial trials were conducted using some typical
parameters found in different papers and books, but the model normally fails in its
operation.
Cummins suggested a simplified model of the diesel engine that is shown in Figure
3.1. This model has a fueling delay defined by ets where s is the Laplace transform and
't is the time delay.
Engine , , Engine
Fueling . Torque + I 60 ~ tS ~ ..  e  Iil~ 2.1t%s
L ' +j~ _
External
Load
Engine
Speed..
Figure 3.1: Simple Engine model. Block diagram in sdomain.
The model shown in Figure 3.1 has an initial PID controller proposed by Cummins
for the nominal values of fueling delay 't = 80 ms and engine inertia / = 2 Ib!tsec
2
.
The basic engine and the controller are shown in Figure 3.2. This model does not include
/l
any limitation in fueling and engine speed. Also, it does not include any friction. If the
engine is working at a given speed without load, and we set the fueling to zero, the engine
will continue at the same speed for unlimited time.
renee Engm. e
d (Nref) + error Ki Kd · 5 Fueling (f)
~ K ++ I ... P 5 (5+a) I
 J~ I I
Engine I.... Controller .1
Speed (N) I I
I.... I
I
Engine .,
I
Engine
I
I + Torque (Te) 1
I 60  ..... ts  I
2·1t·/·s   e  j +
External
Load (T1oad)
Refe
Spee
Figure 3.2: Simple Engine Control System.
For this basic system. we have the following parameters are:
s = Laplace Transform variable. 1t =3.14159265358979.
Nref = Reference Engine Speed.
N = Actual Engine Speed in rpm.
error = (Nref  N) = Speed error in rpm.
Te = Engine Torque in Ibft.
T/ oad =External load torque in Ibft.
1 =Engine Inertia = 2 Ibftsec2
.
f = fueling mm3 /stroke.
't = 80 msec delay.
Kd = 0.05.
a=10or20.
12
To obtain a more realistic engine model, we included a simple gain block that
multiplies the engine speed by bJ substracting the resulting value to the total load applied
to the engine, as shown in Figure 3.3. This block represents viscous friction. We also
included two saturation blocks to avoid negative or excessive engine speeds or fueling. The
first block limits the fueling applied to the engine. That fueling was limited between 0 and
2240 mm3
/ stroke. The second block limits the engine speed. The engine speed was
limited between 0 and a top speed of 2000 rpm. For our experiments we tested with two
friction values intended for low friction (b) = 0.01 ) and high friction (b I = 0.1229).
1)
rence Engm. e
d (Nref) + error K Ki Kd·s / ..  Fueling (  ++  p s (s+a) 
ngine ~
peed (N)
. '0 Friction / Torque 1/
A _,. Engine
60 + Torque (Te)
   ets 2·n·l·s
A~ +
. External Load (T1oad)
E
S
Refe
Spee
Figure 3.3: Engine Control System with engine friction and limited speed andfueling.
The basic PID controller has the parameter values Kp = 2, Ki = 0.5, Kd = 0.05
and two possible values for a = 10 and a = 20. We simulated the diesel engine with the
basic PID controller for different conditions of inertia and fueling delay. The friction and
13
the external load were constant and equal to bI = 0.01 and T/oad = 150 I b.ti
respectively.
If we unify the controller transfer function we obtain:
G(s) = (Kp + Kd)i + (Kp . a + Ki)s + Ki· a
2 s + a· s
(3.1)
are:
Since we have two values for a, the initial transfer functions for the pro controller
For a = 10 ~ 0(5) = 2.05s2 + 20.5s + 5
52 + lOs
For a = 20 ~ G(s) = 2.05s2 + 40.5s + 10
52 + 205
(3.1)
(3.1)
For the simulations we have assumed that the extemalload could change from 0 to
600 ftIb. The maximum fueling rate was defined as 150mm3/stroke. The reference engine
speed will change between 600 rpm and 650 rpm. We found that limitations in fueling were
found for a load of 150 Ibft. Ifwe increase the load we need more fueling. Ifwe use zero
load, we found that negative speed or negative fueling will be needed, making this model
unrealizable. Also, simulations with zero load and zero fuel will run forever for a fixed
speed. A simulink representation ofthe model is shown in Figure 3.4.
Transport
Delay
Sum1
Trans'.r Fcn
(w ~h initial outputs)
I I
To Workspace2
[J. :
DOl Product Inlegralor
'.0471s:2:.'6.278s.82856'
Il...f::~·I....~+r,~.....I fuel I
.2. '5.8625 To Workapace4
Trans'er Fcn
(w~h inilial outpuls)2
torque
To Workspaca3
Pulse
Generelor
~s:m3
SC
onSlant
Figure 3.4: Basic Engine model with limits in fuel and internal losses.
Figure 3.5 shows the speed transition from 600 rpm to 650 rpm for different
values of fueling delay and fixed engine inertia I = 2Ibftsec2
. We noticed how the
percent overshoot and the mean square error increases as the fueling delay increases. Larger
fueling delays implies that each action due to the controller takes more time to influence
the engine response. For that reason an oscillatory response is observed. Figure 3.6 shows
the same speed transition from 600 rpm to 650 rpm for different values engine inertia and
fixed fueling delay 't = 80 ms. We noticed how the oscillatory response increases as the
inertia reduces. The simulation results show how variations in the parameters affected the
final system response. The subsequent chapters will discuss different altematives to
optimize the controller or to generate a controller to reduce the overshoot or the mean
square error for the engine response.
15
700
690
680
670
660
"0 650
0&
<n 640
630
620
610
600
1 1.5 2 2.5 3 3. .. 4.5
time
Figure 3.5: Engine re ponse for different fueling delay using ba ic PID controller.
~inertia = 1 /bftsec?
710
700
690
680
670
660
"E 650 8 en
640
630
620
610
600
1 1.5 2 2.5
time
4 4.5
Figure 3.6: Engine response for different engine in rtia using basic PID con/roll r.
16
In addition to the engine model shown in Figure 3.2, we also modeled the engine
with neural networks as shown in Figure 3.7 (subsystems are shown from Figure 3.8 to
Figure 3.10). Due to all the interactions ofthe subsystems, we preferred the simulink
representation of the previously referenced figures. For the training process we used data
obtained from an engine simulation from a Cummins diesel engine, With that data we
trained two neural networks: one for combustion and torque subsystem and the other for air
mass generation subsystem, Those neural networks were based on the engine model shown
in Figure A.2 in the appendix A. The combustion and torque production subsystem has as
inputs the engine speed N, the engine fueling in! and the fuelair ratio, which is based on
the mass flow in4 and the engine fueling mf' The same block produces the indicated torque
Ti • The air compression subsystem depends of the engine speed N and the engine fueling
inf to generate the air mass flow in4 '
oTo WOrl<,plcI2
mdl~ .... NI+
Air compre'llon
I'" ""''''00' f+~1!OI(2'p"lo)
From S.hJl'lhon'
~ md.d Worhplc.1 TkNld1
I mdfi ~F mIll " ·U:Uf[JD~ F +  F
md'i ,N +1 ;um2 P'oduct Inll;'"lO' Sa~n
CombustIOn and
Sum1
T"",ue producli<>
"'To [3
N Tt .....~ To WotI<IP"C"
To WorhplclJ
1rictlon
Figure 3.7: Neural network based engine model.
17
Since the fuelair ratio and the fue ratio have a smaller dynamic range than the
engine speed (see Table 3.1), we divided the speed input by 1000 for the combustion and
torque production neural network. A similar operation was made with the torque output.
Figure 3.8 shows a representation of the combustion and torque production neural network.
Similar considerations were applied to the air compression subsystem shown in Figure 3.9.
Input or Output
..
mmlmum maximum TraininlZ range
fuelair ratio 0 0.0988 oto 0.1
fuel ratio 0 2.9132 oto 0.3
engine speed (rpm) 572.40 1972.70 oto 2
Table 3.1: InputOutput range for combustion and torque production subsystem.
Figure 3.8: Combustion and Torque production subsystem.
18
0l·1
mdl
Mux
J:::B ,~[)[!}~ ...."' '"9" T." slQmod ran sigmoid LOQ 5lQmo::J
Neuron Neuron neuron
l.~e, kAyer1 lIyer
Figure 3.9: Air compression subsystem.
The engine friction subsystem is based on the Eq. (A.21) from appendix A. For the
model of Figure 3.10, we replaced the mean piston speed Sp by the engine speed N.
Q)l.I
T,
Relational
Operator
1r+c:J.0
Product 1 Tfr
Figure 3.10: Engine friction subsystem.
We will see in the following chapters how the models previously described were
applied for adaptive control, genetic reinforcement learning and reinforcement learning.
19
CHAPTER 4
SELFTUNING REGULATOR APPLIED TO DIESEL ENGINES.
4.1. Introduction.
In order to provide a standard with which to compare the reinforcement learning
algorithms which will be presented in later chapters, this chapter will apply the selftuning
regulator (I) to diesel engine control. This is a standard linear adaptive control technique,
with block diagram as shown in Figure 4.1.
The selftuning regulator is an "indirect" method; a model of the process is
developed (in the estimation block), and this model is used to determine the controller (in
the Controller Design block).
20
Selftuning regulator
r,
Specification I Process parameters
I
I • • l I
I I
I Controller
Estimation
I
I Design ~ I
I I
I Controller I
r  _____ .J
Reference
I , parameters
I
I I
I'" Controller , Process
Input I Output
I I
L __________ .J
Figure 4.1: Block diagram ofa selftuning regulator (1)"
4.2. Pole Placement Design.
There are several different types of selftuning regulator. One is adaptive pole
placement. The idea of this method is to design a controller to meet the specified closedloop
poles specifications. Ifwe take our diesel engine model from Figure 4.2 we can model
the inputoutput relation as:
N A·S N r A·R T
= B· R + A . S· re..  B . R + A . S· load
(4.1)
The idea is to adjust the controller parameters to obtain the desired pole locations.
2/
Reference
speed ( Nref )
+
Actual
engme
speed (N)
r
) ~ Controller = SR
1
Fueling (f)
II
bl
I
II
 Te
Engine Fueling
: Inertia +r+ delay I
I I L ~
Engine = ~ External load (T'oad )
B
Figure 4.2: Engine model.
For the selftuning regulator we need to define the system structure. The engine
transfer function from fueling f to engine speed N is:
N(s)
j{s)
60/(21t1) eu
s + (60 . bI )/(2nI)
(4.2)
The simplified system model is shown in Figure 4.3, where the load is before the
engine delay. We will be using a digital controller, therefore we need to obtain the discretetime
transfer function of the diesel engine with a zeroorder hold:
2[1 eTs . 60/2nI . eu] =
s s + 60· b l /21tI
22
ze
_60b)T
27[/
(4.3)
Reference
speed ( Nref )
+
Actual
engme
speed (N)
r~ Controller = SR
Fueling (j)
r ,
60/(2nJ)
S + (60· b l )/(211/)
L _
Engine = A
B
Te Fueling
+ delay
1:S e
_ _ .J
Figure 4.3: Simplified engine model.
For our experiments we want to select an appropriate sample time T. The engine
model will change according to the fueling delay t and the engine inertia 1, among other
parameters. If we define for our experiments that the fueling delay t will change between
30 ms and 130 ms we want a sample time that will cover those variations using a
reasonable system order. We evaluated some transfer functions for different sample times
T and values of b I' as seen in Table 4.1.
23
Table 4.1: Engine transfer functions for different values ofsampling time T and engine
friction bJT
(ms) b l Transfer function
10 0.01
0.0477· z
\00'[
Z  0.9995
10 0.1229
0.0476· z100'[
z  0.9941
50 0.01
0.2384· z
20,
z0.9976
50 0.1229
0.2353 . z20,
z  0.9711
100 0.01
0.4763 . z10'[
z  0.9952
100 0.1229
0.4637 . z10'[
z  0.9430
We want to define a system structure that supports different variations in the engine
delay and inertia. From Eq. (4.3) we can see that system order variations were due to the
engine delay. We can estimate a system structure of the fonn:
or
N(z)
f(z)
N(z)
f(z)
m mI mI
amz + am IZ + am 2z + ... + a IZ + aD
(4.4)
(4.5)
We now want to define a mechanism to identify the parameters described in Eq.
(4.4) and Eq. (4.5). That mechanism is defined by the parameter estimation process of the
next section. This is the procedure which will be perfonned by the Estimation block of
Figure 4.1.
4.3. Parameter estimation.
For parameter estimation we have our linear model defined as:
Z(k) = H(k)9 + V(k) (4.6)
where for each instant k, Z is a vector with the measurements, H is the data matrix, e is
a vector with unknown parameters and V represents noise or variations in the parameters
that we cannot explain. For example, if our system is represented by Yj = f(x 1)' X 2j' x3)
we can say that the linear model is:
resulting in:
Yl XI J X21 X 31
Y2
[::]
x 12 X 22 X 32
Z(k) e H(k) =
Y3 X13 X 23 X 33
... ... ...
,Yk Xu Xu X 3k
V(k)
V( 1)
V(2)
V(3)
V(k)
(4.7)
(4.8)
A
We want to obtain a function Z(k) = H(k)9 by minimizing:
AT"
min [Z(k)  Z(k)] [Z(k)  Z(k)]
rnpecI9
Sum Squared Error (4.9)
..
where we will obtain the least squared estimate ofthe parameter vector eLS' If we have our
15
system represented by:
y(t) = <Ply(t  1) + ... + ~pY(t  p) + aCt)
then:
yeP + 1)
8 Z(k) = yeP + 2) ~ [:J
yeN)
yeP) yeP  1) ... y(l)
H(k) = yeP + 1) yeP) y(2)
(NI) y(N2) ... y(Np)
where we minimize:
a(p + 1)
V(k) = a(p + 2)
a(p + N)
(4.10)
(4.11)
ZTz= [Z(k)  H(k)e{[Z(k)  H(k)8] => ZTZ  2Z
T
H8 + e
T
H
T
H8 = J (4.12)
 where Z = Z  Z. To find the minimum we must calculate the gradient using the following
properties:
T T Vx[x Ax] = Ax+A x
The minimum of ZTZis found by the relation:
T T
VeJ =  2H Z + 2H He = 0
which implies the normal equation:
26
(4.13)
(4.14)
which is the batch form of the least squared estimate.
In general we have that:
Z(k) = H(k)8 + V(k) (4.15)
for k measurements. If we take an additional k + 1 measurement then Z grows in one
element and H grows in one row:
H(k+ 1) = [HT(k)] => HT(k+ I) = [ T l H (k) \jJk+ Ij
\V k+ I
where:
\jJ~+ I !Y(N) y(NI) ... y(Np+ I)J
then:
HT(k + I )H(k + 1) = [HT(k)'IIk+1J[HT(k)]
\jJk+l
(4.16)
T T 1 1 T
We want to find [H (k)H(k) + \jJk + 1\V k+ d where Pk = H (k)H(k) then we
would find Pk + 1 from Pk :
T T I 1 TI
[H (k)H(k)+'IIk+l'llk+d = [Pk +'IIk+l\¥k+d = Pk+ 1
then we can use the matrix inversion lemma:
27
(4.17)
(4.18)
(4.19)
If we define the scalar a k as:
then:
(4.20)
(4.21)
We need another relation to find the parameters. From Eq. (4.14) we have for the
time k that:
= P(k)HT(k)Z(k)
and for the time k + 1 that:
() (k + 1) = P(k + 1)HT(k + 1)Z(k + 1)
= P(k+ I) [HT(k) J[ Z(k) J
tVk+ I z(k+ I)
(4.22)
28
(4.23)
where the value \jJf+ )8(k) = ;(k + 1) is the prediction of the z(k + 1) value. The value
z(k + 1)  \jJf+ 18(k) is the prediction error. The gain matrix K(k) is defined as the value
UkPk\jJk+)' We can see in Eg. (4.23) that the new estimate of e(k + 1) is based on the
To initialize the algorithm, typically Po = ~l and 8 = zero. Eg. (4.21) and Eg.
(4.23) make up the recursive least squares method for parameter estimation.
If the parameter changes with time, we need a factor to forget older data, especially
k
for adaptive filtering. We are minimizing ZTZ = L z\i), but we want to weight the last
;= I
errors more than the older ones, then we can use the weighted least squares as:
k
" k; 2 2 2 2 2 ~ A . z (i) = z (k) + A . Z (k  1) + A . Z (k  2) + ...
i = I
(4.24)
r 
where 0 < A < 1. The general weighted least squares is Z WZ. For the A case W has the
form:
29
o
W(k) = A?
A
o
If we want to minimize ZTWZ then:
~ T I T eWLS = [H WH] H WZ
~ ~
As the least squares we will estimate eWLs(k + 1) from eWLs(k) . We have the
following relations:
H(k + 1) = [H;k)]
\JIk+J
Z(k+ 1) = [ Z(k) l
z(k + 1)J
Pk+l = [HT(k+l)W(k+l)H(k+l)f
l
[ [ ]]
1
_ AW(k) 0 H(k)
 [H
T
(k) 'I' (k +1)J[0 j '1';+ 1
T T I
= [AH (k)W(k)H(k)+"'k+l\JIk+d
where HT(k) W(k)H(k) = p;' .Using the matrix inversion lemma (see Eq. (4.18»:
30
where uk = T 1 . We can rewrite Eq. (4.25) as:
" + \jJk+ IPk\jJk+ I
For the parameter estimation we have that:
8(k+ 1) = Pk+IHT(k+ l)W(k+ l)Z(k+ 1)
= P [. T l [,,W(k) ol [ Z(k) l
k+ I H (k) \jJk+ Jj 0 IJ z(k+ l)J
T = Pk+I[,,H (k)W(k)Z(k)+\jJk+l z(k+l)]
(4.25)
(4.26)
By substitution of uk we obtain:
(4.27)
We have a new set of equations defined by Eq. (4.26) and Eq. (4.27) to update
8(k + 1) and Pk+ 1 from z(k + 1) and \jJk + ) . To choose " we have two options:
31
A too small ~ increases the variance of the estimate (more oscillation).
"A. too large ~ increases the bias of the estimate.
For initialization we can choose Po = a / or Po = [HTHf' for the first data set
of points. For our case we must use the ARX model for exogenous inputs:
where the data matrix and the vector of parameters is defined by:
y(p  1) ...
H = yeP)
y(l)
y(2)
u(P)
u(p + 1)
u(p m)
u(p  m + 1)
yeN) ... y(Np+2) u(N+ 1) ... u(Nm+ I)
4.4. Parameter estimation for the diesel engine.
The objective of this section is to find a generic transfer function model that could
be used in later sections in the implementation of an adaptive controller. Looking at Eq.
(4.4) and Eg. (4.5) we can note that if we use a very fast sampling time we would obtain a
system with a large order and eventually more difficult to manage. If we use a very slow
sample time we would obtain a reduced system order, but we also reduce the capability of
modeling different time delays with the same transfer function model. We will try to find
32
a tradeoff between sampling time and system order which allow us the use of a unique
transfer function for different fueling delays and inertia values.
We simulated the engine with the original PID controller for different engine delay
conditions using the model shown in Figure 4.4. We saved different data files of fueling
versus engine speed for diverse values of fueling delay and different engine load. For
identification purposes the speed reference was changed between 600 and 650 rpm. The
engine load was simulated with a normal random number generator with the mean value
equal to the desired load and variance equal to one. The engine has a friction denoted by a
block with the same description. A variable called b I could be adjusted for different
friction values.
12 I
To Wo"'spacee
II all
D,.play
I fual2
To Worl<.pace7
From
Worklpace4
Trln.por1
Delay
L }J,.' I
Clock To Wor1\.pace2 Random
Numblr
I uc
Ta Work opace3
rlr1~I.,/LI_lr .........._+1 fuel
L....;:==' Salu,"lIon To Worklpacl4
Inar1ia Gain
I ap I
To Worl<.pace6
I
I nlagralor'
N
Salural,on1
To Worl<.pace1
Figure 4.4: Simulink modelfor selftuning control.
33
For the first case we used a sample time of T = 100 ms, with friction
b I = 0.1229 and a engine transfer function model:
N(z)
I(z)
2
a 2z + a1z + aO
3 2 z  bz
(4.28)
Knowing the average engine load we applied the identification process described
by Eq. (4.26) and Eq. (4.27) for different fueling delay and engine load. Figure 4.5 to Figure
4.8 have the identified parameters for different engine load plotted versus fueling delay.
There are 16 different curves in each figure, one for each engine load (0 to 150 lbft). In
most cases the curves directly overlap.
We can see in Figure 4.5 how the parameter 02 decreases from a value close to
0.4637 for L = 0 to a value near to zero for L = 100 ms. Similar results were obtained
for the parameter a l ' that reaches its maximum value at 't = 100 ms as shown in Figure
4.6. The parameter ao increases after L = 100 ms as shown in Figure 4.7. The parameter
b oscillates between 0.9418 and 0.9430 adjusting its value for the different values in the
engine delay as shown in Figure 4.8. From the figures we can see where each parameter of
the numerator (a 2 to ao) has its maximum value with respect to the fueling delay. For
example, 02 reaches its maximum value for zero fueling delay and a I has its maximum
value for 1: = 100 ms.
0.5
o 4
.«N.:. 0.3 (l)
t)
E 0.2 l.<..l l<l
0
0.1
0
0.1
0 20 40 60 80 100 120
fueling delay (ms)
2
fi
a2z +a\z+ao
Figure 4.5: Parameter a2 or versus fueling delay (T = lOOms).
3 2 z bz
0.5
045
0.4
0.35
.l<..l 0.3 eu
t) 0.25 El.<..l l<l 0.2
0
0.15
0.1
0.05
20 40 60 80
fueling delay (ms)
100 120 140
2 a2z +a\z+ao Figure 4.6: Parameter a \ for 3 2 versus fueling delay (T = lOOms).
z bz
35
o 14
0.12
0.1
0 o 08 tI:l
l eu
'ti 0.06 E
tI:l
lctI.:.
l 0.04
0.02
0
0.02
0 20 40 60 80
fueling delay (ms)
100 120
Figure 4.7: Parameter ao
2
fi
a2z + a Iz + ao or =3=2"';::: versus fueling delay (T = lOOms).
z  bz
0.9432
0.943
0.9428
.0
leu 0.9426
'ti
E
tI:l l o 9424 c'.".
0.9422
0.942
0.9418
0 20 40 60 80
fueling delay (ms)
100 120
Figure 4.8: Parameter b
2 a2z + a)z + ao for 3 2 versus fueling delay (T = lOOms).
z  bz
36
For the next case we changed the sample time to T = SO ms maintaining the same
transfer function structure shown in Eg. (4.28). From Figure 4.9 we can see for the new
sample time that the parameter a2 has an initial lower value changing to a value close to
zero for 't = 50 ms. However we note negatives values after 't = 100 ms .The parameter
a 1 has its maximum value at 't = SO ms as seen in Figure 4.10. The parameter ao
increases after 't = 50 ms but continues increasing after 't = 100 ms as shown in Figure
4.11. From Figure 4.12 we can see that the parameter b is close to the estimated value of
0.9711 but blows up after 't = 100 ms .The values after fueling delays of 100 ms for a2 ,
aD and b are due to the lack ofan additional tenn that represents fueling delays greater than
100 ms.
o 25
0.2
o 15
". N01 u
l>
E 0.05 ".".
Q".",
0
0.05
 0 . 1
0 20 40 60 80
fueling delay (ms)
100 120 140
2 a2z +a1z+aO Figure 4.9: Parameter a2 for 3 2 versus fueling delay (T = 50ms).
z  bz
37
0.3
0.25
0.2
.t.U. o 15
Il.l
Q)
E
E o 1
l'l c..
0.05
0
0.05
0 20 40 60 80 100 120 140
fueling delay (ms)
2
fi
a2z + a1z + ao Figure 4.10: Parameter a 1 or versus fueling delay (T = 50ms).
3 2 z bz
0.3
0.25
0.2
~... o 15 Q)
Q)
E
.l.'.l 0.1 l'l c..
0.05
0
0.05
0 20 40 60 80
fueling delay (ms)
100 120 140
2 a2z +a\z+ao Figure 4.11: Parameter aD for 3 2 versus fueling delay (T = 50ms).
z bz
38
0.979
0.976
U. 9 7 7
0.976
....0. ll.> 0975 ti
E
.'".. 0.974
0'"..
0.973
0.972
0.971
0.97
0 20 40 60 60 100 120 140
fueling delay (ms)
2 az +az+a
Figure 4.12: Parameter b for 2 I 0 versus fueling delay (T = 50ms).
3 2 z  bz
For our last case we decided to increment the system order for the engine model:
N(z)
fez)
3 2
= a3z +a2z +a1z+ao
4 3 z bz
(4.29)
using the same sample time T = 50 ms. From Figure 4.13 to Figure 4.16 we can see that
the parameters a3' a2' a I and ao reach their maximum value for different engine fueling
delays 't that were proportional to the sample time T. The parameter b changes between
values 0.9704 and 0.9712 that were close to the calculated estimate of 0.9711. In Figure
4.17 we can see that b has variations for different loads after T = 100 ms .
39
0.25
02
o 15
M .'."..
<U
"0 0.1
E
«..S. «S
Q.,
0.05
0
0.05
0 20 40 60 80
fueling delay (ms)
100 120
3 Z a3z +aZz +a\z+ao Figure 4.13: Parameter a3 for 4 3 versus fueling delay
z bz
(T = 50ms).
0.3
0.25
0.2
N
«..S. <U 0.15
tj
E
t.'.<.l t'<l 0.1
Q.,
0.05
0
0.05
0 20 40 60 80
fueling delay (ms)
100 120 140
3 2 az+az+az+a
Figure 4.14: Parameter az for 3 ~ 3 I 0 versus fueling delay
z  bz
(T = 50ms).
~o
0.3
0.25
0.2
~... 0.15 Cl)
U
E
'.".. 0.1
c'"..
0.05
0
0.05
0 20 40 60 80
fueling delay (ms)
100 120 140
3 2
F fi
a3z +a2z +alz+aO igure 4.15: Parameter a I or 4 3 versus fueling delay
z  bz
(T = 50ms).
0.14
0.12
01
0.08
'."..
~ 0.06
Cl)
E
l':! 0.04 c'.".
0.02
0
0.02
0 20
fueling delay (ms)
3 2 az +az +az+a
Figure4.16: Parameter ao for 3 2 I 0 versusfuelingdelay (T = 50ms).
4 3 z  bz
41
0.9713
0.9712
0.9711
0.971
.0... .c..u. 0.9709 cu
E 0.9708 .'"..
c'."... 0.9707
0.9706
0.9705
0.9704
0 20 40 60 80
fueling delay (ms)
100 120 140
3 2 az +az +az+a
Figure 4.17: Parameter b for 3 2 J 0 versus fueling delay (T = 50ms).
4 3 z bz
4.5. Controller design.
This section presents an offline design ofa controller that could be used to control
the speed engine. The objective is develop some constraints that could be used in the selftuning
controller in the next section.
For the controller design we select the following transfer function:
S
R
(4.30)
Ifwe combine the controller transfer function ofEq. (4.30) with the system transfer
function ofEq. (4.29) to obtain the inputoutput relation of the closed loop system
described in Eq. (4.1) we obtain:
(4.31)
If we select our closed loop characteristic equation as:
A 65 4 3 +2 + + C = Z + Z Q c5 + Z Q c4 + Z Q c3 Z Q c2 zac! a co (4.32)
then we could define the following relation:
(4.33)
b a, Q2 Q 3
o ao Q 1 a2
o
o
I 0
b 1
o
o
o
o
Initially we could solve this relation by applying the relation RS = (ATA)I ATQc'
However we must include some constraints to obtain the desired system response. We can
find constraints if we apply the final value theorem to Eq. (4.1). We want the final value of
the engine speed with respect to the reference speed to be close to one. We also want the
final value ofthe engine speed with respect to the engine load to be close to zero. From both
conditions we can define:
I1·m ( A . S ~ ~ 1
Z~! B·R+A·
= errorgain (4.34)
and
lim ( A· R _1 ~ 0 = error/oad
1"+1 B·R+A·SJ
(4.35)
For practical purposes we could define a lower and upper bound for errorgain as
Leg < errorgain < Ueg, where Leg could be 0.9999999 and Ueg could be 1.00000001.
A similar relation could be applied to error/oad as LeI < error/oad < Uel, where Lei
could be 0.00000001 and Uel could be 0.00000001. Solving for Eq. (4.34):
[(lb)Leg]
[( 1 b)Leg]
[(2:a,.)(Legl)]
[(2:a)(Leg 1)]
[(La)(Legl)]
~Leg(lb) (4.36)
where Laj = a3 + a2 + a l + aD. For the upper limit of errorgain we obtained:
[(1  b)Ueg]
[(1  b)Ueg]
[(La;)(Uegl)]
[(La)(Uegl)]
[(Laj )( Ueg  1)]
~ Ueg(l  b) (4.37)
Solving for Eq. (4.35):
T
[ (La;) + (1  b)Uel] R
2
[(LaJ+(lb)Uel] R
3
[(La)(Uel)] 8, ~Uel(lb)+(La)
[(La)(Uel)] 82
[(L a;)(UeI) ] S3
T
[(LaJ(lb)Lel] R
2
[(La,.)(lb)Lel] R
3
[(Laj)(Lel)] SI ~ Lel( 1  b)  (La,.)
[(La)(Lel)] 82
[(Lai)(Lel)] 83
We must consider a case where:
lim (R) = 0
ztl
(4.38)
(4.39)
(4.40)
resulting in an equality for Eq. (4.34) and Eq. (4.35). IfEq. (4.40) is satisfied, this does not
mean a final gain equal to one with respect the reference speed or a minimization in the
influence of the external load. To avoid Eq. (4.40) we included the following condition:
R2
. R3
[1 1 0 0 OJ SI *1
82
83
(4.41)
To solve the relations given by Eq. (4.33), Eq. (4.36), Eq. (4.37), Eq. (4.38), Eq.
(4.39) and Eq. (4.41) we can use the Matlab function conls in the form:
result = conls(A, b, C, d)
45
(4.42)
where A and b correspond with the matrix and vector given in Eq. (4.33), C and d
correspond with the matrices and vectors given from Eq. (4.36) to Eq. (4.39) and Eq. (4.41).
This function solves the constrained linear leastsquares problem:
1 2 min(IIAx bll )
x 2
subject to ex::; d (4.43)
From Eq. (4.43) we notice that the condition described by Eq. (4.42) could not be
reached. Therefore we replaced Eq. (4.42) by the following expression:
R2
R3
[1 1 0 0 OJ S I ::; 1.000000001
S2
S3
that allow us to find an expression close to 1. Another possibility could be:
R2
R3
[1 1 0 0 O:Jl SI ::; 0.999999999
S2
S3
(4.44)
(4.45)
We must define the desired transfer function (desired closed loop poles) to be used
in Eq. (4.42). For different experiments we found that fixing a transfer function generally
originates an approximation that generally has one or more of the poles outside of the unit
circle. This condition produces an undesired response for the system. To reduce this
problem we first identified the closed loop transfer function for the original controller.
From that transfer function we determined if the complex poles were predominant with
46
respect the real poles. If that condition occurs then we defined the least dominant real pole
with the same size of the real part of the dominant complex pole. After that the complex
poles were reduced in size.
For example in the case ofa diesel engine with delay 't = 110 ms we identified the
system transfer function as:
N(z)
f(z)
0.0005z3
 o.oOl2i + 0.1930z + 0.0428 =
z4 _ 0.971lz3
(4.46)
where the sample time was T = 50 ms and the average load 150 Ibft. For the original PID
controller:
S
R
2 2.05z 2.575z + 0.5437
2 z 1.25z + 0.25
(4.47)
we obtained the closed loop poles:
0.9879
0.7940 + 0.4972i
0.7940  0.4972i
0.3130 + 0.0378i
0.3130  0.0378i
0.2701
We applied different combinations in the reduction ofreal poles and complex poles.
The combination which reduced the mean square error was by reducing the real poles size
by 0.8 and the complex poles by 0.98. For that reduction the desired new poles must be:
0.2701
0.3067  0.037Ii
0.3067 + 0.0371i
0.7782  0.4873i
0.7782 + 0.4873i
0.7903
By using the function conls we obtained final poles at:
0.9890
0.6838 + 0.3588i
0.6838  0.3588i
0.3087 + 0.0201i
0.3087  0.0201 i
0.3318
with errorgain = 1.00000001 and
controller obtained was:
§. = 1.4893i1.9144z + 0.4374
R i1.1005z+0.I005
error/oad 8 = 8.1349xI0 . The
(4.48)
Ifwe use the condition given by Eq. (4.45) we obtained the controller transfer
function:
§. = 1.4884i1.9150z + 0.4366
R i 1.1005z + 0.1005
48
(4.49)
We imulat d th stem 'th th original PID controll r ( q. (4.47 and both
controllers q. (4.48) and Eq. (4.49)) for th nominal load of 150 lb and a cond 10 d
of 100 lbft. From Figure 4.18 we can s that th syst m respon for the n w ontroll r
has lower 0 er hoot and is immune to load variation. ith th original PID controller
have a higher 0 ershoot and more variation in the r spons due to th engin load. or thi
system we tested various combinations of pole location. We found that the low r m an
squared error was obtained for a reducti.on of80 % in th dominant real pol and 98 % in
the complex poles. The movement in the pole location is hoWD in Figure 4.19.
alculated c nLrolled bolh I ads.
690
680
670
660
650
] 640
0. en 630
620
610
600
57 57.5 58 58.5
time
59 59.5 60 60.5
Figure 4.18: Engine r sponsefor different controller' and load.
49
0.8
0.6
.~ 0.4
>(
<'<l 0.2
C
IrI::::l 0
'00
<'<l 0.2 .§
0.4
0.6
0.8
1
1 0.5 o
real axis
0.5
Figure 4.19: Finalpole locationfor different values in the magnitude reduction oforiginal
real and complex poles.
4.6. Adaptive control.
In this section we will try to use the results of the previous section in the design of
an adaptive controller for the engine. As shown in Figure 4.1, the adaptive controller has
two blocks. A first block is dedicated to estimate the parameters of the engine as described
by Eq. (4.29). Using those parameters, the controller is defined in a second block. The
controller design block will adjust the controller parameters to achieve a desired set of pole
locations. We will use the pole locations that were developed in the previous section.
The identification block was perfonned by using Eq. (4.26) and Eq. (4.27):
50
For the identification block we used A = 0.99. We tried with lower values, but the
estimated parameters had too much variation. For initialization, we choose Po = [HT
H]1
and e = zero for the first data set of points. The system started with the original PID
controller, and after 1 second the adaptive process was started.
The first approach was defining a desired response that the closed loop system must
follow. Using that definition, we found that the engine response was normally saturated at
maximum or minimum speeds. The second approach consisted in reducing the size of the
dominant real pole by a given percentage from the original controller. Results from last
section suggested that we use a reduction in the complex poles of 98 % and a reduction in
the real pole dominant of 80 %. The controller was obtained by solving the relations given
by Eq. (4.33), Eq. (4.36), Eq. (4.37), Eq. (4.38), Eq. (4.39) and Eq. (4.41), using the Matlab
function conls.
In Figure 4.20 and Table 4.2 we can see the different responses for variations in the
size of the dominant real pole for an engine with fueling delay of 110 ms and two friction
values. We note that the best response was obtained with a reduction of 80 %. Similar
results are observed for· an engine with fueling delay of 130 ms in Figure 4.21 and Table
4.3, except for the percent overshoot that was better with a reduction of 70 %. A special
case was for a fueling delay of80 ms. As seen in Figure 4.22, the maximum reduction was
for 85 % of the dominant real pole. When we tried a larger reduction, the response of the
resulting engine was saturated at zero. However, we noted better responses for 85 % and
90 % reduction for the percent overshoot. The mean square error has an small increment.
In Figure 4.23 we can see the case for a fueling delay of 50 ms. Here we obtained a percent
5/
overshoot improvem nt for the dominant r al pole reduced to 0 %. Ifwe continu 'th the
reduction we don't ee further improv m nt and th m n squar TTor in r as .
700
690
680
670
660
650
~
~c. 640
'"
630
620
610
600
36 36.5 37 37.5 38
time
39 39.5
Figure 4.20: Engine re~pom'efor different size,., duelion ofthe r al polesfor fueling delay
of110 ms. engine inertia of2 Ib1i. ec2, friction b1 = 0.1229. Blu = Original stem.
Green = 90 %. Red = 80 %. Cyan = 70 %. Magenta = 60 %.
Size Mean square error Percent overshoot
reduction
of the real hI =0.1229 hI =0.01 hi = 0.1229 hi =0.01 I
poles
Original
]529.52 1873.36 81.87 94.31 response
90% 934.70 (61 %) 1072.66 (57 %) 45.49 (56 %) 57.85 (61 %)
80% 920.91 (60 %) 1059.53 (56 %) 24.35 (30 %) 36.91 (39 %)
70% 1088.70 (71 %) 1301.91 (69 %) 28.17 (34 %) 41.41 (44 %)
60% 1438.96 (94 %) 1881.43 (101 %) 38.75 (47 %) 55.02 (58 %)
Table 4.2: Engine re ponse mean square error and percent over hoot for different size
reduction ofthe real poles for fueling delay of'10 ms and engine inertia of 2Ibft eel.
51
710
70
690
680
670
660
"0 650
~
Q.. 640 '"
630
620
610
600
36 36.5 37 37.5 38 38.5 39 39.5
tim
Figure 4.21: Engine re ponsefor differ nt size redu tion ofth r Ipolesforfueling d ta
of130 ms, engin inertia of2 Ibfis c?, fri tion bI = 0.1229. Btu ri inal 'tern.
Green = 90 %. Red = 80 %. yan = 0 %. Mag nta 60 %.
ize Mean square error Percent overshoot
reduction
of the r I hI = .1229 hI =0.01 hi =0.1229 hI =O. 1
poles
Original
4965.88 6818.79 117.06 119. 9
response
90% 1292.81 (26 %) 1555.00 (23 %) 63.54 (54 %) 77.19 (64 %)
80% 1084.30 (22 %) 1272.48 (19 %) 37.08 (32 %) 51.67 (43 %)
70% 1118.29 (23 %) 1420.07 (21 %) 31.48 (27 %) 46.48 (39 %)
60 °/0 1394.30 (28 %) 1750.87 (26 %) 37.57 (32 Yo) 53.81 (45 %)
Table 4.3: Engine re pon e mean 'quare rr rand percent 0 ershoot for different ize
reduction ofthe real pole for fueling delay of130 m and engin mertia of2 lbfl c?.
3
670
660
650
640
"0eeuu 630 c..
'"
620
610
600
36 36.5 37 37.5
time
38 38.5 39 39.5
Figure 4.22: Engine responsefor diffi rent size reduction ofth r al pol forfuelin"7 delay
of80 ms, engine inertia of 2 lbflsec?, friction bJ = 0.1229. Blue = Original ~ tent.
Green = 90 %. Red = 5 %.
Size Mean square error Percent overshoot
reduction
of real hI = 0.1229 hI = 0.01 hI = 0.1229 hI = 0.01
poles
Original
660.74 72 .81 34.7 4 .59
response
90% 661.98 (100 %) 728.07 (100 %) 17.44 (50 %) 26.32 (60 %)
85% 689.61 (104 %) 765.93 (106 %) 16.00 (46 %) 25.40 (58 %)
Table 4.4: Engine response mean square error andpercent over. hout for different size
reduction ofthe real pole. for fueling delay of80 ms and en ine inert ia uf2 Ib:ftsec?.
54
660
650
640
630
"0 v
ll)
0..
<I> 620
610
600 II
36

365 37 375
time
38 38.5 39 39.5
Figure 4.23: Engine re.~ponsefor different si::e reduction ofthe real polesforfUeling delay
of 50 ms, engine inertia of 2 lbjisec?. friction bJ 0.1229. Blue  Original stem.
Green = 90 %. Red = 85 %. Cyan 80 %.
ize Mean square error Percent overshoot
reduction
of the real hJ =0.1229 hJ = 0.01 bl =0.1229 hi =0.01
poles
Original
445.5] 479.11 13.38 19.78
response
90% 493.81 (lIl %) 537.07 (1 12 %) 9.08 (68 %) 16.25 (82 %)
85% 515.23 (116 %) 563.17(118%) 10.2] (76 %) 17.65 (89 %)
80% 539.92 (121 %) 593.43 (124 %) 11.48 (86 %) 19.25 (93 %)
Table 4.5: Engine response mean square error and percent overshoot for different size
reduction ofthe real pole for fueling delay of50 ms and engine inertia of2 /bjisec?.
55
The objective of this chapter was to introduce a standard adaptive controller with
which to compare the reinforcement learning algorithms to be presented in later chapters.
The selftuning regulator was chosen as the baseline controller.
There are many variations of the selftuning regulator. We used the polepositioning
STR. The first stage in the development ofthis STR is to choose a set ofdesired
pole locations. If these are not chosen carefully, the resulting system may not be stable.
Through experimentation, we found that the best approach was to start with the closedloop
locations ofthe baseline PID controller. We then identified the dominant poles and
reduced them in magnitude by a specified percentage (a reduction of80% provided the best
performance for the engine speed control).
56
~   
CHAPTERS
GENETIC REINFORCEMENT LEARNING FOR DIESEL ENGINES CONTROL.
5.1. Introduction.
In this chapter we will describe the GENITOR algorithm for the optimization ofthe
parameters of a diesel engine controller. We will explain the basics ofthat algorithm and
how it can be used to adjust the controller parameters.
5.2. GENITOR Algorithm.
The GENITOR Algorithm was developed by Dr. Whitley and his students at
Colorado State University and publications are available starting in 1988 (9,23 to 32)'
Initially, GENITOR was an algorithm to solve binary genetic applications(32)' After some
updates, Whitley et. al proposed the GENITOR algorithm as an application using real
numbers for training neural networks for reinforcement learning and "neurocontrol"
applications in a term they called Genetic Reinforcement Learning (25)'
Traditional Genetic Algorithms apply biologic ideas to the solution of a problem.
We can encode a solution in a string, where each parameter solution is consider as a bit of
that string. If we manipulate that string we can obtain new solutions based on the survival
of the fittest. Researchers used manipulation methods related to chromosomal
recombination, such as crossover, mutation, etc.
57
Nonnally the initial population is generated randomly. By random selection two
members of that population are selected and we apply "crossover" to obtain a new member
of the population or offspring. We will consider the parents 0101010001010101 and
yxyyxyyxyxyxyxyx, where the bit representation (0, 1, x, y) was choosen to recognize each
parent. Ifwe apply a crossover at the 4th bit of the parents, it means the recombination:
0101\/010001010101~0101xyyxyxyxyxyx
yxyy\/xyyxyxyxyxyx~yxyyOl0001010101
After the Crossover operation, we can perform the mutation operation, where some
bits ofthe offspring are randomly selected and the values complemented. For example, if
we select the bit 2, 7 and lO of the first offspring we will obtain:
0101xyyxyxyxyxyx
i i i
OOOlxyxxyxxxyxyx
i i t
An important feature from GENITOR to obtain an improvement in the quality of
the population is the tendency to select the best parents more frequently. The difference
with gradient search methods is that genetic algorithms will search randomly in all of the
hyperplanes.
We can define a hyperplane that represents the binary encoding as seen in Figure
5.1. If we have a 3bit string the search is performed in the upper hypercube of Figure 5.1.
Ifwe have a 4bit string the search is made in the hypercube of four dimensions shown in
the lower part ofthe same figure. The difference between the subspaces is the first bit,
where 1 represents the inner cube and 0 the outer cube. The concept of implicit parallelism
58
means an efficient search in those numerous hyperplanes. This feature permits the search
of nonlinear functions without gradient calculation.
010 f+<
000 11' 11'
III
101
_.   ;'0101
0111
• •
1101
._:.'
...... ~ I .' . •
0110
.~   
.. • ". 1110 ';,',•
•
0010 ".~_. _
,, 10 1Q#+.r
•
• , # , : . . . ,•'. 0000
., .
  # 0001
Figure 5.1: A 3dimensional and a 4dimensional hypercube (24)"
The GENITOR algorithm, developed by Whitley and his students, generates an
initial population of random strings. Each member of the population is evaluated and the
population is sorted according to their fitness. Two parents are selected at random from the
population. That selection process uses a bias ranking selection algorithm allowing a higher
59
probability of selection to the best parents. The bias ranking selection is implemented with
the relation:
tix(Populationsize(bias  ,fbias2  4(bias  1)rand)) + 1
parent =
2(bias  1)
that represents the probability density function:
f(P) = bias  2(bias  l)p
where p is the parent ranking. We can see a plot ofthis function for Bias = 1.9 in Figure
5.2. We notice that parents with higher fitness (lower position in sorted population) have
higher probability to be selected.
0.02
0.0 18
0.016
0.014
0.012
0
0..e 0.01
0.008
0.006
0.004
0.002
0
0 20 40
p*IOO
60 80 100
Figure 5.2: Probability density function for bias = 1.9.
60
A crossover process permits the recombination ofthe parents. From both offspring
we select one and discard the other. We evaluate the new offspring and it is placed
according to its fitness, replacing the lowest ranked parent.
The improvements that Whitley and his students defined for the application of
GENITOR in the training of Neural Networks are:
• 1. The Neural Network problem is encoded as realvalued strings instead of binary
strings.
• 2. A different procedure for mutation is used. "Traditional genetic algorithms are
largely driven by recombination, not mutation"(25)'
• 3. The algorithm uses an small population (e.g. 50 individuals) to reduce the exploration
of dissimilar representations for the same neural network.
We can see the GENITOR implementation in Figure 5.3.
61
11. InltlalIzatlOn Phase.
 Set all the weights in the network to a random value between ± 2.5
 Set one allele representing the probability ofcrossover to a random value between
aand 1.
 Evaluate each individual and sort the population according to the fitness.
2. Iteration phase.
 Select two individuals according to relative fitness using linearbias selection.
 Crossover with probability detennined by the crossover probability allele of the
string selected as parent 1; otherwise perform mutation on parent 1.
 The offspring always inherits the crossover probability of parent 1. If parent has
higher fitness than the offspring, increment the offspring crossover probability by a
"actor of 0.10 (to maximum 0.95); otherwise decrease the crossover probability by a
lractor of 0.10 (to minimum 0.05).
 Evaluate new offspring and insert in the population according to fitness.
 Continue "iteration" until error is acceptable or MAXITERATraNS = True.
uperator.
!Mutation: Mutate all weights on the first selected individual by adding a random
tvalue with range ± 10.0.
Crossover: Perform no crossover if the parents differ by two or fewer alleles. Otherwise,
recombine the strings onepoint crossover between the first and the last posiions
at which the parents have different weight values.
Figure 5.3: Original GENITOR algorithm (D. Whitley, et. al.) (25)"
From Figure 5.3 we can see that one allele is the probability of crossover. As the
algorithm converges, the probability of crossover decreases. We can only perform
crossover or mutation for a new offspring. Also, the mutation operator creates a new
random offspring near the selected parent.
62
5.3. Model Description.
In this section we will briefly review the basic diesel engine model and baseline PID
controller which were discussed in Chapter 3. The simplified engine diesel control model
is described from Figure 3.1 to Figure 3.4.
With the model of the Figure 3.4 we made a simulation with constant load of
150 ftlb. The speed engine was changed between 600 rpm and 650 rpm every 5 seconds
for a total time of20 seconds. We can see in Figure 5.4 and Figure 5.5 how the PID
controller changes the engine speed between 600 rpm and 650 rpm, and the fueling is
maintained between 0 and 300 mmJ
/ stroke. We will make the future simulations based
on this model for the engine.
v V
:
1\ f\
670
660
650
640
'0II) 630
&. <II
620
610
600
590
580
o 5 10 time 15 20
Figure 5.4: Original engine responsefor Kp=2, Ki=O.5, Kd=O.05, a=15.
63
5 1 0 time 1 5 20
1\ III I~ , ~
fV If ~
50
o
100
300
1 50
'"0
Cl)
(l)
0
'"
200
250
Figure 5.5: Originalfueling responsefor Kp=2. Ki=O.5, Kd=O.05, a=15.
5.4. Genetic Reinforcement Learning applied to PID controller.
We applied Genetic Reinforcement Learning to the system shown in Figure 3.4.
The simulation will run with constant load of 150 ftIb changing speed between 600 rpm
and 650 rpm every 5 seconds, for a total simulation time of20 seconds. We defined as
fitness function the mean square error of the desired speed response. This parameter is
related to the rise time for the engine speed. For the original PID controller we have the
responses of Figure 5.4 and Figure 5.5. For a = 10 and a = 20 we have a mean square
error of 1000.34 and'999.34, respectively. We want to minimize the mean square error and
indirectly the rise time and the tracking error. The initial population used for training was
set around the basic PID parameters and a = 15.
5.5. Initial results.
We can see the training results in Table 5.1. For each experiment we started the
training with random values around the basic PID controller parameters. We can see an
improvement in the responses due to the PIO controller based on the mean square error. The
mean square error was calculated by direct integration of the square error in the simulink
model. We obtained approximate improvements for the mean square error from 3.63 % to
10.00 %. From the two initial rows ofTable 5.1, we notice that the best results, for the same
number of epochs, were obtained for the smaller population. After increasing the number
of epochs to 30000, we obtained best results for the population of 50, as shown in the last
row of Table 5.1, Figure 5.9 and Figure 5.10. We can see an improvement with longer
training, but good results can be obtained after a few epochs.
Table 5.1: Controller's results for mean square error basedfitness.
Best Fitness 0/0 Pop. Resulting Notes
(error)2. improvement Size Parameters
920.64 7.87 5 Kp= 1.7718 Random initial conditions
Ki= 0.0788 around basic PIO controller.
Kd=0.6835 3000 epochs.
a = 20.8593
963.01 3.63 50 Kp= 1.8662 Random initial conditions
Ki= 0.0117 around basic PIO controller.
Kd=0.1455 3000 epochs.
a = t7.9139
899.38 10.00 50 Kp= 1.7975 Random initial conditions
Ki=0.1417 around basic PIO controller.
Kd= 1.0868 30000 epochs.
a = 25.9656
65
I~ IV
/I I"
II
670
660
650
40
"0 630
~
0 ell 620
610
600
90
580
o 5 10 Lim 16 20
Figure 5.6: Optimal engine response for training with population = 5, epoch 3000,
resultin : Kp=l. 771 ,Ki=0.0788, Kd=0.6 35, a=20.8 93.
f\
t\\  'U6"
~
670
60
850
640
0
II) 630 II)
0
m
620
610
600
590
5 6 7 lime 8 9 10
Figure 5.7: Detail oftransition from 600 t 650 rpm. Blue  original PID parameters,
Green = Genitor optimizedparameters. Population  5, epochs 30 O.
66
660
650 ~
640
830
"0
~ 620
0 en
810
600 n0.
~ 590
580
10 1 1 12 time 13 14 15
Figure 5.8: Detail oftransition/rom 650 to 600 rpm. Blu = riginal PI param fer,
Gr en = Genitor optimized parameters. Population  5, ep ch 3000.
670
660 ~
6 0 
\X/
64
"2v 630 Q,
en
620
610
800 ,oJ
590
5 6 7 time 8 9 10
Figure 5.9: Detail a/transition/rom 600 to 6 0 rpm, Btu Original PID paramet rs,
Ore n = Genitor optimizedparamefe~ .', Population = 50, epoch' =30000.
67
660
650 IlIl
640
630
"'0
3$ 620
c.. en
610
600 J"I:',.
~ 590
580
'0 11 12 time 13 14 16
Figure 5.10: Detail oftransition from 650 to 600 rpm. Blue = Original PID paramet r .
Green = Genitor optimizedparameter. Population = 50, po hs =30000.
5.6. GENITOR applied to analog PID controller.
5.6.1. Results for different engine inertia.
For the next case we repeated the traini ng for differ nt val u of ngm me ia I .
For ach case th initial r pon changes according to th engine in rtia value.
Optimization i ne ded to adjust th prD controller values to match th engin inertia. W
use the mean square error a our fitne s alu . To reduce the training time, we changed the
training schem to two spe d transitions. The first transition is om 600 to 650 rpm at I
second. The second transition is from 650 to 600 at 3.5 seconds. The engine and the PID
controller are initialized with the engine conditions for 600 rpm. The imulation runs from
oto 1 second without error measurements, then the training is start d. After the fir t t of
simulations we obtained negative values 0 integral gain. Those value were due to the
limited simulation time for each speed, during which the tendency of an increased
68
accumulation of integral error could not be noticed. To overcome this, we increased the
simulation time to 8 seconds with the second transition from 650 to 600 rpm at 4.5 seconds.
To ensure the correct values of the parameters for future training, we changed the mutation
process in the GENITOR algorithm by accepting only positive values of the PID
parameters.
For the mean square error as fitness we obtained the results shown in Table 5.2 for
six different inertia values and two different population sizes. For very oscillatory engine
responses we could reduce the mean square error to 20 % of its original value for inertia
equal to 1 Ibftsec2, as seen in first row ofTable 5.2 and Figure 5.11. As inertia increases,
the mean square error was reduced to 77 % of its original value for inertia equal to 1.4 lbft
sec2 (see the second row ofTable 5.2 and Figure 5.13). In the last four rows of Table 5.2
we can see mean square error reduction ranging from 91 % to 95 % of their original values
for inertia between 1.8 Ibftsec2 and 3 Ibftsec2 (see also Figure 5.15, Figure 5.17, Figure
5.19 and Figure 5.21). If we compare the results related to the population size we noticed
small differences in the results for 1500 epochs. For lower inertia values (see Figure 5.12
and Figure 5.14) the learning process is faster with smaller population. For the remaining
cases the learning rate is similar for both of the populations used. A special case is for
inertia 2.2 Ibftsec2 (see Figure 5.18) where a good initial value in the population generated
a better response for the case of higher population. We must remember that the population
initialization and the recombination process are random in nature, therefore, for a specific
experiment we could obtain results that are not consistent with overall trends.
69
Table 5.2: Engine with different inertia. Controller's resultsfor mean square error based
fitness.
Inertia Initial Best Fitness Resulting Best Fitness Resulting
Ibftsec2. Fitness (error)2. Parameters (error)2. Parameters
(error)2. Pop. = 5 Pop. = 5 Pop. = 50 Pop. = 50
1 2939.12 600.18 Kp=0.9651 592.71 Kp=0.9964
(20.42 %) Ki=0.5399 (20.16 %) Ki=0.3470
Kd=0.0481 Kd=0.0422
a = 13.0029 a = 16.5038
1.4 761.07 591.34 Kp=I.3283 597.15 Kp= 1.4847
(77.69 %) Ki=0.4298 (78.46 %) Ki=0.4459
Kd=0.0549 Kd=0.0510
a = 15.6957 a = 13.4670
1.8 626.52 599.19 Kp= 1.9398 597.16 Kp= 1.9493
(95.63 %) Ki= 0.5016 (95.31 %) Ki= 0.4472
Kd=0.0543 Kd=0.0505
a= 14.2661 a = 14.8449
2.2 625.57 594.90 Kp=2.0588 592.23 Kp=2.0697
(95.09 %) Ki=0.4409 (94.67 %) Ki=0.4436
Kd=0.0583 Kd=0.0496
a = 13.2913 a= 15.4919
2.6 648.53 596.43 Kp=2.2861 606.42 Kp=2.1127
(91.96 %) Ki= 0.4830 (93.50 %) Ki=O.4758
Kd=O.0494 Kd=0.0476
a= 16.7868 a = 14.6991
3 665.24 621.36 Kp=2.0800 618.70 Kp = 2.0795
(93.40 %) Ki=O.4801 (93.00 %) Ki=0.4097
Kd=0.0473 Kd=O.0546
a = 14.9868 a = 14.9168
70
700
690
680
670
'"0 660
~
c. 650 ""
640
630
620
610
600
1 .5 2 2.5 time 3 3.5 4 4.5
Figure 5.11: Detail fran 'ition from 600 to 650 rpm. Blue = Original PID parameters,
Green = Genitor op/imizedparameters (Population = 5), Red = Genitor optimized
parameters (Population = 50). Epochs =1500, Inertia = Ilbft ed.
2400
2200
2000
1800
g 1600
II.l
~ 1400 ::;
C"
aell 1200
8 1000
BOO
600
400
0 600 epochs 1000 1600
Figure5.12: Learningrateforengine inertia = I/bftsec!. Blue: Population = 5,
Green: Population = 50.
71
680
670
660
650
"'0 II.J 640 8 U>
630
620
610
600
1.5 2 2.5 time 3 3.5 4 4.5
Figure 5.13: Detail transition from 600 to 650 rpm. Blue = Original PID param ler .
Green = Genitor optimizedparameters (Population = 5), Red = Genitor optimized
parameters (Population = 50). Epochs =1500, Inertia = l.4lbfi 'ed.
740
720
700
g 680
0)
~ 660
&VI
11j 640
QJ e
820
600
580
0 500 epochs 1000 1500
Figu re 5.14: Learning ratefor engine inertia = 1. 4Ibftsed. Blue: Population = 5,
Green: Population = 50.
72
670
660
650
840
~
Q. 630
'"
620
610
600
1.5 2 2.5 time 3 3.5 4 4.5
Figure 5.1 : Detail transit'onfrom 600 to 650 rpm. Blue = Original ID parameter',
reen = enitor optimiz d parameters (Population = 5), Red Genitor optimi=ed
parameter (Population  50). l:.:poch = 1500, Inertia = J.8Ibfi ed.
635
630
625
~ 620
~ 615
S en a 610 l 0)
S
605
600
5 5
0 500 epochs 1000 1500
Figure5.16: Learningrateforengine inertia  1.8 lbfi d. Blue: Population 5,
reen: Population = 50.
73
660 ~ 650
640
"'t;j
Q) Q) 630
0. en
620
610
600 .J
1.5 2 2.5 Lime 3 3.5 4 4.5
Figure 5.17: Detail transition from 600 Lo 650 rpm. Blue Original PID parameters,
Green = enitor optimized paramet r' (Population ~. Red = Genitor optimized
parameter (P pulation = 50). Epochs 1500, Inertia 2.2Ibftsec.
610
60a
606
6
04 ~
t
~ 602 ~:a g 600 <II a~
8 598
596 I
594
1
592
0 500 epochs 1000
'
15 0
Figu re .18: Learning ralefor engine inertia = 2.2Ibftsec? Blue: Population = 5,
reen: Population  50.
74
660 ~
650 ~
640
"0 vv0. 630 '"
620
610
600 fJ
1.5 2 2.5 ti e 3 3.5 4 4.5
Fig re 5.1 : Detail fran ition from 600 to 650 rpm. Blue Original PID parameter',
,., en = Genitor optimized parameter (Population  5), Red = Genitor optimized
parameters (Population = 50). Epochs = /5 O. Inertia 2.6Ibfts d.
635
630
625 .. g 620
v
~ 615 g. >.
'!"ij I 610
v E I I 
605
600 ~
595
0 500 epochs 1000 1500
Figure .20: Learnin ratefor engine inertia 2.6Ibft ed. Blue: Population 5.
reen: Population = 50.
75
660
650
640
e
4) 4) 630
0.. '"
620
610
6 0 kI
1.5 2 2.6 lime 3 3.5 4 4.5
igure .21: Detail transitionfrom 600 to 650 rpm. Blue = Onginal P1D parameters,
Green = Genitor optimized parameters (Population = 5), Red G nitor optimized
parameter' (Populati n = 50). Epochs =1500, Inertia = 3 Ibflsec2.
680
670
660
6
t:
~ 660
~
6 1Il 640 a I'L ~8
630 L
~
620 ~
610
0 500 epochs 1000 1500
FigureS.22: Leamingratefor engine inertia 3Ibfls d. Blue: Population 5,
Green: Population 50.
76
For the next case we repeated the training for different engine inertias, but we
changed the fitness function from mean square error to percent overshoot. For that fitness
function we obtained the results shown in Table 5.3. From that table we can see that the
percentage response improvement depends on the initial overshoot. More impressive
results were obtained with more initial overshoot and oscilJatory responses. Due to the
selected fitness function, we can see in the odd figures from Figure 5.23 to Figure 5.35 that
the resulting responses tend to be more flat. If we compare the training processes, we
obtained a better response for small populations, as seen in the even figures from Figure
5.24 to Figure 5.36. This must be due to the less restrictive fitness function (overshoot)
allowing faster mutation for lower populations.
77
Table 5.3: Engine with different inertia. Controller's results/or percent overshoot based
fitness.
Inertia Initial Best Fitness Resulting Best Fitness Resulting
/bftsec 2
• Fitness (overshoot). Parameters (overshoot). Parameters
(overshoot) Pop. = 5 Pop. = 5 Pop. = 50 Pop. = 50
1 103.1990 0.4311 Kp=0.5323 0.9894 Kp=0.5595
(0041 %) Ki= 0.0670 (0.95 %) Ki= 0.1045
Kd=0.0767 Kd=0.0475
a= 14.8190 a = 16.3213
1.4 57.6517 0.3047 Kp=0.6509 1.2924 Kp=0.7971
(0.52 %) Ki= 0.0628 (2.24 %) Ki=0.1319
Kd=0.0307 Kd=0.0514
a = 11.9443 a = 13.6600
1.8 36.6645 0.3939 Kp=0.7736 0.4040 Kp=0.9222
(OAt %) Ki= 0.0625 Ki= 0.0796
Kd=0.0507 Kd=0.0691
a = 13.9364 a=14.8195
2.2 22.2402 0.3906 Kp=0.9513 0.4233 Kp= 1.1102
(1.07 %) Ki=0.0675 (1.15%) Ki= 0.0830
Kd=O.0494 Kd=0.0479
a = 13.9331 a = 14.8503
2.6 12.9828 0.3604 Kp= 1.3477 0.3329 Kp= 1.2580
(2.77 %) Ki= 0.1014 (2.56 %) Ki=0.0872
Kd=0.0698 Kd=0.0561
a= 17.8313 a = 13.0939
3 7.1080 0.4237 Kp= 1.5058 0.3842 Kp= 104804
(5.96 %) Ki=O.0971 (5.40 %) Ki=0.0955
Kd=O.0440 Kd=0.0449
a = 15.1023 a = 13.6163
78
710
700
690
680
670
~~ 660
~
Q.
<I> 650
640
630
620
610
600
1.5 2 2.5 lime 3 3.5 4 4.5
Figure 5.23: Detail transitionfr m 600 to 650 rpm. Blue = Original PID parameters,
reen = enitor optimizedparameter (Population = 5), Red  Genitor ptimized
parameters (Population  50). Epochs 1500, Inertia 1 lbft ed.
1 10
100
90
80
] 70
V...l 80 u.. 0 50
i5e 40
8 30
20
10
0
0 500 epochs 1000 1600
Figure 5.24: Leamingrateforengine inertia  llbji eel. Blue: PopulatIOn 5,
Creen: Population = 50.
79
690
680
670
660
650
1$
Qcen.. 640
630
620
610
600
1.5 2 2.5 li e 3 3.5 4 4.5
Figure 5.25: Detail transition from 600 to 650 rpm. Blue Original PID parameters,
Green Genitor optimized parameter (Population 5), Red Genitor 'Ptimi;;ed
parameters (Population 50). Epochs 1500, InertlQ /.4 Ibjtsec!.
60 ,....,.,
500 epochs 1000 1500
Ol:__"':::::====:=::I=============::::r============~
o
10
20
40
30
50
Figure 5.26: Learning ratejor engine inertia /.4/bjtsec!. Blue: PopulatIOn 5,
Green: Population 50.
80
680
670
660
650
"0
Q)
Q) 0. 640 en
630
620
610
600
1.6 2 2.5 time 3 3.5 4 4 5
Fig r 5.27: Detail transitionfrom 60010650 rpm. Blue = riginal PID parameters,
reen = enilor optimi::ed parameter (Population  5), Red enit r optimized
parameters (Population = 50). Epochs = 1500, inertia  1.8Ibfisec?
35
30
25
] .e.n. 20 U
;>
0
'E 15
~
8 10
5
0
0 epochs 1600
Fi ure5.28: Learningrateforengineinertia J.8Ibftsec? Blue: Population
Green: Population = 50.
81
670
660
650
13 640 Q)
0.. ""
630
620
610
600
1 .5 2 2.5 lime 3 3.5 4 4.6
Figure 5.29: Detail transilion from 600 to 650 rpm. Blue Or; inal PID par meters,
Gr en = Genitor optimiz d parameter (Population ~. Red Genitor optimized
parameters (Population = 50). Epochs = 1500, Inertia = 2.2 Ibftsec!.
620
610
600
~
OJ ;> 590
0
~
~
8 580
570
560
0 500 epochs 1000 1500
Figure 5.30: Learn;ngrateforengine inertia 2.2Ibftsec? Blue: Population 5,
Green: Population = 50.
82
660
650
640
"'Q
~
0.
U> 630
620
610
600
1.5 2 2.5 tim 3 3.5 4 4.6
Figure 5.31: Detail tranc ifionfrom 600 to 650 rpm. Blu = Original PID param lers,
reen = Genitor optimized parameters (Population = 5), Red = Genitor optimized
parameters (Population = 50). Epochs =1500, Inertia = 2.6Ibft ed.
12
10
8
8
~t
;> 6
0
15 i 4
2
0
0 500 epochs '000 1500
Figur 5.32: Learningrateforengine inertia = 2.6Ibftsec!. Blue: Population 5,
Green: Population  50.
83
660
650
640
0
~
0en 630
620
610
600
1.5 2 2.5 Lime 3 3.5 4 4.5
Fig re 5.33: Detail transitionfrom 600 to 650 rpm. Blu = Origrnal PID parameters,
Green = Genitor optimi::ed parameters (Population 5), led  enitor uptimi::ed
parameters (Popul lion = 50). Epochs =1500, Inertia 3 lbft eel.
7
6
5
8~ 4
~0
;;.
0
i5 3
0
~
8 2
~
0
0 500 epochs 1000 1500
Figure5.34: Learnin raleforengine inertia 3/bftsee!. Blue: Population 5,
Green: Population 50.
84
5.6.2. Results for different fueling delays.
For the next case we repeated the training for different engine fueling delays 't . For
each case the initial response changed according to the fueling delay value with a fixed
engine inertia of2 Ibftsec2. An optimization is needed to adjust the PID controller values
to match the engine fueling delay.We use the mean square error as our fitness value. Like
the previous case, we changed the training scheme to two speed transitions to reduce the
training time process. The first transition is from 600 rpm to 650 rpm at 1 second. The
second transition is from 650 rpm to 600 rpm at 4.5 seconds. The engine and the PID
controller are initialized with the engine conditions for 600 rpm. The simulation runs from
oto 1 second without error measurements, then the training is started.
Using the percent overshoot as fitness, we obtained the results shown in Table 5.4
for six different fueling delay values and two different population sizes. Ifwe look at Table
5.4 and the odd figures from Figure 5.35 to Figure 5.45, we notice how the response
improves from the closest values to the delay of 80 msec to the extreme delay values. This
must be due to the fact that the original PID parameters were optimized for the delay of 80
msec. As we move far from that delay, the original PIO response needs more improvement.
Ifwe look at the training process (even figures from Figure 5.36 to Figure 5.46). we notice
a faster response for the smaller population. However, in the majority of the responses the
final values obtained for the large population were better. Figure 5.40 show a special case,
where the training for smaller population apparently arrived at a local minimal and then
future training does not improve the engine response.
85
Table 5.4: Engines with differentfueling delay. Controller's results for mean square error
based.fitness.
Delay Initial Best Fitness Resulting Best Fitness Resulting
(msec) Fitness (error)2. Parameters (error)2. Parameters
(error)2. Pop. = 5 Pop. = 5 Pop. = 50 Pop. = 5
30 355.32 257.26 Kp=3.0428 268.68 Kp=2.7465
(72.36 %) Ki= 0.2787 (75.57 %) Ki= 0.4947
Kd=0.0527 Kd=0.0425
a = 15.0048 a = 15.7296
50 432.08 385.81 Kp=2.2091 380.24 Kp=2.5555
(89.29 %) Ki=0.4515 (88.00 %) Ki= 0.4656
Kd=0.0568 Kd=0.0496
a = 14.5073 a= 14.1181
70 544.42 521.53 Kp=I.9492 517.01 Kp= 1.9840
(95.79 %) Ki=0.4857 (94.96 %) Ki=0.4314
Kd=0.0533 Kd=0.0485
a= 16.3126 a = 13.3124
90 729.64 660.49 Kp=1.7835 669.07 Kp= I .7831
(90.52 %) Ki=0.2725 (91.69 %) Ki=0.4787
Kd=0.0609 Kd=0.0496
a = 13.6584 a = 13.6043
110 1007.13 818.17 Kp= 1.5797 818.07 Kp= 1.4269
(81.23 %) Ki=0.3228 (81.22 %) Ki=0.2924
Kd=0.0523 Kd=0.0519
a = 17.3795 a = 14.7027
130 1579.99 1006.44 Kp= 1.3023 972.48 Kp= 1.2753
(63.69 %) Ki =0.5150 (61.54 %) Ki=0.2998
Kd=0.0347 Kd=0.0482
a = 13.9807 a = 14.2724
86
650 640 r
"0 630 <l)
8 <I'l
620
610
600
1.5 2 2.5 'me 3 3.5 4 4.5
Figure 5.35: Detail fran ilion from 600 to 650 rpm. Blue = Original PID parameter,
reen = Genitor ptimized parameters (Population = 5 . Red  GeniI r optimi:ed
parameters (Population = 50). Epochs = 1500, Del = 30 m ec.
350 .~_r___,
340
330
320 \
3 0
300
290
280
270
260
500 epochs 1000 1500
250 L.. ..L ' .J
o
Figure 5.36: Learning rate for engine delay  30 m ·ec. Blue: Population 5.
reen: Population  50.
87
660
650 ~
640
~
8 630
620
610
600
15 2 2.5 time 3 3.5 4 4 5
Figure .37: Delai/tran. itionfrom 600 to 650 rpm. Blue  Original PID param ters,
Green  Genitor optimized parameters (Population = 5 ,Red G nitor optimi=ed
parameters (Population = 50). Epochs =1500, Delay  0 mfJec.
420
415 h_
L
410
~ 406
~ 400 l_ 6
!'i"1 396
II) E 1 ,
390 I I
I I 385 1 38
0 500 epochs 1000 1500
igu r 5.38: Learning rale for engine delay 50 msec. Blue: Population 5,
Green: Population 50.
88
660 r\ 650
640
"8'0".. 630
620
810
600 14
1.5 2 2.5 lime 3 3.5 4.5
igur 5.39: Detail transitionfrom 600 to 650 rpm. Blue = riginal PfD parameters,
reen  fenitor optimizedparameters (Population = 5), Red = enitor optimized
parameters (Population = 50 . Epochs = 1500, Delay = 70 msec.
~
..
530
528
626
g
~ 524
~
6 <Il 522
~
~
8 520
518
516
o 500 epochs 1000 1500
Figure 5.40: L arnin rate for engine delay  70 m . c. Blue: Popul tion 5,
Green: Population 50.
670
660
650
"8 640
c.. en
630
620
610
600
1.5 2 2.5 time 3 3.5 4 4.5
Figure 5.41: Detail transitionfrom 00 to 650 rpm. Blue = Original P1D parameters,
reen = Genitor optimi=ed parameters (Population ~ 5 ,Red Genitor optimized
paramet rs (Population = 50). ·poch· =J500, Delay = 9 m ec.
710
705
700
695
5
t: 690
4) :a 685
S en
;a 680
8 675
670
665
660
0 500 epochs 1000 1500
Figu re 5.42: Learning rate for engine delay  90 msec. Blue: Population 5,
Green: Population  50.
90
680
670
660
650
]
p. til 640
630
620
610
600
1.5 2 2.5 tim 3 3.5 4 4.6
Fig re 5.43: Detail lransitionfrom 600 to 650 rpm. Blue  Original PJD par meters,
reen = Genitur uptimi=ed parameters Populat 'on  ~, Red = enilor optimized
parameters (Population = 50). Epoch = 1 00, Delay  11 m ec.
980
960
940
... 920
0
t: u 900
~a 880 <II
;j
u 860 E
840
820
800
0 500 epoc 1000 1600
Figure 5.44: L arning ralefor engine delay  110 msec. Blue: Population 5,
Green: Population 50.
91
690
680
670
660
] 650 0.
'"
640
630
620
610
600
1 .5 2 2.5 tim 3 3.5 4 4.5
Figure 5.45: Detail transition from 6 0 to 650 rpm. Blue Original ID parameters,
reen Genitor optimi::edparameter' (population  5), Red  Genitor optimized
parameters (Population  50). Ep ehs 1500. DeJa = 130 msee,
1600
1500
1400
6~
1300
~
g.
til 1200
lii <U
E 1100
1000
900
0 500 epochs 10 0 1500
Figure 5.46: Learning rale for ngine delay 130 msee. Blue: Population 5,
Green: Population  50.
92
For the next case we repeated the training for different fueling delays 't, changing
the fitness function to the percent overshoot. For that fitness function we obtained the
results shown in Table 5.5. As we can see from Figure 5.47 to Figure 5.58, the resulting
responses tend to be flatter.
Table 5.5: Engines with different fueling delay. Controller's results/or percent overshoot
basedfitness.
Delay Initial Best Fitness Resulting Best Fitness Resulting
(msec) Fitness (overshoot). Parameters (overshoot). Parameters
(overshoot). Pop. = 5 Pop. = 5 Pop. = 50 Pop. = 5
30 1.3080 0.3443 Kp= 1.9959 0.3612 Kp=2.2170
(26.32 %) Ki=0.2174 (27.61 %) Ki = 0.2454
Kd=0.0431 Kd=0.0490
a= 16.0838 a = 14.8958
50 4.5210 0.4311 Kp=I.6536 0.3864 Kp= 1.61 56
(9.53 %) Ki= 0.1636 (8.54 %) Ki=0.1564
Kd=0.0546 Kd=0.0481
a = 18.3008 a = 15.1590
70 18.7506 0.4243 Kp=I.1677 0.3374 Kp=1.2237
(2.26 %) Ki=0.0990 (1.79 %) Ki=0.1097
Kd=0.0591 Kd=O.0577
a=15.1457 a=15.0712
90 37.9684 0.3627 Kp=0.8812 0.4373 Kp=0.9625
(0.95 %) Ki=0.0698 (1.15%) Ki=0.0770
Kd=0.0526 Kd=0.0601
a= 17.7431 a = 13.8594
110 56.3349 0.3243 Kp= 0.7657 0.3109 Kp= 0.7764
(0.57 %) Ki=0.0548 (0.55 %) Ki=0.0587
Kd=0.0520 Kd=O.0601
a = 13.4557 a = 13.3248
130 76.8723 0.2631 Kp= 0.5320 1.0819 Kp= 0.6909
(0.34 %) Ki=0.0335 (1.40 %) Ki=0.0609
Kd=0.0322 Kd=0.0473
a = 14.8544 a = 15.2094
93
650
640
a4s.1 30
0
m
620
610
600
1.5 2 2.5 time 3 3.5 4 4.5
Figure S. 7: Detail transitionfrom 600 a 650 rpm. Blue  Original PI parameters,
Green = enitor optimizedparameters (Population = 5), ed = enitor optimized
parameters (Population  50). Epochs = 15 0, Dela = 30 me.
1.2
1 .1
1 0.9 \
8..r= 0.8
~
0) ;. 0.7
0
i0:) 0.6
c,)
B 0.5
0.4
0.3
0.2
0 500 epochs 1000 1500
Figure .48: Learning rale for engine delay 30 ms c. Blue: Population 5,
Green: Population  50.
94
650 V 640
~
CI) 630
0..
<I)
620
610
600 U
1.5 2 2.5 time 3 3.5 4 4.5
Figure 5.4 : Detad transition from 600 to 650 rpm Blue  Original ID p rameter ,
reen = Genitor optimi~edparameters (Population = 5), Red = Genitor optimized
parameters (Population = 50). Epa h' = 1500, Delay = 50 m 'ec.
4
3.5
3
8 2.5
.s=
ie
0 ;> 2
0
=0) 1.5 [
0.5
0
0 500 epochs 1000 1600
Figure . 0: Learning rate for engine dela  50 mse . Blue: Population 5,
r en: Population 50.
95
660 f\ 650
640 If '"0
Q,) Q,) 630
Q..
'"
620
610
600 ..J
1.5 2 2.5 time 3 3.5 4 4.5
Fig re 5.51: Detail transition from 600 to 650 rpm. Blue  Original PID parameters,
Green = Genitor optimizedparameter' Population = 5), Red Genitor optimized
parameters (Population = 50) . . poch = 1500, Dela = 70 msec.
20
18
16
14
8 12
.Q
<..I.l ~.. 10 0C
8 l 6
4
2
0
0 500 epoc 1000 1600
Figure 5.52: Learning rate for engine delay  70 msec. Blue: Populati n 5,
Green: Population  50.
96
670
660
650
"0 640
~
0. en 630
620
610
800
1.5 2 2.5 time 3 3.5 4 4.5
Figure 5.53: Detail tranu'ition from 600 to 650 rpm. Blue = Origmal PfD parameters,
re n = Genitor optimi:zedparameter' (Population = 5), Red
parameter (PopuLation = 50). Epoch =J500, Delay
enitor optimized
Om e.
40
35
30
8 25
..c
~
4J 20 ;>
0
i:: 8.. 15 8
10
5
0
0 500 epochs 1000 1500
Figure .54: Learning rate for engine delay = 90 msec. Blue: Population  5,
Green: PopuLation  O.
97
680
670
660
650
1. 640
UJ
630
620
610
600
1.5 2 2.5 tim 3 3.5 4 4.5
Figure 5.55: Detaillransitionfrom 600 I 6~0 rpm. Blue = Original PiD parameters,
Green = Genitor optimi:edparameter' (Population = 5), Red  Genitor optimized
parameters (Population = 0). Epa hs =1500, Delay = 110 m ec.
60
50
40
'8 .c
.t.i.l u 30 ;>
0cu
~ 20 8
10
0
0 500 epochs 1500
Figure .56: Learning rate jar ngine delay = no msec. Blue: Populati n 5.
reen: opula/ion 50.
98
690
680
670
660
"0 650 ~
0..
'" 64
630
620
610
600
1 .5 2 2.5 time 3 3.5 4 4.5
Figure 5.57: Detail transition from 600 to 650 rpm. Blu = Original PID parame ers,
Green = enitor optimizedparameters 'Population  5), Red = Genitor optimized
parameters (Population = 50). Epochs = 1500, Delay = 130 msec.
80
70
60
8 50
.c
~
u> 40 0 c;
u 30
~
8
20
10
0
~
0 500 epochs 1000 1500
Figu e 5.58: L arning rat for engine delay 130 msec. Blue: opulalion 5,
Green: Population 50.
99
5.7. Genetic Algorithms applied to digital controllers.
In the next sections we will apply the GENITOR algorithm to the Digital version of
a PID controller. We first converted the original analog PID controller to its digital version.
To realize that conversion we took the PID original relation given by Eq. (3.1) then
executed the conversion:
where T is the sampling time. The transformation of Eq. (3.1) by using Eq. (5.1) is:
(Kp + Kd)i + ( 2(Kp + Kd) + T(Kp· a + Ki»z
+ (Kp + Kd T(Kp· a + Ki) + T(Ki' a))
G(z) = '2''''
Z + (T· a  2)z + (1 T· a)
(5.1)
(5.2)
For Kp = 2, Ki = 0.5, Kd = 0.05 and a = 15 (middle point between 10 and
20), and T = 50 ms we obtained the transfer function:
2
G(z) = 2.05z  2.575z + 0.5437
2 z  1.25z + 0.25
(5.3)
Our first approach was to emulate the GENITOR training of the analog controller
and execute the conversion from Eq. (5.1). This approach resulted in a slow training and
generally the results were far from desired responses. Next, we parametrized the digital
controller as:
G
K (z  zo) (z  Z I)
(z) = ~_:
(zPO)(zPI)
100
(5.4)
With the implementation ofEq. (5.4) w can define the location ofth z rand
the poles without the restrictions ofthe PID controller. We defin d the use of the controller
from Eq. (5.4) where we have five parameters Po' PI' zo, zl and K. The initial alu s of
the previous parameters will be obtained from Eq. (5.3), resulting in Po = l, PI = 0.25.
Zo = 0.9875, zi = 0.2686 and K = 2.05. Starting with those values we will execute the
GENITOR algorithm defined in section 5.2 to optimize the controller parameters for
different values of fueling delay and engine inertia.
5.8. GENITOR algorithm applied to digital controller and engine with friction
bJ=O.1229.
5.8.1. Results for different engine inertia.
For