

small (250x250 max)
medium (500x500 max)
Large
Extra Large
large ( > 500x500)
Full Resolution


STABILITY ANALYSIS OF RECURRENT NEURAL NETWORKS USING DISSIPATIVITY By NAM HOAI NGUYEN Bachelor of Science in Automatic Control Hanoi University of Science and Technology Hanoi, Vietnam 2002 Master of Science in Measurement and Control Systems Hanoi University of Science and Technology Hanoi, Vietnam 2005 Submitted to the Faculty of the Graduate College of Oklahoma State University in partial fulfillment of the requirements for the Degree of DOCTOR OF PHILOSOPHY December, 2012 COPYRIGHT ⃝c By NAM HOAI NGUYEN December, 2012 STABILITY ANALYSIS OF RECURRENT NEURAL NETWORKS USING DISSIPATIVITY Dissertation Approved: Dr. Martin Hagan Dissertation Advisor Dr. Carl Latino Dr. George Scheets Dr. Lisa Mantini Dr. Sheryl Tucker Dean of the Graduate College iii TABLE OF CONTENTS Chapter Page 1 INTRODUCTION 1 2 DISSIPATIVITY AND STABILITY FOR CONTINUOUSTIME SYSTEMS 4 2.1 Continuoustime dynamical systems . . . . . . . . . . . . . . . . . . . . . 4 2.2 Continuoustime dissipative dynamical systems . . . . . . . . . . . . . . . 6 2.3 Properties of supply rate function . . . . . . . . . . . . . . . . . . . . . . . 6 2.4 An example continuoustime dissipative dynamical system . . . . . . . . . 8 2.5 Stability of interconnected continuoustime dissipative dynamical systems . 9 3 STABILITY ANALYSIS OF RECURRENT NEURAL NETWORKS USING DISSIPATIVITY 11 3.1 Dissipative systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.1.1 Static systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.1.2 Discretetime dynamical systems . . . . . . . . . . . . . . . . . . 12 3.1.3 Dissipative systems . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2 Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.2.1 STABILITY ANALYSIS OF RECURRENT NEURAL NETWORKS USING DISSIPATIVITY . . . . . . . . . . . . . . . . . . . . . . . 14 3.2.2 Example use of dissipativity on a simple network . . . . . . . . . . 15 3.3 Stability analysis of Layered Digital Dynamic Networks (LDDNs) . . . . . 17 3.3.1 LDDNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.3.2 Transform LDDNs to a standard form . . . . . . . . . . . . . . . . 18 iv 3.3.3 Sector conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.3.4 Supply rate using sector conditions  static/scalar . . . . . . . . . . 22 3.3.5 Supply rate using sector conditions  static/vector . . . . . . . . . . 23 3.3.6 Selecting the supply rate for LDDNs . . . . . . . . . . . . . . . . . 23 3.3.7 Stability analysis of other types of Recurrent Neural Networks . . . 28 3.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4 COMPARISON OF STABILITY CRITERIA ON TEST PROBLEMS 31 4.1 Sector conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.2 Existing stability criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.2.1 Liu’s criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.2.2 Barabanov and Prokhorov’s LMI criterion . . . . . . . . . . . . . . 33 4.3 Description of test problems . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.4 Stable equilibria that can be proved stable . . . . . . . . . . . . . . . . . . 36 4.5 Stable equilibria that cannot be proved stable . . . . . . . . . . . . . . . . 39 4.6 An example of stability analysis of LDDNs . . . . . . . . . . . . . . . . . 42 4.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5 TRAINING RECURRENT NETWORKS FOR STABILITY 44 5.1 Modified performance index . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.2 Gradient computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.2.1 The derivative of eigenvalue . . . . . . . . . . . . . . . . . . . . . 46 5.2.2 The derivative of maximum eigenvalue . . . . . . . . . . . . . . . 47 6 SIMULATIONS AND TEST RESULTS FOR STABLE TRAINING 50 6.1 Model reference control (MRC) using recurrent neural networks . . . . . . 50 6.1.1 Plant training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 6.1.2 Controller training . . . . . . . . . . . . . . . . . . . . . . . . . . 53 6.2 Design the MRC for a linear system . . . . . . . . . . . . . . . . . . . . . 55 v 6.2.1 Plant model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 6.2.2 Model reference . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 6.2.3 Controller training . . . . . . . . . . . . . . . . . . . . . . . . . . 57 6.3 Design the MRC for a magnetic levitation system . . . . . . . . . . . . . . 59 6.3.1 Magnetic levitation system . . . . . . . . . . . . . . . . . . . . . . 59 6.3.2 Plant training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 6.3.3 Controller training . . . . . . . . . . . . . . . . . . . . . . . . . . 63 6.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 7 CONCLUSIONS AND FUTUREWORK 68 7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 8 APPENDIX (Test Problems) 71 8.1 Test Problem 01 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 8.2 Test Problem 02 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 8.3 Test Problem 03 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 8.4 Test Problem 04 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 8.5 Test Problem 05 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 8.6 Test Problem 06 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 8.7 Test Problem 07 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 8.8 Test Problem 08 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 8.9 Test Problem 09 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 8.10 Test Problem 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 8.11 Test Problem 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 8.12 Test Problem 12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 8.13 Test problem 13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 8.14 Test problem 14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 vi 8.15 Test problem 15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 8.16 Test problem 16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 8.17 Test problem 17 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 8.18 Test problem 18 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 8.19 Test problem 19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 8.20 Test problem 20 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 8.21 Test problem 21 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 8.22 Test problem 22 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 8.23 Test problem 23 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 BIBLIOGRAPHY 92 vii LIST OF TABLES Table Page 4.1 Eigenvalues of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.2 Time steps N to convergence . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.3 Eigenvalues of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.4 Time steps N to convergence . . . . . . . . . . . . . . . . . . . . . . . . . 41 6.1 MPM, MAE, MSE and λm after 1998 step ahead training . . . . . . . . . . 59 6.2 The maximum eigenvalue with different σ . . . . . . . . . . . . . . . . . . 64 6.3 The maximum absolute error with different σ . . . . . . . . . . . . . . . . 64 viii LIST OF FIGURES Figure Page 2.1 Interconnected continuoustime systems . . . . . . . . . . . . . . . . . . . 9 3.1 Interconnected systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 Example LDDN network . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.3 The function f(u) satisfying sector conditions . . . . . . . . . . . . . . . . 22 4.1 Trajectory for test problem 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.2 Trajectory for test problem 2 . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.3 Trajectory for test problem 15 . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.4 Trajectory for test problem 22 . . . . . . . . . . . . . . . . . . . . . . . . . . 41 6.1 Model reference control system . . . . . . . . . . . . . . . . . . . . . . . . 51 6.2 NNbased model reference control system . . . . . . . . . . . . . . . . . . 51 6.3 Neural network plant model . . . . . . . . . . . . . . . . . . . . . . . . . 52 6.4 Network architecture for onestepahead plant training . . . . . . . . . . . 53 6.5 Neural network plant model and neural network controller . . . . . . . . . 54 6.6 Open loop network for controller training . . . . . . . . . . . . . . . . . . 55 6.7 The plant network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 6.8 The reference input and the target . . . . . . . . . . . . . . . . . . . . . . 57 6.9 NNbased MRC system for the linear plant . . . . . . . . . . . . . . . . . 58 6.10 The magnetic levitation system . . . . . . . . . . . . . . . . . . . . . . . . 59 6.11 Control input and target . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 6.12 Performance Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 ix 6.13 The network output, the target and the error . . . . . . . . . . . . . . . . . 62 6.14 The reference model input and output (target) . . . . . . . . . . . . . . . . 63 6.15 The mean square error with σ = 10−6 after 10stepahead training . . . . . 65 6.16 The maximum eigenvalue with σ = 10−6 after 10stepahead training . . . 66 6.17 The combined performance index with σ = 10−6 after 10stepahead training 67 x CHAPTER 1 INTRODUCTION The objective of this research is to use dissipativity theory to analyze stability for a general class of discretetime recurrent neural networks (RNNs). Then, a new training algorithm is proposed to train RNNs for stability. Other efforts on the stability analysis of RNNs have been made in recent years. Jin and Gupta [1] found absolute stability conditions for RNNs based on Ostrowski’s theorem. The networks they dealt with contained only a single layer without biases. Tanaka [2] analyzed the stability of neural network systems by using stability conditions, based on Lyapunov theory, of linear differential inclusions. The neural network systems investigated by Tanaka include a system that combines a neural network plant model and a neural network controller. However, there are no biases included in the neural networks. Suykens [3] found stability criteria for a class of RNNs that can be represented in a form he designated as NLq. These criteria provide sufficient conditions for asymptotic stability. However, he did not deal with the case of nonzero biases. Recently, Barabanov and Prokhorov proposed an approach for the stability analysis of RNNs with sectortype nonlinearities and nonzero biases based on the theory of absolute stability [4]. They later developed a new method based on reduction of dissipativity domain [5]. This method works effectively if the system has a convex Lyapunov function. Later, Liu proposed a generic network model, which is referred to as the discretetime standard neural network model (DSNNM) [6]. The DSNNM represents a neural network model as the interconnection of a linear dynamic system and static nonlinear operators. Liu found some criteria for the globally asymptotic stability of equilibrium points of the DSNNM. 1 More recently, the method in [5] has been modified to speed up convergence by Jafari and Hagan [7]. Most recently, Kim and Braatz [8] used a modified Lure’Postnikov function to obtain stability criteria for some classes of standard nonlinear operator forms. These methods can be applied to RNNs with nonzero biases. However, they have not yet been applied to the stability analysis of layered digital dynamic networks (LDDNs) [9]. A few authors in the past have applied dissipativity theory to continuoustime neural networks [10].We now want to apply this theory to discretetime networks. In this work, a general class of discretetime recurrent neural networks (LDDNs) will be considered, and dissipativity theory will be used to analyze stability of equilibrium points for LDDNs. Dissipativity theory was first developed by Jan C. Willems [11, 12] in the 1970s. The other major authors of this theory are David Hill and Peter Moylan [13, 14]. The term ”dissipativity” was inspired by the concept of passivity. A system for which the rate of increase in its stored energy is not greater than the absorbed input power is a passive system. In dissipativity theory, the stored energy is generalized by a storage function and the input power is generalized by a supply rate function. One of the important results in passivity theory states that if two passive systems are connected in a feedback loop, then the resulting closed loop system is stable. A corresponding result from dissipativity theory states that if two dissipative systems are connected in a feedback loop, then the closed loop system is stable under certain conditions. This result will be used extensively throughout our work. Dissipativity was first defined for continuoustime dynamic systems. Later, a discretetime version of dissipativity was developed by W. Haddad [15]. However, to our knowledge, dissipativity theory has not yet been applied to discretetime RNNs. In the past, Liang Jin and Madan Gupta developed two stable dynamic backpropagation learning algorithms for a class of RNNs [16]. They used both local and global stability conditions to maintain the network stability during the training. Recently, Jason Horn, Orlando de Jesus and Martin Hagan [17] demonstrated that there are spurious valleys in 2 the error surface of RNNs. These spurious valleys occur in unstable regions of the error surfaces and can cause difficulties in training. They suggested that one might be able to use a constrained optimization process to avoid the unstable region during training, but the constraints would be extremely complex for LDDNs. In this work, several stability criteria based on dissipativity will be proposed. Then these novel criteria will be compared with those of Liu [6] and Barabanov and Prokhorov [18] on several test problems. Based on these criteria, we will propose a new training algorithm to train recurrent neural networks for stability. The new training algorithm will be tested on two examples of model reference control systems: a linear plant and a nonlinear plant. This proposal includes seven chapters. The next chapter describes dissipativity theory for continuoustime systems. Chapter 3 is about the stability analysis of discretetime recurrent neural networks using dissipativity. It presents some fundamental concepts and theorems, and gives a brief introduction to LDDNs. Next, a method is proposed to transform LDDNs into a standard interconnected system form. Then, sector conditions are introduced and some important lemmas and theorems are proposed. Finally, novel stability criteria for the equilibrium points of LDDNs, based on these lemmas and theorems, are found. In Chapter 4, existing stability criteria are reviewed and then compared with the novel criteria on a large number of test problems. The following chapter develops a framework for training recurrent neural networks for stability. A modified performance index is defined and a brief review of the first derivative of eigenvalue with respect to a matrix parameter is provided. Then, the first derivative of the maximum eigenvalue with respect to network weights is represented. In Chapter 6, we introduce neural networkbased model reference control systems. The proposed training algorithm is applied to train neural network controllers for both a linear plant and a magnetic levitation system. The final chapter provides a summary and proposes future work. 3 CHAPTER 2 DISSIPATIVITY AND STABILITY FOR CONTINUOUSTIME SYSTEMS The theory of dissipativity was first developed by Willems [11], [12] for continuoustime dynamical systems. Recently, it has been extended to discretetime dynamical systems [15], switched systems [19], and hybrid systems [20]. It has been applied to not only stability analysis [13], [14], [21], [22] but also controller synthesis [15], [20], [23], [24]. This chapter reviews dissipativity theory for continuoustime systems. There are two settings in which dissipative dynamical systems have been defined: the inputoutput setting and the inputstateoutput setting. This chapter will concentrate on the second setting. The chapter will begin by introducing continuoustime dynamical systems. Next, continuoustime dissipative dynamical systems are defined, followed by an example. Finally, stability analysis of interconnected continuoustime dynamical systems is introduced. 2.1 Continuoustime dynamical systems This section presents some definitions [11] concerning dynamical systems and dissipative systems. It also discusses the main properties of dissipative dynamical systems. We begin by introducing some notation. • R is the set of real numbers. • Rn is the n dimensional Euclidean space. • R+ is the set of nonnegative real numbers. • R2 + = { (t2, t1) ∈ R2t2 > t1 } . 4 • Shift operater σT (.): Given a function s(t), t ∈ R, then the shift operator is defined as σT (s) = s(t + T). Continuoustime dynamical systems were defined by Willems [11] as follows. Definition 2.1 A dynamical system is defined as follows: • X, U and Y are called the state space, the set of input values, and the set of output values, respectively. • U∗ is called the input space and it contains a class of Uvalued functions on R. • Y ∗ is called the output space and it contains a class of Yvalued functions on R. • Assume that U∗ and Y ∗ are closed under the shift operator. • : R2 + × X × U∗ → X is called the state transtion function. The following axioms hold: 1. Consistency: (t0, t0, x0, u) = x0 for all t0 ∈ R, x0 ∈ X, and u ∈ U∗. 2. Determinism: (t1, t0, x0, u1) = (t1, t0, x0, u2) for all (t1, t0) ∈ R2 +, x0 ∈ X, and u1, u2 ∈ U∗ satisfying u1(t) = u2(t) for t0 ≤ t ≤ t1. 3. Semigroup property: (t2, t0, x0, u) = (t2, t1, (t1, t0, x0, u), u) for t0 ≤ t1 ≤ t2, x0 ∈ X and u ∈ U∗. 4. Stationary: (t1 +T, t0 +T, x0, σT (u)) = (t1, t0, x0, u) for all (t1, t0) ∈ R2 +, T ∈ R, x0 ∈ X, and u, σT (u) ∈ U. • w : X × U → Y is called the readout function. • The Yvalued function y(t) = w( (t, t0, x0, u), u(t)) for t ≥ t0. It is assumed that a dynamical system is given together with a real valued function r(u, y) = r(u(t), y(t)) called the supply rate function. The constraint of the supply rate 5 function is that for any (t1, t0) ∈ R2 +, u ∈ U and y ∈ Y r(u, y) satisfies ∫ t1 t0 r(u, y)dt < ∞. 2.2 Continuoustime dissipative dynamical systems Continuoustime dissipative dynamical systems were defined by Willems [11] as follows. Definition 2.2 A dynamical system with the supply rate function r(u, y) is said to be dissipative if there exists a nonnegative function S : X → R+, called the storage function, such that for all (t1, t0) ∈ R2 +, x0 ∈ X and u ∈ U, S(x1) ≤ S(x0) + ∫ t1 t0 r(u, y)dt (2.1) where x1 = (t1, t0, x0, u) and r(u, y) = r(u(t), y(t)) with y(t) = w(x(t), u). The inequality (2.1) is called the dissipation inequality. 2.3 Properties of supply rate function There are some key properties of supply rate functions that are explained in the following two lemmas. Lemma 2.1 If a dynamical system is dissipative with respect to the supply rate r(u, y), then it is also dissipative with respect to the supply rate λr(u, y) where λ > 0. Proof. Assume a dynamical system is dissipative with respect to the supply rate r(u, y). Then there exists a storage function S(x) such that the dissipation inequality (2.1) holds. This means that S(x1) ≤ S(x0) + ∫ t1 t0 r(u, y)dt (2.2) Multiplying both sides of (2.2) by λ, we get λS(x1) ≤ λS(x0) + λ ∫ t1 t0 r(u, y)dt 6 Let r1(u, y) = λr(u, y) and S1(x) = λS(x). Then S1(x) is a storage function and r1(u, y) is a supply rate function. It follows that S1(x1) ≤ S1(x0) + ∫ t1 t0 r1(u, y)dt Therefore the system is dissipative with respect to the supply rate λr(u, y). Lemma 2.2 If a dynamical system is dissipative with respect to the supply rate ri(u, y) for i = 1, 2, ..., n, then it is also dissipative with respect to the supply rate r(u, y) = Σn i=1 ri(u, y). Proof. Since the system is dissipative with respect to the supply rate ri(u, y) for i = 1, 2, ..., n, there exists a storage function Si(x) such that the dissipation inequality (2.1) holds. This means that Si(x1) ≤ Si(x0) + ∫ t1 t0 ri(u, y)dt (2.3) for i = 1, 2, ..., n. Taking the summation on both sides of (2.3) where i goes from 1 to n, we get Σn i=1 Si(x1) ≤ Σn i=1 Si(x0) + Σn i=1 ∫ t1 t0 ri(u, y)dt (2.4) The inequality (2.4) is equivalent to Σn i=1 Si(x1) ≤ Σn i=1 Si(x0) + ∫ t1 t0 Σn i=1 ri(u, y)dt (2.5) Let’s define r(u, y) = Σn i=1 ri(u, y) and S(x) = Σn i=1 Si(x).Then (2.5) is equivalent to S(x1) ≤ S(x0) + ∫ t1 t0 r(u, y)dt (2.6) Therefore the system is dissipative with respect to the supply rate r(u, y). Based on these lemmas, we can find supply rate functions for continuoustime dynamical systems. 7 2.4 An example continuoustime dissipative dynamical system Consider a mass, spring and damper system [15]. The equation of motion is mx (t) + Dx_ (t) + Kx(t) = u(t) (2.7) where M is the mass, D is the damper constant, K is the spring stiffness, x is the position of the mass and u is the force acting on the mass. The initial conditions are x(0) = x0 and x_ (0) = x_ 0. It is assumed that M > 0, D ≥ 0 and K ≥ 0. The energy of the system is V (x, x_ ) = 0.5mx_ 2 + 0.5Kx2 The time derivative of the energy is _V (x, x_ ) = mx x_ + Kxx_ = ux_ − Dx_ 2 Let’s define x1 = x, x2 = x_ and y = x_ as state variables and the output of the system (2.7), respectively. Then _V (x) = uy − Dx_ 2 (2.8) where x = [x1 x2]. Integrating both sides of equation (2.8) from t = 0 to t = T gives V (x(T)) = V (x(0)) + ∫ T 0 uydt − ∫ T 0 Dx_ 2dt (2.9) where V (x(T)) is the energy at t = T and V (x(0)) is the initial energy. The second term and the last term on the right side of (2.9) are the energy supplied by the external source and the energy dissipated by the damper, respectively. From (2.9) we get V (x(T)) ≤ V (x(0)) + ∫ T 0 uydt (2.10) since ∫ T 0 Dx_ 2dt ≥ 0 (2.11) 8 Let’s define S(x) = V (x) and r(u, y) = uy. Then S(x) is a storage function and r(u, y) is a supply rate. The inequality (2.10) means that the system (2.7) is dissipative. In this example, the storage function is the stored energy and the supply rate is the absorbed input power. The dissipativity of the system says that the change in its stored energy is not greater than the absorbed input power. 2.5 Stability of interconnected continuoustime dissipative dynamical systems Consider dynamical systems 1 and 2 that are interconnected via constraints u1 = −y2 and u2 = y1, as shown in Fig. 2.1. Suppose that equilibrium points of systems 1 and 2 are located at the origin. Σ1 Σ2 u1 y2 y1 u2 Figure 2.1: Interconnected continuoustime systems Theorem 2.1 [11] If the system 1 is dissipative with respect to the supply rate r1(u1, y1), the system 2 is dissipative with respect to the supply rate r2(u1, y2) and r1(u1, y2) + r2(u1, y2) = 0 then the origin of the feedback system is stable. Proof. Since the system 1 is dissipative with respect to r1(u1, y1), there exists a storage function S1(x1) ≥ 0 such that _S 1(x1) ≤ r1(u1, y1). Since the system 2 is dissipative with respect to r2(u2, y2), there exists a storage function S2(x2) ≥ 0 such that _S 2(x2) ≤ r2(u2, y2). Thus S1(x1)+S2(x2) ≥ 0 and _S 1(x1)+ _S 2(x2) ≤ r1(u1, y1)+r2(u2, y2). Since 9 r1(u1, y1) + r2(u2, y2) = 0, _S 1(x1) + _S 2(x2) ≤ 0. Let’s define V (x) = S1(x1) + S2(x2). Then _V (x) ≤ 0. Therefore the origin of the feedback system is stable. In the next chapter, this theorem will be extended to the case where 1 is static and 2 is a discretetime dynamical system. 10 CHAPTER 3 STABILITY ANALYSIS OF RECURRENT NEURAL NETWORKS USING DISSIPATIVITY The previous chapter introduced the concept of dissipativity for continuous time dynamic systems and demonstrated how dissipativity can be used to prove stability for interconnected dissipative systems. In this chapter, we extend these ideas to analyze the stability of discretetime recurrent neural networks. We begin by defining dissipativity for static and discretetime dynamic systems and update the stability theorem for interconnected dissipative systems. Then we introduce a general framework for representing RNNs and show how this general framework can be represented in a standard form. From the standard form we can apply the stability theorem for interconnected dissipative systems. We will derive three different criteria for testing the stability of RNNs through different choices of supply rate functions. 3.1 Dissipative systems 3.1.1 Static systems Consider the system y = f(u) (3.1) where u ∈ U ⊆ Rm, y ∈ Y ⊆ Rl, f : U → Y and 0 ∈ U. Without loss of generality, let f(0) = 0. Definition 3.1 The system (3.1) is called a static system if the outputs y depend only on current values of inputs u. 11 3.1.2 Discretetime dynamical systems There are several possible representations of discretetime dynamical systems. The representation used in [15] is x(k + 1) = h(x(k), u(k)) y(k) = g(x(k), u(k)) (3.2) where x(k) ∈ D ⊆ Rn, u ∈ U ⊆ Rm, y ∈ Y ⊆ Rl, h : D × U → Rn, g : D × U → Y , and D is open and contains 0. It is assumed that h and g are continuous mappings, and h has at least one equilibrium point. Without loss of generality, suppose h(0, 0) = 0 and g(0, 0) = 0. 3.1.3 Dissipative systems Definition 3.2 A function r : U×Y → R is a supply rate for a given system if the following conditions are satisfied: 1. r(u, y) = 0 if and only if u = 0. 2. Σk=k2 k=k1 r(u(k), y(k)) < ∞, for all k1, k2 ∈ Z+. Definition 3.3 The discretetime dynamical system (3.2) is said to be dissipative with respect to the supply rate r(u, y) if there exists a continuous radially unbounded, positive definite function V : D → R satisfying V (0) = 0 and the following inequality V (x(k + 1)) − V (x(k0)) ≤ Σk i=k0 r(u(i), y(i)) (3.3) holds for all k0, k ∈ Z+ such that k ≥ k0, where x(k) is the solution of system (3.2) with u ∈ U. The function V is called a storage function. If the inequality in (3.3) is strict for x(k) ̸= 0, then system (3.2) is said to be strictly dissipative. Definition 3.4 The static system (3.1) is said to be dissipative with respect to the supply rate r(u, y) if r(u, y) ≥ 0 for all u ∈ U. If the inequality is strict for u ̸= 0, then (3.1) is strictly dissipative with respect to r(u, y). 12 Definition 3.5 [15, p. 803] The system (3.2) is said to be zerostate observable if u(k) = 0 and y(k) = 0 implies x(k) = 0. Theorem 3.1 If the storage function V satisfies V (x(n + 1)) − V (x(n)) ≤ r(u(n), y(n)) for all n ∈ Z+, then (3.2) is dissipative with respect to supply rate r(u, y). Moreover, if the inequality is strict for x(n) ̸= 0, then (3.2) is strictly dissipative with respect to r(u, y). Proof. We will prove that the system is dissipative. The proof for the strictly dissipative case follows in a similar manner. Since the inequality holds for all n ∈ Z+, it holds for k0, k0 + 1, ..., k − 1, k where k0 ∈ Z+. Thus we get V (x(k0 + 1)) − V (x(k0)) ≤ r(u(k0), y(k0)) V (x(k0 + 2)) − V (x(k0 + 1)) ≤ r(u(k0 + 1), y(k0 + 1)) ... V (x(k + 1)) − V (x(k)) ≤ r(u(k), y(k)) Adding these inequalities together, we have V (x(k + 1)) − V (x(k0)) ≤ Σk i=k0 r(u(i), y(i)) Therefore, (3.2) is dissipative with respect to r(u, y). 3.2 Stability In this section we will define stability for discretetime systems and then update Theorem 2.1 on the stability of interconnected dissipative systems to include the case of static and discretetime subsystems. Consider system (3.2) with equilibrium point xe = 0. 13 Definition 3.6 [15, p. 765] The equilibrium point xe is said to be Lyapunov stable if for all ϵ > 0 there exists δ > 0 such that if x(0) < δ then x(k) < ϵ for all k ∈ Z+. In addition, it is said to be globally asymptotically stable (GAS) if it is Lyapunov stable and for all x(0) ∈ Rn, limk→∞ x(k) = 0. 3.2.1 STABILITY ANALYSIS OF RECURRENT NEURAL NETWORKS USING DISSIPATIVITY Consider the interconnected system shown in Fig. 3.1, where subsystem 1 is a static system of the form (3.1), subsystem 2 is a discretetime dynamical system of the form (3.2), u1(k) = −y2(k) and u2(k) = y1(k). Assume that f(0) = 0 and the system 2 has an equilibrium point xe = 0. Σ1 Σ2 u1(k) y2(k) y1(k) u2(k) Figure 3.1: Interconnected systems Theorem 3.2 If subsystem 1 is dissipative with respect to the supply rate r1(u1(k), y1(k)), subsystem 2 is dissipative with respect to the supply rate r2(u2(k), y2(k)), and r1(u1(k), y1(k)) + r2(u2(k), y2(k)) ≤ 0, then the origin of the interconnected system is stable. Moreover, the origin of the system is globally asymptotic stable (GAS), if the system 2 is zero state observable and one of the following additional conditions holds: 1. Either 1 is strictly dissipative with respect to the supply rate r1(u1(k), y1(k)) or 2 is strictly dissipative with respect to the supply rate r2(u2(k), y2(k)). 14 2. r1(u1(k), y1(k)) + r2(u2(k), y2(k)) < 0 when the variables are not all equal to zero. Proof. Since 1 is static and dissipative with respect to the supply rate r1(u1(k), y1(k)), r1(u1(k), y1(k)) ≥ 0. Since subsystem 2 is dissipative with respect to the supply rate r2(u2(k), y2(k)), there exists a storage function V2(x2(k)) such that V2(x2(k + 1)) − V2(x2(k)) ≤ r2(u2(k), y2(k)). Since r1(u1(k), y1(k)) ≥ 0, r2(u2(k), y2(k)) ≤ −r1(u1(k), y1(k)) ≤ 0. Thus V2(x2(k + 1))−V2(x2(k)) ≤ 0. V2(x2(k)) is a Lyapunov function for the closed loop system, therefore the origin of the system is stable. If either the subsystem 1 or the subsystem 2 is strictly dissipative, then V2(x2(k + 1)) − V2(x2(k)) < 0. If both subsystem 1 and subsystem 2 are dissipative, and r1(u1(k), y1(k)) + r2(u2(k), y2(k)) < 0, then V2(x2(k + 1)) − V2(x2(k)) < 0. In either case, if V2(x2(k+1))−V2(x2(k)) = 0 then r1(u1(k), y1(k)) = r2(u2(k), y2(k)) = 0. So u1(k) = 0 and u2(k) = 0 by definition of supply rate. Thus y2(k) = 0 by constraints of interconnected systems. Since the system 2 is zero state observable, x2(k) = 0 . This shows that the origin of the system is GAS. 3.2.2 Example use of dissipativity on a simple network To introduce our proposed method, we start with an example of a recurrent neuron with a zero bias x(k + 1) = tanh(wx(k)). (3.4) 15 This system (3.4) has an equilibrium point xe = 0, and the transfer function tanh satisfies the condition 0 ≤ tanh(u) u ≤ 1. Now let’s transform the system (3.4) into two interconnected subsystems as in Fig. 3.1, where the subsystem 1 is y1(k) = tanh(u1(k)), the subsystem 2 is y2(k) = −wu2(k − 1), and u1(k) = −y2(k) and u2(k) = y1(k). Let’s choose a supply rate r1(u1(k), y1(k)) = u21 (k) − y2 1(k). Then r1(u1(k), y1(k)) = u21 (k) − tanh2(u1(k)). Thus r1(u1(k), y1(k)) ≥ 0. Therefore by definition 3.4, the system 1 is dissipative with respect to the supply rate r1(u1(k), y1(k)). Let r2(u2(k), y2(k)) = −r1(u1(k), y1(k)). Then r2(u2(k), y2(k)) = −u21 (k) + y2 1(k) = u22 (k) − y2 2(k) = u22 (k) − w2u22 (k − 1). Let x2(k) = u2(k − 1). Then x2(k + 1) = u2(k) and y2(k) = −wx2(k). If u2(k) = 0 and y2(k) = 0, then x2(k) = 0. So the system 2 is zero state observable. Choose V2(x2(k)) = qx22 (k), where q > 0. Then V2(x2(k)) = qu22 (k − 1) and V2(x2(k + 1)) = qu22 (k). We claim that if w2 < 1, then the origin of (3.4) is GAS. Since w2 < 1, there exists q > 0 such that w2 < q < 1. So (1 − q)u22 (k) + (q − w2)u22 (k − 1) > 0. 16 It follows that u22 (k) − w2u22 (k − 1) > qu22 (k) − qu22 (k − 1). Thus V2(x2(k + 1)) − V2(x2(k)) < r2(u2(k), y2(k)). Therefore the system 2 is strictly dissipative with respect to the supply rate r2(u2(k), y2(k)). By Theorem 3.2, the origin is GAS for the closed loop system. In the next section, this result is generalized for LDDNs. 3.3 Stability analysis of Layered Digital Dynamic Networks (LDDNs) In this section, we introduce a general framework for representing recurrent neural networksthe Layered Digital Dynamic Network. We then show how this class of network can be represented in a standard form, and how the standard form can be represented in the interconnected system form, so that Theorem 3.2 can be used to demonstrate stability. By selecting different supply rate functions, we develop three different criteria for determining stability of LDDNs. 3.3.1 LDDNs In this section, we want to describe the LDDN framework, first introduced in [9]. An example LDDN is shown in Fig. 3.2. The net input nm(k) for layer m of an LDDN can be computed nm(k) = Σ l∈Lf m Σ d∈DLm;l LWm;l(d)al(k − d) + Σ l∈Im Σ d∈DIm;l IWm;l(d)pl(k − d) + bm (3.5) where pl(k) is the lth input to the network at time k, IWm;l is the input weight between input l and layer m, LWm;l is the layer weight between layer l and layer m, bm is the bias vector for layer m, DLm;l is the set of all delays in the tapped delay line between layer l 17 and layer m, Im is the set of indices of input vectors that connect to layer m, and Lf m is the set of indices of layers that connect directly forward to layer m. The output of layer m is am(k) = fm(nm(k)) (3.6) form = 1, 2, · · · , M. The set ofM paired equations (3.5) and (3.6) describes the LDDN. 3.3.2 Transform LDDNs to a standard form The LDDN is described by (3.5) and (3.6). Our goal in this section is to transform the LDDN into the form of (3.7), which we will call the standard form. It is assumed that the matrix W2 has the property that it can be transformed into a strictly triangular matrix through a reordering of the elements of the state vector x. This guarantees that x(k) can be solved iteratively from some initial x(k0). This is equivalent to the condition that the LDDN contains no zerodelay loops. x(k + 1) = f(W1x(k) +W2x(k + 1) + b) (3.7) Assume that inputs pl(k) are constant. Let Sm be the number of neurons in layer m. Let hm = Σ l∈Im Σ d∈DIm;l IWm;l(d)pl(k − d) + bm, b = [(h1)T 0 · · · (h2)T 0 · · · (hM)T · · · 0]T( S×1), dl = max {d ∈ DLm;lm = 1, 2, ..., M} , where S = ΣM i=1 Sidi. Consider layer m for m = 1, 2, ..., M. Let xm1 (k) = am(k − 1), xm2 (k) = am(k − 2), ..., xmd m(k) = am(k −dm). Then xm1 (k + 1) = am(k), xm2 (k + 1) = am(k −1), ..., xmd m(k + 1) = am(k − dm + 1). Let xm(k) = [xm1 (k) xm2 (k) ... xmd m(k)]T (Smdm×1). Let x(k) = [x1(k) x2(k) ... xM(k)]T( S×1). Then we get the standard form (3.7), where W1 and W2 are defined in (3.8) and (3.9), respectively. 18 S 1 x 1 S 2 x 1 S 3 x 1 S 1 x 1 S 2 x 1 S 3 x 1 S 1 x 1 S 2 x 1 S 3 x 1 R x 1 1 S 1 xR S 2 x S 1 S 3 x S 2 S1 S2 S3 n1(t) n2(t) n3(t) p1(t) a1(t) a2(t) a3(t) IW1,1 LW1,3 LW2,3 LW1,1 LW2,1 LW3,2 b1 b2 b1 1 1 3 R1 Inputs Layer 1 Layer 2 Layer 3 T D L T D L T D L T D L f 1 f 2 f 3 Figure 3.2: Example LDDN network W1 = [W1 i;j ](S×S) (3.8) where W1 i;i = LWi;i(1) LWi;i(2) · · · LWi;i(di − 1) LWi;i(di) ISi 0 · · · 0 0 0 ISi · · · 0 0 ... ... . . . ... ... 0 0 · · · ISi 0 (Sidi×Sidi) and W1 i;j = LWi;j(1) LWi;j(2) · · · LWi;j(dj − 1) LWi;j(dj) 0 0 · · · 0 0 0 0 · · · 0 0 ... ... . . . ... ... 0 0 · · · 0 0 (Sidi×Sjdj ) if i ̸= j. W2 = [W2 i;j ](S×S) (3.9) 19 where W2 i;j = [0](Sidi×Sjdj ) if i ≤ j, and W2 i;j = LWi;j(0) 0 · · · 0 0 0 · · · 0 ... ... . . . ... 0 0 · · · 0 (di×dj ) if i > j. Also, f = f1 f2 ... fm (S×1) where fi = fi idi ... idi (diSi×1) (3.10) where ISi is an identity matrix with dimensions (Si×Si), 0 is a zero matrix with appropriate dimensions, idi is a vector of identity functions, and fi is a vector of transfer functions of layer i for i = 1, 2, ..., M. For an LDDN, the order in which the individual layer outputs must be computed to obtain the correct network output is called the simulation order (see [9]). Here we have assumed that the layers are numbered so that the simulation order (which need not be unique) increases with layer number. Suppose that the LDDN of equations (3.5) and (3.6) has an equilibrium point. Then system (3.7) has an equilibrium point. Let xe be that equilibrium point. Then xe satisfies xe = f(W1xe +W2xe + b). Let z(k) = x(k) − xe. Then z(k + 1) = f(W1z(k) +W2z(k + 1) + te) − f(te) (3.11) where te = W1xe +W2xe + b. (3.12) Therefore system (3.11) has an equilibrium point ze = 0. If this equilibrium point is GAS, then the equilibrium point of system (3.7) and that of the LDDN are also GAS. The next step is to transform system (3.11) into the interconnected system form shown in Fig. 3.1. 20 Let u1(k) = −y2(k) = W1z(k) +W2z(k + 1) u2(k) = y1(k) = z(k + 1). Then subsystem 1 is y1(k) = f(u1(k) + te) − f(te) = g(u1(k)), (3.13) and subsystem 2 is y2(k) = −W1u2(k − 1) −W2u2(k), (3.14) and the constraints are u1(k) = −y2(k), u2(k) = y1(k). Define x2(k) = u2(k−1), then x2(k+1) = u2(k) and y2(k) = −W1x2(k)−W2u2(k). In this case, subsystem 1 is a static system of the form (3.1), and 2 is a dynamic system of the form (3.2). If u2(k) = 0 and y2(k) = 0 then x2(k) = 0, so the system 2 is zero state observable. Since ze = 0 is the equilibrium point of (3.11), it is also the equilibrium point of subsystems 1 and 2. Therefore, if the origin of the interconnected systems is GAS, then the equilibrium point of the LDDN is also GAS. To analyze stability of the equilibrium point, we follow Theorem 3.2. The following sections will perform this analysis and will develop several stability criteria. 3.3.3 Sector conditions Consider a scalar static system of the form (3.1), y = f(u). The function f is said to lie inside a sector [α, β] (written as f ∈ [α, β]) if α ≤ f(u) u ≤ β, ∀u ̸= 0 and f(0) = 0. This is called a sector condition. An example function f(u) (solid curve) and its bounds αu and βu (dashed lines) are shown in Fig. 3.3. 21 y = α∙u 0y y = β∙uy u= f(u) Figure 3.3: The function f(u) satisfying sector conditions 3.3.4 Supply rate using sector conditions  static/scalar It is possible to use sector conditions to design supply rate functions for static systems. In this section we consider the scalar version of the static system (3.1). Lemma 3.1 If the function f ∈ [α, β], and the supply rate is chosen as either r(u, y) = β2u2 − y2 (using the sector upper bound) or r(u, y) = (βu − y)(y − αu), (using the sector upper and lower bounds), then the scalar static system y = f(u) is dissipative with respect to the supply rate r(u, y). Proof. From the sector condition, it follows that β2u2 ≥ y2, (βu − y)(y − αu) ≥ 0. Thus r(u, y) = β2u2 −y2 ≥ 0 and r(u, y) = (βu −y)(y − αu) ≥ 0. In either case, r(u, y) ≥ 0. Thus the static system is dissipative with respect to the supply rate r(u, y) by definition 3.4. 22 3.3.5 Supply rate using sector conditions  static/vector In the case that system (3.1) is multiinput multioutput, let’s assume u = [u1 u2 · · · un]T , y = [y1 y2 · · · yn]T , and f = [f1 f2 · · · fn]T . We will consider functions f such that yi = fi(ui). Suppose that fi ∈ [αi, βi], for i = 1, 2, ..., n. Let A = diag(αi) and B = diag(βi). Lemma 3.2 If the supply rate is chosen as either r(u, y) = uTB2u − yT y or r(u, y) = (Bu − y)TT(y − Au) + uT y where T is a positive definite diagonal matrix and is a diagonal matrix with nonnegative elements, then the system y = f(u) is dissipative with respect to the supply rate r(u, y). Proof. Let T = diag(ti) and = diag(λi) where ti > 0, λi ≥ 0 for i = 1, 2, ..., n. Since fi ∈ [αi, βi], u2i β2 i − y2 i ≥ 0, so uTB2u − yT y = Σn i=1 (u2i β2 i − y2 i ) ≥ 0. Since ti > 0, λi ≥ 0 and fi ∈ [αi, βi], (βiui − yi)ti(yi − αiui) + uiλiyi ≥ 0, so (Bu − y)TT(y − Au) + uT y = Σn i=1 [(βiui − yi)ti(yi − αiui) + uiλiyi] ≥ 0. In either case, r(u, y) ≥ 0. Thus the system is dissipative with respect to the supply rate r(u, y) by definition 3.4. 3.3.6 Selecting the supply rate for LDDNs Consider an LDDN that has been put into the standard form (3.11) and then put into the interconnected systems form of (3.13) and (3.14). Assume that the function gi ∈ [αi, βi] for i = 1, 2, ..., S. Let A = diag(αi) and B = diag(βi). Choose V2(x2(k)) = xT2 (k)Qx2(k), where Q is a positive definite matrix. Supply rate using sector upper bounds Choose r1(u1(k), y1(k)) = uT1 (k)B2u1(k)−yT1 (k)y1(k). Then the system 1 is dissipative with respect to the supply rate r1(u1(k), y1(k)) by Lemma 3.2. Choose r2(u2(k), y2(k)) = 23 −r1(u1(k), y1(k)). Then r1(u1(k), y1(k)) + r2(u2(k), y2(k)) = 0. Using the constraints of interconnected systems, we get r2(u2(k), y2(k)) = uT2 (k)u2(k) − yT2 (k)B2y2(k). Thus r2(u2(k), y2(k)) = uT2 (k)u2(k) − [W1u2(k − 1) +W2u2(k)]TB2[W1u2(k − 1) +W2u2(k)] = uT2 (k)u2(k) − [uT2 (k − 1)(W1)TB2W1u2(k − 1) + uT2 (k)(W2)TB2W1u2(k − 1) + uT2 (k − 1)(W1)TB2W2u2(k) + uT2 (k)(W2)TB2W2u2(k)]. Let P1 = I − Q − (W2)TB2W2 −(W2)TB2W1 −(W1)TB2W2 Q − (W1)TB2W1 (3.15) . Theorem 3.3 If a positive definite matrix Q can be found such that the matrix P1 is positive definite, then the equilibrium point of the LDDN is GAS. Proof. Since P1 is positive definite, [uT2 (k) uT2(k − 1)]P1[uT2 (k) uT2 (k − 1)]T > 0. Thus uT2 (k)[I − Q − (W2)TB2W2]u2(k) − uT2 (k)(W2)TB2W1u2(k − 1) − uT2 (k − 1)(W1)TB2W2u2(k) + uT2 (k − 1)[Q − (W1)TB2W1]u2(k − 1) > 0. It follows that V2(x2(k+1))−V2(x2(k)) < r2(u2(k), y2(k)). Thus the system 2 is strictly dissipative with respect to the supply rate r2(u2(k), y2(k)). In addition, the system 1 is dissipative with respect to the supply rate r1(u1(k), y1(k)), as proved above. Therefore the origin of the interconnected system is GAS by Theorem 3.2. Consequently, the equilibrium point of the LDDN is GAS. 24 Supply rate using sector lower and upper bounds Choose r1(u1(k), y1(k)) = (y1(k)−Au1(k))TT(Bu1(k))−y1(k))+uT1 (k) y1(k), where T is a positive definite diagonal matrix and is an diagonal matrix with nonnegative elements. Then the system 1 is dissipative with respect to the supply rate r1(u1(k), y1(k)) by Lemma 3.2. Choose r2(u2(k), y2(k)) = −r1(u1(k), y1(k)). Then r1(u1(k), y1(k))+r2(u2(k), y2(k)) = 0. Using the constraints of interconnected systems, we get r2(u2(k), y2(k)) = (u2(k) + Ay2(k))TT(By2(k)) + u2(k)) + yT2 (k) u2(k). Thus r2(u2(k), y2(k)) = [(I − AW2)u2(k) − AW1u2(k − 1)]T × T[(I − BW2)u2(k) − BW1u2(k − 1)] + [−W1u2(k − 1) −W2u2(k)]T u2(k) = uT2 (k)(I − AW2)TT(I − BW2)u2(k) + uT2 (k − 1)(W1)TBTAW1u2(k − 1) − uT2 (k)(I − AW2)TT(BW1)u2(k − 1) − uT2 (k − 1)(AW1)TT(I − BW2)u2(k) − uT2 (k − 1)(W1)T u2(k) − uT2(k)(W2)T u2(k). Let P2 = P11 P12 P21 P22 (3.16) 25 where P11 =T − 1 2 (W2)T (B + A)T − 1 2 T(B + A)W2 + (W2)TBTAW2 − Q − 1 2 ((W2)T + W2), P12 = − 1 2 T(A + B)W1 + (W2)TBTAW1 − 1 2 W1, P21 =P12T , P22 =Q + (W1)TBTAW1. Theorem 3.4 The equilibrium point of the LDDN is GAS if a positive definite matrix Q, a positive definite diagonal matrix T and a positive semidefinite diagonal matrix can be found such that the matrix P2 is positive definite. Proof. The proof is similar to the proof of Theorem 3.3, but the matrix P2 is used in place of the matrix P1. Supply rate using general quadratic form Choose r1(u1(k), y1(k)) = vT Fv where v = [u1(k) y1(k)]T and F = F11 F12 F21 F22 . Assume that F satisfies the condition r1(u1(k), y1(k)) ≥ 0. (3.17) Let’s define P3 = P11 P12 P21 P22 26 where P11 = − F22 − 0.5 ∗ (F21 + FT 12)W2 − 0.5(W2)T (F12 + FT 21) − (W2)TF11W2 − Q P12 = − (W2)TF11W1 − 0.5 ∗ (F21 + FT 12)W1 P21 = − (W1)TF11W2 − 0.5 ∗ (W1)T (F12 + F21) P22 =Q − (W1)TF11W1. Theorem 3.5 The equilibrium point of the LDDN is GAS if there exists a matrix F satisfying condition (3.17) and a positive definite matrix Q such that the matrix P3 is positive definite. Proof. The proof is similar to the proof of Theorem 3.3, but the matrix P3 is used in place of the matrix P1. We can see that Theorems 3.3 and 3.4 are special cases of Theorem 3.5. • If F11 = B2, F22 = −I and F12 = F21 = 0 then P3 = P1. • If F11 = −BTA, F22 = −T, F12 = BT + and F21 = TA then P3 = P2. Since the output y1(k) is a static function of u1(k), the matrix F doesn’t need to be positive definite. As we can see in Theorem 3.3, F11 = B2 is a positive definite matrix, F22 = −I is negative definite and F12 = F21 = 0. So in this case the matrix F is not positive definite. Based on this theorem we may find other stability criteria for the standard form. Conclusions In this section, we have found three new conditions for globally asymptotic stability of equilibrium points of LDDNs. The different conditions were derived by selecting different supply rate functions. In Theorem 3.3, we used the upper bound on the sector condition of the static subsystem. In Theorem 3.4, we used both the sector upper and lower bounds to define the supply rate. In Theorem 3.5, we used a general quadratic supply rate. 27 3.3.7 Stability analysis of other types of Recurrent Neural Networks Now we will apply Theorems 3.3, 3.4 and 3.5 to analyze the stability analysis of other types of RNNs. First we consider the network given by Barabanov [4, p. 292]. This network can be transformed into standard form (3.7) as follows. Let x(k) = [x1(k) x2(k) ... xn(k)]T , b = [b1 b2 ... bn]T , W1 = W1 0 0 · · · Vn 0 W2 0 · · · 0 0 0 W3 · · · 0 ... ... ... . . . ... 0 0 0 · · · Wn , and W2 = 0 0 0 · · · 0 V1 0 0 · · · 0 0 V2 0 · · · 0 ... ... ... . . . ... 0 0 0 Vn−1 0 . The next step is to apply Theorems 3.3, 3.4 and 3.5 to check stability of the equilibrium point of this network. Some examples are shown in the next chapter. Next, we consider a class of RNNs given in [1, p. 955], [25, p. 1105], [26, p. 1373] and [27, p. 130]. The network has the form z(k + 1) = Dz(k) + Eg(Wz(k) + s1) + s2 (3.18) where D, E and W are matrices with appropriate sizes, s1 and s2 are bias vectors with appropriate size and g is a vector of activation functions in the network. We can transform 28 this system into standard form (3.7) by defining new state variables as follows: x1(k + 1) = g(Wz(k) + s1) x2(k) = z(k) Then x1(k + 1) = g(Wx2(k) + s1) x2(k + 1) = Dx2(k) + Ex1(k + 1) + s2). So we have W1 = 0 W 0 D ,W2 = 0 0 E 0 , (3.19) f = [g id]T and b = [s1 s2]T , where 0 is a zero matrix with appropriate size. Thus system (3.18) is in standard form (3.7). The strategy for stability analysis of RNNs is to transform the network into the interconnected systems form and then to apply Theorem 3.2. If the network can be represented in standard form (3.7), then we can use Theorems 3.3, 3.4 and 3.5. 3.4 Conclusions This chapter has developed new criteria for analyzing the stability of recurrent neural networks. A general framework for representing recurrent networks was presented, and we demonstrated how these networks could be put into a standard form and then into an interconnected systems form. This enabled us to use Theorem 3.2 to derive stability conditions. The key to the development of the stability criteria was to establish sector conditions on the static subsystem. By using the sector bounds, we were able to define different supply rate functions. It is the flexibility in selecting the supply rate function that makes the use of dissipativity theory so attractive for stability analysis of recurrent neural networks. 29 In the next chapter we will demonstrate the performance of our new stability criteria on a variety of recurrent networks. We will also compare our new criteria with other stateofthe art methods. 30 CHAPTER 4 COMPARISON OF STABILITY CRITERIA ON TEST PROBLEMS In this chapter, we will compare our new dissipativitybased (DB) criteria with Liu’s criterion [6, p.1382] and Barabanov’s LMI criterion [18, p. 4554] on 23 test problems. The test neural networks are in the form of (5) in [4, p. 294], (3.7) or (4.3). We have chosen some networks that have been introduced in previous papers by other authors. We have also used new networks that we have developed. We have chosen networks based on the difficulty of determining the stability of their equilibrium points. 4.1 Sector conditions We will use networks with activation function a = tanh(n). Therefore, as part of the stability analysis, we will need to find upper and lower bounds of the sector of the following function f(u) = tanh(u + t) − tanh(t), (4.1) where t is a constant. In order to determine sector upper bounds, we will determine the argument that maximizes the value of function f(u) u . The following lemma can be used to determine the lower bound of the function (4.1). Lemma 4.1 [4, p. 298] Consider a function (4.1). If u + t ≤ r, then f(u) u ≥ ν = tanh(r)−tanh(t) r−t . If t = r, then ν = d(tanh(u)) du at u = r. There are several ways to find r, depending on the type of system. We will discuss this later. 31 Since we will be comparing our DB criteria with those of Liu and Barabanov, we will give a brief description of their criteria in the next two sections. 4.2 Existing stability criteria 4.2.1 Liu’s criterion Liu’s model is called the Discretetime Delayed Standard Neural Network Model (DDSNNMs) [6, p. 1378]: x(k + 1) = Alx(k) + Bp (ξ(k)) + Bpd (ξ(k − h)) ξ(k) = Cqx(k) + Dp (ξ(k)) + Dpd (ξ(k − h)) (4.2) Assume i(ξi) ∈ [qi, ui], ui > qi ≥ 0 and h is constant. Define Q = diag(qi) and U = diag(ui). Theorem 4.1 [6, p. 1382] The origin of the DDSNNM (4.2) is globally asymptotically stable, if there exist symmetric positivedefinite matrices P and , and diagonal positive semidefinite matrices and T, such that the following matrix is negative definite G = G11 G12 G13 G21 G22 G23 G31 G32 G33 32 where G11 =ATl PAl − P − 2CTq TQUCq G12 =ATl PBp + CTq − 2CTq TQUDp + CTq (Q + U)T G13 =ATl PBpd − 2CTq TQUDpd G21 = GT 12 G31 = GT 13 G32 = GT 23 G22 =BTp PBp + Dp + DTp + − 2DTp TQUDp − 2T + DTp (Q + U)T + T(Q + U)Dp G23 =BTp PBpd + Dpd − 2DTp TQUDpd + T(Q + U)Dpd G33 =BT pdPBpd − − 2DT pdTQUDpd 4.2.2 Barabanov and Prokhorov’s LMI criterion Consider the RNN [18, p. 4553] x(k + 1) = Dx(k) + Etanh(Wx(k) + s1). (4.3) Let z be the equilibrium point of (4.3). Then z = Dz + Etanh(Wz + s1) Define y = x − z and c = Wz + s1. Then y(k + 1) = Dy(k) + Eη(k) η(k) = tanh(σ(k) + c) − tanh(c) (4.4) where σ(k) = Wy(k). Assume ηi(σi) ∈ [νi, μi] for i = 1, 2, ...,m where m is the length of s1. Let N = diag(μi) and M = diag(νi). 33 Lemma 4.2 [18, p. 4554] If ci ≥ cj  then βij = 1 − tanh(αsign(ci)cj) 1 − tanh(αci) otherwise βij = 1 − tanh(αsign(cj)ci) 1 − tanh(αcj ) where α > 0. By Lemma 4.2, ηTG(αWy − η) ≥ 0, where G = {gij} is a symmetric positive definite matrix that satisfies gij < 0 for i ̸= j, gii + Σm j=1;j̸=i βijgij > 0 (4.5) for all i = 1, ...,m. Theorem 4.2 [18, p. 4554] Consider the system (4.3). Assume D + EMW is stable. If there exists diagonal positive definite matrix and symmetric positive definite matrices H and G, such that G satisfies the condition (4.5) and = 11 12 21 22 is negative definite, then the equilibrium point is GAS, where 11 =DTHD − H − WTNMW 12 =WT ((M + N) + Gα)/2 + DTHD 21 = T 12 22 =ETHE − − G To compute the lower bound matrix M, we will use Lemma 4.1. Thus we have to find rj such that σj + cj  ≤ rj for j = 1, 2, ...,m. In the general case for D ̸= 0 we can choose M = 0. If D = 0, then yi ≤ Σm j=1 Eij  for i = 1, 2, ..., n. Let’s define γi = Σm j=1 Eij  and rj = Σn i=1 Wjiγi. Therefore σj + cj  ≤ rj . 34 4.3 Description of test problems We will compare our new DB stability criteria with those of Liu and Barabanov on 22 different test networks. All of the networks have a GAS equilibrium point. For the first 12 networks, the DBbased criteria are able to prove stability. For the next 10 networks, DB criteria are not able to detect stability. The final test network (the 23rd test) is an LDDN network that cannot be put into Liu’s or Barabanov’s form, so their methods cannot be applied to this network. Even in those cases where the methods are not able to prove stability, we would like to measure how close they come. Each of the criteria involves finding matrices that are definite. For the three DB criteria, we attempt to find matrices P1, P2 and P3 that are positive definite. (For the test problems, we will use only P2, from Theorem 3.4, which has produced the best results.) For Liu’s and Barabanov’s methods, we are looking for G and matrices that are negative definite. To measure how close we come to finding matrix P2 that is positive definite, we will find its minimum eigenvalue, which we will label p2. If p2 is positive, then GAS is proved. If it is negative, then we cannot prove stability. However, even in this case, the closer the value is to zero, the closer the algorithm has come to identifying stability. This concept applies also to the maximum eigenvalues of G and , which we will label g and φ. If these values are negative, then GAS is proved. If they are positive, we have not proven stability. However, the closer g and φ are to zero, the closer the algorithm has come to identifying stability. All of the 23 test problems are described in the appendix. They are represented in either the standard form (3.7), in which case W1, W2, b and f will be provided, or in the Barabanov form (4.3), in which case W, E and s1 will be given (D = 0 for the Barabanov test problems). On the first 12 test problems, all matrices for the DB criterion, Liu and Barabanov’s criteria are also provided in the appendix. In the next section we analyze stability of neural networks with stable equilibrium points where our DB methods can prove stability. In the following section we focus on 35 neural networks with stable equilibrium points where these methods cannot prove stability. Finally, we will analyze the stability of an LDDN, where Liu’s and Barabanov’s methods cannot be applied. 4.4 Stable equilibria that can be proved stable In this section, we analyze the stability of equilibrium points for test problems 1 to 12. The first four test problems were first presented in previously published papers. Test problem 1 is a special case of standard form (3.7) where W2 = 0, test problem 2 has the form of (4.3) and test problems 3 and 4 are have the form of (5) in [4, p. 294]. The remaining 8 test problems are in the form of (4.3) with D = 0. We will use the new DB criteria, Liu’s criterion and Barabanov’s criterion to check stability of equilibria. Eigenvalues p2, g, and φ for each test problem are shown in Table 4.1. The equilibrium point is proven to be GAS if p2 > 0, g < 0 or φ < 0. Thus the criteria P2, G and all proved that the equilibrium points of test problems 1 through 12 are GAS. From this table we can see that our DB criterion is as tight as both Liu’s criterion [6, p. 1382] and Barabanov’s LMI criterion [18, 4554] on the first 12 test problems. Fig. 4.1 and Fig. 4.2 are representative of the types of responses that we have in these problems. We have oscillatory responses as well as overdamped responses. Table 4.2 gives an approximate measure of how quickly the systems converge to the equilibrium point from a random initial condition. Because the systems are nonlinear, these convergence times will change with the initial conditions, but these numbers are representative. In the next section we will investigate systems with GAS equilibria for which the DB stability criteria cannot determine stability. 36 TP. p2 g φ 1 0.34743 −0.57805 −0.49272 2 14.625 −147.31 −128.83 3 7.1617 −4.7596 −1.5471 4 34.342 −45.769 −75.599 5 5.9322 −61.345 −42.564 6 1.8062 −7.5758 −8.5027 7 1.1116 −34.491 −22.097 8 29.101 −136.46 −212.8607 9 59.747 −242.07 −393.64 10 19.376 −119.87 −156.07 11 6.8088 −30.292 −16.997 12 0.2592 −12.421 −8.8994 Table 4.1: Eigenvalues of matrices −0.49 −0.485 −0.48 −0.475 −0.47 −0.465 −0.46 −0.455 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x 1 x 2 Figure 4.1: Trajectory for test problem 1 37 −7 −6 −5 −4 −3 −2 −1 0 1 2 −35 −30 −25 −20 −15 −10 −5 0 5 x 1 x 2 Figure 4.2: Trajectory for test problem 2 TP. N 01 15 02 14 03 28 04 35 05 06 06 07 07 05 08 10 09 07 10 07 11 03 12 08 Table 4.2: Time steps N to convergence 38 4.5 Stable equilibria that cannot be proved stable In this section, we will use our DB criterion, Liu’s criterion and Barabanov’s LMI criterion to check stability of equilibrium points of test problems 13 to 22. All of these test neural networks are in the form of (4.3) with D = 0, except test problem 13, which is a special case of (3.7) where W2 = 0. The equilibrium points here are all GAS. The best values of p2, g and φ are shown in Table 4.3. Barabanov’s LMI criterion was the only one that was TP. p2 g φ 13 −4.35 ∗ 10−8 1.16 ∗ 10−7 4.0163 14 −4.16 ∗ 10−4 6.57 ∗ 10−8 −2.8767 15 −2.52 ∗ 10−4 7.67 ∗ 10−5 −0.24913 16 −5.63 ∗ 10−4 5.63 ∗ 10−8 1.4 ∗ 10−4 17 −3.18 ∗ 10−5 3.72 ∗ 10−8 4.47 ∗ 10−5 18 −0.0033 4.67 ∗ 10−7 2.7698 19 −0.0123 3.03 ∗ 10−7 2.53 20 −1.80 ∗ 10−4 2.29 ∗ 10−6 7.98 ∗ 10−5 21 −2.45 ∗ 10−4 4.09 ∗ 10−7 3.64 ∗ 10−5 22 −1.84 ∗ 10−5 1.71 ∗ 10−8 8.57 ∗ 10−6 Table 4.3: Eigenvalues of matrices able to prove stability of equilibria in any of the test problems, and it only worked for test problems 14 and 15. In these two networks, the matrixWis an identity matrix and the bias vector s1 = 0. Fig. 4.3 and Fig. 4.4 are representative of the responses of test problems 13 through 22. All of these responses are oscillatory. Table 4.4 shows approximate convergence times from random initial conditions. We can see that, although all of the responses are oscillatory, there is a wide range of response times  some of them in the same range as the first 12 systems (see Table 4.2). It seems clear that when the system is of higher dimension, has 39 oscillatory behavior and has slow convergence, it becomes more difficult for all methods to determine stability. However, these parameters do not completely determine the success of the various methods. In terms of the eigenvalues,  g is smallest for all test problems, except 13, 14 and 15. For test problems 18 and 19, the oscillation is longer (approximately 600 time steps) and the size of matrices E andWis bigger (2 × 10). In these cases,  g < p2 < φ. The weight matrices and bias vectors of test problems 20, 21 and 22 are the same size, but N is 40, 10 and 5, respectively. In this case,  g, p2, and φ became smaller, as shown in Table 4.3. −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 x 1 x 2 Figure 4.3: Trajectory for test problem 15 40 −2.34 −2.32 −2.3 −2.28 −2.26 −2.24 −2.22 −0.65 −0.6 −0.55 −0.5 −0.45 x 1 x 2 Figure 4.4: Trajectory for test problem 22 TP. N 13 30 14 300 15 3500 16 250 17 90 18 20 19 600 20 40 21 10 22 5 Table 4.4: Time steps N to convergence In summary, for test problems in the form of (4.3), the oscillation of system trajectories, increases in the sizes of system matrices, and slower convergence times tend to increase the difficulty of determining stability. Barabanov’s LMI criterion is less conservative than our criterion and Liu’s criterion on test problems 14 and 15 where the neural network is a very 41 special case of (4.3). 4.6 An example of stability analysis of LDDNs In this section, we analyze stability of test problem 23, which is an LDDN. We found p2 = 2.6051. Therefore, this proves that the equilibrium point is GAS. Since the function f10 is linear, α10 = β10 = 1. This doesn’t satisfy Liu’s condition ui > qi ≥ 0. So we cannot use Liu’s criterion. Since we cannot represent the LDDN in the form of (4.3), Barabanov’s LMI criterion cannot be applied to analyze stability of this network. Therefore, neither Liu’s criterion nor Barabanov’s LMI criterion can be applied to check stability of the equilibrium point. 4.7 Conclusions The second DB criterion, Theorem 3.4, is as tight as both Barabanov’s LMI criterion and Liu’s criterion for the first 12 test problems, but fails to prove stability of equilibria for test problems 13 to 22. Liu’s criterion also fails to prove stability of these GAS equilibrium points. Barabanov’s LMI criterion can prove stability of the equilibrium points for test problems 14 and 15 where W = I and s1 = 0. In general, when a system has more oscillatory responses, larger system matrices and slower convergence, all of the methods described in this paper have more difficulty in determining stability. To analyze the stability of LDDNs using either Liu’s criterion or Barabanov’s LMI criterion, we need to have state transformations which can convert the standard form (3.7) into their corresponding models: (4.2) and (4.3). This is not always possible, as in test problem 23. There do exist LDDNs that cannot be analyzed with either Liu’s or Barabanov’s methods. The DB methods developed in this paper can be applied to any LDDN network. Our DB criteria can be applied to analyze the stability of equilibrium points for neural networks of forms (1) in [4, p. 292], (3.7), and (4.3). 42 Another advantage of the DB criteria when compared with Barabanov’s method for neural networks in the form of (1) in [4, p. 292] is that the dimensionality of the matrices involved in the DB method are generally smaller. This is because of the use of the state space extension method in [4], which requires that the number of states be increased substantially in these cases. The dissipativity approach has not been used before for the stability analysis of recurrent neural networks. The results shown in this chapter demonstrate the promise of this approach. In addition, with the dissipativity method there is the potential for additional stability criteria to be developed. This is because of the flexibility in choosing the supply rate function. 43 CHAPTER 5 TRAINING RECURRENT NETWORKS FOR STABILITY In this chapter, we will apply the novel stability analysis methods presented in the previous chapter to the problem of training recurrent neural networks while maintaining stability. It has been shown [17] that the error surfaces of recurrent neural networks can have spurious valleys that can cause training difficulties. These valleys are caused by network instabilities. If we can maintain network stability during training, we could avoid the spurious valleys. In this chapter, we describe a new training method for maintaining network stability. The first step is to define a new performance index, which combines mean square error with the maximum eigenvalue of the matrix −P2 from (3.16). If this maximum eigenvalue is less than zero, the network is guaranteed to be stable. By minimizing this maximum eigenvalue, we have the best chance of maintaining stability and avoiding the spurious valleys. The next section describes the modified performance index, and the following section describes how the gradient of this performance index can be computed. The gradient is needed for the training algorithm, which finds the network weights and biases that minimize the performance index. 5.1 Modified performance index Let’s consider an LDDN network, as in (3.5) and (3.6). If we represent the network in standard form, we have the system (3.7). Assume that a set of training data is provided {p1, t1}, {p2, t2}, ..., {pq, tq}, (5.1) 44 where pi is an input to the network and the corresponding target response is ti. Now we define a modified performance index as follows J = 1 q Σq i=1 (ti − aM(i))T (ti − aM(i)) + σλ (5.2) where aM is the network output, σ is a constant and λ is the maximum eigenvalue of the matrix −P2 (3.16). Next we will compute the gradient of the performance index J with respect to weights and biases of the network. 5.2 Gradient computation In this work, we use the scaled conjugate gradient algorithm [28] to update weights and biases. Thus, the command ’trainscg’ in the Neural Network Toolbox of Matlab [29] will be modified to train LDDNs with the modified performance index (5.2). Let mse = 1 q Σi=q i=1 (ti − aM(i))T (ti − aM(i)) (5.3) Then ∂J ∂x = ∂(mse) ∂x + σ ∂λ ∂x (5.4) where x is a vector of weights and biases of the network. To calculate the first derivative of J with respect to weights and biases, we separately compute the first derivative of the mean square error (mse) and the first derivative of λ. We can use the standard backpropagation algorithm [9] to compute @(mse) @x . However, to compute the derivative of λ with respect to x, we need a novel development. In the next section, we will demonstrate how to find the derivative of an eigenvalue with respect to an element of the matrix. Then, we show how to find the derivative of the eigenvalue of −P2 with respect to network weights. 45 5.2.1 The derivative of eigenvalue In this section, a method for computation of eigenvalue derivatives is reviewed [30]. Let K(p) ∈ Cn×n be an nondefective matrix [30], where p is a scalar variable and C is the set of complex numbers. Let (p) ∈ Cn×n be the eigenvalue matrix of K(p) and X(p) ∈ Cn×n be a corresponding eigenvector matrix of K(p). Then K(p)X(p) = X(p) (p) (5.5) Assume K(p), (p) and X(p) are differentiable in a neighborhood of p = p0. Taking the first derivative both sides of the equation (5.5), we obtain K′ (p)X(p) + K(p)X′ (p) = X′ (p) (p) + X(p) ′ (p) (5.6) Left multiplying both sides of (5.6) by X−1(p) results in X−1(p)K′ (p)X(p) + X−1(p)K(p)X′ (p) = X−1(p)X′ (p) (p) + X−1(p)X(p) ′ (p) (5.7) Since K is nondefective, the eigenvectors are independent. Thus, there exists a matrix C such that X′ (p) = X(p)C (5.8) Plugging (5.8) into (5.7), we get X−1(p)K′ (p)X(p) − ′ (p) = − (p)C + C (p) (5.9) Let ′ (p) = diag(λk) for k = 1, 2, ...n, X−1(p) = [y1 y2 ... yn]T and X(p) = [x1 x2 ... xn]. Then, for k = 1, 2,..., n λ ′ k = yTk K′ (p)xk. (5.10) Now we want to take the first derivative of the maximum eigenvalue of the matrix K(p) with respect to p. Let λm be the maximum eigenvalue of K(p). Then λm = max(λi) for i = 1, 2, ... n. Thus λ ′ m = yT mK′ (p)xm. (5.11) 46 where vectors ym and xm are associated with the eigenvalue λm. Therefore, λ ′ m p=p0 = yT mK′ (p)p=p0xm. (5.12) We will use this method to compute the first derivative of the maximum eigenvalue λ with respect to weights in the next section. 5.2.2 The derivative of maximum eigenvalue In this section, we compute the first derivative of the maximum eigenvalue of the matrix −P2 with respect to weights. From (3.16), we define = −P2 = 11 12 21 22 (5.13) where 11 = − T + 1 2 (W2)T (B + A)T + 1 2 T(B + A)W2 − (W2)TBTAW2 + Q + 1 2 ((W2)T + W2), 12 = 1 2 T(A + B)W1 − (W2)TBTAW1 + 1 2 W1, 21 = T 12, 22 = − Q − (W1)TBTAW1, W1 i;j = [w1 i;j ]S×S and W2 i;j = [w2 i;j ]S×S. Let λm be the maximum eigenvalue of . In this case, is a real, symmetric matrix. So λm is a real number, depending on the weights. For convenience, let’s define Zk;l = [zi;j ]S×S, where zi;j = 1, if i = k and j = l 0, if others (5.14) For example, if i = 2, j = 3 and S = 3 then Z2;3 = 0 0 0 0 0 1 0 0 0 47 First, we need to compute the first derivative of with respect to a weight w, where w is either w1 i;j or w2 i;j . Keep in mind that w2 i;j = 0 when i ≤ j, so we only consider the case w2 i;j with i < j. We have ∂ ∂w = @ 11 @w @ 12 @w @ 21 @w @ 22 @w (5.15) For w = w1 i;j , ∂ 11 ∂w1 i;j =0 ∂ 12 ∂w1 i;j = 1 2 T(A + B)Zi;j − (W2)TBTAZi;j + 1 2 Zi;j ∂ 21 ∂w1 i;j =( ∂ 12 ∂w1 i;j )T ∂ 22 ∂w1 i;j = − (Zi;j)TBTAW1 − (W1)TBTAZi;j For w = w2 i;j , ∂ 11 ∂w2 i;j = 1 2 (Zi;j)T (B + A)T + 1 2 T(B + A)Zi;j − (Zi;j)TBTAW2 − (W2)TBTAZi;j + 1 2 ((Zi;j)T + Zi;j), ∂ 12 ∂w2 i;j = − (Zi;j)TBTAW1 ∂ 21 ∂w2 i;j =( ∂ 12 ∂w1 i;j )T ∂ 22 ∂w2 i;j =0 Then, let’s assume that we want to take the first derivative of λm at a certain point W1 = W10 and W2 = W20 . Let K = (W10 ,W20 ). Let = diag(λk) for k = 1, 2, ...n be the eigenvalues and X = [x1 x2 ... xn] be the corresponding eigenvectors of the matrix K, where n = 2 ∗ S. Assume X−1 = [y1 y2 ... yn]T . Let λm be the maximum eigenvalue, with associated xm and ym. From (5.12) and (5.15), we get ∂λm ∂w1 i;j = yT m ∂ ∂w1 i;j xm (5.16) 48 ∂λm ∂w2 i;j = yT m ∂ ∂w2 i;j xm (5.17) Finally, we can calculate the first derivative of the maximum eigenvalue with respect to any weight of the LDDN. Steps for the calculation of the derivative of the maximum eigenvalue are as follows. • GivenW1 = W10 and W2 = W20 . Compute K = (W10 ,W20 ). • Find eigenvalues and eigenvectors of the matrixK: = diag(λk) and X = [x1 x2 ... xn]. • Find X−1 = [y1 y2 ... yn]T . • Find λm = max(λk). • Compute @ m @w1 i;j  W10 ;W20 = yT m @ @w1 i;j  W10 ;W20 xm. • Compute @ m @w2 i;j  W10 ;W20 = yT m @ @w2 i;j  W10 ;W20 xm if i < j We will modify the command ’trainscg’ in the Neural Network Toolbox to train the LDDN with the modified performance index. The regular trainscg already computes the derivative of mse with respect to the weights and biases. Now we will use the steps above to compute the derivative of the maximum eigenvalue with respect to weights. We will add this part to the regular trainscg with a penalty parameter σ. The modified trainscg will be used to train controller networks in next chapter. 49 CHAPTER 6 SIMULATIONS AND TEST RESULTS FOR STABLE TRAINING In the previous chapter, we proposed a modified recurrent neural network training algorithm for maintaining network stability. In this chapter, two examples will be used to demonstrate the method. These examples are both model reference control (MRC) systems. The controllers of these systems are LDDNs. We will use the modified algorithm to train the controller networks while maintaining system stability. In the next section, a brief introduction to MRC systems is given. In the following section, a controller network for a linear MRC system will be trained. In the final section, we will train a controller network for a nonlinear MRC system. 6.1 Model reference control (MRC) using recurrent neural networks In this section, we provide a brief description of neural networkbased MRC (NNbased MRC) systems. An MRC system has the general structure shown in Fig. 6.1. In this figure, the plant model is chosen such that the plant model output is as close to the plant output as possible. The reference model represents the desired response of the closedloop system. The controller is designed so that the plant output closely tracks the output of the reference model. 50 Plant Plant Model Controller Reference Model Reference Input Control Input Model Error Control Error Plant Output Figure 6.1: Model reference control system Based on this idea, NNbased MRC systems were introduced in [31]. The NNbased MRC structure is shown in Fig. 6.2, where the plant model and the controller are neural networks. In order to design the NNbased MRC controller, first we have to train the NN plant model using the data observed from the input and the output of the plant. Then we have to choose a reference model whose response represents the desired behaviour of the plant. We will collect a training data set from the reference model. Then, we train the NN controller so that the control error is small enough while the MRC system remains stable. In the next section, we will show how to do the plant training. Plant NN Plant Model NN Controller Reference Model Reference Input Control Input Model Error Control Error Plant Output Figure 6.2: NNbased model reference control system 51 6.1.1 Plant training The plant training process includes two stages. The first stage is to train the openloop network (onestepahead training) and the final stage is to train the closed loop network. First, we create a training data set. An input signal P, a series of step functions with a random magnitude and random width, will be generated. This input signal is applied to the plant. At the same time, we sample the plant output T. The sequences P and T will be used as data for plant training. The NN plant model is shown in Fig. 6.3. This network consists of two layers with tanh transfer function in the first layer and linear transfer function in the second layer. The number of neurons in the output layer depends on the number of outputs of the plant. In our examples, the plant is single input and single output. So there is one neuron in the second layer and one input to the network. f1 IW1,1 b1 T D L f2 LW2,1 b2 LW1,2 T D L a a2(t) 1(t) 1 1 p(t) Figure 6.3: Neural network plant model Next, we perform onestepahead training for the NN plant. To do this, we cut the feedback loop and use the network output as the second input to the network. The open loop network is shown in Fig. 6.4. The sequence P is applied to the first input and the sequence T is applied to the second input. The corresponding network output a2 will be compared with the target T. This model error will be used to update weights and biases. 52 f1 IW1,1 b1 T D L f2 LW2,1 b2 IW1,2 T D L a a2(t) 1(t) 1 1 P T Figure 6.4: Network architecture for onestepahead plant training After training the open loop network, we close the network by connecting the network output to the second input. The closed loop network is the original network shown in Fig. 6.3. We use the trained weights and biases from the one stepahead training as initial weights and biases for the twostepahead training. Now we need to prepare the data for training. Assume that d is the maximum number of delays in the network. The sequences P and T will be divided into subsequences with a length of d + 2. Each subsequence of P will be applied to the input of the closed loop network, and the corresponding network output will be compared to the corresponding T subsequence. The model error will be used to update the weights and biases. We will do the same thing for kstephead training with k ≥ 3, but each preceeding subsequence has d + k data points, and the initial weights and biases are taken from the preceeding k − 1 stepahead training. This training process will be ended when the subsequence has the same length as the original P sequence. The weights and biases from the final training will be used to do the controller training in the next section. 6.1.2 Controller training In this section, we will show how to train the controller network of NNbased MRC systems. First, an NN controller is created. The controller network and the plant network will be combined as in Fig. 6.5. In this figure, the NN plant includes layers 3 and 4. Its 53 weights and biases are taken from the plant training, and they are not adjusted during controller training. Layers 1 and 2 make up the NN controller. In our examples, the controller network has two inputs: the first input is the reference input and the other input is the NN plant output. Since we will use the modified performance index in (5.2) to train the controller network, which involves the matrix P2 of (3.16), we have to find matrices Q, T and . Initial weights and biases of the controller network will be chosen as small random numbers or zeros for the second layer. This will increase the chance of getting stable weights and biases. From these initial weights and biases of the controller network, with the trained weights and biases of the NN plant, we will compute the initial Q, T and matrices. These matrices will be recalculated after each kstepahead training segment. f3 LW3,2 b3 T D L f4 LW4,3 b4 LW3,4 T D L a a4(t) 3(t) 1 f1 IW1,1 b1 T D L f2 LW2,1 b2 LW1,4 T D L a2a (t) 1(t) LW1,2 T D L p(t) Figure 6.5: Neural network plant model and neural network controller Next, we will do one stepahead training. To do this, two output feedback loops will be opened. Thus the output becomes the second input of the network, as in Fig. 6.6. Therefore, LW1;4 becomes IW1;2, LW3;4 becomes IW3;2. The training data will be generated from the reference model. When we train this open loop network, the weights and biases in layers 3 and 4 are kept unchanged, and only the weights and biases in the first two layers are updated. 54 f3 LW3,2 b3 T D L f4 LW4,3 b4 IW3,2 T D L a a4(t) 3(t) 1 f1 IW1,1 b1 T D L f2 LW2,1 b2 IW1,2 T D L a2a (t) 1(t) LW1,2 T D L P T Figure 6.6: Open loop network for controller training After one stepahead training, we will do k stepahead training as we did for the plant training. We will use the network in Fig. 6.5 for k stepahead training. Thus, IW1;2 becomes LW1;4 and IW3;2 becomes LW3;4. When we train the controller network, the weights and biases of the plant network are kept constant. 6.2 Design the MRC for a linear system In this section, we design an NN controller for a linear plant using the modified trainscg. Our object here is to illustrate and to verify the proposed method. 6.2.1 Plant model The linear plant that we have chosen to demonstate the modified algorithm with is G(z) = z−1 1 − z−1 + 0.25z−2 (6.1) The NN representation for this plant is n1(k) = IW1;1u(k − 1) + LW1;1[a1(k − 1) a1(k − 2)]T 55 a1(k) = n1(k) (6.2) where u(k) is the input to the NN plant, IW1;1 = 1, LW1;1 = [1 − 0.25] and a1 is the NN plant output. The plant network is shown in Fig. 6.7. It has one neuron and the activation function is linear. p(t) f1 IW1,1 1 LW 1 1,1 2 a1(t) Figure 6.7: The plant network In this example, we don’t need to train the NN plant because it is known. The next step is to choose a reference model. 6.2.2 Model reference We choose the following continuoustime reference model: G(s) = 144 s2 + 24s + 144 (6.3) We sample this model every 0.01 sec to generate a training data set. The reference input and the output are shown in Fig. 6.8. The next step is to create an NN controller and train it. 56 0 200 400 600 800 1000 1200 1400 1600 1800 2000 1 0.5 0 0.5 1 Reference Input P 0 200 400 600 800 1000 1200 1400 1600 1800 2000 1 0.5 0 0.5 1 Targer Output T Figure 6.8: The reference input and the target 6.2.3 Controller training In this section, we will train a controller network using the modified trainscg. The NN controller has one neuron with a linear activation function, delays 1 and 2 from the neuron output, delay 1 from the input, and delays 1 and 2 from the network output. The NNbased MRC system is shown in Fig. 6.9. 57 f1 IW 1 1,1 LW1 1,2 2 a1p(t) (t) f2 LW2,1 1 LW1 2,2 2 a2(t) LW1 1,1 2 Figure 6.9: NNbased MRC system for the linear plant The NN controller will be trained with different values for the penalty term coefficient σ. By increasing σ, we can increase the weight on λ and force the system to be stable. We use random weights in the stable area as initial weights of the controller network. First, small random weights are chosen. Then we check the stability of the network. If it is unstable, we multiply all these weights by a number less than 1 and check stability again. We keep doing this until the system is stable. From these initial stable weights and the weights of the NN plant, we find the matrices Q, T and for onestepahead training. After each k stepahead training stage, we recalculate these matrices from the current weights of the network. Table 6.1 shows the maximum closed loop pole magnitude (MPM), the maximum eigenvalue (λm) of −P2, the mean square error (MSE) and the maximum absolute error (MAE) after 1998 stepahead training for different values of σ. It can be seen that when σ increases, the maximum pole magnitude goes down, which indicates greater stability margin, and the error as well as the mean square error goes up. For all cases, MPM < 1 and λm < 0, which means that the system is stable. In conclusion, the modified trainscg works well for the linear NNbased MRC system. In the next section, we will demonstrate the algorithm for a nonlinear physical system. 58 σ MPM MAE MSE λm 10−5 0.8489 0.1247 4.1 ∗ 10−5 −3.6 ∗ 10−5 10−4 0.8428 0.1247 5.5 ∗ 10−5 −2.4 ∗ 10−5 10−3 0.8279 0.1247 9.3 ∗ 10−5 −2.8 ∗ 10−5 10−2 0.3560 0.2640 7.1 ∗ 10−4 −6.9 ∗ 10−5 10−1 0.3751 0.3980 1.5 ∗ 10−3 −2.7 ∗ 10−5 Table 6.1: MPM, MAE, MSE and λm after 1998 step ahead training 6.3 Design the MRC for a magnetic levitation system In this section, we design an NN controller for a magnetic levitation system using the modified trainscg. First, we introduce a magnetic levitation system. Then we train the plant using the LevenbergMarquardt algorithm (trainlm in [29]). Finally, we train the NN controller network using the modified trainscg. 6.3.1 Magnetic levitation system The magnetic levitation system is shown in Fig. 6.10. This is a simplified version of the MAGLEV train system [32]. y(t) i(t) S N Figure 6.10: The magnetic levitation system 59 In this system, the magnet is placed above the electromagnet. It can only move in the vertical direction. y(t) is the distance of the magnet from the electromagnet. i(t) is the current flowing in the electromagnet. The equation of motion can be represented as follows [29] d2y(t) dt = −g + α M i2(t) dt − β M dy(t) dt (6.4) where M is the mass of the magnet, g is the gravitational constant, β is a viscous friction coefficient and α is a field strength constant. This is a nonlinear system with one input and one output. The input is the current and the output is the position of the magnet. Our goal is to control the position of the magnet such that it tracks a target. 6.3.2 Plant training We use Equation (6.4) to generate a training data set for plant training. The parameters are M = 3, g = 9.8, β = 12 and α = 15. The data training includes 4000 data points. It is shown in Fig. 6.11, where P is the control input, which is applied to the plant and the target T is the corresponding output of the plant. 60 0 500 1000 1500 2000 2500 3000 3500 4000 2 0 2 4 Control Input P 0 500 1000 1500 2000 2500 3000 3500 4000 0 2 4 6 Target T Figure 6.11: Control input and target The NN plant model is shown in Fig. 6.3. It includes 10 neurons in the first layer, three delays in the input, and two delays in the feedback output. We use trainlm [29] to consecutively do from onestepahead training up to 3997stepahead training. After the final training, we have the performance index as in Fig. 6.12, the network output and the error as in Fig. 6.13. 61 0 50 100 150 200 250 300 350 400 450 500 2.265 2.27 2.275 2.28 2.285 2.29 2.295 2.3 2.305 2.31 x 10 7 Number of iterations Performance Index Performance Index Figure 6.12: Performance Index 0 500 1000 1500 2000 2500 3000 3500 4000 0 1 2 3 4 5 6 Net Output and Target 0 500 1000 1500 2000 2500 3000 3500 4000 2 1 0 1 2 3 x 10 3 Error Network output Target Figure 6.13: The network output, the target and the error In summary, the model error is very small. The maximum error is less than 3 ∗ 10−3, so the trained NN plant is accurate enough. It will be used during training of the NN controller in the next section. During the controller training, the weights and biases of the plant network are kept constant. 62 6.3.3 Controller training First, we have to choose a reference model. In this example, the reference model is chosen as follows G(s) = 9 s2 + 6s + 9 (6.5) An input sequence of step functions with random magnitude and random width is generated. This input P is applied to the input of the reference model (6.5). Then we sample the system output with a sampling time Ts = 0.01. The sampled output T is used as the target. The reference input and the target are shown in Fig. 6.14. 0 200 400 600 800 1000 1200 1400 1600 1800 2000 2 2.5 3 3.5 4 Reference input P 0 200 400 600 800 1000 1200 1400 1600 1800 2000 2 2.5 3 3.5 4 Target T Figure 6.14: The reference model input and output (target) This training data P and T will be used to train an NN controller. In this example, the NN controller is chosen as in Fig. 6.5. We choose 10 neurons and tanh as the activation function in the first layer, with two delays from the network output, two delays from the second layer and two delays in the input. The second layer has one neuron with linear activation function. We use modified trainscg to train the controller network, with different values for the penalty parameter σ. Small initial random weights and biases are used for the NN controller 63 for onestepahead training. Using these initial weights and biases, and the trained weights and biases of the plant network, we compute the matrices Q, T and . After each kstepahead training stage, these matrices will be updated. The maximum eigenvalue is shown in Table 6.2, and the maximum absolute error is shown in Table 6.3, after kstepahead training with different σ. σ 10−2 10−4 10−6 10−8 10−10 100 stepahead 0.0559 0.1006 0.3188 0.5182 0.3313 200 stepahead 0.0494 0.1412 0.6069 0.7375 0.4175 300 stepahead 0.0413 0.1245 0.7575 0.8497 0.4645 400 stepahead 0.0413 0.1424 0.8380 0.9247 0.5063 Table 6.2: The maximum eigenvalue with different σ σ 10−2 10−4 10−6 10−8 10−10 100 stepahead 0.1116 0.0790 0.0421 0.0403 0.0399 200 stepahead 0.0645 0.0465 0.0334 0.0353 0.0244 300 stepahead 0.0769 0.0396 0.0286 0.0286 0.0281 400 stepahead 0.0476 0.0202 0.0165 0.0174 0.0141 Table 6.3: The maximum absolute error with different σ We would expect that the error would increase as σ increases, because more weight is being placed on the maximum eigenvalue, and therefore relatively less weight is being placed on the error. This is clearly shown in Table 6.3. We would also expect that the maximum eigen value of −P2 would decrease as σ increases. This general trend is seen in Table 6.2. This pattern is not as clear, because the maximum eigenvalues in each entry in Table 6.2 are not exactly comparable. In each case, the weights and biases were different, and so the Q, T and matrices were also different. The following figures demonstrate that the modified trainscg algorithm works effectively. Figure 6.15 shows a typical plot of mean square error versus iteration. Figure 6.16 64 shows the maximum eigenvalue versus iteration, and Figure 6.17 shows the combined performance index versus iteration. Although MSE and maximum eigenvalue may sometimes increase, the combined performance index always decreased in all of our test cases. 0 5 10 15 20 25 30 35 40 45 50 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5 x 10 6 Mean square error Iterations Figure 6.15: The mean square error with σ = 10−6 after 10stepahead training 65 0 5 10 15 20 25 30 35 40 45 50 0.2 0.22 0.24 0.26 0.28 0.3 0.32 0.34 0.36 Maximum eigenvalue Iterations Figure 6.16: The maximum eigenvalue with σ = 10−6 after 10stepahead training 66 0 5 10 15 20 25 30 35 40 45 50 4.65 4.7 4.75 4.8 4.85 4.9 4.95 5 5.05 5.1 x 10 6 Iterations Performance Index Performance Index Figure 6.17: The combined performance index with σ = 10−6 after 10stepahead training 6.4 Conclusions We proposed a novel training algorithm for maintaining system stability. It is demonstrated through two examples: the linear plant and the magnetic levitation system. The results show that it is possible to train recurrent neural networks for stability using the new training algorithm. This has been only a demonstration of the potential use of our novel dissipativity criteria for stable training of RNNs. Future work will be needed to refine the algorithm to achieve consistent stable training. 67 CHAPTER 7 CONCLUSIONS AND FUTUREWORK 7.1 Conclusions In this work, we have used dissipativity theory to analyze the stability of a general class of discretetime dynamic neural networks, called Layered Digital Dynamic Networks (LDDNs). To our knowledge, this is the first time that dissipativity theory has been applied to the analysis of stability in discretetime neural networks. The application of dissipativity theory requires the selection of supply rate functions. The flexibility in choosing the supply rate allows the development of a variety of stability criteria. In this report, we have developed three different sets of stability criteria, based on three different choices for the supply rate function. We do not claim that these sets are the best that can be obtained. However, the use of dissipativity theory for the analysis of LDDNs opens up the possibility for additional criteria to be developed. We have tested our dissipativity based (DB) criteria on a wide variety of recurrent neural networks and have compared the results with two other stateoftheart methods. We have analyzed the performance of the various criteria on cases where they perform well and also on cases where they fail to perform. All of the methods tend to perform worse as the network responses oscillate more, have larger system matrices and take longer to converge. Our DB criterion performed at least as well as Liu’s criterion [6, p. 1382] on all of the networks that we tested. In two of 22 cases, Barabanov’s method [18, p. 4554] was able to determine stability when the DB criterion was not able to. These two cases represented systems that were in a form upon which the Barabanov method was designed. The DB methods described in this work were derived for a general recurrent network 68 structure  the LDDN. There are LDDN architectures to which the Liu and Barabanov methods cannot be applied (test problem 23, for example). In these cases, only our DB methods are appropriate (of the three methods analyzed for this report). However, the same can also be said of the DDSNNM architecture of Liu. There are certain DDSNNM structures that cannot be represented in the LDDN format or in the recurrent network structure used by Barabonov [18, p. 4553]. Each method is best suited to the architecture for which it was designed. We have proposed a new training method using the DB criterion to train recurrent neural networks for stability. The standard performance index is modified with an additional term consisting of the maximum eigenvalue of the matrix −P2 multiplied by a constant σ. The important thing is to compute the first derivative of the modified performance index with respect to weights and biases. We use the standard backpropagation algorithm to compute the gradient of the mean square error. Then, we show how to compute the gradient of the maximum eigenvalue with respect to the network weights. By combining these two results, we have the gradient of the modified performance index with respect to the network weights. The weights can be updated by using any gradientbased learning algorithm. In this work, we use the scaled conjugate gradient algorithm, which is already implemented in the Neural Network Toolbox. The modified algorithm was tested on two examples of NNbased MRC systems. The tests demonstrated the potential of the modified algorithm to produce stable training. 7.2 FutureWork One area where the stability analysis of recurrent networks is very important is neural network control. After a neural network controller has been designed, it is important to verify that the closed loop control system is stable. Also, it would be desirable to maintain the stability of the closed loop system throughout the training process. This is because there exist spurious valleys in the error surfaces of recurrent networks in regions where the network 69 is unstable. We can avoid these valleys by maintaining stability during training. So, our future work will focus on improving the proposed training method for nonlinear systems and developing new DBbased stability criteria, which are less conservative. We will also use other gradientbased learning algorithms to implement the stable training method. 70 CHAPTER 8 APPENDIX (Test Problems) 8.1 Test Problem 01 Consider the one layer network given in [4, p. 300], which can be put in standard form with W1 = 0.5 0.0333 −0.0355 1.1882 −2.2687 ,W2 = 0.5 0 0 0 0 , b = 0.5[−1.0092 3.5970]T and f = [tanh tanh]T . The DB criterion: A = diag(0.76323, 0.12656), B = diag(0.93757, 0.87661), T = diag(1000, 122.55), = diag(100, 0) and Q = 933.72 −16.855 −16.855 51.898 . Liu’s criterion: Q = diag(0.76323, 0.12656), U = diag(0.93757, 0.87661), = diag(1000, 0), T = diag(1000, 154.6), P = 41.699 −67.822 −67.822 130.97 and G = 57.374 0.36672 0.36672 0.58042 . Barabanov’s criterion: M = diag(0.76323, 0.12656), N = diag(0.93757, 0.87661), = diag(1000, 207.55), H = 190.56 −43.735 −43.735 87.827 ,G = 1000 −10−10 −10−10 4.2315 ∗ 10−10 , and β = 1 4.2315 4.2315 1 . 71 8.2 Test Problem 02 Consider the network in [6, p. 1388]. This network can be put into the standard form with W1 = 0 0 −0.5 −1 0 0 −0.01 −0.5 0 0 0.2 0 0 0 0 0.8 ,W2 = 0 0 0 0 0 0 0 0 1 0.1 0 0 0.1 1 0 0 b = [−7 7 0 0]T and f = [tanh tanh id id]T . The DB criterion: A = diag(0, 0, 1, 1), B = diag(1, 1, 1, 1), T = diag(109.78, 941.44, 979.18, 1000), = diag(0, 0, 10−6, 10−6) and Q = 18.554 −4.6636 −4.2217 −0.43615 −4.6636 441.78 −38.317 −15.767 −4.2217 −38.317 42.684 17.189 −0.43615 −15.767 17.189 224.46 . Liu’s criterion: Q = diag(0, 0), U = diag(1, 1), = diag(3.5715 ∗ 10−8, 228.76), T = diag(504.19, 1000), P = 302.39 185.76 185.76 1000 and G = 147.31 −0.023467 −0.023467 149.78 . Barabanov’s criterion: M = diag(0, 0), N = diag(1, 1), = diag(333.07, 1000), H = 201.44 73.981 73.981 724.64 ,G = 162.79 −10−10 −10−10 552.05 , and β = 1 3.1009 ∗ 109 3.1009 ∗ 109 1 . 72 8.3 Test Problem 03 Consider the two layer network given in [4, p. 301]. The standard form weight matrices are W1 = 0.5 −1.3482 −1.8825 1.5 0.5 −0.7464 −0.5695 1.2 −0.1 0 0 0.4904 −0.7599 0 0 −1.4697 −1.4608 , W2 = 0.5 0 0 0 0 0 0 0 0 2 0.2 0 0 −0.5 1.40 0 0 , b = 0.5 0.3 −0.5 −1 1 and f = [tanh tanh tanh tanh]T . The DB criterion: A = diag(0.32231, 0.41816, 0.25498, 0.2636), B = diag(0.99439, 0.93639, 0.90631, 0.9752), T = diag(290.19, 1000, 418.16, 380.73), = diag(0, 0, 10−6, 0.00039036) and Q = 134.2 3.3473 −100 24.895 3.3473 363.83 −100 −9.6501 −100 −100 171.86 34.766 24.895 −9.6501 34.766 120.92 . Liu’s criterion: Q = diag(0.32231, 0.41816, 0.25498, 0.2636), U = diag(0.99439, 0.93639, 0.90631, 0.9752), = diag(0, 0, 0, 0), T = diag(142, 564.36, 211.17, 189.88), P = 108.4 59.729 −99.995 19.375 59.729 220.9 −100 2.8786 −99.995 −100 170.43 35.055 19.375 2.8786 35.055 119.51 , 73 and G = 20.459 −29.246 −8.954 4.4233 −29.246 106.92 −24.559 2.0016 −8.954 −24.559 45.595 −11.396 4.4233 2.0016 −11.396 8.211 . Barabanov’s criterion: M = diag(0.32231, 0.41816, 0.25498, 0.2636), N = diag(0.99439, 0.93639, 0.90631, 0.9752), = diag(636.85, 1000, 1000, 165.85), H = 363.39 −98.097 −16.095 43.689 −18.712 −16.793 105.52 −106.22 −98.097 951.22 42.764 −63.609 −1.0098 −72.852 −229.65 25.213 −16.095 42.764 374.06 65.911 −421.84 86.45 −38.807 −26.445 43.689 −63.609 65.911 758.61 −208.24 −57.617 34.07 −71.827 −18.712 −1.0098 −421.84 −208.24 861.48 −55.45 45.91 17.913 −16.793 −72.852 86.45 −57.617 −55.45 287.95 11.896 −4.9001 105.52 −229.65 −38.807 34.07 45.91 11.896 309.59 26.527 −106.22 25.213 −26.445 −71.827 17.913 −4.9001 26.527 269.63 , G = 103.55 −33.567 −1.3562 −16.466 −33.567 628.26 −1.7904 −135.58 −1.3562 −1.7904 208.28 −1e − 010 −16.466 −135.58 −1e − 010 381.65 , and β = 1 2.2319 2.7202 1.2334 2.2319 1 1.2188 2.5446 2.7202 1.2188 1 3.1013 1.2334 2.5446 3.1013 1 . 74 8.4 Test Problem 04 Consider the example given in [33, p. 1778]. Representing this system in standard form we get W1 = 0.2753 −0.0306 0.2967 −0.2277 0.1844 −0.3387 0.1676 −0.0663 0 0 0.6428 0.2309 0 0 −0.1106 0.5839 ,W2 = 0 0 0 0 0 0 0 0 0.3064 −0.0631 0 0 0.2937 0.2769 0 0 , and f = [tanh tanh tanh tanh]T . The DB criterion: A = diag(0.81975, 0.84453, 0.6808, 0.67383), B = diag(1, 1, 1, 1), T = diag(1000, 756.64, 1000, 1000), = diag(8.4132 ∗ 10−9, 1.3655 ∗ 10−7, 3.9446 ∗ 10−8) and Q = 274.24 91.103 −68.509 −68.402 91.103 251.03 −99.277 15.481 −68.509 −99.277 280.07 43.601 −68.402 15.481 43.601 205.53 . Liu’s criterion: Q = diag(0.81975, 0.84453, 0.6808, 0.67383), U = diag(1, 1, 1, 1), T = diag(1000, 770.69, 1000, 1000), = diag(0, 0, 0, 0), P = 265.47 −28.351 21.634 −100 −28.351 403.35 −80.933 46.384 21.634 −80.933 434.49 75.265 −100 46.384 75.265 362.42 , and G = 167.45 −70.26 −55.17 14.998 −70.26 280.63 21.786 −95.94 −55.17 21.786 72.125 −5.1197 14.998 −95.94 −5.1197 96.704 . 75 Barabanov’s criterion: M = diag(0.81975, 0.84453, 0.6808, 0.67383), N = diag(1, 1, 1, 1), = diag(1000, 954.94, 1000, 998.52), H = 1000 −190.37 −1.7249 −125.4 51.001 −65.832 −53.517 173.12 −190.37 805.55 143.72 −44.463 −60.447 −33.642 −90.638 193.26 −1.7249 143.72 686.94 −208.69 −198.63 85.043 22.886 −102.38 −125.4 −44.463 −208.69 482.96 51.316 −88.32 93.608 −59.962 51.001 −60.447 −198.63 51.316 908.23 −108.71 −50.821 44.838 −65.832 −33.642 85.043 −88.32 −108.71 884.85 27.148 24.569 −53.517 −90.638 22.886 93.608 −50.821 27.148 623.88 −74.082 173.12 193.26 −102.38 −59.962 44.838 24.569 −74.082 763.18 , G = 1000 −10−6 −4.2819 −10−6 −10−6 625.22 −35.158 −12.784 −4.2819 −35.158 1000 −10−6 −10−6 −12.784 −10−6 810.79 , and β = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 . 8.5 Test Problem 05 E = 0.4498 −1.3460 0.6169 0.3715 ,W = −1.1407 −0.4336 −1.0933 −0.1685 , s1 = [−0.2185 0.5413]T and f = [tanh tanh id id]T . The DB criterion: A = diag(0.21589, 0.073411, 1, 1), B = diag(0.91974, 0.68839, 1, 1), T = diag(198.75, 614.03, 1000, 1000), = diag(0, 0, 10−6, 10−6) and Q = 104.97 −51.538 −52.084 −95.512 −51.538 632.6 394.96 −99.792 −52.084 394.96 400.31 −10.052 −95.512 −99.792 −10.052 175.29 . 76 Liu’s criterion: Q = diag(0.21589, 0.073411), U = diag(0.91974, 0.68839), = diag(1.2472 ∗ 10−6, 0), T = diag(1000, 770.69, 1000, 1000), P = 580.65 99.456 99.456 167.37 and G = 66.297 −1.183 −1.183 61.627 . Barabanov’s criterion: M = diag(0.21589, 0.073411), N = diag(0.91974, 0.68839), = diag(388.59, 1000), H = 310.09 42.138 42.138 68.108 ,G = 4.803 ∗ 10−6 −10−6 −10−6 0.00093555 , and β = 1 4.803 4.803 1 . 8.6 Test Problem 06 E = −2.1382 −1.0618 0.0389 −0.2408 −0.2108 −1.0018 0.7125 −0.3998 ,WT = −0.0544 −2.7524 −0.0989 −1.2134 −0.1434 1.0171 0.2366 −0.4306 , s1 = [0.6501 − 0.4852]T and f = [tanh tanh id id id id]T . The DB criterion: A = diag(0.33079, 2.3982 ∗ 10−6, 1, 1, 1, 1), B = diag(0.81069, 0.25584, 1, 1, 1, 1), T = diag(99.97, 125.78, 1000, 956.55, 998.82, 961.05), 77 = diag(0.00034784, 0, 10−6, 10−6, 10−6, 0.00077597) and Q = 2461.1 1109.6 1011.2 29.791 14.364 144.84 1109.6 2117.3 743.8 231.45 926.58 609.29 1011.2 743.8 567.01 47.012 29.205 328.99 29.791 231.45 47.012 265.26 121.42 −35.132 14.364 926.58 29.205 121.42 812.93 −43.762 144.84 609.29 328.99 −35.132 −43.762 695.36 . Liu’s criterion: Q = diag(0.33079, 2.3982 ∗ 10−6), U = diag(0.81069, 0.25584), = diag(0.36978, 0), T = diag(1000, 164.49), P = 155.88 −20.026 −91.828 −95.885 −20.026 128.2 67.815 32.503 −91.828 67.815 113.02 71.445 −95.885 32.503 71.445 236.97 and G = 20.837 −0.55989 −0.55989 7.5994 . Barabanov’s criterion: M = diag(0.33079, 2.3982∗10−6), N = diag(0.81069, 0.25584), = diag(1000, 339.64), H = 317.03 133.95 −381.52 −54.939 133.95 957.13 −299.75 −241.94 −381.52 −299.75 674.12 −27.545 −54.939 −241.94 −27.545 714.63 ,G = 1000 −10−6 −10−6 0.017865 , and β = 1 17865 17865 1 . 8.7 Test Problem 07 E = −0.5367 −0.8914 1.1566 1.0866 1.1402 −0.1332 ,WT = 0.1399 −1.3558 −0.2022 −0.6691 1.3142 1.0448 , 78 s1 = [−0.7165 − 0.8795]T and f = [tanh tanh id id id]T . The DB criterion: A = diag(0.08737, 0.007409, 1, 1, 1), B = diag(0.76153, 0.55182, 1, 1, 1), T = diag(210.36, 99.142, 987.86, 959.92, 1000), = diag(0, 0.06456, 10−6, 10−6, 10−6, 0.00077597) and Q = 429.74 600.34 587.49 −33.662 −65.315 600.34 886.98 824.87 −79.6 −60.587 587.49 824.87 873.36 −22.622 −74.159 −33.662 −79.6 −22.622 41.952 −31.658 −65.315 −60.587 −74.159 −31.658 127.32 . Liu’s criterion: Q = diag(0.08737, 0.007409), U = diag(0.76153, 0.55182), = diag(0, 3.7044 ∗ 10−7), T = diag(1000, 443.53), P = 994.17 288.8 242.55 288.8 154.79 −99.992 242.55 −99.992 1000 and G = 35.827 −9.5597 −9.5597 102.88 . Barabanov’s criterion: M = diag(0.08737, 0.007409), N = diag(0.76153, 0.55182), = diag(1000, 461.85), H = 797.85 408.86 −4.1171 408.86 320.72 −171.01 −4.1171 −171.01 592.92 ,G = 10−6 6.4083 −1.0002 −1.0002 6.4146 , and β = 1 6.4073 6.4073 1 . 8.8 Test Problem 08 E = 0.4498 −1.3460 0.6169 0.3715 ,W = 0.2692 0.7177 −0.0074 0.8009 , 79 s1 = [−0.2586 0.0537]T and f = [tanh tanh id id]T . The DB criterion: A = diag(0.45842, 0.74239, 1, 1), B = diag(0.94416, 0.98544, 1, 1), T = diag(1000, 1000, 994.81, 1000), = diag(0, 0, 10−6, 10−6) and Q = 45.541 −83.784 −65.824 2.8525 −83.784 819.24 585.23 −100 −65.824 585.23 518.02 29.31 2.8525 −100 29.31 411.68 . Liu’s criterion: Q = diag(0.45842, 0.74239), U = diag(0.94416, 0.98544), = diag(0, 0), T = diag(1000, 1000), P = 210.56 160.82 160.82 1000 and G = 299.36 −78.169 −78.169 173.97 . Barabanov’s criterion: M = diag(0.45842, 0.74239), M = diag(0.94416, 0.98544), = diag(1000, 1000), H = 262.69 125.53 125.53 1000 ,G = 800.06 −10−6 −10−6 1000 , and β = 1 1.4003 1.4003 1 . 8.9 Test Problem 09 E = −0.4376 0.6711 0.2333 −0.3892 −0.1496 0.3565 0.8825 0.0271 ,WT = 0.8775 0.2470 0.3328 0.7495 0.2635 0.7885 0.2241 0.1221 , s1 = 0 and f = [tanh tanh id id id id]T . 80 The DB criterion: A = diag(0.5986, 0.67847, 1, 1, 1, 1), B = diag(1, 1, 1, 1, 1, 1), T = diag(949.76, 1000, 1000, 1000, 1000, 1000), = diag(0, 0, 10−6, 100, 10−6, 10−6) and Q = 111.28 13.247 25.708 21.974 −28.664 −59.065 13.247 188.85 −36.728 163.94 −70.283 −95.483 25.708 −36.728 325.53 70.29 48.782 50.854 21.974 163.94 70.29 620.19 161.65 −92.585 −28.664 −70.283 48.782 161.65 435.83 114.96 −59.065 −95.483 50.854 −92.585 114.96 195.98 . Liu’s criterion: Q = diag(0.5986, 0.67847), U = diag(1, 1), = diag(4.9935 ∗ 10−5, 0), T = diag(1000, 1000), P = 978.57 335.7 167.82 171.9 335.7 999.77 677.6 122.11 167.82 677.6 1000 105.3 171.9 122.11 105.3 289.55 and G = 248.26 −6.0496 −6.0496 247.97 . Barabanov’s criterion: M = diag(0.5986, 0.67847), M = diag(1, 1), = diag(1000, 991.51), H = 1000 241.22 194.23 155.82 241.22 970.92 577.69 97.549 194.23 577.69 991.47 91.865 155.82 97.549 91.865 437.93 ,G = 1000 −240.56 −240.56 912.39 , and β = 1 1 1 1 . 81 8.10 Test Problem 10 E = 0.1820 −1.5142 −0.0995 0.0070 −0.3681 1.2971 −0.1194 1.1440 ,WT = 0.2361 0.5642 0.6413 0.5412 0.2559 0.7715 0.1664 0.2860 , s1 = 0 and f = [tanh tanh id id id id]T . The DB criterion: A = diag(0.726, 0.3722, 1, 1, 1, 1), B = diag(1, 1, 1, 1, 1, 1), T = diag(1000, 1000, 994.43, 1000, 1000, 1000), = diag(0, 0, 10−6, 100, 10−6, 10−6) and Q = 273.56 −73.734 −29.437 93.543 −58.562 48.331 −73.734 1766.5 1130.1 36.435 −69.631 −83.295 −29.437 1130.1 897.18 105.85 97.635 61.019 93.543 36.435 105.85 650.02 87.074 6.3594 −58.562 −69.631 97.635 87.074 313.06 2.5765 48.331 −83.295 61.019 6.3594 2.5765 208.29 . Liu’s criterion: Q = diag(0.726, 0.3722), U = diag(1, 1), = diag(0.036388, 0), T = diag(1000, 1000), P = 652.64 286.48 620.53 339.77 286.48 951.01 455.72 103.38 620.53 455.72 999.99 292.86 339.77 103.38 292.86 403.84 and G = 462.6 −37.436 −37.436 123.96 . Barabanov’s criterion: M = diag(0.726, 0.3722),M = diag(1, 1), = diag(942.82, 1000), H = 816.33 578.16 598.65 522.47 578.16 825.05 476.38 489.74 598.65 476.38 1000 286.54 522.47 489.74 286.54 685.9 ,G = 415.61 −363.06 −363.06 1000 , 82 and β = 1 1 1 1 . 8.11 Test Problem 11 E = −1.5369 −1.4479 2.0182 0.6986 ,W = 0.5301 1.1132 −0.7163 0.3181 , s1 = [−2.0962 − 0.3127]T and f = [tanh tanh id id]T . The DB criterion: A = diag(0.00053451, 0.0053027, 1, 1), B = diag(0.38817, 0.40828, 1, 1), T = diag(1000, 350.31, 821.55, 1000), = diag(0, 0, 10−6, 10−6) and Q = 1673.5 793.22 676.56 −91.493 793.22 1189.6 891.01 227.33 676.56 891.01 737.18 221.63 −91.493 227.33 221.63 252.43 . Liu’s criterion: Q = diag(0.00053451, 0.0053027), U = diag(0.38817, 0.40828), = diag(0, 0), T = diag(1000, 457.88), P = 87.837 73.387 73.387 288.62 and G = 113.99 −59.337 −59.337 72.36 . Barabanov’s criterion: M = diag(0.00053451, 0.0053027), N = diag(0.38817, 0.40828), = diag(1000, 435.69), H = 229.27 258.81 258.81 412.7 ,G = 10−6 1.575501 −1 −1 1.575501 , and β = 1 1.5755 1.5755 1 . 83 8.12 Test Problem 12 E = −1.7083 −0.3869 1.2134 −1.0272 ,W = 0.0337 0.3472 −1.2111 −1.0742 , s1 = [0.4811 − 0.0876]T and f = [tanh tanh id id]T . The DB criterion: A = diag(0.58581, 0.026454, 1, 1), B = diag(0.98733, 0.68489, 1, 1), T = diag(860.8, 173.26, 997.37, 1000), = diag(0, 0, 10−6, 10−6) and Q = 1855.5 362.01 939.2 −93.541 362.01 1028.2 635.64 661.26 939.2 635.64 776.97 322.28 −93.541 661.26 322.28 527.61 . Liu’s criterion: Q = diag(0.58581, 0.026454), U = diag(0.98733, 0.68489), = diag(0, 0), T = diag(995.2, 611.75), P = 513.77 295.37 295.37 262.31 and G = 16.938 1.7133 1.7133 13.071 . Barabanov’s criterion: M = diag(0.58581, 0.026454), N = diag(0.98733, 0.68489), = diag(1000, 563.99), H = 256.06 141.33 141.33 130.33 ,G = 0.0049901 −10−6 −10−6 8.3245 ∗ 10−6 , and β = 1 8.3245 8.3245 1 . 84 8.13 Test problem 13 Consider the network given in [8] x(k + 1) = tanh(Wx(k)) where W1 = 0.5893 −0.4047 0.3142 0.3133 −0.5308 1.0074 −0.7935 0.7659 0.2278 0.0204 −1.0197 −0.0221 0.1484 0.1643 0.8982 1.1161 −0.7743 0.4514 −0.8473 −0.0883 0.6870 −1.0181 0.0379 −0.5418 −0.6798 W2 = 0, b = 0 and f = [tanh tanh tanh tanh tanh]T . 8.14 Test problem 14 Given a network (4.3) whereW = diag(1, 1), s1 = [0 0]T and E = −1.0510 1.6516 −1.3141 1.1566 8.15 Test problem 15 Given a network (4.3) whereW = diag(1, 1), s1 = [0 0]T and E = 0.4498 −1.3460 0.6169 0.3715 . 8.16 Test problem 16 Given a network (4.3) where E = −0.0176 1.4660 −1.9825 −0.3525 ,W = 0.7133 0.5571 0.7637 0.5651 , and s1 = [−0.1768 1.5514]T . 85 8.17 Test problem 17 Consider a network (4.3) where E = 0.4889 −0.1699 1.9743 1.4636 ,W = 0.3325 −1.1325 −0.6563 1.9766 , and s1 = [−0.6537 1.722]T . 8.18 Test problem 18 Given a network (4.3) where E (Column(1 : 5)) = −0.1442 −0.1923 −0.2243 0.5341 0.2312 1.2841 0.7068 0.6862 −0.8575 1.2335 , E (Column(6 : 10)) = 0.8103 0.2148 −0.3047 0.7268 0.2490 −1.1181 −0.6498 −0.8863 0.5849 1.3329 , WT (Column(1 : 5)) = −0.0822 −0.6546 0.1519 −0.6173 −1.1329 −0.5581 −0.3620 −1.4393 −0.4941 −0.5955 , WT (Column(6 : 10)) = 0.51552 0.5686 −2.8238 −0.066408 0.26577 0.37745 −0.40056 −1.1589 0.16272 0.52101 , and s1 = [0.8594 − 0.1774 − 0.6582 0.2043 1.4457 − 0.4587 0.2136 1.5275 0.1615 − 0.2330]T . 86 8.19 Test problem 19 Given a network (4.3) where E = −1.2676 −0.0239 −1.1729 −0.5952 0.9547 −0.9055 −1.7919 −1.0372 −1.3526 0.3340 , WT = −0.2504 −2.6603 1.3109 0.4479 1.1745 −0.0003 −1.6884 −0.3583 0.4195 −0.4836 , and s1 = [−0.5025 − 1.6517 1.0859 − 0.4030 0.5661]T . 8.20 Test problem 20 Given a network (4.3) where E = 0.1526 0.9272 −2.0829 0.3752 ,W = 1.8267 0.4274 −0.2127 0.2953 , and s1 = [−0.4759 1.1288]T . 8.21 Test problem 21 Given a network (4.3) where E = 1.1618 1.5530 −1.2990 −0.6542 ,W = −0.4199 0.0067 1.0924 −2.5687 , and s1 = [0.3535 0.5172]T . 87 8.22 Test problem 22 Given a network (4.3) where E = 0.0207 −2.3595 −0.8107 0.0597 ,W = −0.9855 0.9763 −0.0560 −1.2719 , and s1 = [−0.2746 1.2021]T . 8.23 Test problem 23 W1 (Column(1 : 5)) = 0.1465 −0.1059 0.1594 0.2925 0.2751 −0.3257 0.1052 0.1820 −0.2141 −0.2609 −0.3717 0.1741 −0.0173 0.2439 −0.2634 0.0899 0.1541 0.0439 0.3267 0.3954 0.0868 −0.3327 −0.3032 −0.2145 −0.0482 −0.3874 −0.0365 −0.0394 −0.2085 −0.1280 −0.3869 −0.0465 0.1727 −0.3602 −0.1486 −0.2479 −0.1174 0.3143 −0.3373 −0.1079 0.0695 −0.2771 −0.1815 0.1127 −0.0854 −0.3539 0.1405 −0.1962 −0.2473 0.0732 88 W1 (Column(6 : 10)) = −0.3042 0.3735 −0.1231 −0.0394 −0.0799 −0.3695 0.1319 −0.2672 −0.0702 −0.2410 −0.0331 0.2963 −0.2755 0.3213 0.1002 0.2959 −0.3921 −0.2471 −0.3955 0.1867 0.3474 −0.2904 −0.0620 −0.1621 −0.0993 −0.1884 0.2550 0.2848 −0.3607 −0.3921 −0.2718 −0.0559 −0.0078 0.1545 −0.0641 0.2983 0.3123 0.2527 0.1201 0.2029 −0.2097 0.1879 −0.0314 0.3864 0.2351 0.1167 0.1499 −0.0341 0.0421 0.3360 W2 (Column(1 : 5)) = 0 0 0 0 0 0.2758 0 0 0 0 −0.1058 0.0966 0 0 0 0.1850 −0.2449 0.3238 0 0 0.0554 0.1054 −0.2125 0.0390 0 0.3453 −0.1318 0.1244 −0.0865 0.1019 0.1593 −0.0823 −0.0691 0.1242 0.2701 −0.0598 0.0757 0.0526 0.1732 0.0090 −0.2513 0.1605 0.3862 0.2453 0.1629 −0.1077 −0.2880 0.0534 0.2584 0.1392 89 W2 (Column(6 : 10)) = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 −0.1027 0 0 0 0 0.2211 −0.0085 0 0 0 −0.0120 −0.3083 0.1319 0 0 0.3996 0.3693 −0.3529 −0.1118 0 b = 0(10×1) and f = [tanh tanh tanh tanh tanh tanh tanh tanh tanh id]T . A = diag(0, 0, 0, 0, 0, 0, 0, 0, 0, 1), B = diag(1, 1, 1, 1, 1, 1, 1, 1, 1), Q (Column(1 : 5)) = 409.2071 −60.7473 9.4582 22.6040 104.9155 −60.7473 318.8918 −69.5846 146.0832 −4.6166 9.4582 −69.5846 478.7037 −48.1748 131.9870 22.6040 146.0832 −48.1748 445.2587 −5.3633 104.9155 −4.6166 131.9870 −5.3633 375.6876 0.7304 −9.4876 −66.3357 79.2393 −16.4141 −34.6683 −62.3295 39.5300 33.3255 −38.3926 66.2225 16.2987 6.8482 −21.4796 −27.1391 −8.1199 −24.9707 −150.3311 −35.8364 −121.1125 −26.8860 94.6296 −51.7739 −57.8724 −16.2736 90 Q (Column(6 : 10)) = 0.7304 −34.6683 66.2225 −8.1199 −26.8860 −9.4876 −62.3295 16.2987 −24.9707 94.6296 −66.3357 39.5300 6.8482 −150.3311 −51.7739 79.2393 33.3255 −21.4796 −35.8364 −57.8724 −16.4141 −38.3926 −27.1391 −121.1125 −16.2736 362.5196 −43.3246 −14.7978 63.5450 −86.0946 −43.3246 449.1086 47.0527 125.4525 −71.5817 −14.7978 47.0527 346.4056 −22.5932 −67.1660 63.5450 125.4525 −22.5932 505.1451 70.8772 −86.0946 −71.5817 −67.1660 70.8772 370.0100 = diag(1, 1, 1, 1, 1, 1, 1, 1, 1, 10−5) and T = 103diag(1, 0.4131, 0.7992, 0.6997, 0.7026, 0.6775, 0.6711, 0.9937, 0.7843, 0.4969). 91 BIBLIOGRAPHY [1] L. Jin, P. N. Nikiforuk, and M. M. Gupta, “Absolute stability conditions for discretetime recurrent neural networks,” IEEE Trans. on Neural Networks, 1994. [2] K. Tanaka, “An approach to stability criteria of neural network control systems,” IEEE Trans. on Neural Networks, 2002. [3] J. A. K. Suykens, J. Vandewalle, and B. L. R. D. Moor, “Nlq theory: Checking and imposing stability of recurrent neural networks for nonlinear modelling,” IEEE Trans. Signal Processing, 1997. [4] N. E. Barabanov and D. V. Prokhorov, “Stability analysis of discretetime recurrent neural networks,” IEEE Trans. on Neural Networks, 2002. [5] N. E. Barabanov and D. V. Prokhorov, “A new method for stability analysis of nonlinear discretetime systems,” IEEE Trans. on Automatic Control, 2003. [6] M. Liu, “Delayed standard neural network models for control systems,” IEEE Trans. Neural Networks, 2007. [7] R. Jafari and M. T. Hagan, “Global stability analysis using the method of reduction of dissipativity domain,” Proceedings of international joint conference on neural networks, 2011. [8] K. Kim, E. R. Patron, and R. D. Braatz, “Standard representation and stability analysis of dynamic artificial neural networks: A unified approach,” Proceedings of IEEE symposium computeraided control system design, 2011. 92 [9] O. D. Jesus and M. T. Hagan, “Backpropagation algorithms for a broad class of dynamic networks,” IEEE Trans. Neural Networks, 2007. [10] J. A. K. Suykens, J. Vandewalle, and B. L. R. D. Moor, “Lure systems with multilayer perceptron and recurrent neural networks: Absolute stability and dissipativity,” IEEE Trans. Automatic Control, 1997. [11] J. C. Williem, “Dissipative dynamical systems part i: General theory,” Arch. Rational Mech. Anal., 1972. [12] J. C. Williem, “Dissipative dynamical systems part ii: Linear systems with quadratic supply rates,” Arch. Rational Mech. Anal., 1972. [13] D. J. Hill and P. J. Moylan, “The stability of nonlinear dissipative systems,” IEEE Trans. Automatic Control, 1976. [14] D. J. Hill and P. J. Moylan, “Stability results for nonlinear feedback systems,” Automatica, 1977. [15] W. M. Haddad and V. Chellaboina, Nonlinear dynamical systems and control: A Lyapunovbased approach. Princeton University Press, 1st ed., 2008. [16] L. Jin and M. M. Gupta, “Stable dynamic backpropagation learning in recurrent neural networks,” International journal of robust and nonlinear control, 1999. [17] J. Horn, O. D. Jesus, and M. T. Hagan, “Spurious valleys in the error surface of recurrent networks  analysis and avoidance,” IEEE Trans. Neural Networks, 2007. [18] N. E. Barabanov and D. V. Prokhorov, “Global stability analysis of discretetime recurrent neural networks,” Proceedings of the American Control Conference, 2001. [19] J. Zhao and D. J. Hill, “Dissipativity theory for switched systems,” IEEE Trans. on Automatic Control, 2008. 93 [20] W. M. Haddad and Q. Hui, “Dissipativity theory for discontinuous dynamical systems: Basic input, state, and output properties, and finitetime stability of feedback interconnections,” Journal of Nonlinear Analysis: Hybrid systems, 2009. [21] V. Chellaboina and W. Haddad, “Stability margins of discretetime nonlinear nonquadratic optimal regulators,” International Journal of System Science, 2002. [22] V. Chellaboina, W. Haddad, and A. Kamath, “A dissipative dynamical system to stability analysis of time delay systems,” International Journal of Robust and Nonlinear Control, 2004. [23] C. A. Jacobson, A. M. Stankovic, G. Tadmor, and M. A. Stevens, “Towards a dissipativity framework for power system stabilizer design,” IEEE Trans. on Power Systems, 1996. [24] B. Brogliato, R. Lozano, B. Maschke, and O. Egeland, Dissipative systems analysis and control: Theory and applications. Springer, 2nd ed., 2007. [25] S. Hu and J. Wang, “Global stability of a class of discretetime recurrent neural networks,” IEEE Trans. Circuits Syst. I: Fundam. Theory Appl., 2002. [26] L.Wang and Z. Xu, “Sufficient and necessary conditions for global exponential stability of discretetime recurrent neural networks,” IEEE Trans. on Circuits and Systems I: Regular Papers, 2006. [27] S. Hu and J. Wang, “Global robust stability of a class of discretetime interval neural networks,” IEEE Trans. Circuits Syst. I: Reg. Papers, 2006. [28] M. F. Moller, “A scaled conjugate gradient algorithm for fast supervised learning,” Neural Networks, vol. 6, pp. 525533, 1993. [29] M. H. Beal, M. T. Hagan, and H. B. Demuth, “Neural network toolbox user’s guide,” Matlab, 2012. 94 [30] N. P. V. D. AA, H. G. T. Morsche, and R. R. M. Mattheij, “Computation of eigenvalue and eigenvector derivatives for a general complexvalued eigensystem,” A publication of the International Linear Algebra Society, vol. 16, pp. 300314, 2007. [31] M. T. Hagan, H. B. Demuth, and O. D. Jesus, “An introduction to the use of neural networks in control systems,” International journal of robust and nonlinear control, 2002. [32] R. M. Goodall, “The theory of electromagnetic levitation,” Physics in Technology, 1985. [33] N. E. Barabanov and D. V. Prokhorov, “Two alternative stability criteria for discretetime rmlp,” Proceedings of the 41st IEEE conference on Decision and Control, 2002. 95 Name: Nam Hoai Nguyen Date of Degree: December, 2012 Institution: Oklahoma State University Location: Stillwater, Oklahoma Title of Study: STABILITY ANALYSIS OF RECURRENT NEURAL NETWORKS USING DISSIPATIVITY Pages in Study: 95 Candidate for the Degree of Doctor of Philosophy Major Field: Electrical Engineering The purpose of this work is to describe how dissipativity theory can be used for the stability analysis of discretetime recurrent neural networks and to propose a training algorithm for producing stable networks. Using dissipativity theory, we have found conditions for the globally asymptotic stability of equilibrium points of Layered Digital Dynamic Networks (LDDNs), a very general class of recurrent neural networks. The LDDNs are transformed into a standard interconnected system structure, and a fundamental theorem describing the stability of interconnected dissipative systems is applied. The theorem leads to several new sufficient conditions for the stability of equilibrium points for LDDNs. These conditions are demonstrated on several test problems and compared to previously proposed stability conditions. From these novel stability criteria, we propose a new algorithm to train stable recurrent neural networks. The standard mean square error performance index is modified to include stability criteria. This requires computation of the derivative of the maximum eigenvalue of a matrix with respect to neural network weights. The new training algorithm is tested on two examples of neural networkbased model reference control systems, including a magnetic levitation system. ADVISOR’S APPROVAL: Dr. Martin Hagan VITA Nam Hoai Nguyen Candidate for the Degree of Doctor of Philosophy Dissertation: STABILITY ANALYSIS OF RECURRENT NEURAL NETWORKS USING DISSIPATIVITY Major Field: Electrical Engineering Biographical: Education: Received the B.S. degree from Hanoi University of Science and Technology, Hanoi, Vietnam, 2002, in Electrical Engineering. Received the M.S. degree from Hanoi University of Science and Technology, Hanoi, Vietnam, 2005, in Electrical Engineering. Completed the requirements for the degree of Doctor of Philosophy in Electrical Engineering at Oklahoma State University, Stillwater, Oklahoma in December, 2012. Experience: Worked for Thai Nguyen University of Technology, Thai Nguyen, Vietnam as a lecturer from 2002 to 2009.
Click tabs to swap between content that is broken into logical sections.
Rating  
Title  Stability Analysis of Recurrent Neural Networks Using Dissipativity 
Date  20121201 
Author  Nguyen, Nam Hoai 
Keywords  discretetime systems, dissipativity, model reference control systems, recurrent neural networks, stability analysis, training algorithms 
Department  Electrical Engineering 
Document Type  
Full Text Type  Open Access 
Abstract  The purpose of this work is to describe how dissipativity theory can be used for the stability analysis of discretetime recurrent neural networks and to propose a training algorithm for producing stable networks. Using dissipativity theory, we have found conditions for the globally asymptotic stability of equilibrium points of Layered Digital Dynamic Networks (LDDNs), a very general class of recurrent neural networks. The LDDNs are transformed into a standard interconnected system structure, and a fundamental theorem describing the stability of interconnected dissipative systems is applied. The theorem leads to several new sufficient conditions for the stability of equilibrium points for LDDNs. These conditions are demonstrated on several test problems and compared to previously proposed stability conditions. From these novel stability criteria, we propose a new algorithm to train stable recurrent neural networks. The standard mean square error performance index is modified to include stability criteria. This requires computation of the derivative of the maximum eigenvalue of a matrix with respect to neural network weights. The new training algorithm is tested on two examples of neural networkbased model reference control systems, including a magnetic levitation system. 
Note  Dissertation 
Rights  � Oklahoma Agricultural and Mechanical Board of Regents 
Transcript  STABILITY ANALYSIS OF RECURRENT NEURAL NETWORKS USING DISSIPATIVITY By NAM HOAI NGUYEN Bachelor of Science in Automatic Control Hanoi University of Science and Technology Hanoi, Vietnam 2002 Master of Science in Measurement and Control Systems Hanoi University of Science and Technology Hanoi, Vietnam 2005 Submitted to the Faculty of the Graduate College of Oklahoma State University in partial fulfillment of the requirements for the Degree of DOCTOR OF PHILOSOPHY December, 2012 COPYRIGHT ⃝c By NAM HOAI NGUYEN December, 2012 STABILITY ANALYSIS OF RECURRENT NEURAL NETWORKS USING DISSIPATIVITY Dissertation Approved: Dr. Martin Hagan Dissertation Advisor Dr. Carl Latino Dr. George Scheets Dr. Lisa Mantini Dr. Sheryl Tucker Dean of the Graduate College iii TABLE OF CONTENTS Chapter Page 1 INTRODUCTION 1 2 DISSIPATIVITY AND STABILITY FOR CONTINUOUSTIME SYSTEMS 4 2.1 Continuoustime dynamical systems . . . . . . . . . . . . . . . . . . . . . 4 2.2 Continuoustime dissipative dynamical systems . . . . . . . . . . . . . . . 6 2.3 Properties of supply rate function . . . . . . . . . . . . . . . . . . . . . . . 6 2.4 An example continuoustime dissipative dynamical system . . . . . . . . . 8 2.5 Stability of interconnected continuoustime dissipative dynamical systems . 9 3 STABILITY ANALYSIS OF RECURRENT NEURAL NETWORKS USING DISSIPATIVITY 11 3.1 Dissipative systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.1.1 Static systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.1.2 Discretetime dynamical systems . . . . . . . . . . . . . . . . . . 12 3.1.3 Dissipative systems . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2 Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.2.1 STABILITY ANALYSIS OF RECURRENT NEURAL NETWORKS USING DISSIPATIVITY . . . . . . . . . . . . . . . . . . . . . . . 14 3.2.2 Example use of dissipativity on a simple network . . . . . . . . . . 15 3.3 Stability analysis of Layered Digital Dynamic Networks (LDDNs) . . . . . 17 3.3.1 LDDNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.3.2 Transform LDDNs to a standard form . . . . . . . . . . . . . . . . 18 iv 3.3.3 Sector conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.3.4 Supply rate using sector conditions  static/scalar . . . . . . . . . . 22 3.3.5 Supply rate using sector conditions  static/vector . . . . . . . . . . 23 3.3.6 Selecting the supply rate for LDDNs . . . . . . . . . . . . . . . . . 23 3.3.7 Stability analysis of other types of Recurrent Neural Networks . . . 28 3.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4 COMPARISON OF STABILITY CRITERIA ON TEST PROBLEMS 31 4.1 Sector conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.2 Existing stability criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.2.1 Liu’s criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.2.2 Barabanov and Prokhorov’s LMI criterion . . . . . . . . . . . . . . 33 4.3 Description of test problems . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.4 Stable equilibria that can be proved stable . . . . . . . . . . . . . . . . . . 36 4.5 Stable equilibria that cannot be proved stable . . . . . . . . . . . . . . . . 39 4.6 An example of stability analysis of LDDNs . . . . . . . . . . . . . . . . . 42 4.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5 TRAINING RECURRENT NETWORKS FOR STABILITY 44 5.1 Modified performance index . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.2 Gradient computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.2.1 The derivative of eigenvalue . . . . . . . . . . . . . . . . . . . . . 46 5.2.2 The derivative of maximum eigenvalue . . . . . . . . . . . . . . . 47 6 SIMULATIONS AND TEST RESULTS FOR STABLE TRAINING 50 6.1 Model reference control (MRC) using recurrent neural networks . . . . . . 50 6.1.1 Plant training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 6.1.2 Controller training . . . . . . . . . . . . . . . . . . . . . . . . . . 53 6.2 Design the MRC for a linear system . . . . . . . . . . . . . . . . . . . . . 55 v 6.2.1 Plant model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 6.2.2 Model reference . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 6.2.3 Controller training . . . . . . . . . . . . . . . . . . . . . . . . . . 57 6.3 Design the MRC for a magnetic levitation system . . . . . . . . . . . . . . 59 6.3.1 Magnetic levitation system . . . . . . . . . . . . . . . . . . . . . . 59 6.3.2 Plant training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 6.3.3 Controller training . . . . . . . . . . . . . . . . . . . . . . . . . . 63 6.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 7 CONCLUSIONS AND FUTUREWORK 68 7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 8 APPENDIX (Test Problems) 71 8.1 Test Problem 01 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 8.2 Test Problem 02 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 8.3 Test Problem 03 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 8.4 Test Problem 04 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 8.5 Test Problem 05 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 8.6 Test Problem 06 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 8.7 Test Problem 07 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 8.8 Test Problem 08 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 8.9 Test Problem 09 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 8.10 Test Problem 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 8.11 Test Problem 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 8.12 Test Problem 12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 8.13 Test problem 13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 8.14 Test problem 14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 vi 8.15 Test problem 15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 8.16 Test problem 16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 8.17 Test problem 17 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 8.18 Test problem 18 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 8.19 Test problem 19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 8.20 Test problem 20 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 8.21 Test problem 21 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 8.22 Test problem 22 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 8.23 Test problem 23 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 BIBLIOGRAPHY 92 vii LIST OF TABLES Table Page 4.1 Eigenvalues of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.2 Time steps N to convergence . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.3 Eigenvalues of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.4 Time steps N to convergence . . . . . . . . . . . . . . . . . . . . . . . . . 41 6.1 MPM, MAE, MSE and λm after 1998 step ahead training . . . . . . . . . . 59 6.2 The maximum eigenvalue with different σ . . . . . . . . . . . . . . . . . . 64 6.3 The maximum absolute error with different σ . . . . . . . . . . . . . . . . 64 viii LIST OF FIGURES Figure Page 2.1 Interconnected continuoustime systems . . . . . . . . . . . . . . . . . . . 9 3.1 Interconnected systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 Example LDDN network . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.3 The function f(u) satisfying sector conditions . . . . . . . . . . . . . . . . 22 4.1 Trajectory for test problem 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.2 Trajectory for test problem 2 . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.3 Trajectory for test problem 15 . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.4 Trajectory for test problem 22 . . . . . . . . . . . . . . . . . . . . . . . . . . 41 6.1 Model reference control system . . . . . . . . . . . . . . . . . . . . . . . . 51 6.2 NNbased model reference control system . . . . . . . . . . . . . . . . . . 51 6.3 Neural network plant model . . . . . . . . . . . . . . . . . . . . . . . . . 52 6.4 Network architecture for onestepahead plant training . . . . . . . . . . . 53 6.5 Neural network plant model and neural network controller . . . . . . . . . 54 6.6 Open loop network for controller training . . . . . . . . . . . . . . . . . . 55 6.7 The plant network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 6.8 The reference input and the target . . . . . . . . . . . . . . . . . . . . . . 57 6.9 NNbased MRC system for the linear plant . . . . . . . . . . . . . . . . . 58 6.10 The magnetic levitation system . . . . . . . . . . . . . . . . . . . . . . . . 59 6.11 Control input and target . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 6.12 Performance Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 ix 6.13 The network output, the target and the error . . . . . . . . . . . . . . . . . 62 6.14 The reference model input and output (target) . . . . . . . . . . . . . . . . 63 6.15 The mean square error with σ = 10−6 after 10stepahead training . . . . . 65 6.16 The maximum eigenvalue with σ = 10−6 after 10stepahead training . . . 66 6.17 The combined performance index with σ = 10−6 after 10stepahead training 67 x CHAPTER 1 INTRODUCTION The objective of this research is to use dissipativity theory to analyze stability for a general class of discretetime recurrent neural networks (RNNs). Then, a new training algorithm is proposed to train RNNs for stability. Other efforts on the stability analysis of RNNs have been made in recent years. Jin and Gupta [1] found absolute stability conditions for RNNs based on Ostrowski’s theorem. The networks they dealt with contained only a single layer without biases. Tanaka [2] analyzed the stability of neural network systems by using stability conditions, based on Lyapunov theory, of linear differential inclusions. The neural network systems investigated by Tanaka include a system that combines a neural network plant model and a neural network controller. However, there are no biases included in the neural networks. Suykens [3] found stability criteria for a class of RNNs that can be represented in a form he designated as NLq. These criteria provide sufficient conditions for asymptotic stability. However, he did not deal with the case of nonzero biases. Recently, Barabanov and Prokhorov proposed an approach for the stability analysis of RNNs with sectortype nonlinearities and nonzero biases based on the theory of absolute stability [4]. They later developed a new method based on reduction of dissipativity domain [5]. This method works effectively if the system has a convex Lyapunov function. Later, Liu proposed a generic network model, which is referred to as the discretetime standard neural network model (DSNNM) [6]. The DSNNM represents a neural network model as the interconnection of a linear dynamic system and static nonlinear operators. Liu found some criteria for the globally asymptotic stability of equilibrium points of the DSNNM. 1 More recently, the method in [5] has been modified to speed up convergence by Jafari and Hagan [7]. Most recently, Kim and Braatz [8] used a modified Lure’Postnikov function to obtain stability criteria for some classes of standard nonlinear operator forms. These methods can be applied to RNNs with nonzero biases. However, they have not yet been applied to the stability analysis of layered digital dynamic networks (LDDNs) [9]. A few authors in the past have applied dissipativity theory to continuoustime neural networks [10].We now want to apply this theory to discretetime networks. In this work, a general class of discretetime recurrent neural networks (LDDNs) will be considered, and dissipativity theory will be used to analyze stability of equilibrium points for LDDNs. Dissipativity theory was first developed by Jan C. Willems [11, 12] in the 1970s. The other major authors of this theory are David Hill and Peter Moylan [13, 14]. The term ”dissipativity” was inspired by the concept of passivity. A system for which the rate of increase in its stored energy is not greater than the absorbed input power is a passive system. In dissipativity theory, the stored energy is generalized by a storage function and the input power is generalized by a supply rate function. One of the important results in passivity theory states that if two passive systems are connected in a feedback loop, then the resulting closed loop system is stable. A corresponding result from dissipativity theory states that if two dissipative systems are connected in a feedback loop, then the closed loop system is stable under certain conditions. This result will be used extensively throughout our work. Dissipativity was first defined for continuoustime dynamic systems. Later, a discretetime version of dissipativity was developed by W. Haddad [15]. However, to our knowledge, dissipativity theory has not yet been applied to discretetime RNNs. In the past, Liang Jin and Madan Gupta developed two stable dynamic backpropagation learning algorithms for a class of RNNs [16]. They used both local and global stability conditions to maintain the network stability during the training. Recently, Jason Horn, Orlando de Jesus and Martin Hagan [17] demonstrated that there are spurious valleys in 2 the error surface of RNNs. These spurious valleys occur in unstable regions of the error surfaces and can cause difficulties in training. They suggested that one might be able to use a constrained optimization process to avoid the unstable region during training, but the constraints would be extremely complex for LDDNs. In this work, several stability criteria based on dissipativity will be proposed. Then these novel criteria will be compared with those of Liu [6] and Barabanov and Prokhorov [18] on several test problems. Based on these criteria, we will propose a new training algorithm to train recurrent neural networks for stability. The new training algorithm will be tested on two examples of model reference control systems: a linear plant and a nonlinear plant. This proposal includes seven chapters. The next chapter describes dissipativity theory for continuoustime systems. Chapter 3 is about the stability analysis of discretetime recurrent neural networks using dissipativity. It presents some fundamental concepts and theorems, and gives a brief introduction to LDDNs. Next, a method is proposed to transform LDDNs into a standard interconnected system form. Then, sector conditions are introduced and some important lemmas and theorems are proposed. Finally, novel stability criteria for the equilibrium points of LDDNs, based on these lemmas and theorems, are found. In Chapter 4, existing stability criteria are reviewed and then compared with the novel criteria on a large number of test problems. The following chapter develops a framework for training recurrent neural networks for stability. A modified performance index is defined and a brief review of the first derivative of eigenvalue with respect to a matrix parameter is provided. Then, the first derivative of the maximum eigenvalue with respect to network weights is represented. In Chapter 6, we introduce neural networkbased model reference control systems. The proposed training algorithm is applied to train neural network controllers for both a linear plant and a magnetic levitation system. The final chapter provides a summary and proposes future work. 3 CHAPTER 2 DISSIPATIVITY AND STABILITY FOR CONTINUOUSTIME SYSTEMS The theory of dissipativity was first developed by Willems [11], [12] for continuoustime dynamical systems. Recently, it has been extended to discretetime dynamical systems [15], switched systems [19], and hybrid systems [20]. It has been applied to not only stability analysis [13], [14], [21], [22] but also controller synthesis [15], [20], [23], [24]. This chapter reviews dissipativity theory for continuoustime systems. There are two settings in which dissipative dynamical systems have been defined: the inputoutput setting and the inputstateoutput setting. This chapter will concentrate on the second setting. The chapter will begin by introducing continuoustime dynamical systems. Next, continuoustime dissipative dynamical systems are defined, followed by an example. Finally, stability analysis of interconnected continuoustime dynamical systems is introduced. 2.1 Continuoustime dynamical systems This section presents some definitions [11] concerning dynamical systems and dissipative systems. It also discusses the main properties of dissipative dynamical systems. We begin by introducing some notation. • R is the set of real numbers. • Rn is the n dimensional Euclidean space. • R+ is the set of nonnegative real numbers. • R2 + = { (t2, t1) ∈ R2t2 > t1 } . 4 • Shift operater σT (.): Given a function s(t), t ∈ R, then the shift operator is defined as σT (s) = s(t + T). Continuoustime dynamical systems were defined by Willems [11] as follows. Definition 2.1 A dynamical system is defined as follows: • X, U and Y are called the state space, the set of input values, and the set of output values, respectively. • U∗ is called the input space and it contains a class of Uvalued functions on R. • Y ∗ is called the output space and it contains a class of Yvalued functions on R. • Assume that U∗ and Y ∗ are closed under the shift operator. • : R2 + × X × U∗ → X is called the state transtion function. The following axioms hold: 1. Consistency: (t0, t0, x0, u) = x0 for all t0 ∈ R, x0 ∈ X, and u ∈ U∗. 2. Determinism: (t1, t0, x0, u1) = (t1, t0, x0, u2) for all (t1, t0) ∈ R2 +, x0 ∈ X, and u1, u2 ∈ U∗ satisfying u1(t) = u2(t) for t0 ≤ t ≤ t1. 3. Semigroup property: (t2, t0, x0, u) = (t2, t1, (t1, t0, x0, u), u) for t0 ≤ t1 ≤ t2, x0 ∈ X and u ∈ U∗. 4. Stationary: (t1 +T, t0 +T, x0, σT (u)) = (t1, t0, x0, u) for all (t1, t0) ∈ R2 +, T ∈ R, x0 ∈ X, and u, σT (u) ∈ U. • w : X × U → Y is called the readout function. • The Yvalued function y(t) = w( (t, t0, x0, u), u(t)) for t ≥ t0. It is assumed that a dynamical system is given together with a real valued function r(u, y) = r(u(t), y(t)) called the supply rate function. The constraint of the supply rate 5 function is that for any (t1, t0) ∈ R2 +, u ∈ U and y ∈ Y r(u, y) satisfies ∫ t1 t0 r(u, y)dt < ∞. 2.2 Continuoustime dissipative dynamical systems Continuoustime dissipative dynamical systems were defined by Willems [11] as follows. Definition 2.2 A dynamical system with the supply rate function r(u, y) is said to be dissipative if there exists a nonnegative function S : X → R+, called the storage function, such that for all (t1, t0) ∈ R2 +, x0 ∈ X and u ∈ U, S(x1) ≤ S(x0) + ∫ t1 t0 r(u, y)dt (2.1) where x1 = (t1, t0, x0, u) and r(u, y) = r(u(t), y(t)) with y(t) = w(x(t), u). The inequality (2.1) is called the dissipation inequality. 2.3 Properties of supply rate function There are some key properties of supply rate functions that are explained in the following two lemmas. Lemma 2.1 If a dynamical system is dissipative with respect to the supply rate r(u, y), then it is also dissipative with respect to the supply rate λr(u, y) where λ > 0. Proof. Assume a dynamical system is dissipative with respect to the supply rate r(u, y). Then there exists a storage function S(x) such that the dissipation inequality (2.1) holds. This means that S(x1) ≤ S(x0) + ∫ t1 t0 r(u, y)dt (2.2) Multiplying both sides of (2.2) by λ, we get λS(x1) ≤ λS(x0) + λ ∫ t1 t0 r(u, y)dt 6 Let r1(u, y) = λr(u, y) and S1(x) = λS(x). Then S1(x) is a storage function and r1(u, y) is a supply rate function. It follows that S1(x1) ≤ S1(x0) + ∫ t1 t0 r1(u, y)dt Therefore the system is dissipative with respect to the supply rate λr(u, y). Lemma 2.2 If a dynamical system is dissipative with respect to the supply rate ri(u, y) for i = 1, 2, ..., n, then it is also dissipative with respect to the supply rate r(u, y) = Σn i=1 ri(u, y). Proof. Since the system is dissipative with respect to the supply rate ri(u, y) for i = 1, 2, ..., n, there exists a storage function Si(x) such that the dissipation inequality (2.1) holds. This means that Si(x1) ≤ Si(x0) + ∫ t1 t0 ri(u, y)dt (2.3) for i = 1, 2, ..., n. Taking the summation on both sides of (2.3) where i goes from 1 to n, we get Σn i=1 Si(x1) ≤ Σn i=1 Si(x0) + Σn i=1 ∫ t1 t0 ri(u, y)dt (2.4) The inequality (2.4) is equivalent to Σn i=1 Si(x1) ≤ Σn i=1 Si(x0) + ∫ t1 t0 Σn i=1 ri(u, y)dt (2.5) Let’s define r(u, y) = Σn i=1 ri(u, y) and S(x) = Σn i=1 Si(x).Then (2.5) is equivalent to S(x1) ≤ S(x0) + ∫ t1 t0 r(u, y)dt (2.6) Therefore the system is dissipative with respect to the supply rate r(u, y). Based on these lemmas, we can find supply rate functions for continuoustime dynamical systems. 7 2.4 An example continuoustime dissipative dynamical system Consider a mass, spring and damper system [15]. The equation of motion is mx (t) + Dx_ (t) + Kx(t) = u(t) (2.7) where M is the mass, D is the damper constant, K is the spring stiffness, x is the position of the mass and u is the force acting on the mass. The initial conditions are x(0) = x0 and x_ (0) = x_ 0. It is assumed that M > 0, D ≥ 0 and K ≥ 0. The energy of the system is V (x, x_ ) = 0.5mx_ 2 + 0.5Kx2 The time derivative of the energy is _V (x, x_ ) = mx x_ + Kxx_ = ux_ − Dx_ 2 Let’s define x1 = x, x2 = x_ and y = x_ as state variables and the output of the system (2.7), respectively. Then _V (x) = uy − Dx_ 2 (2.8) where x = [x1 x2]. Integrating both sides of equation (2.8) from t = 0 to t = T gives V (x(T)) = V (x(0)) + ∫ T 0 uydt − ∫ T 0 Dx_ 2dt (2.9) where V (x(T)) is the energy at t = T and V (x(0)) is the initial energy. The second term and the last term on the right side of (2.9) are the energy supplied by the external source and the energy dissipated by the damper, respectively. From (2.9) we get V (x(T)) ≤ V (x(0)) + ∫ T 0 uydt (2.10) since ∫ T 0 Dx_ 2dt ≥ 0 (2.11) 8 Let’s define S(x) = V (x) and r(u, y) = uy. Then S(x) is a storage function and r(u, y) is a supply rate. The inequality (2.10) means that the system (2.7) is dissipative. In this example, the storage function is the stored energy and the supply rate is the absorbed input power. The dissipativity of the system says that the change in its stored energy is not greater than the absorbed input power. 2.5 Stability of interconnected continuoustime dissipative dynamical systems Consider dynamical systems 1 and 2 that are interconnected via constraints u1 = −y2 and u2 = y1, as shown in Fig. 2.1. Suppose that equilibrium points of systems 1 and 2 are located at the origin. Σ1 Σ2 u1 y2 y1 u2 Figure 2.1: Interconnected continuoustime systems Theorem 2.1 [11] If the system 1 is dissipative with respect to the supply rate r1(u1, y1), the system 2 is dissipative with respect to the supply rate r2(u1, y2) and r1(u1, y2) + r2(u1, y2) = 0 then the origin of the feedback system is stable. Proof. Since the system 1 is dissipative with respect to r1(u1, y1), there exists a storage function S1(x1) ≥ 0 such that _S 1(x1) ≤ r1(u1, y1). Since the system 2 is dissipative with respect to r2(u2, y2), there exists a storage function S2(x2) ≥ 0 such that _S 2(x2) ≤ r2(u2, y2). Thus S1(x1)+S2(x2) ≥ 0 and _S 1(x1)+ _S 2(x2) ≤ r1(u1, y1)+r2(u2, y2). Since 9 r1(u1, y1) + r2(u2, y2) = 0, _S 1(x1) + _S 2(x2) ≤ 0. Let’s define V (x) = S1(x1) + S2(x2). Then _V (x) ≤ 0. Therefore the origin of the feedback system is stable. In the next chapter, this theorem will be extended to the case where 1 is static and 2 is a discretetime dynamical system. 10 CHAPTER 3 STABILITY ANALYSIS OF RECURRENT NEURAL NETWORKS USING DISSIPATIVITY The previous chapter introduced the concept of dissipativity for continuous time dynamic systems and demonstrated how dissipativity can be used to prove stability for interconnected dissipative systems. In this chapter, we extend these ideas to analyze the stability of discretetime recurrent neural networks. We begin by defining dissipativity for static and discretetime dynamic systems and update the stability theorem for interconnected dissipative systems. Then we introduce a general framework for representing RNNs and show how this general framework can be represented in a standard form. From the standard form we can apply the stability theorem for interconnected dissipative systems. We will derive three different criteria for testing the stability of RNNs through different choices of supply rate functions. 3.1 Dissipative systems 3.1.1 Static systems Consider the system y = f(u) (3.1) where u ∈ U ⊆ Rm, y ∈ Y ⊆ Rl, f : U → Y and 0 ∈ U. Without loss of generality, let f(0) = 0. Definition 3.1 The system (3.1) is called a static system if the outputs y depend only on current values of inputs u. 11 3.1.2 Discretetime dynamical systems There are several possible representations of discretetime dynamical systems. The representation used in [15] is x(k + 1) = h(x(k), u(k)) y(k) = g(x(k), u(k)) (3.2) where x(k) ∈ D ⊆ Rn, u ∈ U ⊆ Rm, y ∈ Y ⊆ Rl, h : D × U → Rn, g : D × U → Y , and D is open and contains 0. It is assumed that h and g are continuous mappings, and h has at least one equilibrium point. Without loss of generality, suppose h(0, 0) = 0 and g(0, 0) = 0. 3.1.3 Dissipative systems Definition 3.2 A function r : U×Y → R is a supply rate for a given system if the following conditions are satisfied: 1. r(u, y) = 0 if and only if u = 0. 2. Σk=k2 k=k1 r(u(k), y(k)) < ∞, for all k1, k2 ∈ Z+. Definition 3.3 The discretetime dynamical system (3.2) is said to be dissipative with respect to the supply rate r(u, y) if there exists a continuous radially unbounded, positive definite function V : D → R satisfying V (0) = 0 and the following inequality V (x(k + 1)) − V (x(k0)) ≤ Σk i=k0 r(u(i), y(i)) (3.3) holds for all k0, k ∈ Z+ such that k ≥ k0, where x(k) is the solution of system (3.2) with u ∈ U. The function V is called a storage function. If the inequality in (3.3) is strict for x(k) ̸= 0, then system (3.2) is said to be strictly dissipative. Definition 3.4 The static system (3.1) is said to be dissipative with respect to the supply rate r(u, y) if r(u, y) ≥ 0 for all u ∈ U. If the inequality is strict for u ̸= 0, then (3.1) is strictly dissipative with respect to r(u, y). 12 Definition 3.5 [15, p. 803] The system (3.2) is said to be zerostate observable if u(k) = 0 and y(k) = 0 implies x(k) = 0. Theorem 3.1 If the storage function V satisfies V (x(n + 1)) − V (x(n)) ≤ r(u(n), y(n)) for all n ∈ Z+, then (3.2) is dissipative with respect to supply rate r(u, y). Moreover, if the inequality is strict for x(n) ̸= 0, then (3.2) is strictly dissipative with respect to r(u, y). Proof. We will prove that the system is dissipative. The proof for the strictly dissipative case follows in a similar manner. Since the inequality holds for all n ∈ Z+, it holds for k0, k0 + 1, ..., k − 1, k where k0 ∈ Z+. Thus we get V (x(k0 + 1)) − V (x(k0)) ≤ r(u(k0), y(k0)) V (x(k0 + 2)) − V (x(k0 + 1)) ≤ r(u(k0 + 1), y(k0 + 1)) ... V (x(k + 1)) − V (x(k)) ≤ r(u(k), y(k)) Adding these inequalities together, we have V (x(k + 1)) − V (x(k0)) ≤ Σk i=k0 r(u(i), y(i)) Therefore, (3.2) is dissipative with respect to r(u, y). 3.2 Stability In this section we will define stability for discretetime systems and then update Theorem 2.1 on the stability of interconnected dissipative systems to include the case of static and discretetime subsystems. Consider system (3.2) with equilibrium point xe = 0. 13 Definition 3.6 [15, p. 765] The equilibrium point xe is said to be Lyapunov stable if for all ϵ > 0 there exists δ > 0 such that if x(0) < δ then x(k) < ϵ for all k ∈ Z+. In addition, it is said to be globally asymptotically stable (GAS) if it is Lyapunov stable and for all x(0) ∈ Rn, limk→∞ x(k) = 0. 3.2.1 STABILITY ANALYSIS OF RECURRENT NEURAL NETWORKS USING DISSIPATIVITY Consider the interconnected system shown in Fig. 3.1, where subsystem 1 is a static system of the form (3.1), subsystem 2 is a discretetime dynamical system of the form (3.2), u1(k) = −y2(k) and u2(k) = y1(k). Assume that f(0) = 0 and the system 2 has an equilibrium point xe = 0. Σ1 Σ2 u1(k) y2(k) y1(k) u2(k) Figure 3.1: Interconnected systems Theorem 3.2 If subsystem 1 is dissipative with respect to the supply rate r1(u1(k), y1(k)), subsystem 2 is dissipative with respect to the supply rate r2(u2(k), y2(k)), and r1(u1(k), y1(k)) + r2(u2(k), y2(k)) ≤ 0, then the origin of the interconnected system is stable. Moreover, the origin of the system is globally asymptotic stable (GAS), if the system 2 is zero state observable and one of the following additional conditions holds: 1. Either 1 is strictly dissipative with respect to the supply rate r1(u1(k), y1(k)) or 2 is strictly dissipative with respect to the supply rate r2(u2(k), y2(k)). 14 2. r1(u1(k), y1(k)) + r2(u2(k), y2(k)) < 0 when the variables are not all equal to zero. Proof. Since 1 is static and dissipative with respect to the supply rate r1(u1(k), y1(k)), r1(u1(k), y1(k)) ≥ 0. Since subsystem 2 is dissipative with respect to the supply rate r2(u2(k), y2(k)), there exists a storage function V2(x2(k)) such that V2(x2(k + 1)) − V2(x2(k)) ≤ r2(u2(k), y2(k)). Since r1(u1(k), y1(k)) ≥ 0, r2(u2(k), y2(k)) ≤ −r1(u1(k), y1(k)) ≤ 0. Thus V2(x2(k + 1))−V2(x2(k)) ≤ 0. V2(x2(k)) is a Lyapunov function for the closed loop system, therefore the origin of the system is stable. If either the subsystem 1 or the subsystem 2 is strictly dissipative, then V2(x2(k + 1)) − V2(x2(k)) < 0. If both subsystem 1 and subsystem 2 are dissipative, and r1(u1(k), y1(k)) + r2(u2(k), y2(k)) < 0, then V2(x2(k + 1)) − V2(x2(k)) < 0. In either case, if V2(x2(k+1))−V2(x2(k)) = 0 then r1(u1(k), y1(k)) = r2(u2(k), y2(k)) = 0. So u1(k) = 0 and u2(k) = 0 by definition of supply rate. Thus y2(k) = 0 by constraints of interconnected systems. Since the system 2 is zero state observable, x2(k) = 0 . This shows that the origin of the system is GAS. 3.2.2 Example use of dissipativity on a simple network To introduce our proposed method, we start with an example of a recurrent neuron with a zero bias x(k + 1) = tanh(wx(k)). (3.4) 15 This system (3.4) has an equilibrium point xe = 0, and the transfer function tanh satisfies the condition 0 ≤ tanh(u) u ≤ 1. Now let’s transform the system (3.4) into two interconnected subsystems as in Fig. 3.1, where the subsystem 1 is y1(k) = tanh(u1(k)), the subsystem 2 is y2(k) = −wu2(k − 1), and u1(k) = −y2(k) and u2(k) = y1(k). Let’s choose a supply rate r1(u1(k), y1(k)) = u21 (k) − y2 1(k). Then r1(u1(k), y1(k)) = u21 (k) − tanh2(u1(k)). Thus r1(u1(k), y1(k)) ≥ 0. Therefore by definition 3.4, the system 1 is dissipative with respect to the supply rate r1(u1(k), y1(k)). Let r2(u2(k), y2(k)) = −r1(u1(k), y1(k)). Then r2(u2(k), y2(k)) = −u21 (k) + y2 1(k) = u22 (k) − y2 2(k) = u22 (k) − w2u22 (k − 1). Let x2(k) = u2(k − 1). Then x2(k + 1) = u2(k) and y2(k) = −wx2(k). If u2(k) = 0 and y2(k) = 0, then x2(k) = 0. So the system 2 is zero state observable. Choose V2(x2(k)) = qx22 (k), where q > 0. Then V2(x2(k)) = qu22 (k − 1) and V2(x2(k + 1)) = qu22 (k). We claim that if w2 < 1, then the origin of (3.4) is GAS. Since w2 < 1, there exists q > 0 such that w2 < q < 1. So (1 − q)u22 (k) + (q − w2)u22 (k − 1) > 0. 16 It follows that u22 (k) − w2u22 (k − 1) > qu22 (k) − qu22 (k − 1). Thus V2(x2(k + 1)) − V2(x2(k)) < r2(u2(k), y2(k)). Therefore the system 2 is strictly dissipative with respect to the supply rate r2(u2(k), y2(k)). By Theorem 3.2, the origin is GAS for the closed loop system. In the next section, this result is generalized for LDDNs. 3.3 Stability analysis of Layered Digital Dynamic Networks (LDDNs) In this section, we introduce a general framework for representing recurrent neural networksthe Layered Digital Dynamic Network. We then show how this class of network can be represented in a standard form, and how the standard form can be represented in the interconnected system form, so that Theorem 3.2 can be used to demonstrate stability. By selecting different supply rate functions, we develop three different criteria for determining stability of LDDNs. 3.3.1 LDDNs In this section, we want to describe the LDDN framework, first introduced in [9]. An example LDDN is shown in Fig. 3.2. The net input nm(k) for layer m of an LDDN can be computed nm(k) = Σ l∈Lf m Σ d∈DLm;l LWm;l(d)al(k − d) + Σ l∈Im Σ d∈DIm;l IWm;l(d)pl(k − d) + bm (3.5) where pl(k) is the lth input to the network at time k, IWm;l is the input weight between input l and layer m, LWm;l is the layer weight between layer l and layer m, bm is the bias vector for layer m, DLm;l is the set of all delays in the tapped delay line between layer l 17 and layer m, Im is the set of indices of input vectors that connect to layer m, and Lf m is the set of indices of layers that connect directly forward to layer m. The output of layer m is am(k) = fm(nm(k)) (3.6) form = 1, 2, · · · , M. The set ofM paired equations (3.5) and (3.6) describes the LDDN. 3.3.2 Transform LDDNs to a standard form The LDDN is described by (3.5) and (3.6). Our goal in this section is to transform the LDDN into the form of (3.7), which we will call the standard form. It is assumed that the matrix W2 has the property that it can be transformed into a strictly triangular matrix through a reordering of the elements of the state vector x. This guarantees that x(k) can be solved iteratively from some initial x(k0). This is equivalent to the condition that the LDDN contains no zerodelay loops. x(k + 1) = f(W1x(k) +W2x(k + 1) + b) (3.7) Assume that inputs pl(k) are constant. Let Sm be the number of neurons in layer m. Let hm = Σ l∈Im Σ d∈DIm;l IWm;l(d)pl(k − d) + bm, b = [(h1)T 0 · · · (h2)T 0 · · · (hM)T · · · 0]T( S×1), dl = max {d ∈ DLm;lm = 1, 2, ..., M} , where S = ΣM i=1 Sidi. Consider layer m for m = 1, 2, ..., M. Let xm1 (k) = am(k − 1), xm2 (k) = am(k − 2), ..., xmd m(k) = am(k −dm). Then xm1 (k + 1) = am(k), xm2 (k + 1) = am(k −1), ..., xmd m(k + 1) = am(k − dm + 1). Let xm(k) = [xm1 (k) xm2 (k) ... xmd m(k)]T (Smdm×1). Let x(k) = [x1(k) x2(k) ... xM(k)]T( S×1). Then we get the standard form (3.7), where W1 and W2 are defined in (3.8) and (3.9), respectively. 18 S 1 x 1 S 2 x 1 S 3 x 1 S 1 x 1 S 2 x 1 S 3 x 1 S 1 x 1 S 2 x 1 S 3 x 1 R x 1 1 S 1 xR S 2 x S 1 S 3 x S 2 S1 S2 S3 n1(t) n2(t) n3(t) p1(t) a1(t) a2(t) a3(t) IW1,1 LW1,3 LW2,3 LW1,1 LW2,1 LW3,2 b1 b2 b1 1 1 3 R1 Inputs Layer 1 Layer 2 Layer 3 T D L T D L T D L T D L f 1 f 2 f 3 Figure 3.2: Example LDDN network W1 = [W1 i;j ](S×S) (3.8) where W1 i;i = LWi;i(1) LWi;i(2) · · · LWi;i(di − 1) LWi;i(di) ISi 0 · · · 0 0 0 ISi · · · 0 0 ... ... . . . ... ... 0 0 · · · ISi 0 (Sidi×Sidi) and W1 i;j = LWi;j(1) LWi;j(2) · · · LWi;j(dj − 1) LWi;j(dj) 0 0 · · · 0 0 0 0 · · · 0 0 ... ... . . . ... ... 0 0 · · · 0 0 (Sidi×Sjdj ) if i ̸= j. W2 = [W2 i;j ](S×S) (3.9) 19 where W2 i;j = [0](Sidi×Sjdj ) if i ≤ j, and W2 i;j = LWi;j(0) 0 · · · 0 0 0 · · · 0 ... ... . . . ... 0 0 · · · 0 (di×dj ) if i > j. Also, f = f1 f2 ... fm (S×1) where fi = fi idi ... idi (diSi×1) (3.10) where ISi is an identity matrix with dimensions (Si×Si), 0 is a zero matrix with appropriate dimensions, idi is a vector of identity functions, and fi is a vector of transfer functions of layer i for i = 1, 2, ..., M. For an LDDN, the order in which the individual layer outputs must be computed to obtain the correct network output is called the simulation order (see [9]). Here we have assumed that the layers are numbered so that the simulation order (which need not be unique) increases with layer number. Suppose that the LDDN of equations (3.5) and (3.6) has an equilibrium point. Then system (3.7) has an equilibrium point. Let xe be that equilibrium point. Then xe satisfies xe = f(W1xe +W2xe + b). Let z(k) = x(k) − xe. Then z(k + 1) = f(W1z(k) +W2z(k + 1) + te) − f(te) (3.11) where te = W1xe +W2xe + b. (3.12) Therefore system (3.11) has an equilibrium point ze = 0. If this equilibrium point is GAS, then the equilibrium point of system (3.7) and that of the LDDN are also GAS. The next step is to transform system (3.11) into the interconnected system form shown in Fig. 3.1. 20 Let u1(k) = −y2(k) = W1z(k) +W2z(k + 1) u2(k) = y1(k) = z(k + 1). Then subsystem 1 is y1(k) = f(u1(k) + te) − f(te) = g(u1(k)), (3.13) and subsystem 2 is y2(k) = −W1u2(k − 1) −W2u2(k), (3.14) and the constraints are u1(k) = −y2(k), u2(k) = y1(k). Define x2(k) = u2(k−1), then x2(k+1) = u2(k) and y2(k) = −W1x2(k)−W2u2(k). In this case, subsystem 1 is a static system of the form (3.1), and 2 is a dynamic system of the form (3.2). If u2(k) = 0 and y2(k) = 0 then x2(k) = 0, so the system 2 is zero state observable. Since ze = 0 is the equilibrium point of (3.11), it is also the equilibrium point of subsystems 1 and 2. Therefore, if the origin of the interconnected systems is GAS, then the equilibrium point of the LDDN is also GAS. To analyze stability of the equilibrium point, we follow Theorem 3.2. The following sections will perform this analysis and will develop several stability criteria. 3.3.3 Sector conditions Consider a scalar static system of the form (3.1), y = f(u). The function f is said to lie inside a sector [α, β] (written as f ∈ [α, β]) if α ≤ f(u) u ≤ β, ∀u ̸= 0 and f(0) = 0. This is called a sector condition. An example function f(u) (solid curve) and its bounds αu and βu (dashed lines) are shown in Fig. 3.3. 21 y = α∙u 0y y = β∙uy u= f(u) Figure 3.3: The function f(u) satisfying sector conditions 3.3.4 Supply rate using sector conditions  static/scalar It is possible to use sector conditions to design supply rate functions for static systems. In this section we consider the scalar version of the static system (3.1). Lemma 3.1 If the function f ∈ [α, β], and the supply rate is chosen as either r(u, y) = β2u2 − y2 (using the sector upper bound) or r(u, y) = (βu − y)(y − αu), (using the sector upper and lower bounds), then the scalar static system y = f(u) is dissipative with respect to the supply rate r(u, y). Proof. From the sector condition, it follows that β2u2 ≥ y2, (βu − y)(y − αu) ≥ 0. Thus r(u, y) = β2u2 −y2 ≥ 0 and r(u, y) = (βu −y)(y − αu) ≥ 0. In either case, r(u, y) ≥ 0. Thus the static system is dissipative with respect to the supply rate r(u, y) by definition 3.4. 22 3.3.5 Supply rate using sector conditions  static/vector In the case that system (3.1) is multiinput multioutput, let’s assume u = [u1 u2 · · · un]T , y = [y1 y2 · · · yn]T , and f = [f1 f2 · · · fn]T . We will consider functions f such that yi = fi(ui). Suppose that fi ∈ [αi, βi], for i = 1, 2, ..., n. Let A = diag(αi) and B = diag(βi). Lemma 3.2 If the supply rate is chosen as either r(u, y) = uTB2u − yT y or r(u, y) = (Bu − y)TT(y − Au) + uT y where T is a positive definite diagonal matrix and is a diagonal matrix with nonnegative elements, then the system y = f(u) is dissipative with respect to the supply rate r(u, y). Proof. Let T = diag(ti) and = diag(λi) where ti > 0, λi ≥ 0 for i = 1, 2, ..., n. Since fi ∈ [αi, βi], u2i β2 i − y2 i ≥ 0, so uTB2u − yT y = Σn i=1 (u2i β2 i − y2 i ) ≥ 0. Since ti > 0, λi ≥ 0 and fi ∈ [αi, βi], (βiui − yi)ti(yi − αiui) + uiλiyi ≥ 0, so (Bu − y)TT(y − Au) + uT y = Σn i=1 [(βiui − yi)ti(yi − αiui) + uiλiyi] ≥ 0. In either case, r(u, y) ≥ 0. Thus the system is dissipative with respect to the supply rate r(u, y) by definition 3.4. 3.3.6 Selecting the supply rate for LDDNs Consider an LDDN that has been put into the standard form (3.11) and then put into the interconnected systems form of (3.13) and (3.14). Assume that the function gi ∈ [αi, βi] for i = 1, 2, ..., S. Let A = diag(αi) and B = diag(βi). Choose V2(x2(k)) = xT2 (k)Qx2(k), where Q is a positive definite matrix. Supply rate using sector upper bounds Choose r1(u1(k), y1(k)) = uT1 (k)B2u1(k)−yT1 (k)y1(k). Then the system 1 is dissipative with respect to the supply rate r1(u1(k), y1(k)) by Lemma 3.2. Choose r2(u2(k), y2(k)) = 23 −r1(u1(k), y1(k)). Then r1(u1(k), y1(k)) + r2(u2(k), y2(k)) = 0. Using the constraints of interconnected systems, we get r2(u2(k), y2(k)) = uT2 (k)u2(k) − yT2 (k)B2y2(k). Thus r2(u2(k), y2(k)) = uT2 (k)u2(k) − [W1u2(k − 1) +W2u2(k)]TB2[W1u2(k − 1) +W2u2(k)] = uT2 (k)u2(k) − [uT2 (k − 1)(W1)TB2W1u2(k − 1) + uT2 (k)(W2)TB2W1u2(k − 1) + uT2 (k − 1)(W1)TB2W2u2(k) + uT2 (k)(W2)TB2W2u2(k)]. Let P1 = I − Q − (W2)TB2W2 −(W2)TB2W1 −(W1)TB2W2 Q − (W1)TB2W1 (3.15) . Theorem 3.3 If a positive definite matrix Q can be found such that the matrix P1 is positive definite, then the equilibrium point of the LDDN is GAS. Proof. Since P1 is positive definite, [uT2 (k) uT2(k − 1)]P1[uT2 (k) uT2 (k − 1)]T > 0. Thus uT2 (k)[I − Q − (W2)TB2W2]u2(k) − uT2 (k)(W2)TB2W1u2(k − 1) − uT2 (k − 1)(W1)TB2W2u2(k) + uT2 (k − 1)[Q − (W1)TB2W1]u2(k − 1) > 0. It follows that V2(x2(k+1))−V2(x2(k)) < r2(u2(k), y2(k)). Thus the system 2 is strictly dissipative with respect to the supply rate r2(u2(k), y2(k)). In addition, the system 1 is dissipative with respect to the supply rate r1(u1(k), y1(k)), as proved above. Therefore the origin of the interconnected system is GAS by Theorem 3.2. Consequently, the equilibrium point of the LDDN is GAS. 24 Supply rate using sector lower and upper bounds Choose r1(u1(k), y1(k)) = (y1(k)−Au1(k))TT(Bu1(k))−y1(k))+uT1 (k) y1(k), where T is a positive definite diagonal matrix and is an diagonal matrix with nonnegative elements. Then the system 1 is dissipative with respect to the supply rate r1(u1(k), y1(k)) by Lemma 3.2. Choose r2(u2(k), y2(k)) = −r1(u1(k), y1(k)). Then r1(u1(k), y1(k))+r2(u2(k), y2(k)) = 0. Using the constraints of interconnected systems, we get r2(u2(k), y2(k)) = (u2(k) + Ay2(k))TT(By2(k)) + u2(k)) + yT2 (k) u2(k). Thus r2(u2(k), y2(k)) = [(I − AW2)u2(k) − AW1u2(k − 1)]T × T[(I − BW2)u2(k) − BW1u2(k − 1)] + [−W1u2(k − 1) −W2u2(k)]T u2(k) = uT2 (k)(I − AW2)TT(I − BW2)u2(k) + uT2 (k − 1)(W1)TBTAW1u2(k − 1) − uT2 (k)(I − AW2)TT(BW1)u2(k − 1) − uT2 (k − 1)(AW1)TT(I − BW2)u2(k) − uT2 (k − 1)(W1)T u2(k) − uT2(k)(W2)T u2(k). Let P2 = P11 P12 P21 P22 (3.16) 25 where P11 =T − 1 2 (W2)T (B + A)T − 1 2 T(B + A)W2 + (W2)TBTAW2 − Q − 1 2 ((W2)T + W2), P12 = − 1 2 T(A + B)W1 + (W2)TBTAW1 − 1 2 W1, P21 =P12T , P22 =Q + (W1)TBTAW1. Theorem 3.4 The equilibrium point of the LDDN is GAS if a positive definite matrix Q, a positive definite diagonal matrix T and a positive semidefinite diagonal matrix can be found such that the matrix P2 is positive definite. Proof. The proof is similar to the proof of Theorem 3.3, but the matrix P2 is used in place of the matrix P1. Supply rate using general quadratic form Choose r1(u1(k), y1(k)) = vT Fv where v = [u1(k) y1(k)]T and F = F11 F12 F21 F22 . Assume that F satisfies the condition r1(u1(k), y1(k)) ≥ 0. (3.17) Let’s define P3 = P11 P12 P21 P22 26 where P11 = − F22 − 0.5 ∗ (F21 + FT 12)W2 − 0.5(W2)T (F12 + FT 21) − (W2)TF11W2 − Q P12 = − (W2)TF11W1 − 0.5 ∗ (F21 + FT 12)W1 P21 = − (W1)TF11W2 − 0.5 ∗ (W1)T (F12 + F21) P22 =Q − (W1)TF11W1. Theorem 3.5 The equilibrium point of the LDDN is GAS if there exists a matrix F satisfying condition (3.17) and a positive definite matrix Q such that the matrix P3 is positive definite. Proof. The proof is similar to the proof of Theorem 3.3, but the matrix P3 is used in place of the matrix P1. We can see that Theorems 3.3 and 3.4 are special cases of Theorem 3.5. • If F11 = B2, F22 = −I and F12 = F21 = 0 then P3 = P1. • If F11 = −BTA, F22 = −T, F12 = BT + and F21 = TA then P3 = P2. Since the output y1(k) is a static function of u1(k), the matrix F doesn’t need to be positive definite. As we can see in Theorem 3.3, F11 = B2 is a positive definite matrix, F22 = −I is negative definite and F12 = F21 = 0. So in this case the matrix F is not positive definite. Based on this theorem we may find other stability criteria for the standard form. Conclusions In this section, we have found three new conditions for globally asymptotic stability of equilibrium points of LDDNs. The different conditions were derived by selecting different supply rate functions. In Theorem 3.3, we used the upper bound on the sector condition of the static subsystem. In Theorem 3.4, we used both the sector upper and lower bounds to define the supply rate. In Theorem 3.5, we used a general quadratic supply rate. 27 3.3.7 Stability analysis of other types of Recurrent Neural Networks Now we will apply Theorems 3.3, 3.4 and 3.5 to analyze the stability analysis of other types of RNNs. First we consider the network given by Barabanov [4, p. 292]. This network can be transformed into standard form (3.7) as follows. Let x(k) = [x1(k) x2(k) ... xn(k)]T , b = [b1 b2 ... bn]T , W1 = W1 0 0 · · · Vn 0 W2 0 · · · 0 0 0 W3 · · · 0 ... ... ... . . . ... 0 0 0 · · · Wn , and W2 = 0 0 0 · · · 0 V1 0 0 · · · 0 0 V2 0 · · · 0 ... ... ... . . . ... 0 0 0 Vn−1 0 . The next step is to apply Theorems 3.3, 3.4 and 3.5 to check stability of the equilibrium point of this network. Some examples are shown in the next chapter. Next, we consider a class of RNNs given in [1, p. 955], [25, p. 1105], [26, p. 1373] and [27, p. 130]. The network has the form z(k + 1) = Dz(k) + Eg(Wz(k) + s1) + s2 (3.18) where D, E and W are matrices with appropriate sizes, s1 and s2 are bias vectors with appropriate size and g is a vector of activation functions in the network. We can transform 28 this system into standard form (3.7) by defining new state variables as follows: x1(k + 1) = g(Wz(k) + s1) x2(k) = z(k) Then x1(k + 1) = g(Wx2(k) + s1) x2(k + 1) = Dx2(k) + Ex1(k + 1) + s2). So we have W1 = 0 W 0 D ,W2 = 0 0 E 0 , (3.19) f = [g id]T and b = [s1 s2]T , where 0 is a zero matrix with appropriate size. Thus system (3.18) is in standard form (3.7). The strategy for stability analysis of RNNs is to transform the network into the interconnected systems form and then to apply Theorem 3.2. If the network can be represented in standard form (3.7), then we can use Theorems 3.3, 3.4 and 3.5. 3.4 Conclusions This chapter has developed new criteria for analyzing the stability of recurrent neural networks. A general framework for representing recurrent networks was presented, and we demonstrated how these networks could be put into a standard form and then into an interconnected systems form. This enabled us to use Theorem 3.2 to derive stability conditions. The key to the development of the stability criteria was to establish sector conditions on the static subsystem. By using the sector bounds, we were able to define different supply rate functions. It is the flexibility in selecting the supply rate function that makes the use of dissipativity theory so attractive for stability analysis of recurrent neural networks. 29 In the next chapter we will demonstrate the performance of our new stability criteria on a variety of recurrent networks. We will also compare our new criteria with other stateofthe art methods. 30 CHAPTER 4 COMPARISON OF STABILITY CRITERIA ON TEST PROBLEMS In this chapter, we will compare our new dissipativitybased (DB) criteria with Liu’s criterion [6, p.1382] and Barabanov’s LMI criterion [18, p. 4554] on 23 test problems. The test neural networks are in the form of (5) in [4, p. 294], (3.7) or (4.3). We have chosen some networks that have been introduced in previous papers by other authors. We have also used new networks that we have developed. We have chosen networks based on the difficulty of determining the stability of their equilibrium points. 4.1 Sector conditions We will use networks with activation function a = tanh(n). Therefore, as part of the stability analysis, we will need to find upper and lower bounds of the sector of the following function f(u) = tanh(u + t) − tanh(t), (4.1) where t is a constant. In order to determine sector upper bounds, we will determine the argument that maximizes the value of function f(u) u . The following lemma can be used to determine the lower bound of the function (4.1). Lemma 4.1 [4, p. 298] Consider a function (4.1). If u + t ≤ r, then f(u) u ≥ ν = tanh(r)−tanh(t) r−t . If t = r, then ν = d(tanh(u)) du at u = r. There are several ways to find r, depending on the type of system. We will discuss this later. 31 Since we will be comparing our DB criteria with those of Liu and Barabanov, we will give a brief description of their criteria in the next two sections. 4.2 Existing stability criteria 4.2.1 Liu’s criterion Liu’s model is called the Discretetime Delayed Standard Neural Network Model (DDSNNMs) [6, p. 1378]: x(k + 1) = Alx(k) + Bp (ξ(k)) + Bpd (ξ(k − h)) ξ(k) = Cqx(k) + Dp (ξ(k)) + Dpd (ξ(k − h)) (4.2) Assume i(ξi) ∈ [qi, ui], ui > qi ≥ 0 and h is constant. Define Q = diag(qi) and U = diag(ui). Theorem 4.1 [6, p. 1382] The origin of the DDSNNM (4.2) is globally asymptotically stable, if there exist symmetric positivedefinite matrices P and , and diagonal positive semidefinite matrices and T, such that the following matrix is negative definite G = G11 G12 G13 G21 G22 G23 G31 G32 G33 32 where G11 =ATl PAl − P − 2CTq TQUCq G12 =ATl PBp + CTq − 2CTq TQUDp + CTq (Q + U)T G13 =ATl PBpd − 2CTq TQUDpd G21 = GT 12 G31 = GT 13 G32 = GT 23 G22 =BTp PBp + Dp + DTp + − 2DTp TQUDp − 2T + DTp (Q + U)T + T(Q + U)Dp G23 =BTp PBpd + Dpd − 2DTp TQUDpd + T(Q + U)Dpd G33 =BT pdPBpd − − 2DT pdTQUDpd 4.2.2 Barabanov and Prokhorov’s LMI criterion Consider the RNN [18, p. 4553] x(k + 1) = Dx(k) + Etanh(Wx(k) + s1). (4.3) Let z be the equilibrium point of (4.3). Then z = Dz + Etanh(Wz + s1) Define y = x − z and c = Wz + s1. Then y(k + 1) = Dy(k) + Eη(k) η(k) = tanh(σ(k) + c) − tanh(c) (4.4) where σ(k) = Wy(k). Assume ηi(σi) ∈ [νi, μi] for i = 1, 2, ...,m where m is the length of s1. Let N = diag(μi) and M = diag(νi). 33 Lemma 4.2 [18, p. 4554] If ci ≥ cj  then βij = 1 − tanh(αsign(ci)cj) 1 − tanh(αci) otherwise βij = 1 − tanh(αsign(cj)ci) 1 − tanh(αcj ) where α > 0. By Lemma 4.2, ηTG(αWy − η) ≥ 0, where G = {gij} is a symmetric positive definite matrix that satisfies gij < 0 for i ̸= j, gii + Σm j=1;j̸=i βijgij > 0 (4.5) for all i = 1, ...,m. Theorem 4.2 [18, p. 4554] Consider the system (4.3). Assume D + EMW is stable. If there exists diagonal positive definite matrix and symmetric positive definite matrices H and G, such that G satisfies the condition (4.5) and = 11 12 21 22 is negative definite, then the equilibrium point is GAS, where 11 =DTHD − H − WTNMW 12 =WT ((M + N) + Gα)/2 + DTHD 21 = T 12 22 =ETHE − − G To compute the lower bound matrix M, we will use Lemma 4.1. Thus we have to find rj such that σj + cj  ≤ rj for j = 1, 2, ...,m. In the general case for D ̸= 0 we can choose M = 0. If D = 0, then yi ≤ Σm j=1 Eij  for i = 1, 2, ..., n. Let’s define γi = Σm j=1 Eij  and rj = Σn i=1 Wjiγi. Therefore σj + cj  ≤ rj . 34 4.3 Description of test problems We will compare our new DB stability criteria with those of Liu and Barabanov on 22 different test networks. All of the networks have a GAS equilibrium point. For the first 12 networks, the DBbased criteria are able to prove stability. For the next 10 networks, DB criteria are not able to detect stability. The final test network (the 23rd test) is an LDDN network that cannot be put into Liu’s or Barabanov’s form, so their methods cannot be applied to this network. Even in those cases where the methods are not able to prove stability, we would like to measure how close they come. Each of the criteria involves finding matrices that are definite. For the three DB criteria, we attempt to find matrices P1, P2 and P3 that are positive definite. (For the test problems, we will use only P2, from Theorem 3.4, which has produced the best results.) For Liu’s and Barabanov’s methods, we are looking for G and matrices that are negative definite. To measure how close we come to finding matrix P2 that is positive definite, we will find its minimum eigenvalue, which we will label p2. If p2 is positive, then GAS is proved. If it is negative, then we cannot prove stability. However, even in this case, the closer the value is to zero, the closer the algorithm has come to identifying stability. This concept applies also to the maximum eigenvalues of G and , which we will label g and φ. If these values are negative, then GAS is proved. If they are positive, we have not proven stability. However, the closer g and φ are to zero, the closer the algorithm has come to identifying stability. All of the 23 test problems are described in the appendix. They are represented in either the standard form (3.7), in which case W1, W2, b and f will be provided, or in the Barabanov form (4.3), in which case W, E and s1 will be given (D = 0 for the Barabanov test problems). On the first 12 test problems, all matrices for the DB criterion, Liu and Barabanov’s criteria are also provided in the appendix. In the next section we analyze stability of neural networks with stable equilibrium points where our DB methods can prove stability. In the following section we focus on 35 neural networks with stable equilibrium points where these methods cannot prove stability. Finally, we will analyze the stability of an LDDN, where Liu’s and Barabanov’s methods cannot be applied. 4.4 Stable equilibria that can be proved stable In this section, we analyze the stability of equilibrium points for test problems 1 to 12. The first four test problems were first presented in previously published papers. Test problem 1 is a special case of standard form (3.7) where W2 = 0, test problem 2 has the form of (4.3) and test problems 3 and 4 are have the form of (5) in [4, p. 294]. The remaining 8 test problems are in the form of (4.3) with D = 0. We will use the new DB criteria, Liu’s criterion and Barabanov’s criterion to check stability of equilibria. Eigenvalues p2, g, and φ for each test problem are shown in Table 4.1. The equilibrium point is proven to be GAS if p2 > 0, g < 0 or φ < 0. Thus the criteria P2, G and all proved that the equilibrium points of test problems 1 through 12 are GAS. From this table we can see that our DB criterion is as tight as both Liu’s criterion [6, p. 1382] and Barabanov’s LMI criterion [18, 4554] on the first 12 test problems. Fig. 4.1 and Fig. 4.2 are representative of the types of responses that we have in these problems. We have oscillatory responses as well as overdamped responses. Table 4.2 gives an approximate measure of how quickly the systems converge to the equilibrium point from a random initial condition. Because the systems are nonlinear, these convergence times will change with the initial conditions, but these numbers are representative. In the next section we will investigate systems with GAS equilibria for which the DB stability criteria cannot determine stability. 36 TP. p2 g φ 1 0.34743 −0.57805 −0.49272 2 14.625 −147.31 −128.83 3 7.1617 −4.7596 −1.5471 4 34.342 −45.769 −75.599 5 5.9322 −61.345 −42.564 6 1.8062 −7.5758 −8.5027 7 1.1116 −34.491 −22.097 8 29.101 −136.46 −212.8607 9 59.747 −242.07 −393.64 10 19.376 −119.87 −156.07 11 6.8088 −30.292 −16.997 12 0.2592 −12.421 −8.8994 Table 4.1: Eigenvalues of matrices −0.49 −0.485 −0.48 −0.475 −0.47 −0.465 −0.46 −0.455 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x 1 x 2 Figure 4.1: Trajectory for test problem 1 37 −7 −6 −5 −4 −3 −2 −1 0 1 2 −35 −30 −25 −20 −15 −10 −5 0 5 x 1 x 2 Figure 4.2: Trajectory for test problem 2 TP. N 01 15 02 14 03 28 04 35 05 06 06 07 07 05 08 10 09 07 10 07 11 03 12 08 Table 4.2: Time steps N to convergence 38 4.5 Stable equilibria that cannot be proved stable In this section, we will use our DB criterion, Liu’s criterion and Barabanov’s LMI criterion to check stability of equilibrium points of test problems 13 to 22. All of these test neural networks are in the form of (4.3) with D = 0, except test problem 13, which is a special case of (3.7) where W2 = 0. The equilibrium points here are all GAS. The best values of p2, g and φ are shown in Table 4.3. Barabanov’s LMI criterion was the only one that was TP. p2 g φ 13 −4.35 ∗ 10−8 1.16 ∗ 10−7 4.0163 14 −4.16 ∗ 10−4 6.57 ∗ 10−8 −2.8767 15 −2.52 ∗ 10−4 7.67 ∗ 10−5 −0.24913 16 −5.63 ∗ 10−4 5.63 ∗ 10−8 1.4 ∗ 10−4 17 −3.18 ∗ 10−5 3.72 ∗ 10−8 4.47 ∗ 10−5 18 −0.0033 4.67 ∗ 10−7 2.7698 19 −0.0123 3.03 ∗ 10−7 2.53 20 −1.80 ∗ 10−4 2.29 ∗ 10−6 7.98 ∗ 10−5 21 −2.45 ∗ 10−4 4.09 ∗ 10−7 3.64 ∗ 10−5 22 −1.84 ∗ 10−5 1.71 ∗ 10−8 8.57 ∗ 10−6 Table 4.3: Eigenvalues of matrices able to prove stability of equilibria in any of the test problems, and it only worked for test problems 14 and 15. In these two networks, the matrixWis an identity matrix and the bias vector s1 = 0. Fig. 4.3 and Fig. 4.4 are representative of the responses of test problems 13 through 22. All of these responses are oscillatory. Table 4.4 shows approximate convergence times from random initial conditions. We can see that, although all of the responses are oscillatory, there is a wide range of response times  some of them in the same range as the first 12 systems (see Table 4.2). It seems clear that when the system is of higher dimension, has 39 oscillatory behavior and has slow convergence, it becomes more difficult for all methods to determine stability. However, these parameters do not completely determine the success of the various methods. In terms of the eigenvalues,  g is smallest for all test problems, except 13, 14 and 15. For test problems 18 and 19, the oscillation is longer (approximately 600 time steps) and the size of matrices E andWis bigger (2 × 10). In these cases,  g < p2 < φ. The weight matrices and bias vectors of test problems 20, 21 and 22 are the same size, but N is 40, 10 and 5, respectively. In this case,  g, p2, and φ became smaller, as shown in Table 4.3. −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 x 1 x 2 Figure 4.3: Trajectory for test problem 15 40 −2.34 −2.32 −2.3 −2.28 −2.26 −2.24 −2.22 −0.65 −0.6 −0.55 −0.5 −0.45 x 1 x 2 Figure 4.4: Trajectory for test problem 22 TP. N 13 30 14 300 15 3500 16 250 17 90 18 20 19 600 20 40 21 10 22 5 Table 4.4: Time steps N to convergence In summary, for test problems in the form of (4.3), the oscillation of system trajectories, increases in the sizes of system matrices, and slower convergence times tend to increase the difficulty of determining stability. Barabanov’s LMI criterion is less conservative than our criterion and Liu’s criterion on test problems 14 and 15 where the neural network is a very 41 special case of (4.3). 4.6 An example of stability analysis of LDDNs In this section, we analyze stability of test problem 23, which is an LDDN. We found p2 = 2.6051. Therefore, this proves that the equilibrium point is GAS. Since the function f10 is linear, α10 = β10 = 1. This doesn’t satisfy Liu’s condition ui > qi ≥ 0. So we cannot use Liu’s criterion. Since we cannot represent the LDDN in the form of (4.3), Barabanov’s LMI criterion cannot be applied to analyze stability of this network. Therefore, neither Liu’s criterion nor Barabanov’s LMI criterion can be applied to check stability of the equilibrium point. 4.7 Conclusions The second DB criterion, Theorem 3.4, is as tight as both Barabanov’s LMI criterion and Liu’s criterion for the first 12 test problems, but fails to prove stability of equilibria for test problems 13 to 22. Liu’s criterion also fails to prove stability of these GAS equilibrium points. Barabanov’s LMI criterion can prove stability of the equilibrium points for test problems 14 and 15 where W = I and s1 = 0. In general, when a system has more oscillatory responses, larger system matrices and slower convergence, all of the methods described in this paper have more difficulty in determining stability. To analyze the stability of LDDNs using either Liu’s criterion or Barabanov’s LMI criterion, we need to have state transformations which can convert the standard form (3.7) into their corresponding models: (4.2) and (4.3). This is not always possible, as in test problem 23. There do exist LDDNs that cannot be analyzed with either Liu’s or Barabanov’s methods. The DB methods developed in this paper can be applied to any LDDN network. Our DB criteria can be applied to analyze the stability of equilibrium points for neural networks of forms (1) in [4, p. 292], (3.7), and (4.3). 42 Another advantage of the DB criteria when compared with Barabanov’s method for neural networks in the form of (1) in [4, p. 292] is that the dimensionality of the matrices involved in the DB method are generally smaller. This is because of the use of the state space extension method in [4], which requires that the number of states be increased substantially in these cases. The dissipativity approach has not been used before for the stability analysis of recurrent neural networks. The results shown in this chapter demonstrate the promise of this approach. In addition, with the dissipativity method there is the potential for additional stability criteria to be developed. This is because of the flexibility in choosing the supply rate function. 43 CHAPTER 5 TRAINING RECURRENT NETWORKS FOR STABILITY In this chapter, we will apply the novel stability analysis methods presented in the previous chapter to the problem of training recurrent neural networks while maintaining stability. It has been shown [17] that the error surfaces of recurrent neural networks can have spurious valleys that can cause training difficulties. These valleys are caused by network instabilities. If we can maintain network stability during training, we could avoid the spurious valleys. In this chapter, we describe a new training method for maintaining network stability. The first step is to define a new performance index, which combines mean square error with the maximum eigenvalue of the matrix −P2 from (3.16). If this maximum eigenvalue is less than zero, the network is guaranteed to be stable. By minimizing this maximum eigenvalue, we have the best chance of maintaining stability and avoiding the spurious valleys. The next section describes the modified performance index, and the following section describes how the gradient of this performance index can be computed. The gradient is needed for the training algorithm, which finds the network weights and biases that minimize the performance index. 5.1 Modified performance index Let’s consider an LDDN network, as in (3.5) and (3.6). If we represent the network in standard form, we have the system (3.7). Assume that a set of training data is provided {p1, t1}, {p2, t2}, ..., {pq, tq}, (5.1) 44 where pi is an input to the network and the corresponding target response is ti. Now we define a modified performance index as follows J = 1 q Σq i=1 (ti − aM(i))T (ti − aM(i)) + σλ (5.2) where aM is the network output, σ is a constant and λ is the maximum eigenvalue of the matrix −P2 (3.16). Next we will compute the gradient of the performance index J with respect to weights and biases of the network. 5.2 Gradient computation In this work, we use the scaled conjugate gradient algorithm [28] to update weights and biases. Thus, the command ’trainscg’ in the Neural Network Toolbox of Matlab [29] will be modified to train LDDNs with the modified performance index (5.2). Let mse = 1 q Σi=q i=1 (ti − aM(i))T (ti − aM(i)) (5.3) Then ∂J ∂x = ∂(mse) ∂x + σ ∂λ ∂x (5.4) where x is a vector of weights and biases of the network. To calculate the first derivative of J with respect to weights and biases, we separately compute the first derivative of the mean square error (mse) and the first derivative of λ. We can use the standard backpropagation algorithm [9] to compute @(mse) @x . However, to compute the derivative of λ with respect to x, we need a novel development. In the next section, we will demonstrate how to find the derivative of an eigenvalue with respect to an element of the matrix. Then, we show how to find the derivative of the eigenvalue of −P2 with respect to network weights. 45 5.2.1 The derivative of eigenvalue In this section, a method for computation of eigenvalue derivatives is reviewed [30]. Let K(p) ∈ Cn×n be an nondefective matrix [30], where p is a scalar variable and C is the set of complex numbers. Let (p) ∈ Cn×n be the eigenvalue matrix of K(p) and X(p) ∈ Cn×n be a corresponding eigenvector matrix of K(p). Then K(p)X(p) = X(p) (p) (5.5) Assume K(p), (p) and X(p) are differentiable in a neighborhood of p = p0. Taking the first derivative both sides of the equation (5.5), we obtain K′ (p)X(p) + K(p)X′ (p) = X′ (p) (p) + X(p) ′ (p) (5.6) Left multiplying both sides of (5.6) by X−1(p) results in X−1(p)K′ (p)X(p) + X−1(p)K(p)X′ (p) = X−1(p)X′ (p) (p) + X−1(p)X(p) ′ (p) (5.7) Since K is nondefective, the eigenvectors are independent. Thus, there exists a matrix C such that X′ (p) = X(p)C (5.8) Plugging (5.8) into (5.7), we get X−1(p)K′ (p)X(p) − ′ (p) = − (p)C + C (p) (5.9) Let ′ (p) = diag(λk) for k = 1, 2, ...n, X−1(p) = [y1 y2 ... yn]T and X(p) = [x1 x2 ... xn]. Then, for k = 1, 2,..., n λ ′ k = yTk K′ (p)xk. (5.10) Now we want to take the first derivative of the maximum eigenvalue of the matrix K(p) with respect to p. Let λm be the maximum eigenvalue of K(p). Then λm = max(λi) for i = 1, 2, ... n. Thus λ ′ m = yT mK′ (p)xm. (5.11) 46 where vectors ym and xm are associated with the eigenvalue λm. Therefore, λ ′ m p=p0 = yT mK′ (p)p=p0xm. (5.12) We will use this method to compute the first derivative of the maximum eigenvalue λ with respect to weights in the next section. 5.2.2 The derivative of maximum eigenvalue In this section, we compute the first derivative of the maximum eigenvalue of the matrix −P2 with respect to weights. From (3.16), we define = −P2 = 11 12 21 22 (5.13) where 11 = − T + 1 2 (W2)T (B + A)T + 1 2 T(B + A)W2 − (W2)TBTAW2 + Q + 1 2 ((W2)T + W2), 12 = 1 2 T(A + B)W1 − (W2)TBTAW1 + 1 2 W1, 21 = T 12, 22 = − Q − (W1)TBTAW1, W1 i;j = [w1 i;j ]S×S and W2 i;j = [w2 i;j ]S×S. Let λm be the maximum eigenvalue of . In this case, is a real, symmetric matrix. So λm is a real number, depending on the weights. For convenience, let’s define Zk;l = [zi;j ]S×S, where zi;j = 1, if i = k and j = l 0, if others (5.14) For example, if i = 2, j = 3 and S = 3 then Z2;3 = 0 0 0 0 0 1 0 0 0 47 First, we need to compute the first derivative of with respect to a weight w, where w is either w1 i;j or w2 i;j . Keep in mind that w2 i;j = 0 when i ≤ j, so we only consider the case w2 i;j with i < j. We have ∂ ∂w = @ 11 @w @ 12 @w @ 21 @w @ 22 @w (5.15) For w = w1 i;j , ∂ 11 ∂w1 i;j =0 ∂ 12 ∂w1 i;j = 1 2 T(A + B)Zi;j − (W2)TBTAZi;j + 1 2 Zi;j ∂ 21 ∂w1 i;j =( ∂ 12 ∂w1 i;j )T ∂ 22 ∂w1 i;j = − (Zi;j)TBTAW1 − (W1)TBTAZi;j For w = w2 i;j , ∂ 11 ∂w2 i;j = 1 2 (Zi;j)T (B + A)T + 1 2 T(B + A)Zi;j − (Zi;j)TBTAW2 − (W2)TBTAZi;j + 1 2 ((Zi;j)T + Zi;j), ∂ 12 ∂w2 i;j = − (Zi;j)TBTAW1 ∂ 21 ∂w2 i;j =( ∂ 12 ∂w1 i;j )T ∂ 22 ∂w2 i;j =0 Then, let’s assume that we want to take the first derivative of λm at a certain point W1 = W10 and W2 = W20 . Let K = (W10 ,W20 ). Let = diag(λk) for k = 1, 2, ...n be the eigenvalues and X = [x1 x2 ... xn] be the corresponding eigenvectors of the matrix K, where n = 2 ∗ S. Assume X−1 = [y1 y2 ... yn]T . Let λm be the maximum eigenvalue, with associated xm and ym. From (5.12) and (5.15), we get ∂λm ∂w1 i;j = yT m ∂ ∂w1 i;j xm (5.16) 48 ∂λm ∂w2 i;j = yT m ∂ ∂w2 i;j xm (5.17) Finally, we can calculate the first derivative of the maximum eigenvalue with respect to any weight of the LDDN. Steps for the calculation of the derivative of the maximum eigenvalue are as follows. • GivenW1 = W10 and W2 = W20 . Compute K = (W10 ,W20 ). • Find eigenvalues and eigenvectors of the matrixK: = diag(λk) and X = [x1 x2 ... xn]. • Find X−1 = [y1 y2 ... yn]T . • Find λm = max(λk). • Compute @ m @w1 i;j  W10 ;W20 = yT m @ @w1 i;j  W10 ;W20 xm. • Compute @ m @w2 i;j  W10 ;W20 = yT m @ @w2 i;j  W10 ;W20 xm if i < j We will modify the command ’trainscg’ in the Neural Network Toolbox to train the LDDN with the modified performance index. The regular trainscg already computes the derivative of mse with respect to the weights and biases. Now we will use the steps above to compute the derivative of the maximum eigenvalue with respect to weights. We will add this part to the regular trainscg with a penalty parameter σ. The modified trainscg will be used to train controller networks in next chapter. 49 CHAPTER 6 SIMULATIONS AND TEST RESULTS FOR STABLE TRAINING In the previous chapter, we proposed a modified recurrent neural network training algorithm for maintaining network stability. In this chapter, two examples will be used to demonstrate the method. These examples are both model reference control (MRC) systems. The controllers of these systems are LDDNs. We will use the modified algorithm to train the controller networks while maintaining system stability. In the next section, a brief introduction to MRC systems is given. In the following section, a controller network for a linear MRC system will be trained. In the final section, we will train a controller network for a nonlinear MRC system. 6.1 Model reference control (MRC) using recurrent neural networks In this section, we provide a brief description of neural networkbased MRC (NNbased MRC) systems. An MRC system has the general structure shown in Fig. 6.1. In this figure, the plant model is chosen such that the plant model output is as close to the plant output as possible. The reference model represents the desired response of the closedloop system. The controller is designed so that the plant output closely tracks the output of the reference model. 50 Plant Plant Model Controller Reference Model Reference Input Control Input Model Error Control Error Plant Output Figure 6.1: Model reference control system Based on this idea, NNbased MRC systems were introduced in [31]. The NNbased MRC structure is shown in Fig. 6.2, where the plant model and the controller are neural networks. In order to design the NNbased MRC controller, first we have to train the NN plant model using the data observed from the input and the output of the plant. Then we have to choose a reference model whose response represents the desired behaviour of the plant. We will collect a training data set from the reference model. Then, we train the NN controller so that the control error is small enough while the MRC system remains stable. In the next section, we will show how to do the plant training. Plant NN Plant Model NN Controller Reference Model Reference Input Control Input Model Error Control Error Plant Output Figure 6.2: NNbased model reference control system 51 6.1.1 Plant training The plant training process includes two stages. The first stage is to train the openloop network (onestepahead training) and the final stage is to train the closed loop network. First, we create a training data set. An input signal P, a series of step functions with a random magnitude and random width, will be generated. This input signal is applied to the plant. At the same time, we sample the plant output T. The sequences P and T will be used as data for plant training. The NN plant model is shown in Fig. 6.3. This network consists of two layers with tanh transfer function in the first layer and linear transfer function in the second layer. The number of neurons in the output layer depends on the number of outputs of the plant. In our examples, the plant is single input and single output. So there is one neuron in the second layer and one input to the network. f1 IW1,1 b1 T D L f2 LW2,1 b2 LW1,2 T D L a a2(t) 1(t) 1 1 p(t) Figure 6.3: Neural network plant model Next, we perform onestepahead training for the NN plant. To do this, we cut the feedback loop and use the network output as the second input to the network. The open loop network is shown in Fig. 6.4. The sequence P is applied to the first input and the sequence T is applied to the second input. The corresponding network output a2 will be compared with the target T. This model error will be used to update weights and biases. 52 f1 IW1,1 b1 T D L f2 LW2,1 b2 IW1,2 T D L a a2(t) 1(t) 1 1 P T Figure 6.4: Network architecture for onestepahead plant training After training the open loop network, we close the network by connecting the network output to the second input. The closed loop network is the original network shown in Fig. 6.3. We use the trained weights and biases from the one stepahead training as initial weights and biases for the twostepahead training. Now we need to prepare the data for training. Assume that d is the maximum number of delays in the network. The sequences P and T will be divided into subsequences with a length of d + 2. Each subsequence of P will be applied to the input of the closed loop network, and the corresponding network output will be compared to the corresponding T subsequence. The model error will be used to update the weights and biases. We will do the same thing for kstephead training with k ≥ 3, but each preceeding subsequence has d + k data points, and the initial weights and biases are taken from the preceeding k − 1 stepahead training. This training process will be ended when the subsequence has the same length as the original P sequence. The weights and biases from the final training will be used to do the controller training in the next section. 6.1.2 Controller training In this section, we will show how to train the controller network of NNbased MRC systems. First, an NN controller is created. The controller network and the plant network will be combined as in Fig. 6.5. In this figure, the NN plant includes layers 3 and 4. Its 53 weights and biases are taken from the plant training, and they are not adjusted during controller training. Layers 1 and 2 make up the NN controller. In our examples, the controller network has two inputs: the first input is the reference input and the other input is the NN plant output. Since we will use the modified performance index in (5.2) to train the controller network, which involves the matrix P2 of (3.16), we have to find matrices Q, T and . Initial weights and biases of the controller network will be chosen as small random numbers or zeros for the second layer. This will increase the chance of getting stable weights and biases. From these initial weights and biases of the controller network, with the trained weights and biases of the NN plant, we will compute the initial Q, T and matrices. These matrices will be recalculated after each kstepahead training segment. f3 LW3,2 b3 T D L f4 LW4,3 b4 LW3,4 T D L a a4(t) 3(t) 1 f1 IW1,1 b1 T D L f2 LW2,1 b2 LW1,4 T D L a2a (t) 1(t) LW1,2 T D L p(t) Figure 6.5: Neural network plant model and neural network controller Next, we will do one stepahead training. To do this, two output feedback loops will be opened. Thus the output becomes the second input of the network, as in Fig. 6.6. Therefore, LW1;4 becomes IW1;2, LW3;4 becomes IW3;2. The training data will be generated from the reference model. When we train this open loop network, the weights and biases in layers 3 and 4 are kept unchanged, and only the weights and biases in the first two layers are updated. 54 f3 LW3,2 b3 T D L f4 LW4,3 b4 IW3,2 T D L a a4(t) 3(t) 1 f1 IW1,1 b1 T D L f2 LW2,1 b2 IW1,2 T D L a2a (t) 1(t) LW1,2 T D L P T Figure 6.6: Open loop network for controller training After one stepahead training, we will do k stepahead training as we did for the plant training. We will use the network in Fig. 6.5 for k stepahead training. Thus, IW1;2 becomes LW1;4 and IW3;2 becomes LW3;4. When we train the controller network, the weights and biases of the plant network are kept constant. 6.2 Design the MRC for a linear system In this section, we design an NN controller for a linear plant using the modified trainscg. Our object here is to illustrate and to verify the proposed method. 6.2.1 Plant model The linear plant that we have chosen to demonstate the modified algorithm with is G(z) = z−1 1 − z−1 + 0.25z−2 (6.1) The NN representation for this plant is n1(k) = IW1;1u(k − 1) + LW1;1[a1(k − 1) a1(k − 2)]T 55 a1(k) = n1(k) (6.2) where u(k) is the input to the NN plant, IW1;1 = 1, LW1;1 = [1 − 0.25] and a1 is the NN plant output. The plant network is shown in Fig. 6.7. It has one neuron and the activation function is linear. p(t) f1 IW1,1 1 LW 1 1,1 2 a1(t) Figure 6.7: The plant network In this example, we don’t need to train the NN plant because it is known. The next step is to choose a reference model. 6.2.2 Model reference We choose the following continuoustime reference model: G(s) = 144 s2 + 24s + 144 (6.3) We sample this model every 0.01 sec to generate a training data set. The reference input and the output are shown in Fig. 6.8. The next step is to create an NN controller and train it. 56 0 200 400 600 800 1000 1200 1400 1600 1800 2000 1 0.5 0 0.5 1 Reference Input P 0 200 400 600 800 1000 1200 1400 1600 1800 2000 1 0.5 0 0.5 1 Targer Output T Figure 6.8: The reference input and the target 6.2.3 Controller training In this section, we will train a controller network using the modified trainscg. The NN controller has one neuron with a linear activation function, delays 1 and 2 from the neuron output, delay 1 from the input, and delays 1 and 2 from the network output. The NNbased MRC system is shown in Fig. 6.9. 57 f1 IW 1 1,1 LW1 1,2 2 a1p(t) (t) f2 LW2,1 1 LW1 2,2 2 a2(t) LW1 1,1 2 Figure 6.9: NNbased MRC system for the linear plant The NN controller will be trained with different values for the penalty term coefficient σ. By increasing σ, we can increase the weight on λ and force the system to be stable. We use random weights in the stable area as initial weights of the controller network. First, small random weights are chosen. Then we check the stability of the network. If it is unstable, we multiply all these weights by a number less than 1 and check stability again. We keep doing this until the system is stable. From these initial stable weights and the weights of the NN plant, we find the matrices Q, T and for onestepahead training. After each k stepahead training stage, we recalculate these matrices from the current weights of the network. Table 6.1 shows the maximum closed loop pole magnitude (MPM), the maximum eigenvalue (λm) of −P2, the mean square error (MSE) and the maximum absolute error (MAE) after 1998 stepahead training for different values of σ. It can be seen that when σ increases, the maximum pole magnitude goes down, which indicates greater stability margin, and the error as well as the mean square error goes up. For all cases, MPM < 1 and λm < 0, which means that the system is stable. In conclusion, the modified trainscg works well for the linear NNbased MRC system. In the next section, we will demonstrate the algorithm for a nonlinear physical system. 58 σ MPM MAE MSE λm 10−5 0.8489 0.1247 4.1 ∗ 10−5 −3.6 ∗ 10−5 10−4 0.8428 0.1247 5.5 ∗ 10−5 −2.4 ∗ 10−5 10−3 0.8279 0.1247 9.3 ∗ 10−5 −2.8 ∗ 10−5 10−2 0.3560 0.2640 7.1 ∗ 10−4 −6.9 ∗ 10−5 10−1 0.3751 0.3980 1.5 ∗ 10−3 −2.7 ∗ 10−5 Table 6.1: MPM, MAE, MSE and λm after 1998 step ahead training 6.3 Design the MRC for a magnetic levitation system In this section, we design an NN controller for a magnetic levitation system using the modified trainscg. First, we introduce a magnetic levitation system. Then we train the plant using the LevenbergMarquardt algorithm (trainlm in [29]). Finally, we train the NN controller network using the modified trainscg. 6.3.1 Magnetic levitation system The magnetic levitation system is shown in Fig. 6.10. This is a simplified version of the MAGLEV train system [32]. y(t) i(t) S N Figure 6.10: The magnetic levitation system 59 In this system, the magnet is placed above the electromagnet. It can only move in the vertical direction. y(t) is the distance of the magnet from the electromagnet. i(t) is the current flowing in the electromagnet. The equation of motion can be represented as follows [29] d2y(t) dt = −g + α M i2(t) dt − β M dy(t) dt (6.4) where M is the mass of the magnet, g is the gravitational constant, β is a viscous friction coefficient and α is a field strength constant. This is a nonlinear system with one input and one output. The input is the current and the output is the position of the magnet. Our goal is to control the position of the magnet such that it tracks a target. 6.3.2 Plant training We use Equation (6.4) to generate a training data set for plant training. The parameters are M = 3, g = 9.8, β = 12 and α = 15. The data training includes 4000 data points. It is shown in Fig. 6.11, where P is the control input, which is applied to the plant and the target T is the corresponding output of the plant. 60 0 500 1000 1500 2000 2500 3000 3500 4000 2 0 2 4 Control Input P 0 500 1000 1500 2000 2500 3000 3500 4000 0 2 4 6 Target T Figure 6.11: Control input and target The NN plant model is shown in Fig. 6.3. It includes 10 neurons in the first layer, three delays in the input, and two delays in the feedback output. We use trainlm [29] to consecutively do from onestepahead training up to 3997stepahead training. After the final training, we have the performance index as in Fig. 6.12, the network output and the error as in Fig. 6.13. 61 0 50 100 150 200 250 300 350 400 450 500 2.265 2.27 2.275 2.28 2.285 2.29 2.295 2.3 2.305 2.31 x 10 7 Number of iterations Performance Index Performance Index Figure 6.12: Performance Index 0 500 1000 1500 2000 2500 3000 3500 4000 0 1 2 3 4 5 6 Net Output and Target 0 500 1000 1500 2000 2500 3000 3500 4000 2 1 0 1 2 3 x 10 3 Error Network output Target Figure 6.13: The network output, the target and the error In summary, the model error is very small. The maximum error is less than 3 ∗ 10−3, so the trained NN plant is accurate enough. It will be used during training of the NN controller in the next section. During the controller training, the weights and biases of the plant network are kept constant. 62 6.3.3 Controller training First, we have to choose a reference model. In this example, the reference model is chosen as follows G(s) = 9 s2 + 6s + 9 (6.5) An input sequence of step functions with random magnitude and random width is generated. This input P is applied to the input of the reference model (6.5). Then we sample the system output with a sampling time Ts = 0.01. The sampled output T is used as the target. The reference input and the target are shown in Fig. 6.14. 0 200 400 600 800 1000 1200 1400 1600 1800 2000 2 2.5 3 3.5 4 Reference input P 0 200 400 600 800 1000 1200 1400 1600 1800 2000 2 2.5 3 3.5 4 Target T Figure 6.14: The reference model input and output (target) This training data P and T will be used to train an NN controller. In this example, the NN controller is chosen as in Fig. 6.5. We choose 10 neurons and tanh as the activation function in the first layer, with two delays from the network output, two delays from the second layer and two delays in the input. The second layer has one neuron with linear activation function. We use modified trainscg to train the controller network, with different values for the penalty parameter σ. Small initial random weights and biases are used for the NN controller 63 for onestepahead training. Using these initial weights and biases, and the trained weights and biases of the plant network, we compute the matrices Q, T and . After each kstepahead training stage, these matrices will be updated. The maximum eigenvalue is shown in Table 6.2, and the maximum absolute error is shown in Table 6.3, after kstepahead training with different σ. σ 10−2 10−4 10−6 10−8 10−10 100 stepahead 0.0559 0.1006 0.3188 0.5182 0.3313 200 stepahead 0.0494 0.1412 0.6069 0.7375 0.4175 300 stepahead 0.0413 0.1245 0.7575 0.8497 0.4645 400 stepahead 0.0413 0.1424 0.8380 0.9247 0.5063 Table 6.2: The maximum eigenvalue with different σ σ 10−2 10−4 10−6 10−8 10−10 100 stepahead 0.1116 0.0790 0.0421 0.0403 0.0399 200 stepahead 0.0645 0.0465 0.0334 0.0353 0.0244 300 stepahead 0.0769 0.0396 0.0286 0.0286 0.0281 400 stepahead 0.0476 0.0202 0.0165 0.0174 0.0141 Table 6.3: The maximum absolute error with different σ We would expect that the error would increase as σ increases, because more weight is being placed on the maximum eigenvalue, and therefore relatively less weight is being placed on the error. This is clearly shown in Table 6.3. We would also expect that the maximum eigen value of −P2 would decrease as σ increases. This general trend is seen in Table 6.2. This pattern is not as clear, because the maximum eigenvalues in each entry in Table 6.2 are not exactly comparable. In each case, the weights and biases were different, and so the Q, T and matrices were also different. The following figures demonstrate that the modified trainscg algorithm works effectively. Figure 6.15 shows a typical plot of mean square error versus iteration. Figure 6.16 64 shows the maximum eigenvalue versus iteration, and Figure 6.17 shows the combined performance index versus iteration. Although MSE and maximum eigenvalue may sometimes increase, the combined performance index always decreased in all of our test cases. 0 5 10 15 20 25 30 35 40 45 50 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5 x 10 6 Mean square error Iterations Figure 6.15: The mean square error with σ = 10−6 after 10stepahead training 65 0 5 10 15 20 25 30 35 40 45 50 0.2 0.22 0.24 0.26 0.28 0.3 0.32 0.34 0.36 Maximum eigenvalue Iterations Figure 6.16: The maximum eigenvalue with σ = 10−6 after 10stepahead training 66 0 5 10 15 20 25 30 35 40 45 50 4.65 4.7 4.75 4.8 4.85 4.9 4.95 5 5.05 5.1 x 10 6 Iterations Performance Index Performance Index Figure 6.17: The combined performance index with σ = 10−6 after 10stepahead training 6.4 Conclusions We proposed a novel training algorithm for maintaining system stability. It is demonstrated through two examples: the linear plant and the magnetic levitation system. The results show that it is possible to train recurrent neural networks for stability using the new training algorithm. This has been only a demonstration of the potential use of our novel dissipativity criteria for stable training of RNNs. Future work will be needed to refine the algorithm to achieve consistent stable training. 67 CHAPTER 7 CONCLUSIONS AND FUTUREWORK 7.1 Conclusions In this work, we have used dissipativity theory to analyze the stability of a general class of discretetime dynamic neural networks, called Layered Digital Dynamic Networks (LDDNs). To our knowledge, this is the first time that dissipativity theory has been applied to the analysis of stability in discretetime neural networks. The application of dissipativity theory requires the selection of supply rate functions. The flexibility in choosing the supply rate allows the development of a variety of stability criteria. In this report, we have developed three different sets of stability criteria, based on three different choices for the supply rate function. We do not claim that these sets are the best that can be obtained. However, the use of dissipativity theory for the analysis of LDDNs opens up the possibility for additional criteria to be developed. We have tested our dissipativity based (DB) criteria on a wide variety of recurrent neural networks and have compared the results with two other stateoftheart methods. We have analyzed the performance of the various criteria on cases where they perform well and also on cases where they fail to perform. All of the methods tend to perform worse as the network responses oscillate more, have larger system matrices and take longer to converge. Our DB criterion performed at least as well as Liu’s criterion [6, p. 1382] on all of the networks that we tested. In two of 22 cases, Barabanov’s method [18, p. 4554] was able to determine stability when the DB criterion was not able to. These two cases represented systems that were in a form upon which the Barabanov method was designed. The DB methods described in this work were derived for a general recurrent network 68 structure  the LDDN. There are LDDN architectures to which the Liu and Barabanov methods cannot be applied (test problem 23, for example). In these cases, only our DB methods are appropriate (of the three methods analyzed for this report). However, the same can also be said of the DDSNNM architecture of Liu. There are certain DDSNNM structures that cannot be represented in the LDDN format or in the recurrent network structure used by Barabonov [18, p. 4553]. Each method is best suited to the architecture for which it was designed. We have proposed a new training method using the DB criterion to train recurrent neural networks for stability. The standard performance index is modified with an additional term consisting of the maximum eigenvalue of the matrix −P2 multiplied by a constant σ. The important thing is to compute the first derivative of the modified performance index with respect to weights and biases. We use the standard backpropagation algorithm to compute the gradient of the mean square error. Then, we show how to compute the gradient of the maximum eigenvalue with respect to the network weights. By combining these two results, we have the gradient of the modified performance index with respect to the network weights. The weights can be updated by using any gradientbased learning algorithm. In this work, we use the scaled conjugate gradient algorithm, which is already implemented in the Neural Network Toolbox. The modified algorithm was tested on two examples of NNbased MRC systems. The tests demonstrated the potential of the modified algorithm to produce stable training. 7.2 FutureWork One area where the stability analysis of recurrent networks is very important is neural network control. After a neural network controller has been designed, it is important to verify that the closed loop control system is stable. Also, it would be desirable to maintain the stability of the closed loop system throughout the training process. This is because there exist spurious valleys in the error surfaces of recurrent networks in regions where the network 69 is unstable. We can avoid these valleys by maintaining stability during training. So, our future work will focus on improving the proposed training method for nonlinear systems and developing new DBbased stability criteria, which are less conservative. We will also use other gradientbased learning algorithms to implement the stable training method. 70 CHAPTER 8 APPENDIX (Test Problems) 8.1 Test Problem 01 Consider the one layer network given in [4, p. 300], which can be put in standard form with W1 = 0.5 0.0333 −0.0355 1.1882 −2.2687 ,W2 = 0.5 0 0 0 0 , b = 0.5[−1.0092 3.5970]T and f = [tanh tanh]T . The DB criterion: A = diag(0.76323, 0.12656), B = diag(0.93757, 0.87661), T = diag(1000, 122.55), = diag(100, 0) and Q = 933.72 −16.855 −16.855 51.898 . Liu’s criterion: Q = diag(0.76323, 0.12656), U = diag(0.93757, 0.87661), = diag(1000, 0), T = diag(1000, 154.6), P = 41.699 −67.822 −67.822 130.97 and G = 57.374 0.36672 0.36672 0.58042 . Barabanov’s criterion: M = diag(0.76323, 0.12656), N = diag(0.93757, 0.87661), = diag(1000, 207.55), H = 190.56 −43.735 −43.735 87.827 ,G = 1000 −10−10 −10−10 4.2315 ∗ 10−10 , and β = 1 4.2315 4.2315 1 . 71 8.2 Test Problem 02 Consider the network in [6, p. 1388]. This network can be put into the standard form with W1 = 0 0 −0.5 −1 0 0 −0.01 −0.5 0 0 0.2 0 0 0 0 0.8 ,W2 = 0 0 0 0 0 0 0 0 1 0.1 0 0 0.1 1 0 0 b = [−7 7 0 0]T and f = [tanh tanh id id]T . The DB criterion: A = diag(0, 0, 1, 1), B = diag(1, 1, 1, 1), T = diag(109.78, 941.44, 979.18, 1000), = diag(0, 0, 10−6, 10−6) and Q = 18.554 −4.6636 −4.2217 −0.43615 −4.6636 441.78 −38.317 −15.767 −4.2217 −38.317 42.684 17.189 −0.43615 −15.767 17.189 224.46 . Liu’s criterion: Q = diag(0, 0), U = diag(1, 1), = diag(3.5715 ∗ 10−8, 228.76), T = diag(504.19, 1000), P = 302.39 185.76 185.76 1000 and G = 147.31 −0.023467 −0.023467 149.78 . Barabanov’s criterion: M = diag(0, 0), N = diag(1, 1), = diag(333.07, 1000), H = 201.44 73.981 73.981 724.64 ,G = 162.79 −10−10 −10−10 552.05 , and β = 1 3.1009 ∗ 109 3.1009 ∗ 109 1 . 72 8.3 Test Problem 03 Consider the two layer network given in [4, p. 301]. The standard form weight matrices are W1 = 0.5 −1.3482 −1.8825 1.5 0.5 −0.7464 −0.5695 1.2 −0.1 0 0 0.4904 −0.7599 0 0 −1.4697 −1.4608 , W2 = 0.5 0 0 0 0 0 0 0 0 2 0.2 0 0 −0.5 1.40 0 0 , b = 0.5 0.3 −0.5 −1 1 and f = [tanh tanh tanh tanh]T . The DB criterion: A = diag(0.32231, 0.41816, 0.25498, 0.2636), B = diag(0.99439, 0.93639, 0.90631, 0.9752), T = diag(290.19, 1000, 418.16, 380.73), = diag(0, 0, 10−6, 0.00039036) and Q = 134.2 3.3473 −100 24.895 3.3473 363.83 −100 −9.6501 −100 −100 171.86 34.766 24.895 −9.6501 34.766 120.92 . Liu’s criterion: Q = diag(0.32231, 0.41816, 0.25498, 0.2636), U = diag(0.99439, 0.93639, 0.90631, 0.9752), = diag(0, 0, 0, 0), T = diag(142, 564.36, 211.17, 189.88), P = 108.4 59.729 −99.995 19.375 59.729 220.9 −100 2.8786 −99.995 −100 170.43 35.055 19.375 2.8786 35.055 119.51 , 73 and G = 20.459 −29.246 −8.954 4.4233 −29.246 106.92 −24.559 2.0016 −8.954 −24.559 45.595 −11.396 4.4233 2.0016 −11.396 8.211 . Barabanov’s criterion: M = diag(0.32231, 0.41816, 0.25498, 0.2636), N = diag(0.99439, 0.93639, 0.90631, 0.9752), = diag(636.85, 1000, 1000, 165.85), H = 363.39 −98.097 −16.095 43.689 −18.712 −16.793 105.52 −106.22 −98.097 951.22 42.764 −63.609 −1.0098 −72.852 −229.65 25.213 −16.095 42.764 374.06 65.911 −421.84 86.45 −38.807 −26.445 43.689 −63.609 65.911 758.61 −208.24 −57.617 34.07 −71.827 −18.712 −1.0098 −421.84 −208.24 861.48 −55.45 45.91 17.913 −16.793 −72.852 86.45 −57.617 −55.45 287.95 11.896 −4.9001 105.52 −229.65 −38.807 34.07 45.91 11.896 309.59 26.527 −106.22 25.213 −26.445 −71.827 17.913 −4.9001 26.527 269.63 , G = 103.55 −33.567 −1.3562 −16.466 −33.567 628.26 −1.7904 −135.58 −1.3562 −1.7904 208.28 −1e − 010 −16.466 −135.58 −1e − 010 381.65 , and β = 1 2.2319 2.7202 1.2334 2.2319 1 1.2188 2.5446 2.7202 1.2188 1 3.1013 1.2334 2.5446 3.1013 1 . 74 8.4 Test Problem 04 Consider the example given in [33, p. 1778]. Representing this system in standard form we get W1 = 0.2753 −0.0306 0.2967 −0.2277 0.1844 −0.3387 0.1676 −0.0663 0 0 0.6428 0.2309 0 0 −0.1106 0.5839 ,W2 = 0 0 0 0 0 0 0 0 0.3064 −0.0631 0 0 0.2937 0.2769 0 0 , and f = [tanh tanh tanh tanh]T . The DB criterion: A = diag(0.81975, 0.84453, 0.6808, 0.67383), B = diag(1, 1, 1, 1), T = diag(1000, 756.64, 1000, 1000), = diag(8.4132 ∗ 10−9, 1.3655 ∗ 10−7, 3.9446 ∗ 10−8) and Q = 274.24 91.103 −68.509 −68.402 91.103 251.03 −99.277 15.481 −68.509 −99.277 280.07 43.601 −68.402 15.481 43.601 205.53 . Liu’s criterion: Q = diag(0.81975, 0.84453, 0.6808, 0.67383), U = diag(1, 1, 1, 1), T = diag(1000, 770.69, 1000, 1000), = diag(0, 0, 0, 0), P = 265.47 −28.351 21.634 −100 −28.351 403.35 −80.933 46.384 21.634 −80.933 434.49 75.265 −100 46.384 75.265 362.42 , and G = 167.45 −70.26 −55.17 14.998 −70.26 280.63 21.786 −95.94 −55.17 21.786 72.125 −5.1197 14.998 −95.94 −5.1197 96.704 . 75 Barabanov’s criterion: M = diag(0.81975, 0.84453, 0.6808, 0.67383), N = diag(1, 1, 1, 1), = diag(1000, 954.94, 1000, 998.52), H = 1000 −190.37 −1.7249 −125.4 51.001 −65.832 −53.517 173.12 −190.37 805.55 143.72 −44.463 −60.447 −33.642 −90.638 193.26 −1.7249 143.72 686.94 −208.69 −198.63 85.043 22.886 −102.38 −125.4 −44.463 −208.69 482.96 51.316 −88.32 93.608 −59.962 51.001 −60.447 −198.63 51.316 908.23 −108.71 −50.821 44.838 −65.832 −33.642 85.043 −88.32 −108.71 884.85 27.148 24.569 −53.517 −90.638 22.886 93.608 −50.821 27.148 623.88 −74.082 173.12 193.26 −102.38 −59.962 44.838 24.569 −74.082 763.18 , G = 1000 −10−6 −4.2819 −10−6 −10−6 625.22 −35.158 −12.784 −4.2819 −35.158 1000 −10−6 −10−6 −12.784 −10−6 810.79 , and β = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 . 8.5 Test Problem 05 E = 0.4498 −1.3460 0.6169 0.3715 ,W = −1.1407 −0.4336 −1.0933 −0.1685 , s1 = [−0.2185 0.5413]T and f = [tanh tanh id id]T . The DB criterion: A = diag(0.21589, 0.073411, 1, 1), B = diag(0.91974, 0.68839, 1, 1), T = diag(198.75, 614.03, 1000, 1000), = diag(0, 0, 10−6, 10−6) and Q = 104.97 −51.538 −52.084 −95.512 −51.538 632.6 394.96 −99.792 −52.084 394.96 400.31 −10.052 −95.512 −99.792 −10.052 175.29 . 76 Liu’s criterion: Q = diag(0.21589, 0.073411), U = diag(0.91974, 0.68839), = diag(1.2472 ∗ 10−6, 0), T = diag(1000, 770.69, 1000, 1000), P = 580.65 99.456 99.456 167.37 and G = 66.297 −1.183 −1.183 61.627 . Barabanov’s criterion: M = diag(0.21589, 0.073411), N = diag(0.91974, 0.68839), = diag(388.59, 1000), H = 310.09 42.138 42.138 68.108 ,G = 4.803 ∗ 10−6 −10−6 −10−6 0.00093555 , and β = 1 4.803 4.803 1 . 8.6 Test Problem 06 E = −2.1382 −1.0618 0.0389 −0.2408 −0.2108 −1.0018 0.7125 −0.3998 ,WT = −0.0544 −2.7524 −0.0989 −1.2134 −0.1434 1.0171 0.2366 −0.4306 , s1 = [0.6501 − 0.4852]T and f = [tanh tanh id id id id]T . The DB criterion: A = diag(0.33079, 2.3982 ∗ 10−6, 1, 1, 1, 1), B = diag(0.81069, 0.25584, 1, 1, 1, 1), T = diag(99.97, 125.78, 1000, 956.55, 998.82, 961.05), 77 = diag(0.00034784, 0, 10−6, 10−6, 10−6, 0.00077597) and Q = 2461.1 1109.6 1011.2 29.791 14.364 144.84 1109.6 2117.3 743.8 231.45 926.58 609.29 1011.2 743.8 567.01 47.012 29.205 328.99 29.791 231.45 47.012 265.26 121.42 −35.132 14.364 926.58 29.205 121.42 812.93 −43.762 144.84 609.29 328.99 −35.132 −43.762 695.36 . Liu’s criterion: Q = diag(0.33079, 2.3982 ∗ 10−6), U = diag(0.81069, 0.25584), = diag(0.36978, 0), T = diag(1000, 164.49), P = 155.88 −20.026 −91.828 −95.885 −20.026 128.2 67.815 32.503 −91.828 67.815 113.02 71.445 −95.885 32.503 71.445 236.97 and G = 20.837 −0.55989 −0.55989 7.5994 . Barabanov’s criterion: M = diag(0.33079, 2.3982∗10−6), N = diag(0.81069, 0.25584), = diag(1000, 339.64), H = 317.03 133.95 −381.52 −54.939 133.95 957.13 −299.75 −241.94 −381.52 −299.75 674.12 −27.545 −54.939 −241.94 −27.545 714.63 ,G = 1000 −10−6 −10−6 0.017865 , and β = 1 17865 17865 1 . 8.7 Test Problem 07 E = −0.5367 −0.8914 1.1566 1.0866 1.1402 −0.1332 ,WT = 0.1399 −1.3558 −0.2022 −0.6691 1.3142 1.0448 , 78 s1 = [−0.7165 − 0.8795]T and f = [tanh tanh id id id]T . The DB criterion: A = diag(0.08737, 0.007409, 1, 1, 1), B = diag(0.76153, 0.55182, 1, 1, 1), T = diag(210.36, 99.142, 987.86, 959.92, 1000), = diag(0, 0.06456, 10−6, 10−6, 10−6, 0.00077597) and Q = 429.74 600.34 587.49 −33.662 −65.315 600.34 886.98 824.87 −79.6 −60.587 587.49 824.87 873.36 −22.622 −74.159 −33.662 −79.6 −22.622 41.952 −31.658 −65.315 −60.587 −74.159 −31.658 127.32 . Liu’s criterion: Q = diag(0.08737, 0.007409), U = diag(0.76153, 0.55182), = diag(0, 3.7044 ∗ 10−7), T = diag(1000, 443.53), P = 994.17 288.8 242.55 288.8 154.79 −99.992 242.55 −99.992 1000 and G = 35.827 −9.5597 −9.5597 102.88 . Barabanov’s criterion: M = diag(0.08737, 0.007409), N = diag(0.76153, 0.55182), = diag(1000, 461.85), H = 797.85 408.86 −4.1171 408.86 320.72 −171.01 −4.1171 −171.01 592.92 ,G = 10−6 6.4083 −1.0002 −1.0002 6.4146 , and β = 1 6.4073 6.4073 1 . 8.8 Test Problem 08 E = 0.4498 −1.3460 0.6169 0.3715 ,W = 0.2692 0.7177 −0.0074 0.8009 , 79 s1 = [−0.2586 0.0537]T and f = [tanh tanh id id]T . The DB criterion: A = diag(0.45842, 0.74239, 1, 1), B = diag(0.94416, 0.98544, 1, 1), T = diag(1000, 1000, 994.81, 1000), = diag(0, 0, 10−6, 10−6) and Q = 45.541 −83.784 −65.824 2.8525 −83.784 819.24 585.23 −100 −65.824 585.23 518.02 29.31 2.8525 −100 29.31 411.68 . Liu’s criterion: Q = diag(0.45842, 0.74239), U = diag(0.94416, 0.98544), = diag(0, 0), T = diag(1000, 1000), P = 210.56 160.82 160.82 1000 and G = 299.36 −78.169 −78.169 173.97 . Barabanov’s criterion: M = diag(0.45842, 0.74239), M = diag(0.94416, 0.98544), = diag(1000, 1000), H = 262.69 125.53 125.53 1000 ,G = 800.06 −10−6 −10−6 1000 , and β = 1 1.4003 1.4003 1 . 8.9 Test Problem 09 E = −0.4376 0.6711 0.2333 −0.3892 −0.1496 0.3565 0.8825 0.0271 ,WT = 0.8775 0.2470 0.3328 0.7495 0.2635 0.7885 0.2241 0.1221 , s1 = 0 and f = [tanh tanh id id id id]T . 80 The DB criterion: A = diag(0.5986, 0.67847, 1, 1, 1, 1), B = diag(1, 1, 1, 1, 1, 1), T = diag(949.76, 1000, 1000, 1000, 1000, 1000), = diag(0, 0, 10−6, 100, 10−6, 10−6) and Q = 111.28 13.247 25.708 21.974 −28.664 −59.065 13.247 188.85 −36.728 163.94 −70.283 −95.483 25.708 −36.728 325.53 70.29 48.782 50.854 21.974 163.94 70.29 620.19 161.65 −92.585 −28.664 −70.283 48.782 161.65 435.83 114.96 −59.065 −95.483 50.854 −92.585 114.96 195.98 . Liu’s criterion: Q = diag(0.5986, 0.67847), U = diag(1, 1), = diag(4.9935 ∗ 10−5, 0), T = diag(1000, 1000), P = 978.57 335.7 167.82 171.9 335.7 999.77 677.6 122.11 167.82 677.6 1000 105.3 171.9 122.11 105.3 289.55 and G = 248.26 −6.0496 −6.0496 247.97 . Barabanov’s criterion: M = diag(0.5986, 0.67847), M = diag(1, 1), = diag(1000, 991.51), H = 1000 241.22 194.23 155.82 241.22 970.92 577.69 97.549 194.23 577.69 991.47 91.865 155.82 97.549 91.865 437.93 ,G = 1000 −240.56 −240.56 912.39 , and β = 1 1 1 1 . 81 8.10 Test Problem 10 E = 0.1820 −1.5142 −0.0995 0.0070 −0.3681 1.2971 −0.1194 1.1440 ,WT = 0.2361 0.5642 0.6413 0.5412 0.2559 0.7715 0.1664 0.2860 , s1 = 0 and f = [tanh tanh id id id id]T . The DB criterion: A = diag(0.726, 0.3722, 1, 1, 1, 1), B = diag(1, 1, 1, 1, 1, 1), T = diag(1000, 1000, 994.43, 1000, 1000, 1000), = diag(0, 0, 10−6, 100, 10−6, 10−6) and Q = 273.56 −73.734 −29.437 93.543 −58.562 48.331 −73.734 1766.5 1130.1 36.435 −69.631 −83.295 −29.437 1130.1 897.18 105.85 97.635 61.019 93.543 36.435 105.85 650.02 87.074 6.3594 −58.562 −69.631 97.635 87.074 313.06 2.5765 48.331 −83.295 61.019 6.3594 2.5765 208.29 . Liu’s criterion: Q = diag(0.726, 0.3722), U = diag(1, 1), = diag(0.036388, 0), T = diag(1000, 1000), P = 652.64 286.48 620.53 339.77 286.48 951.01 455.72 103.38 620.53 455.72 999.99 292.86 339.77 103.38 292.86 403.84 and G = 462.6 −37.436 −37.436 123.96 . Barabanov’s criterion: M = diag(0.726, 0.3722),M = diag(1, 1), = diag(942.82, 1000), H = 816.33 578.16 598.65 522.47 578.16 825.05 476.38 489.74 598.65 476.38 1000 286.54 522.47 489.74 286.54 685.9 ,G = 415.61 −363.06 −363.06 1000 , 82 and β = 1 1 1 1 . 8.11 Test Problem 11 E = −1.5369 −1.4479 2.0182 0.6986 ,W = 0.5301 1.1132 −0.7163 0.3181 , s1 = [−2.0962 − 0.3127]T and f = [tanh tanh id id]T . The DB criterion: A = diag(0.00053451, 0.0053027, 1, 1), B = diag(0.38817, 0.40828, 1, 1), T = diag(1000, 350.31, 821.55, 1000), = diag(0, 0, 10−6, 10−6) and Q = 1673.5 793.22 676.56 −91.493 793.22 1189.6 891.01 227.33 676.56 891.01 737.18 221.63 −91.493 227.33 221.63 252.43 . Liu’s criterion: Q = diag(0.00053451, 0.0053027), U = diag(0.38817, 0.40828), = diag(0, 0), T = diag(1000, 457.88), P = 87.837 73.387 73.387 288.62 and G = 113.99 −59.337 −59.337 72.36 . Barabanov’s criterion: M = diag(0.00053451, 0.0053027), N = diag(0.38817, 0.40828), = diag(1000, 435.69), H = 229.27 258.81 258.81 412.7 ,G = 10−6 1.575501 −1 −1 1.575501 , and β = 1 1.5755 1.5755 1 . 83 8.12 Test Problem 12 E = −1.7083 −0.3869 1.2134 −1.0272 ,W = 0.0337 0.3472 −1.2111 −1.0742 , s1 = [0.4811 − 0.0876]T and f = [tanh tanh id id]T . The DB criterion: A = diag(0.58581, 0.026454, 1, 1), B = diag(0.98733, 0.68489, 1, 1), T = diag(860.8, 173.26, 997.37, 1000), = diag(0, 0, 10−6, 10−6) and Q = 1855.5 362.01 939.2 −93.541 362.01 1028.2 635.64 661.26 939.2 635.64 776.97 322.28 −93.541 661.26 322.28 527.61 . Liu’s criterion: Q = diag(0.58581, 0.026454), U = diag(0.98733, 0.68489), = diag(0, 0), T = diag(995.2, 611.75), P = 513.77 295.37 295.37 262.31 and G = 16.938 1.7133 1.7133 13.071 . Barabanov’s criterion: M = diag(0.58581, 0.026454), N = diag(0.98733, 0.68489), = diag(1000, 563.99), H = 256.06 141.33 141.33 130.33 ,G = 0.0049901 −10−6 −10−6 8.3245 ∗ 10−6 , and β = 1 8.3245 8.3245 1 . 84 8.13 Test problem 13 Consider the network given in [8] x(k + 1) = tanh(Wx(k)) where W1 = 0.5893 −0.4047 0.3142 0.3133 −0.5308 1.0074 −0.7935 0.7659 0.2278 0.0204 −1.0197 −0.0221 0.1484 0.1643 0.8982 1.1161 −0.7743 0.4514 −0.8473 −0.0883 0.6870 −1.0181 0.0379 −0.5418 −0.6798 W2 = 0, b = 0 and f = [tanh tanh tanh tanh tanh]T . 8.14 Test problem 14 Given a network (4.3) whereW = diag(1, 1), s1 = [0 0]T and E = −1.0510 1.6516 −1.3141 1.1566 8.15 Test problem 15 Given a network (4.3) whereW = diag(1, 1), s1 = [0 0]T and E = 0.4498 −1.3460 0.6169 0.3715 . 8.16 Test problem 16 Given a network (4.3) where E = −0.0176 1.4660 −1.9825 −0.3525 ,W = 0.7133 0.5571 0.7637 0.5651 , and s1 = [−0.1768 1.5514]T . 85 8.17 Test problem 17 Consider a network (4.3) where E = 0.4889 −0.1699 1.9743 1.4636 ,W = 0.3325 −1.1325 −0.6563 1.9766 , and s1 = [−0.6537 1.722]T . 8.18 Test problem 18 Given a network (4.3) where E (Column(1 : 5)) = −0.1442 −0.1923 −0.2243 0.5341 0.2312 1.2841 0.7068 0.6862 −0.8575 1.2335 , E (Column(6 : 10)) = 0.8103 0.2148 −0.3047 0.7268 0.2490 −1.1181 −0.6498 −0.8863 0.5849 1.3329 , WT (Column(1 : 5)) = −0.0822 −0.6546 0.1519 −0.6173 −1.1329 −0.5581 −0.3620 −1.4393 −0.4941 −0.5955 , WT (Column(6 : 10)) = 0.51552 0.5686 −2.8238 −0.066408 0.26577 0.37745 −0.40056 −1.1589 0.16272 0.52101 , and s1 = [0.8594 − 0.1774 − 0.6582 0.2043 1.4457 − 0.4587 0.2136 1.5275 0.1615 − 0.2330]T . 86 8.19 Test problem 19 Given a network (4.3) where E = −1.2676 −0.0239 −1.1729 −0.5952 0.9547 −0.9055 −1.7919 −1.0372 −1.3526 0.3340 , WT = −0.2504 −2.6603 1.3109 0.4479 1.1745 −0.0003 −1.6884 −0.3583 0.4195 −0.4836 , and s1 = [−0.5025 − 1.6517 1.0859 − 0.4030 0.5661]T . 8.20 Test problem 20 Given a network (4.3) where E = 0.1526 0.9272 −2.0829 0.3752 ,W = 1.8267 0.4274 −0.2127 0.2953 , and s1 = [−0.4759 1.1288]T . 8.21 Test problem 21 Given a network (4.3) where E = 1.1618 1.5530 −1.2990 −0.6542 ,W = −0.4199 0.0067 1.0924 −2.5687 , and s1 = [0.3535 0.5172]T . 87 8.22 Test problem 22 Given a network (4.3) where E = 0.0207 −2.3595 −0.8107 0.0597 ,W = −0.9855 0.9763 −0.0560 −1.2719 , and s1 = [−0.2746 1.2021]T . 8.23 Test problem 23 W1 (Column(1 : 5)) = 0.1465 −0.1059 0.1594 0.2925 0.2751 −0.3257 0.1052 0.1820 −0.2141 −0.2609 −0.3717 0.1741 −0.0173 0.2439 −0.2634 0.0899 0.1541 0.0439 0.3267 0.3954 0.0868 −0.3327 −0.3032 −0.2145 −0.0482 −0.3874 −0.0365 −0.0394 −0.2085 −0.1280 −0.3869 −0.0465 0.1727 −0.3602 −0.1486 −0.2479 −0.1174 0.3143 −0.3373 −0.1079 0.0695 −0.2771 −0.1815 0.1127 −0.0854 −0.3539 0.1405 −0.1962 −0.2473 0.0732 88 W1 (Column(6 : 10)) = −0.3042 0.3735 −0.1231 −0.0394 −0.0799 −0.3695 0.1319 −0.2672 −0.0702 −0.2410 −0.0331 0.2963 −0.2755 0.3213 0.1002 0.2959 −0.3921 −0.2471 −0.3955 0.1867 0.3474 −0.2904 −0.0620 −0.1621 −0.0993 −0.1884 0.2550 0.2848 −0.3607 −0.3921 −0.2718 −0.0559 −0.0078 0.1545 −0.0641 0.2983 0.3123 0.2527 0.1201 0.2029 −0.2097 0.1879 −0.0314 0.3864 0.2351 0.1167 0.1499 −0.0341 0.0421 0.3360 W2 (Column(1 : 5)) = 0 0 0 0 0 0.2758 0 0 0 0 −0.1058 0.0966 0 0 0 0.1850 −0.2449 0.3238 0 0 0.0554 0.1054 −0.2125 0.0390 0 0.3453 −0.1318 0.1244 −0.0865 0.1019 0.1593 −0.0823 −0.0691 0.1242 0.2701 −0.0598 0.0757 0.0526 0.1732 0.0090 −0.2513 0.1605 0.3862 0.2453 0.1629 −0.1077 −0.2880 0.0534 0.2584 0.1392 89 W2 (Column(6 : 10)) = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 −0.1027 0 0 0 0 0.2211 −0.0085 0 0 0 −0.0120 −0.3083 0.1319 0 0 0.3996 0.3693 −0.3529 −0.1118 0 b = 0(10×1) and f = [tanh tanh tanh tanh tanh tanh tanh tanh tanh id]T . A = diag(0, 0, 0, 0, 0, 0, 0, 0, 0, 1), B = diag(1, 1, 1, 1, 1, 1, 1, 1, 1), Q (Column(1 : 5)) = 409.2071 −60.7473 9.4582 22.6040 104.9155 −60.7473 318.8918 −69.5846 146.0832 −4.6166 9.4582 −69.5846 478.7037 −48.1748 131.9870 22.6040 146.0832 −48.1748 445.2587 −5.3633 104.9155 −4.6166 131.9870 −5.3633 375.6876 0.7304 −9.4876 −66.3357 79.2393 −16.4141 −34.6683 −62.3295 39.5300 33.3255 −38.3926 66.2225 16.2987 6.8482 −21.4796 −27.1391 −8.1199 −24.9707 −150.3311 −35.8364 −121.1125 −26.8860 94.6296 −51.7739 −57.8724 −16.2736 90 Q (Column(6 : 10)) = 0.7304 −34.6683 66.2225 −8.1199 −26.8860 −9.4876 −62.3295 16.2987 −24.9707 94.6296 −66.3357 39.5300 6.8482 −150.3311 −51.7739 79.2393 33.3255 −21.4796 −35.8364 −57.8724 −16.4141 −38.3926 −27.1391 −121.1125 −16.2736 362.5196 −43.3246 −14.7978 63.5450 −86.0946 −43.3246 449.1086 47.0527 125.4525 −71.5817 −14.7978 47.0527 346.4056 −22.5932 −67.1660 63.5450 125.4525 −22.5932 505.1451 70.8772 −86.0946 −71.5817 −67.1660 70.8772 370.0100 = diag(1, 1, 1, 1, 1, 1, 1, 1, 1, 10−5) and T = 103diag(1, 0.4131, 0.7992, 0.6997, 0.7026, 0.6775, 0.6711, 0.9937, 0.7843, 0.4969). 91 BIBLIOGRAPHY [1] L. Jin, P. N. Nikiforuk, and M. M. Gupta, “Absolute stability conditions for discretetime recurrent neural networks,” IEEE Trans. on Neural Networks, 1994. [2] K. Tanaka, “An approach to stability criteria of neural network control systems,” IEEE Trans. on Neural Networks, 2002. [3] J. A. K. Suykens, J. Vandewalle, and B. L. R. D. Moor, “Nlq theory: Checking and imposing stability of recurrent neural networks for nonlinear modelling,” IEEE Trans. Signal Processing, 1997. [4] N. E. Barabanov and D. V. Prokhorov, “Stability analysis of discretetime recurrent neural networks,” IEEE Trans. on Neural Networks, 2002. [5] N. E. Barabanov and D. V. Prokhorov, “A new method for stability analysis of nonlinear discretetime systems,” IEEE Trans. on Automatic Control, 2003. [6] M. Liu, “Delayed standard neural network models for control systems,” IEEE Trans. Neural Networks, 2007. [7] R. Jafari and M. T. Hagan, “Global stability analysis using the method of reduction of dissipativity domain,” Proceedings of international joint conference on neural networks, 2011. [8] K. Kim, E. R. Patron, and R. D. Braatz, “Standard representation and stability analysis of dynamic artificial neural networks: A unified approach,” Proceedings of IEEE symposium computeraided control system design, 2011. 92 [9] O. D. Jesus and M. T. Hagan, “Backpropagation algorithms for a broad class of dynamic networks,” IEEE Trans. Neural Networks, 2007. [10] J. A. K. Suykens, J. Vandewalle, and B. L. R. D. Moor, “Lure systems with multilayer perceptron and recurrent neural networks: Absolute stability and dissipativity,” IEEE Trans. Automatic Control, 1997. [11] J. C. Williem, “Dissipative dynamical systems part i: General theory,” Arch. Rational Mech. Anal., 1972. [12] J. C. Williem, “Dissipative dynamical systems part ii: Linear systems with quadratic supply rates,” Arch. Rational Mech. Anal., 1972. [13] D. J. Hill and P. J. Moylan, “The stability of nonlinear dissipative systems,” IEEE Trans. Automatic Control, 1976. [14] D. J. Hill and P. J. Moylan, “Stability results for nonlinear feedback systems,” Automatica, 1977. [15] W. M. Haddad and V. Chellaboina, Nonlinear dynamical systems and control: A Lyapunovbased approach. Princeton University Press, 1st ed., 2008. [16] L. Jin and M. M. Gupta, “Stable dynamic backpropagation learning in recurrent neural networks,” International journal of robust and nonlinear control, 1999. [17] J. Horn, O. D. Jesus, and M. T. Hagan, “Spurious valleys in the error surface of recurrent networks  analysis and avoidance,” IEEE Trans. Neural Networks, 2007. [18] N. E. Barabanov and D. V. Prokhorov, “Global stability analysis of discretetime recurrent neural networks,” Proceedings of the American Control Conference, 2001. [19] J. Zhao and D. J. Hill, “Dissipativity theory for switched systems,” IEEE Trans. on Automatic Control, 2008. 93 [20] W. M. Haddad and Q. Hui, “Dissipativity theory for discontinuous dynamical systems: Basic input, state, and output properties, and finitetime stability of feedback interconnections,” Journal of Nonlinear Analysis: Hybrid systems, 2009. [21] V. Chellaboina and W. Haddad, “Stability margins of discretetime nonlinear nonquadratic optimal regulators,” International Journal of System Science, 2002. [22] V. Chellaboina, W. Haddad, and A. Kamath, “A dissipative dynamical system to stability analysis of time delay systems,” International Journal of Robust and Nonlinear Control, 2004. [23] C. A. Jacobson, A. M. Stankovic, G. Tadmor, and M. A. Stevens, “Towards a dissipativity framework for power system stabilizer design,” IEEE Trans. on Power Systems, 1996. [24] B. Brogliato, R. Lozano, B. Maschke, and O. Egeland, Dissipative systems analysis and control: Theory and applications. Springer, 2nd ed., 2007. [25] S. Hu and J. Wang, “Global stability of a class of discretetime recurrent neural networks,” IEEE Trans. Circuits Syst. I: Fundam. Theory Appl., 2002. [26] L.Wang and Z. Xu, “Sufficient and necessary conditions for global exponential stability of discretetime recurrent neural networks,” IEEE Trans. on Circuits and Systems I: Regular Papers, 2006. [27] S. Hu and J. Wang, “Global robust stability of a class of discretetime interval neural networks,” IEEE Trans. Circuits Syst. I: Reg. Papers, 2006. [28] M. F. Moller, “A scaled conjugate gradient algorithm for fast supervised learning,” Neural Networks, vol. 6, pp. 525533, 1993. [29] M. H. Beal, M. T. Hagan, and H. B. Demuth, “Neural network toolbox user’s guide,” Matlab, 2012. 94 [30] N. P. V. D. AA, H. G. T. Morsche, and R. R. M. Mattheij, “Computation of eigenvalue and eigenvector derivatives for a general complexvalued eigensystem,” A publication of the International Linear Algebra Society, vol. 16, pp. 300314, 2007. [31] M. T. Hagan, H. B. Demuth, and O. D. Jesus, “An introduction to the use of neural networks in control systems,” International journal of robust and nonlinear control, 2002. [32] R. M. Goodall, “The theory of electromagnetic levitation,” Physics in Technology, 1985. [33] N. E. Barabanov and D. V. Prokhorov, “Two alternative stability criteria for discretetime rmlp,” Proceedings of the 41st IEEE conference on Decision and Control, 2002. 95 Name: Nam Hoai Nguyen Date of Degree: December, 2012 Institution: Oklahoma State University Location: Stillwater, Oklahoma Title of Study: STABILITY ANALYSIS OF RECURRENT NEURAL NETWORKS USING DISSIPATIVITY Pages in Study: 95 Candidate for the Degree of Doctor of Philosophy Major Field: Electrical Engineering The purpose of this work is to describe how dissipativity theory can be used for the stability analysis of discretetime recurrent neural networks and to propose a training algorithm for producing stable networks. Using dissipativity theory, we have found conditions for the globally asymptotic stability of equilibrium points of Layered Digital Dynamic Networks (LDDNs), a very general class of recurrent neural networks. The LDDNs are transformed into a standard interconnected system structure, and a fundamental theorem describing the stability of interconnected dissipative systems is applied. The theorem leads to several new sufficient conditions for the stability of equilibrium points for LDDNs. These conditions are demonstrated on several test problems and compared to previously proposed stability conditions. From these novel stability criteria, we propose a new algorithm to train stable recurrent neural networks. The standard mean square error performance index is modified to include stability criteria. This requires computation of the derivative of the maximum eigenvalue of a matrix with respect to neural network weights. The new training algorithm is tested on two examples of neural networkbased model reference control systems, including a magnetic levitation system. ADVISOR’S APPROVAL: Dr. Martin Hagan VITA Nam Hoai Nguyen Candidate for the Degree of Doctor of Philosophy Dissertation: STABILITY ANALYSIS OF RECURRENT NEURAL NETWORKS USING DISSIPATIVITY Major Field: Electrical Engineering Biographical: Education: Received the B.S. degree from Hanoi University of Science and Technology, Hanoi, Vietnam, 2002, in Electrical Engineering. Received the M.S. degree from Hanoi University of Science and Technology, Hanoi, Vietnam, 2005, in Electrical Engineering. Completed the requirements for the degree of Doctor of Philosophy in Electrical Engineering at Oklahoma State University, Stillwater, Oklahoma in December, 2012. Experience: Worked for Thai Nguyen University of Technology, Thai Nguyen, Vietnam as a lecturer from 2002 to 2009. 



A 

B 

C 

D 

E 

F 

I 

J 

K 

L 

O 

P 

R 

S 

T 

U 

V 

W 


