

small (250x250 max)
medium (500x500 max)
Large
Extra Large
large ( > 500x500)
Full Resolution


A GENERALIZED RULE ANTECEDENT STRUCTURE FOR TSK TYPE OF DYNAMIC MODELS: STRUCTURE IDENTIFICATION AND PARAMETER ESTIMATION By MING SU Bachelor of Science East China University of Science and Technology Shanghai, People’s Republic of China 2000 Master of Science East China University of Science and Technology Shanghai, People’s Republic of China 2003 Submitted to the Faculty of the Graduate College of the Oklahoma State University in partial fulfillment of the requirements for the Degree of DOCTOR OF PHILOSOPHY December, 2009 ii A GENERALIZED RULE ANTECEDENT STRUCTURE FOR TSK TYPE OF DYNAMIC MODELS: STRUCTURE IDENTIFICATION AND PARAMETER ESTIMATION Dissertation Approved: Dr. R. Russell Rhinehart Dissertation Adviser Dr. Karen A. High Dr. James R. Whiteley Dr. Martin T. Hagan Dr. Gary G. Yen Dr. A. Gordon Emslie Dean of the Graduate College iii ACKNOWLEDGMENTS I would like to express my sincere gratitude to my research advisor, Dr. R. Russell Rhinehart for his guidance, instruction, knowledge sharing and encouragement over years. He is ready all the time to discuss research problems and give advice. The conversations with Dr. Rhinehart have always been inspirational and joyful. His knowledge, analytical skills, creative thinking and sharp questions broaden my view on many problems. His advising is not limited to research problems. He taught me how to ask questions and sharpened my conversation and writing skills. He guided me how to approach a problem from scratch and organize time effectively to do research. I would also like to thank my committee members, Dr. James R. Whiteley, Dr. Karen A. High, Dr. Gary G. Yen, and Dr. Martin T. Hagan for their insightful comments and suggestions as well as warming support and encouragement. I thank Dr. Yen for the time and effort that he spent on helping me preparing academic papers. I thank Dr. Hagan for his help and time to guide me studying Neural Networks and System Identification, I want to thank Dr. A. H. Johannes from Chemical Engineering for his help over years to improve my presentation and teaching skills. I want to acknowledge Mr. Nittin Sharma, Mr. Pedro de Lima, Ms. Preetica Kumar, and Mr. Garov Aurora for their help and contribution to this project. I wish to extend my thanks to the Edward E. and Helen Turner Bartlett Foundation for financial support. Last, but not least I am thankful to my wife, Yumiao for her unconditional patience and support, and her absurd confidence in me. I am also thankful to my 2year old daughter, Serena, who has always been a comfort to me when I feel frustrated. My special appreciation goes to my parents, Wenxin Su and Minghua Du for their understanding, caring and love. iv TABLE OF CONTENTS Chapter Page I. INTRODUCTION ......................................................................................................1 II. REVIEW OF LITERATURE....................................................................................7 2.1 Literature Survey for Dynamic Order Determination ........................................7 2.2 Literature Survey for Fuzzy Model Structure ..................................................10 2.3 Literature Survey for Fuzzy Model Identification ...........................................12 2.3.1 Variable Selection ...................................................................................12 2.3.2 Fuzzy Model Identification .....................................................................13 III. A GENERALIZED RULE ANTECEDENT STRUCTURE .................................15 3.1 Model Complexity ...........................................................................................15 3.2 Antecedent Dimension .....................................................................................16 3.3 Antecedent Structure ........................................................................................18 3.3.1 A Generalized Antecedent Structure ......................................................18 3.3.2 Interpretation of the Proposed Structure .................................................21 3.4 SISO Models ....................................................................................................23 3.4.1 Model Parameters ...................................................................................24 3.4.2 Model Computation ................................................................................25 3.5 Extension to MIMO Models ............................................................................26 IV. DYNAMIC ORDER DETERMINATION AND NONLINEAR COMPOENT DETECTION .........................................................................................................30 4.1 Dynamic Order Determination ........................................................................31 4.1.1 Nonlinearity Representation ...................................................................31 4.1.2 Recursive Estimation for Time Varying Parameters ..............................35 4.1.3 Sequential Nearest Neighbor Rearrangement .........................................37 4.1.4 Model Comparison Criterion ..................................................................45 4.1.5 Regressor Selection Procedure ...............................................................48 4.2 Nonlinear Component Detection .....................................................................50 4.3 Extension to MIMO Processes.........................................................................51 4.4 Simulations and Discussions ...........................................................................51 4.4.1 Testing Models and Processes ................................................................51 4.4.2 Testing on Dynamic Order Determination .............................................63 4.4.3 Testing on Nonlinear Component Detection ..........................................70 v Chapter Page V. PARAMETER ESTIMATION FOR GTSK MODELS .........................................73 5.1 Parameter Estimation by Newton’s Method ....................................................73 5.1.1 A Constrained Optimization Problem .....................................................73 5.1.2 Interpretation of Local Optimal Solutions ..............................................78 5.1.3 Random Parameter Initialization ............................................................80 5.2 Parameter Estimation for MIMO GTSK Models.............................................83 5.3 Overview of the Proposed Parameter Initialization .........................................84 5.4 A Splitting and Regression Problem ................................................................88 5.4.1 Description of the Splitting and Regression Problem .............................88 5.4.2 SRP is Not a Clustering Problem ............................................................89 5.4.3 Analysis of the Splitting and Regression Problem .................................92 5.5 Solving of the Splitting and Regression Problem ............................................99 5.5.1 Initialization of Data Segregation ...........................................................99 5.5.2 Solving for a Linear Boundary .............................................................105 5.5.3 Boundary Refinement ...........................................................................109 5.5.4 Testing and Demonstration ...................................................................110 5.5.5 Comparison to Other Methods ..............................................................118 5.6 Extension to MultipleOutput Processes ........................................................125 5.7 Recursive Partition by Growing a Binary Tree ..............................................128 5.8 Removal of Insignificant Partitions by Trimming a Tree ..............................130 5.9 Rule Antecedent Parameter Estimation .........................................................135 VI. RESULTS FOR TESTING PROBLEMS ...........................................................138 6.1 Function Approximation ................................................................................138 6.2 Dynamic Fuzzy Modeling..............................................................................161 VII. SUMMARY, CONCLUSIONS AND FUTURE RESEARCH RECOMMENDATIONS .....................................................................................185 7.1 Summary ........................................................................................................185 7.2 Conclusions ....................................................................................................186 7.3 Future Research Recommendations ...............................................................188 REFERENCES ..........................................................................................................191 vi LIST OF TABLES Table Page 4.1 A segment of 10 data samples in time sequence ....................................................42 4.2 SNNR rearranged data for the timesequence data in Table 4.1 ............................42 4.3 MSE for a recursive estimation..............................................................................44 4.4 Regressor forward selection for data in Figure 4.22 ..............................................64 4.5 Exhaustive search on nu with ny=2, d=1 for data in Figure 4.22 ..........................65 4.6 Regressor forward selection for data in Figure 4.22 using timesequence data ....65 4.7 Regressors determined for deterministic versions of Models 15 .........................66 4.8 Regressors determined for Models 15 ..................................................................66 4.9 Regressors selection for Model 7 ...........................................................................67 4.10 Regressors selection for Model 8 .........................................................................67 4.11 Regressors determined for Models 1013 ............................................................68 4.12 Regressors determined for the two phase flow process .......................................68 4.13 Results of order determination for the distillation column ..................................69 4.14 Exhaustive search on ny for y2 ............................................................................69 4.15 Exhaustive search for nonlinear components for Model 1 ..................................70 4.16 Results for nonlinear component detection for Models 1~5 ...............................70 4.17 Exhaustive search for nonlinear components for Model 4 ..................................71 4.18 Details of nonlinear component detection for Model 9 .......................................72 4.19 Nonlinear components detected for the two phase flow process .........................72 4.20 Choices of nonlinear components for the distillation column .............................72 5.1 Solution for a SRP using different τ values .........................................................120 5.2 The number of rules and SSE resulted from different M ...................................130 5.3 The value αc for branch nodes shown in Figure 5.39 ..........................................134 6.1 Trials of antecedent variables for Model 1 .........................................................162 vii Table Page 6.2 Trials of a GTSK model for Model 1...................................................................162 6.3 Comparison of the GTSK with RB and FFNN for Model 1 ................................166 6.4 Trials of a GTSK model for Model 3...................................................................167 6.5 Trials of antecedent variables for Model 3 ..........................................................169 6.6 Trials of a GTSK model for Model 4...................................................................170 6.7 Trials of a GTSK model for Model 4 with all regressors included .....................172 6.8 Trials of a GTSK model for the two phase process .............................................174 6.9 Training and validation results for the GTSK model on y1 .................................176 6.9 Training and validation results for the GTSK model on y2 .................................180 viii LIST OF FIGURES Figure Page 3.1 The ellipsoid contour of TA .................................................................................... 18 3.2 Antecedent space partition and representation ....................................................... 19 3.3 A rotated local region covered by a horizontal or vertical ellipsoid ...................... 20 3.4 A rotated local region covered many small ellipsoids ............................................ 20 3.5 A rotated local region covered by a rotated ellipsoid ............................................. 20 3.6 Model parameters for a 4rule GTSK model .......................................................... 24 4.1 Exponential weighting with α = 0.95 ...................................................................... 36 4.2 Data generated from the model in Equation (4.25)................................................. 41 4.3 Time varying parameters a1(t) and b0(t) in Equation (4.4) ..................................... 41 4.4 SNNR Rearranged regressors from the timesequence data in Figure 4.1 ............. 43 4.5 Varying parameters for the SNNR reaaranged data ............................................... 44 4.6 α vs. α4/(1 α) .............................................................................................................. 48 4.7 Exhaustive dynamic order search ........................................................................... 48 4.8 Inputoutput data generated for Model 1 in Equation (4.40) .................................. 52 4.9 Inputoutput data generated for Model 2 in Equation (4.41) .................................. 53 4.10 Inputoutput data generated for Model 3 in Equation (4.42) ................................. 53 4.11 Inputoutput data generated for Model 4 in Equation (4.43) ................................. 54 4.12 Inputoutput data generated for Model 5 in Equation (4.44) ................................. 54 4.13 Data generated for Model 10 in Equation (4.50) ................................................... 56 4.14 Data generated for Model 12 in Equation (4.52) ................................................... 57 4.15 The two phase flow experiment setup ................................................................... 58 4.16 The schematic diagram for the two phase flow experiment .................................. 59 4.17 Water flowrate measurements with set point at 20 lbmol/hr ................................. 59 ix Figure Page 4.18 A choice of input and output channels ................................................................... 60 4.19 A choice of input and output channels ................................................................... 61 4.20 Reflux flowrate (solid line) and boiler heat rate (dash line) .................................. 62 4.21 The xD and xB in distillation column experiments ................................................. 63 4.22 Data generated for the determinist version of model in Equation (4.42) ............... 63 5.1 Antecedent space defined by antecedent variables u(t9) and y(t3) ...................... 81 5.2 Evaluation of function in Equation (5.28) .............................................................. 82 5.3 An antecedent space partitioned by three linear boundaries ................................... 85 5.4 An iterative procedure to partition an antecedent space ......................................... 86 5.5 Antecedent space partition by a regression tree ...................................................... 87 5.6 Parameters to be estimated in solving a SRP .......................................................... 88 5.7 Data samples for Equation (5.41) ........................................................................... 90 5.8 Data samples in antecedent space for Equation (5.41) ........................................... 91 5.9 A linear boundary based on data distribution ......................................................... 91 5.10 A linear boundary according to function nonlinearity ............................................ 92 5.11 Illustration of Equation (5.42) with different τ ...................................................... 93 5.12 A linear boundary generated for liner separable data .......................................... 107 5.13 A linear nonseparable case ................................................................................. 107 5.14 A liner nonseparable example with equally mixed points .................................. 108 5.15 Clusters found by LVQ for data in Figure 5.14 ................................................... 109 5.16 Flowchart for solving a SRP ................................................................................ 110 5.17 a) Initialization of data segregation for Equation (5.95); b) A linear separation boundary found for the initial data segregation ....................................................... 111 5.18 a) Initialization of data segregation for Equation (5.96); b) Initial linear boundary and its variation over iteration ................................................................................. 112 5.19 a) Initialization of data segregation for Equation (5.97); b) Initial linear boundary and its variation over iteration ................................................................................. 113 5.20 An initial data segregation for Equation (5.97) fails a SVM solver .................... 114 5.21 Clusters recognized using LVQ for the initial segregation in Figure 5.20 .......... 114 5.22 Initial boundary from clusters in Figure 5.21 and its variation in iterations ........ 115 x Figure Page 5.23 Liner boundary solved for a threepiece piecewise function ............................... 115 5.24 Linear boundary solved for a quadratic function ................................................. 116 5.25 SSE with respect to the separation locations for the quadratic function .............. 116 5.26 Initial linear boundary and its variation over iteration......................................... 117 5.27 SSE with respect to the separation locations for Sin(x) ....................................... 117 5.28 Objective function converges using Newton’s method to solve a SRP ............... 119 5.29 Converged objective function value with respect to τ. ........................................ 120 5.30 NelderMead algorithm to solve a SRP ............................................................... 121 5.31 SSE with respect to the separation locations for Equation (5.97) ........................ 122 5.32 Separation locations for Equation (5.97) by NelderMead method ..................... 123 5.33 Illustration of the function in Equation (5.98) ..................................................... 123 5.34 SSE with respect to separation locations for Equation (5.98) ............................. 124 5.35 Separation locations for Equation (5.98) by NelderMead method ..................... 124 5.36 Separation locations for Equation (5.98) by the proposed SRP solver ................ 125 5.37 Antecedent partition using different M .............................................................. 129 5.38 The branch Bt3 from Figure 5.5(a) ....................................................................... 131 5.39 The tree structure for the antecedent partition in Figure 5.37 (c) ........................ 133 5.40 Antecedent space partition after removing splits under branch nodes t5, t6 and t7 in Figure 5.39 ............................................................................................................... 134 5.41 Antecedent space partitions after remove some unimportant splits ..................... 135 5.42 A local region in an antecedent space .................................................................. 136 6.1 Values of αc for antecedent space partition for Equation (6.1) ............................. 139 6.2 Antecedent space partition and TAs based on Equation (6.1) .............................. 140 6.3 Function approximation by the 8rule GTSK model in Figure 6.2....................... 141 6.4 Normalized TAs for those in Figure 6.2. .............................................................. 142 6.5 Optimized TAs from initialization in Figure 6.2. ................................................. 143 6.6 Function approximation by the optimized 8rule GTSK model ........................... 143 6.7 Normalized TAs for those in Figure 6.5 ............................................................... 144 6.8 Optimized TAs starting from random initialization .............................................. 145 6.9 Function approximation by the 8rule GTSK model in Figure 6.8....................... 145 xi Figure Page 6.10 Antecedent space partition and TAs on Equation (6.3) ....................................... 146 6.11 Values of αc for antecedent space partition for Equation (6.4) ............................ 147 6.12 a) Antecedent space partition by αc > 117; b) Ellipsoids (TA = 0.05) ................. 147 6.13 Normalized TAs for those in Figure 6.12. ........................................................... 148 6.14 Normalized TAs for the leftfront rule in Figure 6.13 ......................................... 149 6.15 Quadratic function approximation by the fuzzy model in Figure 6.12 ................ 149 6.16 Antecedent space partition by αc > 130 ............................................................... 150 6.17 A portion of αc in Figure 6.11 with values less than 118 .................................... 150 6.18 a) Antecedent space partition by αc > 3; b) Ellipsoids (TA = 0.05) ..................... 151 6.19 Quadratic function approximation by the fuzzy model in Figure 6.18 ................ 151 6.20 A portion of αc shown in Figure 6.11 with values less than 3 .............................. 152 6.21 Quadratic function approximation by the GTSK model in Figure 6.22 ............. 153 6.22 Optimized TAs for a 16rule GTSK model from random initialization .............. 153 6.23 Illustration of the function in Equation (6.5) and its contour plot ....................... 154 6.24 Values of αc for antecedent space partition on Equation (6.5) ............................. 155 6.25 a) Antecedent space partition by αc > 0.1;b) Ellipsoids (TA=0.05) ..................... 155 6.26 Optimized TAs of a 8rule GTSK model for Equation (6.5) ............................... 156 6.27 Approximation of Equation (6.5) by the GTSK model in Figure (6.26) ............. 156 6.28 Illustration of function in Equation (6.6) and its contour plot ............................. 157 6.29 Values of αc for antecedent space partition on Equation (6.6) ............................. 158 6.30 a) Antecedent space partition by αc>0.81; b) Ellipsoids (TA=0.05) .................... 158 6.31 Function approximation by the model in Figure 6.30 and the contour ................ 159 6.32 Values of αc for antecedent space partition for Equation (6.6) with quadratic local models ...................................................................................................................... 159 6.33 a) Antecedent space partition by αc >1.5; b) Ellipsoids (TA=0.05) ..................... 160 6.34 Function approximation by the model in Figure 6.33 and the contour ............... 160 6.35 Antecedent space partition and TAs for Model 1 ................................................ 163 6.36 The separation boundaries shown for the nonlinear part in Model 1 .................. 163 6.37 Coefficients for local models in the GTSK model in Figure 6.35 ....................... 165 6.38 Twodimension antecedent space for Model 3 .................................................... 167 xii Figure Page 6.39 a) Antecedent space partition by αc > 10; b) Ellipsoids (TA=0.05) ..................... 168 6.40 Coefficients for local models in the fuzzy model in Figure 6.39......................... 169 6.41 Antecedent space partition and TAs for Model 4 ................................................ 171 6.42 Coefficients for local models in the GTSK model in Figure 6.41 ....................... 171 6.43 Twodimension antecedent space for the twophase process .............................. 173 6.44 Validation data set for the twophase flow process ............................................. 173 6.45 a) Antecedent space for two phase flow process; b) Ellipsoids (TA=0.05) ......... 174 6.46 Coefficients for local models in the GTSK model in Figure 6.45 ....................... 175 6.47 Twodimension antecedent space for y1 of the distillation column ..................... 177 6.48 a) Antecedent space partition output y1of the distillation column; b) Ellipsoids (TA=0.05) ................................................................................................................ 177 6.49 Coefficients for local models in the GTSK model in Figure 6.48 ....................... 178 6.50 Antecedent space partition and TAs for y1 of the distillation column ................. 179 6.51 Coefficients for local models in the GTSK model in Figure 6.50 ....................... 179 6.52 Twodimension antecedent space for y2 of the distillation column ..................... 180 6.53 a) Antecedent space partition output y2 of the distillation column; b) Ellipsoids (TA=0.05) ................................................................................................................ 181 6.54 Antecedent space partition and TAs for y2 of the distillation column ................. 181 6.55 Coefficients for local models in the GTSK model in Figure 6.50 ....................... 182 6.56 Twodimension antecedent space for the MIMO(2,2) GTSK model .................. 183 6.57 Antecedent space partition for the MIMO(2,2) GTSK model ............................. 183 1 CHAPTER I INTRODUCTION Efforts to describe chemical processes exist in various forms. Preferentially, based on idealized and simplified understanding of the underlying mechanism, firstprinciples models are developed. Many of these models have been standardized in commercial software such as ChemCAD for education or AspenPlus for prototyping process design. However, hardly can an idealized firstprinciples model find its application in practice; because, often, some artificial factors (like tray efficiency in a distillation column) have to be introduced to augment an ideal model to improve modeling accuracy via tuning against experiment data. Moreover, firstprinciples models are expensive to develop. It takes time for researchers to acquire sufficient knowledge for describing a new process mathematically and comprehensively. An ultimate goal of firstprinciples modeling is to understand the fundamental physics. However, in practice, partial or empirical understanding is often sufficient for certain practical applications. For instance, a modestly accurate inputoutput dynamic model makes controller design possible. Contrasting to firstprinciples modeling, another effort is blackbox modeling by system identification. Blackbox modeling tends to overlook details in mechanism, but focuses on inputoutput behavior of a process. For instance, the inputoutput description via firstorderplustimedelay models is often adequate for process control engineers to tune PID controllers. There are many choices for model structures including Finite Impulse Response, Autoregressive with exogenous inputs, Output Error, Autoregressive and Moving Average with exogenous inputs, and BoxJenkins. For each structure, the simplest one is a linear model. Surprisingly, many chemical processes can be quite well described using linear models due to the fact that most chemical processes are operated around a steady state operating point. The linear model could be interpreted as a local 2 linearization of the truly nonlinear chemical process. Despite the fact that linear models have been successfully used in many chemical processes, efforts have been devoted to describe nonlinear dynamical chemical processes in a more compact or unified approach. It is also expected that nonlinear modeling can provide more accurate description. If a nonlinear model is desired, users have options to represent a nonlinear function mapping. These options include but are not limited to polynomial models, piecewise models, basis function models, network models, and fuzzy models. Interestingly, there is also experiencedbased knowledge existing for chemical processes. These rules are familiar to us in various forms including process operating instructions and manuals, handbooks and rules of thumb. Some rules are derived from prior knowledge, which could be either understanding of fundamentals or experts’ experience. For instance, our knowledge regarding distillation behavior might produce two following rules expressing steady state relations: IF Reflux (R) is Fast THEN Overhead Purity (xd) is High IF Reflux (R) is Slow THEN Overhead Purity (xd) is Low where linguistic terms ‘Fast’ and ‘Slow’ are used to specify Reflux (R) while ‘High’ and ‘Low’ are used to specify xd. Knowledge expressed in logical rules is easy to understand but often difficult to use. Linguistic terms such as Fast, Slow, High, and Low are often not clearly defined. Moreover, human knowledge might be incomplete or outdated. In this work, one focus is to describe the inputoutput behavior of a nonlinear dynamic process. We choose TSK (TakagiSugenoKang) (Sugeno & Kang, 1986; Takagi & Sugeno, 1985) fuzzy models to approximate nonlinearity. The choice is motivated to take advantage of simplicity, interpretability, modularity and flexibility in a fuzzy model. The concept of a fuzzy set was introduced by Zadeh (Zadeh, 1965) to express degrees of membership of elements to sets, which could be viewed as a generalization of the classical 3 notion of set defined on a twovalue (0 and 1 or Ture and False) membership value. Subsequently, fuzzy logic is invented to handle the reasoning based on fuzzy sets. There are many ways to define fuzzy logic. An interesting application of fuzzy logic in engineering fields (fuzzy logic in broad sense) is fuzzy modeling, which uses fuzzy models to represent a nonlinear function. A fundamental proof, which permits the belief in fuzzy modeling shows that a fuzzy model is a universal approximator (Kosko, 1994). It simply means that fuzzy models can theoretically approximate almost any nonlinear function. Although a fuzzy model is not the only universal approximator, it is preferable over other modeling approaches because of its simplicity, interpretability, modularity and flexibility. One aspect of simplicity could be the modeling simplicity. One merit in fuzzy modeling is to allow users to translate their intuition and knowledge into a qualitative model description at first, by a fuzzy model, and leave quantitative description to a later tuning phase. For instance, an experienced operator can quickly provide a model with several rules to describe a distillation column as shown above, then, subsequently the break points defining linguistic categories can be fine tuned. Because fuzzy models are strongly connected to human knowledge, they are often accredited interpretability. The use of linguistic terms seems to be an ‘obvious’ reason. For sure, the involvement of linguistic terms makes a fuzzy model appear friendly to users. More fundamentally, the interpretability is due to the fact that a fuzzy model is expressed in IF…THEN structure, which matches the reasoning procedure for humans and makes a fuzzy model appear ”intelligent”. Another important aspect of interpretability is knowledge transparence, which is due to the modularity in a fuzzy model. Fuzzy models are made of rules. Regardless how ‘big’ a fuzzy model is, each rule in the fuzzy model is relatively simple. A fuzzy model as a whole with thousands of rules looks by no means interpretable no matter how many linguistic terms are used. However, the modularity in a fuzzy model allows users to look at a fuzzy model in a different way by shifting focus onto individual rules. In each rule, knowledge on local behavior of a nonlinear process becomes clear, and interpretability is possible. 4 Modularity is also aligned with the concept of divideandconquer in dealing with complex problems. In fuzzy model identification, modularity could be exploited to convert the identification of a fuzzy model to a number of smaller and simpler identification problems, each of which focuses on a rule. In applications, for instance, designing a fuzzy model based controller, modularity is used to translate the controller design for a fuzzy model into a number of smaller and simpler controller design problems. Modularity also leads to flexibility in fuzzy models. A fuzzy model can be viewed as an interface rather than a model. It serves as a common gateway to connect different types of models and allow communication among them. As shown below is a possibility to let a fuzzy model to incorporate different types of models IF x is High THEN use a firstprinciples model IF x is Low THEN use a Neural Network model IF x is Medium THEN y is High The flexibility and modularity also simplifies the model management maintenance. In addition to adapting model parameters to compensate modelplant mismatch, fuzzy models also allow insertion and deleting operations on rules to incorporate newly discovered events and eliminate obsolete behavior. Different from most blackbox modeling approaches, in our view, fuzzy models explicitly separate nonlinear components in a model from its linear components. This work will exploit this feature to simplify the model structure. However, the applications of fuzzy models are limited by their insufficiency to handle highdimension problems due to a well known problem, the curse of dimensionality. With this restriction, fuzzy models can hardly have any significant practical impact. Even for a singleinputsingleoutput (SISO) dynamic process, fuzzy models will be embarrassed if the SISO process has high dynamic orders. Many successful academic examples of using fuzzy 5 models are demonstrated on dynamic processes with low dynamic orders, often not exceeding four. In this work, fuzzy models, particularly TSK type of fuzzy models are chosen to describe nonlinear dynamics due to the potential benefits mentioned above. The TSK model used in this work is featured with a generalized rule structure, which is proposed to overcome its insufficiency in dealing with high dimensional problem. The new structure has different dimensions in rule antecedent and consequent. Usually, in this work, the antecedent dimension is lower than consequent and contains only ‘nonlinear variables’, which tends to directly handle the curse of dimensionality by having fewer variables included in antecedents. Additionally, the new structure replaces the combinatorial antecedent structure by a more flexible one, where an extra degree of freedom is introduced to ‘rotate’ the coverage of a rule. The new addition reduces the number of rules needed in a TSK model by improving the covering efficiency of each rule. With the generalized rule antecedent structure, the TSK model in this work is referred to as GTSK (generalized TSK). The structure of a GTSK model includes the overall model dimension and antecedent dimension. In this work, since the primary modeling target is nonlinear dynamic processes, the determination of the overall dimension of a GTSK model is translated to discover the dynamic orders from measured inputoutput data. The antecedent dimension of a GTSK model is determined by finding nonlinear components in a GTSK model. Parameter estimation of the GTSK model is automated heuristically by recognizing rules from an iteratively partitioned space. Following the heuristic procedure is the fine tuning of the fuzzy model parameters by solving a nonlinear optimization problem with matrix inequality constraints. This work tends to provide a unified and systematic procedure to obtain a GTSK model with new rule structure from inputoutput data for a nonlinear dynamic process. The procedure is demonstrated on several theoretical benchmark problems, which are drawn from published research works and are used primarily for illustrating ideas, comparing methods and verifying results. The procedure is also tested on a distillation column simulator, which 6 has been successfully used in past research work (Ou, 2001). Additionally, the procedure is tested on a pilotscale chemical process, twophase flow, which exhibit nonlinear dynamics, time delay, and measurement noise. Innovations of this work are design of a new rule antecedent structure, which has a reduced antecedent dimension and a more flexible antecedent structure, design of a systematic approach to determine dynamic orders and detect nonlinear variables, and design of a heuristic procedure that iteratively partition an antecedent to generate regions, within which a linear relation is valid. 7 CHAPTER II LITERATURE SURVEY 2.1 Literature Survey for Dynamic Order Determination TSK type of fuzzy models is used in this work to describe a nonlinear dynamic process. Several potential benefits that users might expect from a fuzzy model have been listed in the Introduction. The modeling procedure proposed in this work is capable of dealing with multipleinputmultipleoutput (MIMO) processes. However, the majority of technical elaboration will be based on singleinputsingleoutput (SISO) models as described in Equation (2.1) for the simplicity of presentation. The extension to MIMO models will be addressed accordingly. y (t ) = f ( y (t −1),L , y (t − ny),u (t − d ),L ,u (t − nu − d )) + e(t ) (2.1) Equation (2.1) is a nonlinear autoregressive with exogenous input (NARX) model. The term NARX is chosen to be consistent with its linear counterpart, ARX models. The terminology is however not unique in the literature. In (Seborg & Henson, 1996), the structure in Equation (2.1) is named as a nonlinear autoregressive and moving average model (NARMA). In this work, ARX structure is chosen for its simplicity. More importantly, function arguments (lagged y and u) in Equation (2.1) include only input and output measurements. Some operations and treatment on raw data in this work are currently limited to model structures that have only measurable function arguments. 8 More complex structures could be used to describe nonlinear dynamics if necessary. A nonlinear NARMAX model is described in (Johansen & Foss, 1993). Its structure information is retrieved from its linear counterpart, ARMAX. As commented in (Nelles, 2001), more advanced structures are often not worth their additional complexities in describing nonlinear dynamics. On the other hand, NARX models as simpler models should often be tried first for any unknown structure nonlinear dynamic processes.The btained NARX models could be the basis for further structure variation or complication. In (Fischer, Nelles & Isermann, 1998), an NARX is first identified then converted to a nonlinear output error model (NOE) by some regressor replacements (for instance, y(t1) is replaced by its prediction ŷ(t1)) followed by model parameter retuning. Additionally, we assume, in this work, that ny, nu and d in Equation (2.1) are time invariants. The additional simplification may be against the nature of some realistic processes. For instance, a timevarying delay is often encountered in chemical processes, where a transportation delay strongly relates to a flow rate that is time varying in nature. On the other hand, a constant delay is often a good enough approximation in practice, especially in a relatively steady working condition. The first step in system identification is to determine orders of the model. For the SISO model in Equation (2.1), the problem is then to discover ny, nu and d. In terms of dynamic order determination, there are welldeveloped methods for linear systems. For dynamical linear systems, a preliminary analysis using autocorrelation and partial autocorrelation (Box, Jenkins & Reinsel, 1994) is able to identify dynamic orders. The result is often a set of candidate orders to be tried and validated further against data. Dynamic order determination can also be translated to problems regarding regressor analysis. Regressor analysis does not result in the dynamic orders and time delay directly. However, it would be a trivial practice to draw, ny, nu and d from the result of regressor analysis. One method is subset selection (Miller, 1990), which has different versions including forward selection, backward elimination, cycling replacement and exhaustive search methods. Among them, only exhaustive search, the most expensive one, is guaranteed to be able to find a global optimal solution, the best set 9 of regressors. Other methods are heuristically motivated aiming at a suboptimal solution with improvement in searching speed or efficiency. Analysis of variance (ANOVA) as a tool to find the influential experimental factors can also be used to find influential regressors (Lind & Liung, 2008). ANOVA method suffers from the curse of dimensionality and the evaluation of interacting influence among factors requires a combinatorial amount of trials. In addition, a conventional ANOVA procedure takes finite levels of experimental factors rather than continuous (‘infinite’ levels) values. Extra computation is required to prepare the raw data for ANOVA analysis (for instance by clustering). For nonlinear dynamical models, even for NARX models, there is not a general method such as the autocorrelation or partial autocorrelation method available for dynamic order determination. Rigorous analysis based on nonlinear correlation is possible if the nonlinear structure of f is known or presumed (Haber & Unbehauen, 1990). There are a variety of choices of predefined nonlinear structures such as bilinear, Wiener, Hammerstein models or their combinations. Another approach aims at a general target and does not depend on a predefined nonlinear structure. The geometric method (Molina, Sampson, Fitzgerald & Niranjan, 1996) is proposed to determine the embedded dimension in deterministic nonlinear autoregressive nonlinear systems. Following the same concept, its extension to dealing with deterministic ARX by including inputs is proposed in (Rhodes & Morari, 1995) based on False Nearest Neighbor. Both methods are more intuitively motivated rather than rigorously derived, and can be roughly argued based upon the firstorder Taylor expansion. Another method also based on the firstorder Taylor expansion argument is Lipschitz Quotient (He & Asada, 1993) aiming at deterministic NARX dynamic processes. The difficulty in determining the order in Equation (2.1) is the unknown nonlinear function, f. Even if f is known to be nonlinear, the richness of nonlinearity would keep users from exhausting all possible nonlinear forms, making it difficult to find ny, nu and d. If the nonlinearity is known, it is possible to transform a nonlinear problem into a linear problem. If the nonlinearity is unknown, users could resort to any one of ‘big’ 10 models such as neural network or any other one being proved to be a universal approximator. These complex structures are able to capture almost any nonlinearity given enough flexibility. Without nonlinearity being a problem, users can then experiment and compare different sets of orders in these ‘big’ models. The drawback of using ‘big’ models is high computational burden. Additionally, as we will present later, experimentation of dynamic orders in ‘big’ models is not suitable for another objective in this work, nonlinear component detection. In our work, a unified approach is proposed for both dynamic order determination and nonlinear component detection. 2.2 Literature Survey for Fuzzy Model Structure There are several different types of fuzzy models. One of them is the Mamdani fuzzy model (Mamdani, 1974). For the nonlinear dynamic process in Equation (2.1), Mamdani fuzzy models might be defined by rules as below ( ( ) ( ) ) ( ) 1 1 1 is is y t r r ny nu r y t A u t nu d A C + + IF − AND AND − − THEN is L (2.2) where, the expression ( ) ( ) 1 1 1 is r is r ny nu y t A u t nu d A + + − AND L AND − − is the antecedent of the rule. The expression y (t ) is Cr is the consequent of the rule. The variables y(t1), …, y(tny), u(td), …, u(tnud) are antecedent variables and 1 Ar is the fuzzy subset for y(t1) in the rule. Notations of fuzzy subsets for other variables should be clear in context. A Mamdani fuzzy model has the perhaps the simplest consequent models. An extension of Mamdani fuzzy models is TakagiSugenoKang (TSK) fuzzy model (Sugeno & Kang, 1986; Takagi & Sugeno, 1985). The generalization goes to rule consequent. For the nonlinear dynamic process in Equation (2.1), a rule in a TSK fuzzy model could be defined by ( ( ) ( ) ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 1 1 1 1 1 1 1 1 1 0 1 1 is is 1 r r ny nu r r r r r r r ny ny r r r r nu nu y t A u t nu d A z y t k z u t d e t z a z a z z b b z b z + + − − − − − − − − − − − = + − + = + + + = + + + IF AND AND THEN A B A B L L L (2.3) 11 where, consequent model is Ar (z−1 ) y (t ) = k r + Br (z−1 )u (t − d ) + er (t ) with dynamic orders ny and nu, pure time delay d and a constant kr. z is the backshift operator. The local model could be interpreted as a linearization of the nonlinear dynamic process in Equation (2.1). The linearization explains the inclusion of the constant term kr. As mentioned in (Leith & Leithead, 1999; Shorten, Smith, Bjorgan & Gollee, 1999), the linearization could be interpreted as conducted around either a steady state or transitional working point. Including of the later is commented to be able to improve modeling performance for transient behavior (Smith & Johansen, 1997). Mamdani and TSK represent two major types of fuzzy models and are different in consequents. In fact, a TSK fuzzy model could be further generalized by replacing its linear consequent models with other types of models. In (Mastorocostas & Theocharis, 2002), a new type of fuzzy model is proposed with neural network consequent models. Hierarchical fuzzy models (Lee, Chung & Yu, 2003; Liu & Li, 2005; Zeng & Keane, 2005) are often mentioned in the literature and could also be considered as a particular type of generalization by having fuzzy models as local models. Interestingly, fuzzy models could also be compared with models originated from other disciplines. It is shown in (Andersen, Lotfi & Westphal, 1998; Roger & Sum, 1993) that a TSK fuzzy model with Gaussian membership functions and product operator for AND logic conjunction is functionally equivalent to a normalized radial basis network under certain restrictions. In (Smith & Johansen, 1997), a TSK fuzzy model is addressed in a broader perspective as a multimodel network. The above mentioned fuzzy models represent one direction of generalization of fuzzy model structure by making consequent models more complex. Interestingly, not much effort is devoted to generalize the antecedent structure in a fuzzy model. The maneuverability in antecedents lies mainly in the choices of different types of membership functions including triangular, trapezoidal and Gaussian, etc., as well as different configurations for a particular type of membership functions. 12 Another degree of freedom in designing antecedents is via using different logic operators. For instance, the AND conjunction in the antecedent expression in Equation (2.2) or (2.3) could be quantitatively evaluated using either product or minimum operator. In addition to these two, there are in fact many other choices for the evaluation of AND conjunction, which is defined by a variety of Tnorms as a result of research on symbolic fuzzy logic (Lee & Zhu, 1995). 2.3 Literature Survey for Fuzzy Model Identification Identifying a fuzzy model generally involves two objectives, structure identification and parameter estimation. The structure identification selects variables for antecedent and consequent, determining number of fuzzy subsets for each variable, and estimating number of rules in a fuzzy model. Parameter estimation determines values of model parameters. As shown in a TSK rule in Equation (2.3), model parameters include parameters defining all fuzzy subsets (membership functions) in the antecedent and those defining consequent models. There are many different approaches for fuzzy model identification. They vary for different types of fuzzy models to be identified or based on different assumptions. Very often in practice, the structure identification and parameter value estimation are coupled. For instance, the number of rules is related to the number of variables in the antecedent as well as the number of fuzzy subsets for each antecedent variable. Meanwhile, an addition or deletion of a fuzzy subset to a variable is expected to affect of the distribution of other fuzzy subsets, which in turn results in retuning of membership functions for optimal result. Inevitably, any variation in antecedent parameter values should be accompanied by corresponding change in consequent model coefficients. 2.3.1 Variable Selection Variable selection determines the variables for rule antecedent and consequent. Very often, it is implicitly assumed for simplicity that all rules in a fuzzy model share the same set of antecedent and consequent variables. It is therefore equivalent in practice to 13 define the problem as antecedent and consequent variable selection for a fuzzy model. Variable selection is not conducted separately but often accompanied by parameter estimation/retuning. A common explicit procedure is to try different sets of selections with evaluation of their corresponding model accuracy and complexity, and find the best. In (Pomares, Rojas, González & Prieto, 2002), the variable selection is conducted iteratively in a constructive approach to build a fuzzy model. In each iteration, a fuzzy model is augmented by either changing the number of fuzzy subsets of already selected variables or adding a variable in antecedent. The better one is kept. Similar to the approach widely used in classification tree identification, the antecedent variable selection is implicitly conducted in (Nelles & Isermann, 1996). In each step, the augmentation of the existing fuzzy model is tried by adding a new rule for each candidate variable. The best rule is then kept. In the end, antecedent variable selection is automatically achieved by discarding variables from the antecedent, which are never selected. The variable selection becomes more complicated for a dynamic process as described in Equation (2.1) since each variable is associated with an unknown dynamic order. The variable selection problem should then be extended to determine the dynamic order for each variable. The extension could be simply achieved by including more lagged terms, which, however, largely increases the problem dimension and makes many methods designated for low dimension problems become difficult. 2.3.2 Fuzzy Model Identification There are several different ways to categorize methods in fuzzy model identification. Some identification methods are based on heuristic criterion for linguistic interpretability and knowledge transparency. On the other hand, many other identification methods tend to find a more accurate fuzzy model by minimizing a quantitative performance index. The approaches to extract fuzzy rules heuristically are mainly inspired by two procedures. The Pittsburgh approach focuses on rule set evolution while the Michigan approach evolves individual rules independently. Both Pittsburgh and Michigan approaches use genetic algorithms for optimization, which is consistent with the main 14 theme being heuristic. More importantly, it is due to the fact that heuristic criteria are unable to provide explicit searching directions expressed by gradients or Hessians. The research on this field focus primarily on inventing new heuristics by digging deep how human process linguistic information, or devise more efficient searching or combinatorial optimization techniques (Cordon, Herrera, Gomide, Hoffmann & Magdalena, 2001). Different from those heuristically inspired approaches, a modeling error driven approach estimate parameter values of a fuzzy model by optimizing the performance index, for instance, sum of squared error. In this approach, one could take either a ‘global’ procedure to tune all parameters (antecedent, consequent parameters) simultaneously or a ‘local’ procedure starting from individual rules and combine them to be a fuzzy model. The ‘global’ procedure requires a good initial guess to avoid trivial solutions or poor local minimal. In (Dickerson & Kosko, 1996), an initial fuzzy model is generated by recognizing piecewise patches along a SISO nonlinear function to be approximated. Then a steepest descent optimizer is followed. Heuristics based on clustering are also used to recognize the prototype rules (Dickerson & Kosko, 1996; Vernieuwe, Baets & Verhoest, 2006; Wang & Yang, 2009). In (Nelles, 2001) rules are progressively generated by conducting an equal division in a dimension in each step. It is also possible to overparameterize a fuzzy model and let a simplification procedure (Yen & Wang, 1999) to merge redundant rules or eliminate invalid rules. It is worthy pointing out that there is a procedure that tends to obtain a fuzzy model representation of a known nonlinear process by mathematical equivalence (Kawamoto, 1992). This approach has nothing to do with above mentioned fuzzy model identification from data. The main purpose of this procedure is to represent a nonlinear model by a fuzzy model and exploit the structure features in the fuzzy model to design controller, and investigate stability for the original nonlinear model. Additionally, heuristicbased stochastic procedures exist to gain both model structures and parameter values simultaneously (Du & Zhang, 2008; Guenounou, Belmehdi & Dahhou, 2009; Lin, 2008; Lin & Xu, 2006), which however require even more computation resources. 15 CHAPTER III A GENERALIZED RULE ANTECEDENT STRUCTURE In this chapter, a generalized rule antecedent structure is proposed. The new rule antecedent uses only nonlinear variables. Additionally, one more degree of freedom is introduced to design antecedents to cover an antecedent space more efficiently. The following elaboration focuses on a singleinputsingleoutput (SISO) model. The extension to multipleinputmultipleoutput MIMO models is provided at the end. 3.1 Model Complexity Equation (3.1) represents a SISO dynamic process with dynamic orders ny, nu, pure time delay d, and an additive noise e (t) y (t ) = f ( y (t −1),L , y (t − ny),u (t − d ),L ,u (t − nu − d )) + e(t ) (3.1) where y is the process response and u is the input. The nonlinear function, f could be approximated by a TSK model in Equation (2.3) and reproduced as below for simple reference ( ( ) ( ) ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 1 1 1 1 1 1 1 1 1 0 1 1 is is 1 r r ny nu r r r r r r r ny ny r r r r nu nu y t A u t nu d A z y t k z u t d e t z a z a z z b b z b z + + − − − − − − − − − − − = + − + = + + + = + + + IF AND AND THEN A B A B L L L (3.2) 16 Complexity of a TSK model could simply be regarded as the number of rules. For the TSK model in Equation (3.2), given that each variable has 5 fuzzy subsets (could be linguistically labeled as Low, MediumLow, Medium, MediumHigh, High), there would be 5ny+nu+1 possible rules to be considered. The problem dimension (ny+nu+1 in this case) is an obvious cause for the complexity. Moreover, the number of rules also depends on the number of fuzzy subsets for each variable. The illustrated number, 5 is quite conservative in practice. Simply put, the TSK model described in Equation (3.2) has difficulty to deal with high dimension problems or it is subject to “the curse of dimensionality”. In the following, a generalized rule antecedent structure is proposed to design an efficient GTSK model by using fewer rules. The new rule antecedent only uses nonlinear variables, which separates the antecedent dimension from the problem dimension. The complexity of a GTSK model is only related to the antecedent dimension. It is then possible to apply a GTSK model to a high dimension problem so long as its antecedent dimension is acceptable. Additionally, the proposed rule antecedents are expressed as ellipsoids covering the underlying local regions and feature one more degree of freedom in design. The extra flexibility makes spatial coverage more efficient and simplifies a fuzzy model in terms of number of rules. 3.2 Antecedent Dimension The direct approach to reduce the number of rules is to control the problem dimension, which is unfortunately determined by the nature of the problem but not by users. However, dimension reduction in the antecedent is still possible by excluding variables that appear linearly. To illustrate dimension reduction, consider the following nonlinear dynamic model with three regressors, [y(t1) y(t2) u(t1)] y (t ) = y (t −1) y (t − 2) + 2.5 + y2 (t −1)u (t −1) (3.3) 17 Using the rule structure in Equation (3.2), the rule antecedent could then be expressed as (y(t1) is A1 AND y(t2) is A2 AND u(t1) is A3)). The antecedent dimension is 3, which is same as the problem dimension. Assuming that each variable has 5 fuzzy sets, the combinatorial construction will then generate 53=125 possible rules. The dynamic model in Equation (3.3) can be represented in a linear format using timevarying parameters. ( y t ) = a1 (t ) y (t −1) + a2 (t ) y (t − 2) + b0 (t )u (t −1) (3.4) with a1(t) = 2.5, a2(t) = y(t1) and b0(t) = y(t1)2 where, model parameters a2 and b0 are not only timevarying but functions of the regressor, y(t1). It indicates that the model can be expressed linearly in all variables except y(t1). The coefficient values in Equation (3.4) are independent of y(t2) and u(t1). Equivalently, the regressor, y(t1), is the only regressor that changes the otherwise linear model coefficient values. Therefore, y(t1) should be the only one included in the antecedent. The simplified rule is defined by ( ) ( ) ( ) ( ) 1 2 1 ( 1) is 1 2 1 r r r r r y t A y t k a y t a y t b u t − = + − + − + − IF THEN (3.5) where the antecedent dimension is reduced to 1. The possible number of rules is reduced from 125 to 5. In Equation (3.5), y(t1) is then an antecedent variable and collected in a vector c(t). Regressors in the consequent including y(t1), y(t2) and u(t1) are collected in vector x(t). The concept to include only nonlinear variables in antecedents have been explicitly mentioned in (Shorten, Smith, Bjorgan & Gollee, 1999) or implicitly applied in (Nelles & Isermann, 1996; Tanaka & Wang, 2001), where fuzzy models are used to describe known nonlinear dynamic processes. However, the above mentioned dimension reduction can only be made practically applicable if it is able to find antecedent variables from data. The detection of antecedent variables will be addressed in Chapter 4. 18 3.3 Antecedent Structure 3.3.1 A Generalized Antecedent Structure As mentioned above, the number of rules is related to the problem dimension by 5ny+nu+1. In Section 3.2, it is illustrated that it is possible to use a number for the exponent less than ny+nu+1. However, the exponential relation between the number of rules and the dimension (antecedent dimension) is still preserved. The underlying cause for the exponential connection is the combinatorial antecedent structure expressed in the TSK rule in Equation (3.2), using AND conjunction to connect antecedent variables. For example, given a two dimensional antecedent (c1 is A1 and c2 is A2), if Gaussian membership functions are assumed and the product operator is used for the AND conjunction, the antecedent is then evaluated by the truth of antecedent (TA) 2 2 1 1 2 2 1 2 exp c o c o TA σ σ − − = − − (3.6) where TA is an ellipsoid centering at (o1,o2) with width by σ1 and σ2. A contour plot of TA is shown below Figure 3.1. The ellipsoid contour of TA In Figure 3.1, the highest value of TA =1 is reached at the centroid. The further out is the contour, the smaller the TA. The value of TA can be interpreted as the 1 c 2 c 1 o 2 o 19 belongingness of a data point to a local region. A fuzzy model has several rules. Given a twodimensional antecedent with equal number of fuzzy sets for each antecedent variable, a typical combinatorial antecedent space partition and representation by horizontal and vertical ellipsoids is shown in Figure 3.2(a) (a) (b) Figure 3.2. Antecedent space partition and representation where, 9 rules result from the exhaustive combinations of 3 fuzzy sets for each antecedent variable. Users might resort to the techniques in (Yen & Wang, 1999) to reduce the redundancy in consequent models and have a more compact fuzzy model. The number of rules can be reduced by merging some regions that exhibit similar local behavior. Figure 3.2(b) shows a possible simplified partition after merging some regions. The partition in Figure 3.2(b) will also become inefficient as shown in Figure 3.3, where neither a horizontal nor a vertical ellipsoid provides an efficient representation of the underlying local region represented by either the rotated “space” of correlated variables or irregular polygons. 1 c 2 c 1 c 2 c Figure 3.3. A rotated local region covered by a horizontal or vertical ellipsoid One possible solution for covering the space is to use many smaller ellipsoids as shown in Figure 3.4, which Figure 3.4. A rotated local region covered Another solution is to rotate the ellipsoid as shown in Figure 3.5. Figure 3.5. 20 3. however might result in a lot of rules. 4. by many small ellipsoids nother A rotated local region covered by a rotated ellipsoid 21 The rotated ellipsoid proposed here with the stretching and contraction is flexible enough to match many geometric shapes. In order to address the rotation mathematically, the parameters σ in Equation(3.6) are replaced by a symmetric positive definite matrix P, which is termed as the shape matrix in this work and redefines the truth of antecedent by exp( ( ) ( )) T TA = − c − o P c − o (3.7) where o is a vector with dimension of nc and represents the centroid, and the dimension for the shape matrix P is nc by nc. The flexibility in representing antecedent subspaces is at cost of additional nc(nc1)/2 new parameters in the shape matrix in Equation (3.7). This approach could be interpreted as a transition from a TSK fuzzy model with many simple rules to a GTSK fuzzy model with fewer complex rules. Clearly, the simplicity and complexity in this context refers to that in rule antecedents. 3.3.2 Interpretation of the Proposed Structure Interested readers could follow the following method to convert the new antecedent structure in Equation (3.7) to a conventional antecedent in Equation (3.2) with new defined variables. Since the treatment in Section 3.3.2 is not essential, readers might also choose to skip it. The conversion is aided via representing the shape matrix in Equation (3.7) by its spectral decomposition. 1 nc T i i i i λ = P =Σ z z (3.8) where λi is an eigenvalue and zi is the corresponding eigenvector. Substituting P by its spectral decomposition, Equation (3.7) then becomes 22 ( ) ( ) ( ( ) ( )) ( ) 1 1 2 , 1 1 exp exp exp nc T T i i i i nc T T i i i i nc nc i j j i j i j TA c o z λ λ λ = = = = = − − − = − − − = − − Σ Π Π Σ c o z z c o c o z z c o (3.9) Then TA could be converted to the conventional form 2 1 exp nc i i i i v q TA = σ − = − Π (3.10) with the new introduced antecedent variable vi, centroid, qi and σi are defined by , 1 , 1 1 2 nc i j i j j nc i j i j j i i v c z q o z σ λ = = − = = = Σ Σ (3.11) The rule antecedent could then be represented in the conventional format using AND conjunction as ( ) ( ) 1 1 1 1 is , is , nc nc nc nc v A q σ ANDL AND v A q σ (3.12) where A1(q1, σ1) denotes a Gaussian membership function with the centroid, q1 and the width specified by σ1. The above mentioned interpretation might be useful to convert an existing GTSK model with the generalized antecedent structure to a conventional TSK fuzzy model to regain the interpretability offered in antecedents using AND conjunction. It seems also that the antecedent structure generalization is to extend a conventional TSK fuzzy model architecturally by introducing an extra layer to linearly combined raw variables to form 23 antecedent variables. However, the above interpretation might not be helpful in estimating model parameters in general. For instance, there are nc(nc+1)/2 variables required to specify the shape matrix. However, there are nc(nc+1) parameters expressed in Equation (3.11); zi,j (i=1,…,nc; j=1,…,nc) and λi (i=1,…,nc). One might need to add additional constraints to eliminate the extra nc(nc+1)/2 degrees of freedom. For instance, eigenvectors are orthogonal to each other and eigenvectors have unit length. 3.4 SISO Models In a GTSK model, model parameters include both antecedent and consequent parameters. Antecedent parameters specify active regions for each rule while consequent parameters describe local models. For simplicity of presentation, a vector x(t) is defined as below to collect the input arguments in Equation (3.1) ( ) 1 ( 1) ( ) ( ) ( ) T x t = y t − L y t − ny u t − d L u t − d − nu (3.13) where the dimension of x(t) is nx+1 with nx = ny+nu +1. If a GTSK model is used to approximate the nonlinear function f in Equation (3.1), the fuzzy model is then defined as below using the proposed antecedent structure ( ( ) ( )) ( ) ( ) ( ( ) ( )) ( ) ( ) is in 1 1, 1 ˆ1 1 is in M M , M ˆ M M t R y t t t R y t t = = IF c o P THEN θ x IF c o P THEN θ x M (3.14) where, superscript 1 indicates the first rule in a GTSK model. The antecedent representation using AND conjunction in Equation (3.2) is replaced by the statement c(t) is in R1 (o1,P1). The expression of R1 (o1,P1) could be interpreted to represent an ellipsoidal active region for the first rule. The number of rules, M, is assumed known. c(t) containing nc antecedent variables is defined as below and obtained as nonlinear components in Chapter 4 for nonlinear dynamic processes. 24 ( ) ( ) ( ) 1 T c t = c t L cnc t (3.15) 3.4.1 Model Parameters Figure 3.6 illustrates the model parameters to be estimated for a GTSK model in a twodimension antecedent structure. Figure 3.6. Model parameters for a 4rule GTSK model Ri represents the active region for the ith rule. Its location and shape are specified by antecedent parameters; a centroid vector, oi ∈Rnc and a positive definite shape matrix, Pi ∈Rnc×nc . They are respectively defined by 1 i i i T nc o = o L o (3.16) 1 2 4 2 3 5 4 5 6 i i i i i i i i i i p p p p p p p p p = P L L L M M M O (3.17) 25 where the symmetric matrix Pi is specified by a vector ( 1) 2 Ri nc× nc+ p ∈ 1 2 3 ( 1) 2 i i i i i nc nc p p p p + = p L (3.18) The Pi matrix can be expressed as a weighted sum of symmetric basis matrices in order to facilitate the computation later on ( 1) 2 1 nc nc i i j j j p + = P = Σ B (3.19) where symmetric basis matrices, Bj, are defined in the following manner 1 1 0 0 0 0 0 0 0 0 = B L L M M O M L , 2 0 1 0 1 0 0 0 0 0 = B L L M M O M L (3.20) 3 0 0 0 0 1 0 0 0 0 = B L L M M O M L , …, ( 1) 2 0 0 0 0 0 0 0 0 1 nc nc+ = B L L M M O M L The local model parameters (consequent parameters) are included in vector θi ∈Rn×+1 defined by 0 1 i i i i nx θ = θ θ L θ (3.21) 3.4.2 Model Computation The computation of the model in Equation (3.14) is defined by ( ) ( ) ( ) 1 ˆ ˆ M i i i y t w t y t = =Σ (3.22) 26 where ŷi(t) is output from the local model in Rule i and weighted by wi(t). Weights wi(t) are defined as the normalized truth of the antecedent (TA) ( ) ( ) ( ) 1 i i M i i TA t w t TA t = = Σ (3.23) with TA evaluated by Equation (3.7) 3.5 Extension to MIMO Models Dealing with MIMO models becomes simple in this work. As below, a MIMO model is shown a collection of several MISO models. Interested readers might follow Section 3.5 to see how a MIMO model is equivalent to multiple MISO or come back later to revisit the subject when dealing with a MIMO case. For a MIMO process, a general description of its onestep predictor is defined by yˆ (t ) = f (y (t −1),L , y (t − ny),u(t − d ),L ,u(t − d − nu)) (3.24) where, the MIMO model has n outputs and m inputs. The output and input vectors, y(t) and u(t) are defined by ( ) ( ) ( ) ( ) ( ) ( ) 1 1 T n T m t y t y t t u t u t = = y u L L (3.25) The above model structure implicitly assumes that the dynamic orders in all yi (i=1,…,n) and uj(j=1,…,m) for each output yk(t) are ny and nu respectively. A universal time delay is also assumed between each pair of uj and yk. The universal order assumption is in general not true in practice. However, a MIMO GTSK in discrete time model could be modified to have such a universalorder structure by adding zeros if necessary. A regressor vector x(t) is defined to collect all input arguments in Equation (3.24) 27 ( ) 1 ( 1) ( ) ( ) ( ) T T T T T x t = y t − L y t − ny u t − d L u t − d − nu (3.26) where the dimension of x(t) is nx+1 with nx = n×ny+(m+1)×nu. The model is then defined as below ( ( ) ( )) ( ) ( ) ( ( ) ( )) ( ) ( ) is in 1 1, 1 ˆ 1 1 is in M M , M ˆ M M t R t t t R t t = = IF c o P THEN y θ x IF c o P THEN y θ x M (3.27) The model in Equation (3.27) is almost identical to that in Equation (3.14). oi and Pi have the same meaning. Antecedent variables are included in vector c(t), which is also a subset of x(t). The only difference is that local models in Equation (3.28) are multipleoutput. The vector ŷi collects the n output predictions by the local model in the ith rule 1 ˆ i ˆi ˆ i T n y = y L y (3.28) Consequently, local model parameters are organized in a matrix ( 1) Ri n× nx+ θ ∈ defined by 1,0 1,1 1, ,0 ,1 , i i i nx i i i i n n n nx θ θ θ θ θ θ = θ L M M O M L (3.29) Each row of θi corresponds to an output and every column of θi is related to a regressor. It is possible to decompose θi in terms of columns or rows as below 1 0 ; i i i i i nx i n = = θ θ θ θ θ θ L M (3.30) Where i θj (j=0,…,nx) represents the jth column in matrix θi and rows i kθ represents the kth row in θi (k=1,…,n) 28 The computation in Equation (3.22) is then extended as below to deal with a multipleoutput model. ( ) ( ) ( ) ( ) 1 1 1 ˆ ˆ i M i i i n n y t w t t y t = = Σ θ x θ M M (3.31) Equation (3.31) could be viewed as a collection of n singleoutput models. For instance, the computation for the kth output is ( ) ( ) ( ) 1 ˆ M i i k k i y t w t t = =Σ θ x (3.32) where i kθ defined in Equation (3.30) is the kth row of matrix θi. It then is possible to define a single output GTSK model for the kth output only by ( ( ) ( )) ( ) ( ) ( ( ) ( )) ( ) ( ) is in 1 1 , 1 ˆ1 1 is in , ˆ k k M M M M M k k t R y t t t R y t t = = IF c o P THEN θ x IF c o P THEN θ x L (3.33) Comparing the singleoutput model for output yk with that in Equation (3.14), equivalence is established by equating kθi in Equation (3.33) to θi in Equation (3.14). However, two models are different. Model in Equation (3.14) is SISO while that in Equation (3.33) is MISO. The x(t) in Equation (3.33) actually collects the lagged multiple inputs and lagged multiple outputs. Fortunately, the difference in contents in x(t) has no impact on evaluation of the first and second order derivatives to be presented later. The computation of gradients and Hessian matrices for a SISO GTSK model can be extended directly to each MISO element in a MIMO GTSK model. A matrix kθ is defined to collect all local model parameters for the kth output. 1 k k r k = θ θ θ M (3.34) 29 The above decomposition can facilitate estimation of model parameters in terms of evaluation of derivatives if a decomposable performance index is used. Simply, the centroids and shape matrices have global influence on a GTSK model. Their influence on all outputs should be accumulated. To the contrary, the consequent parameters, kθ have only local influence on its corresponding output yk. It then could be expected that the interactions between kθ and lθ (k≠l) is zero. The representation of a MIMO GTSK model by several singleoutput GTSK models will be exploited in Chapter 5 to derive the first and second order derivatives of an objective function with respect to model parameters. 30 CHAPTER IV DYNAMIC ORDER DETERMINATION AND NONLINEAR COMPONENT DETECTION Determination of dynamic orders (ny, nu and d in Equation (3.1)) is the first step in system identification. Order determination is in general difficult for nonlinear system identification due to the interaction of model structure (unknown orders) and unknown nonlinearity. If the attenuation of unknown nonlinearity is possible, different model structures could then be fairly compared. Guided by this concept, the work in this chapter uses a recursive estimation to reduce the effect of the underlying nonlinearity on parameter variation, and proposes a sequential nearest neighbor rearrangement to enhance the reduction. The “best” dynamic order will minimize a final prediction error with the consideration of the locality of the model parameters. In addition to determining dynamic orders, the sequential nearest neighbor rearrangement is also extended to detect nonlinear components, which are regressors responsible for parameter variation if a nonlinear dynamic model is converted to a liner timevarying dynamic model. The result from Chapter 4 could be viewed as the preliminary analysis for building a GTSK model to be presented in Chapter 5. The dynamic order determination defines the overall dimension of a model. The nonlinear component detection selects antecedent variables for the model. 31 4.1 Dynamic Order Determination The dynamic orders ny, nu and delay d are described in Equation (3.1). The difficulty in discovering the dynamic orders for a nonlinear dynamic model is caused by the unknown nonlinearities. Even if f is known to be nonlinear, the richness of nonlinearity would keep users from exhausting all possible nonlinear forms, making it difficult to find ny, nu and d. If the unknown nonlinearity is not a problem or at least not as severe as it was, it is possible to devise a procedure for dynamic order analysis for a NARX. The objective of the following methodology is to detangle the nonlinearity and dynamic orders, which makes it possible to define model orders. The methodology simply involves two stages of works. The first is to attenuate the unknown nonlinearity. The second is search for dynamic orders. 4.1.1 Nonlinearity Representation Nonlinearity could be explicitly or implicitly expressed. It is possible to transform a nonlinear dynamic model into a linear one if the nonlinear function is known. For instance, the following nonlinear dynamic model y (t ) = 0.4y (t −1)3 +u(t −1)3 + e(t ) (4.1) could be redefined as a linear dynamic model by static transformation z(t) = y(t1)3, v(t) = u(t1)3 Unknown nonlinearity could be addressed by using structurerich models such as neural network models, basis function models and fuzzy systems. These models are all universal approximators and able to capture almost any nonlinearity given enough flexibility. If a neural network model is used, one then could use the following procedure to find proper dynamic orders. A neural network is tried for different sets of ny, nu and d and the best set is then reported. Due to the application of a neural network, the nonlinearity is presumably addressed. The only affecting factors for modeling performance are ny, nu and d. It then is possible to find the set with the best performance. This approach is very general and could be applied to any scenarios, any nonlinear 32 dynamic models by any universal approximators. The drawback is the computational burden in terms of training ‘big’ models and efforts put to select appropriate network architecture (number of layers, nodes in each layer in neural networks; number of fuzzy subsets, number of rules in a fuzzy system). If simple models are preferred such as linear models, nonlinearity could be addressed by adaptation. Model parameters are recursively updated to track the model parameter variation caused by nonlinearity. Linear models with parameter adaptation require much less computation compared to ‘big’ models. The following example shows how convert a nonlinear dynamic process to a linear format. The example uses a NARX model defined by ( ) ( ) ( ) ( ) ( ) 3 3 1 1 1 1 y t y t u t e t y t − = + − + + − (4.2) which could be represented in a linear format y (t ) = a1 (t ) y (t −1) + b0 (t )u (t −1) + e(t ) (4.3) where a1(t) and b0(t) are timevarying model parameters and are defined in Equation (4.4) as functions of y(t1) and u(t1) to establish onetoone correspondence between Equation (4.3) match Equation (4.2) ( ) ( ) ( ) ( )2 1 2 0 1 , 1 1 1 a t b t u t y t = = − + − (4.4) In general, the nonlinear dynamic model in Equation (3.1) could be expressed in the following linear format ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 1 0 1 ny nu y t a t y t a t y t ny b t u t d b t u t nu d e t = − + + − + − + + − − + L L (4.5) 33 The linear format could be established from a known nonlinear dynamical model by onetoone correspondence as shown in Equation (4.4). However, the linear format is not always unique and one could have options. For instance, given a NARX model defined in Equation (4.6) ( ) ( ) ( )( ( ) ) ( ) ( ) ( ) ( ) 2 2 1 2 1 2.5 1 1 1 2 y t y t y t y t u t e t y t y t − − − + = + − + + − + − (4.6) It is possible to define a linear format ( ) ( ) ( ) ( ) ( ) ( ) 1 0 y t = a t y t −1 + b t u t −1 + e t (4.7) with ( ) ( )( ( ) ) ( ) ( ) ( ) 1 2 2 0 2 1 2.5 , 1 1 1 2 y t y t a t b t y t y t − − + = = + − + − another possibility is defined in Equation (4.8) with a different set of timevarying model parameters ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 1 2 0 y t = a t y t −1 + a t y t − 2 + b t u t −1 + e t (4.8) with ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 2 1 2 2 2 2 2 0 2.5 2 1 , , 1 1 1 2 1 1 2 y t y t a t a t b t y t y t y t y t − − = = = + − + − + − + − In general, it is rather difficult (maybe impossible) to extract the exact parameter functions as defined in Equation (4.4) from data only. There are few exceptions such as the one mentioned in (Young, 1993), where a1(t) and b0(t) are known to be linear functions of y(t1) and u(t1). 34 The nonlinear dynamic model in Equation (3.1) could also be approximately expressed by the following timevarying model ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 1 0 1 ny nu y t k t a t y t a t y t ny b t u t d b t u t nu d e t ≈ + − + + − + − + + − − + L L (4.9) The approximation is due to the firstorder Taylor expansion of Equation (3.1) with following definitions ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 0 0 0 0 0 0 0 0 0 1 1 1 t t t t f f k t y t y t y t ny y t y t ny f f u t u t nu d u t d y t ny ∂ ∂ = − − − − − − ∂ − ∂ − ∂ ∂ − − − − − ∂ − ∂ − L L ( ) ( ) ( ) ( ) 0 0 1 1 t ny t f a t y t f a t y t ny ∂ = ∂ − ∂ = ∂ − M (4.10) ( ) ( ) ( ) ( ) 0 0 0 t nu t f b t u t d f b t u t nu d ∂ = ∂ − ∂ = ∂ − − M where, t0 represents the reference point that the Taylor expansion is based on. The representation of Equation (3.1) by Equation (4.5) or (4.9) are different, although both share the same notations for timevarying model parameters a(t) and b(t). Equation (4.5) is due to the onetoone correspondence to Equation (3.1), while Equation (4.9) is based on onetoone correspondence to the firstorder Taylor expansion of Equation (3.1). The only difference is the additional timevarying intercept term k(t) in 35 Equation (4.9) and the following presented order determination procedure is applicable to both structures. 4.1.2 Recursive Estimation for Time Varying Parameters Equation (4.5) or (4.9) could be represented in a more compact format y (t ) = xT (t )θ(t ) + e(t ) (4.11) with ( ) ( ) ( ) ( ) ( ) ( ) 1 0 T ny nu θ t = k t a t L a t b t L b t ( ) 1 ( 1) ( ) ( ) ( ) T x t = y t − L y t − ny u t − d L u t − nu − d where, the constant regressor will be dropped if format in Equation (4.5) is used. The output prediction is then defined using the estimates of timevarying parameters yˆ (t ) = xT (t )θˆ (t ) (4.12) There are several different ways to estimate θ(t). Recursive estimation attempts to estimate local model parameters instantaneously. Another approach uses stochastic models to describe parameter variation if the statistics regarding parameter variation is assumed known. Among them, the simplest one is a random walk model. A Kalman filter is then used to estimate the timevarying parameter values as the states in the stochastic model. The second approach will not be investigated in this work since we assume the lack of knowledge on the statistics of parameter variation. Recursive estimation for parameter values, θ(t), is based on a timevarying weighted quadratic performance as below ( ) ( ( ) ( )) ( ) 2 1 , t J t y y w t τ τ τ τ = =Σ − ) (4.13) 36 where w(τ,t) is a weighting function. Commonly used weighting functions include rectangular window weighting and exponential weighting (Ljung & Soderstrom, 1986). In this work, the exponential weighting is used and described by, w(τ , t ) =α t−τ , τ = 0,1,L ,N (4.14) where the variable, α, a scalar between 0 and 1, is termed as forgetting factor. Figure 4.1 illustrates a particular exponential weighting with α = 0.95. Figure 4.1. Exponential weighting with α = 0.95 Using exponential weighting, the following equations (Young, 1984) are used to update model parameters from θˆ (N −1) to θˆ (N ) ( ) ( ) ( ) ( ) ( ) ( )( ( ) ( ) ( ) ) ( ) ( ) ( )( ( ) ( )) ( ) ( ( ) ( ) ( ) ( )) 1 ˆ ˆ 1 1 1 ˆ ˆ 1 ˆ 1 1 1 T T T y t t t t t t t t t t t t y t y t t t t t t α α − = − = − − + = − − − = − − − x θ H P x x P x θ θ H P P H x P (4.15) The forgetting factor, determines the influence of data in the past to the current estimation. The suggested range for α is between 0.9 and 0.99 (Young, 1984). In practice, trials for α might be needed for a balanced performance for nonlinearity adaptation speed and parameter estimation precision. 0.2 0.4 0.6 0.8 1 10 12 14 16 18 20 22 24 26 28 30 w(τ, 30) τ 37 4.1.3 Sequential Nearest Neighbor Rearrangement In the recursive estimation with exponential weighting, the tuning variable is the forgetting factor. When adjusting the forgetting factor, one should be aware of its conflicting affects on parameter estimates. The forgetting factor relates to the rate of variation. A smaller forgetting factor is expected for faster parameter variation. On the other hand, the precision of parameter estimates is determined by the size of data included in an “effective” window. The length of the window is also a function of the forgetting factor. Smaller is the tuning factor (shorter window), fewer data are included for estimation. In turn, the variance in estimates is high. Therefore, a larger forgetting factor should be preferred for higher estimation precision. However, a larger forgetting factor is only a good choice for slow parameter variation. The above argument verifies the suggested range for forgetting factor over 0.90, where the precaution is also mentioned for using exponential recursive estimation for slow variation at best (Young, 1984). As a result of the conflicting influence of forgetting factor on parameter estimates, dynamical nonlinear processes being dealt are expected to have slow parameter variation. Unfortunately, the nonlinearity is inherited in the data and determined by the nature of the process to be investigated. There is nothing one can possibly do to alter the nature of the process given only access to test it and generate inputoutput data. However, the nonlinearity is in fact not really the difficulty that we are aiming at but the source of difficulty, the parameter variation. The nonlinearity is believed to be the cause of parameter variation. It is desired to get around the inherited and inaccessible nonlinear nature of a process to change the parameter variation directly. If it is possible, the improvement of the recursive estimation becomes probable. As proposed below is an approach to manipulate raw data in time sequence to create an artificial sequence of data with slowed parameter variation. The following elaboration starts by defining parameter variation explicitly ( ) ( ) ( ) ( ) ( ) ( ) 1 1, , 1 0, , i i i i i i a t a t a t i ny b t b t b t j nu = − − = = − − = L L (4.16) 38 with the definition, the following vector collecting variations for all parameters is defined ( ) ( ) ( ) ( ) ( ) ( ) ( ) 1 0 1 T ny nu t t t a t a t b t b t = − − = θ θ θ L L (4.17) The parameter variation at t could then be quantified by θ(t ) , where norm is not specified. The parameter variation (pv) for the entire data set is ( ) 2 N t pv t = =Σ θ (4.18) If it is possible to minimize pv, it is then expected that resultant data set would be more suitable for a recursive estimation. The optimal solution would be a permutation of a sequence of number (1, …, N). Find the right permutation is like to solve a travelling salesman problem to find the shortest path traveling through all cities and visiting each city only once. The optimization problem is NPcomplete. In this work, a suboptimal solution is pursued rather than the exact optimal solution. The suboptimal solution is the result of a greedy procedure (Cormen, Leiserson, Rivest & Stein, 2001), where minimization of pv is decomposed into N1 simpler minimization problems. ( ( ) ( ) ( ) ) ( ) ( ) ( ) min 2 3 min 2 min 3 min N N + + + ≤ + + + θ θ θ θ θ θ L L (4.19) where N1 minimization problems are slightly dependent to each with dependence in every two consecutive tasks. The greedy procedure is then conduced as below. Assuming θ(1) is known, then θ(2) is searched for the problem of min θ(1)θ(2), which in turn determines θ(2). Subsequently, θ(2)θ(3) is minimized and θ(3) is determined. The procedure stops when θ(N) is determined. Two fundamental steps are involved in this procedure, determination of θ(1) and solving the problem of min θ(k1)θ(k) to determine θ(k). 39 With known θ(k1), the problem of min θ(k1)θ(k) is fully expanded as below ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 1 0 1 0 min 1 min , , , , , min min T ny nu ny nu i j i j k k a k a k b k b k a k b k = = − − = ≤Σ +Σ θ θ L L (4.20) The bound is due to the triangular inequality. The minimization of θ(k1)θ(k) is then translated to minimize ny+nu+1 smaller objectives simultaneously. Given a timevarying model, the parameters ai(k) and bj(k) can be expressed as functions of all states ( ) ( ) ( ) ( ) ( ) 1 , , , , , i i y t y t ny a k a u t d u t nu d − − = − − − L L (4.21) The expression for bj(t) is similar. Note that the indices are different in both sides of Equation (4.21), which simply means that the kth sample in the optimal result is the tth sample in time order. If ai(k1) is known and its correspondence sample in time order is τ. ( ) ( ) ( ) ( ) ( ) 1 , , , 1 , , i i y y ny a k a u d u nu d τ τ τ τ − − − = − − − L L The exact functional form of ai is unknown. If its continuity and differentiability are assumed and its high order derivatives are assumed to be negligible, the difference between ai(k1) and ai(k) could be approximated b ( ) ( ) ( ) ( ( ) ( )) ( ) ( ( ) ( )) 0 0 1 0 1 ny i i i t i nu i t j a a k a k y i y t i y t i a u j d u t j d u t j d τ τ = = ∂ − − ≈ − − − ∂ − ∂ + − − − − − ∂ − − Σ Σ (4.22) If the first order derivative is bounded by a constant Gai, the minimization of ai(k1)  ai(k) could be approached by 40 min ( 1) ( ) min ( ) ( ) i i i t a k a k Ga t τ τ ≠ − − ≤ x − x (4.23) where, x is defined in Equation (4.11) and t becomes the decision variable. Since the functional form is uniformly assumed for all parameter functions, the solution of Problem (4.23) will simultaneously minimize the all upper bounds. In this work, the Euclidean norm is used and described by ( ) ( ) ( ( ) ( )) ( ( ) ( )) 2 2 2 1 0 ny nu i i τ t y τ i y t i u τ d i u t d i = = x − x = Σ − − − +Σ − − − − − (4.24) A nearest neighbor will define the solution for Problem (4.20). The solving procedure is then termed as Sequential Nearest Neighbor Rearrangement (SNNR). The resultant regressor and output are labeled as xsnnr and ysnnr . The rearrangement starts letting xsnnr(1) = x(1) and ysnnr(1) = y(1). If the nearest neighbor of xsnnr(1) is found to be x(t), x(t) and y(t) is then added to the rearranged data set by letting xsnnr(2) = x(t) and ysnnr(2) = y(t). Then the nearest neighbor of xsnnr(2) is found and added to the rearranged data set. The procedure continues until the xsnnr(N) is found. By conducting the SNNR, the raw data in timesequence is reorganized in spatialorder. The treatment is expected to reduce the parameter variation, which enables the choice of a larger forgetting factor, α which in turn improves the parameter estimates. The results of the SNNR procedure are the basis for the analysis in the following section for dynamic order determination. However, first is a demonstration of the impact of the SNNR procedure on parameter variation as well as recursive estimation. The demonstration is based on the deterministic nonlinear dynamic model in Equation (4.25) ( ) ( ) ( ) ( )3 2 1 1 1 1 y t y t u t y t − = + − + − (4.25) 41 Figure 4.2 shows the first 1000 out of 5000 samples generated from the deterministic model when u(t) is driven by a “skyline” function. Figure 4.2. Data generated from the model in Equaiton (4.25) The timevarying model parameters a1(t) and b0(t) are defined in Equation (4.4) and their variation over time is shown in Figure 4.3. Figure 4.3. Time varying parameters a1(t) and b0(t) in Equation (4.4) The parameter variation (pv) defined Equation (4.18) is then evaluted using the 0 200 400 600 800 1000 1 0 1 u(t) 0 200 400 600 800 1000 2 0 2 y(t) 0 200 400 600 800 1000 0 0.5 1 a1(t) 0 200 400 600 800 1000 0 0.5 1 b0(t) 42 Euclidean norm, ( ( ) ( )) ( ( ) ( )) 5000 2 2 1 1 0 0 2 1 1 t pv a t a t b t b t = = Σ − − + − − . The obtained pv is 99.25. The mean squared error (MSE) resulted from a recursive estimation on the timesequenced data is 0.0044. The SNNR operation is illustrated on a segment of data with 10 samples. The raw data in time sequence is shown in Table 4.1 indexed by t. Table 4.1. A segment of 10 data samples in time sequence t 1 2 3 4 5 6 7 8 9 10 y(t1) 0 0.2488 0.8683 0.7200 0.3775 1.1465 0.2815 0.1014 0.8542 0.1648 u(t1) 0 0.2076 0.7200 0.7603 0.3617 0.8668 0.0913 0.3199 0.7120 0.2645 The SNNR rearranged data is shown in Table 4.2 and indexed by k. The index t in Table 4.2 tracks the rearrangement and relates the kth data sample in Table 4.2 to its original position in Table 4.1. Two regressors in Table 4.2 are denoted by y1 and u1 rather than the timelagged notations in the original time sequence data set. Table 4.2. SNNR rearranged data for the timesequence data in Table 4.1 k 1 2 3 4 5 6 7 8 9 10 t 1 7 8 10 2 5 9 3 4 6 y1 0 0.2815 0.1014 0.1648 0.2488 0.3775 0.8542 0.8683 0.9385 1.1465 u1 0 0.0913 0.3199 0.2645 0.2076 0.3617 0.7120 0.7200 0.7603 0.8668 Figure 4.4 shows the first 1000 samples of SNNR rearranged data for the timesequenced data in Figure 4.2. It is observed that the abrupt transition between adjaent levels in Figure 4.2 for both u(t) and y(t) is replaced by a smooth transition in both u1 and y1 in Figure 4.4. 43 Figure 4.4. SNNR Rearranged regressors from the timesequence data in Figure 4.1 For the rearranged data, the varying parameters are redefined in terms of u1 and y1 ( ) ( ( ) ) 2 1 1 1 a k 1 y k − = + ( ) ( )2 0 1 b k = u k The variation of a1(k) and b0(k) is shown in Figure 4.5, which results in a parameter variation of 32.03, only about a third of that in the timesequence data. The mean squared error (MSE) resulted from a recursive estimation on the rearranged data is 0.0022, which is half of that in the timesequence data. 0 200 400 600 800 1000 1 0.5 0 0.5 1 y1 0 200 400 600 800 1000 1 0.5 0 0.5 u1 44 Figure 4.5. Varying parameters for the SNNR rearranged data. The same test and comparison is conducted on 6 deterministic models, their stochastic versions are defined in Equations (4.40~4.45). The results are summarized in Table 4.3. Table 4.3. MSE for a recursive estimation Time Sequence SNNR Model 1 0.0148 0.0112 Model 2 0.0486 0.0084 Model 3 0.0044 0.0025 Model 4 0.0022 0.0017 Model 5 0.0064 0.0034 Model 6 7.38e8 3.57e5 As observed in Table 4.3, SNNR is able to reduce the MSE in the Models 1~5. 0 200 400 600 800 1000 0.7 0.8 0.9 1 a1 0 200 400 600 800 1000 0 0.2 0.4 0.6 0.8 b0 45 Increase of MSE is however observed in the Model 6 test, where the tested model is linear. Therefore, the increase of MSE might signal the ineffectiveness of SNNR treatment and imply that the model is linear. Using this feature, one might use the SNNR to tell if a given process is linear or nonlinear. 4.1.4 Model Comparison Criterion The methodology for determination of dynamic orders could be trying different sets of ny, nu and d and find the best values. Given a set of ny, nu and d, regressors are determined first on the original timesequenced data, x(t). A SNNR is then conducted on x(t) and y(t) producing xsnnr(t) and ysnnr(t), to which an exponential weighting recursive estimation will be applied. The quality of the hypothesized ny, nu and d will then be evaluated by a criterion considering both fitting and generalization performance. In this work, the evaluation is based upon a modified final prediction error (FPE) criterion. The original FPE (Ljung, 1999) is defined for a linear model with N samples by 2 ( ) 1 1 ˆ t, N N t N np FPE N np N ε = + = − Σ θ (4.26) Equation (4.26) can be interpreted as a weighted mean squared error where the weighting is determined by N, the size of data set as well as the model complexity, np, the number of parameters. The FPE criterion results from the performance index 2 ( ) 1 t, ˆ N N N t V ε = =Σ θ (4.27) In application to exponentiallyweighted recursive estimation, the definition of FPE is modified according to the exponentially weighted performance index t 2 ( ) t 1 t, ˆ k k k k V α − ε = =Σ θ (4.28) where Vk is varying, and progressively includes more data. The weighting factor, αkt would become very small for longpast data sets, making the remote error 46 inconsequential in estimating θk. A critical number L is hence introduced to decompose Vk as below ( ) ( ) ( ) t 2 t 2 t 1 t 1 t 2 t 1 t, ˆ t, ˆ t, ˆ k L k k k k k k k L k k k k L V α ε α ε α ε − − − = = − + − = − + = + ≈ Σ Σ Σ θ θ θ (4.29) where, Vk is approximated by its recent portion. By this approximation, the number of data involved in Vk is a constant, L. Subsequently, the FPE based on Vk is redefined ( ) t 2 ( ) t 1 1 ˆ t, k k k k L L np FPE k L np L α − ε = − + + = − Σ θ (4.30) where, the implicit constraints on t by kL+1≥1 and k≤N bound k between L and N. The average of FPE(k) over all k is then defined ( ) t 2 ( ) t 1 1 1 t, ˆ 1 N k L N k k k k L k L FPE FPE k N L L L np N L L np α ε = − = = − + = − + + = − + − Σ Σ Σ θ (4.31) where, the double sum is decomposed into three parts after being switched ( ) ( ) ( ) ( ) 1 t 1 t 2 t 2 t 1 t 1 1 t 1 t 2 t 2 t t t 2 t t, ˆ t, ˆ t, ˆ t, ˆ N k L L k k k k k L k L k L N L L N N k k k k L k N L k α ε α ε α ε α ε − + − − − = = − + = = − + + − − − = = = − + = = + + Σ Σ Σ Σ Σ Σ Σ Σ θ θ θ θ (4.32) The recursive estimation works well if parameter variation within a local range is assumed to be small t 1 t 2 t ˆ ˆ ˆ +L− +L− θ ≈ θ ≈L ≈ θ (4.33) which in turn results in the following approximation 47 2 ( ) 2 ( ) 2 ( ) t 1 t 2 t t, ˆ t, ˆ t, ˆ L L ε ε ε + − + − θ ≈ θ ≈L ≈ θ (4.34) The double sum is then simplified to ( ) ( ) ( ) ( ) 1 t 1 t 2 2 t t 1 t 1 1 t 1 2 t 2 t t t t 2 t t t, ˆ t, ˆ t, ˆ N k L L k k t k L k L k L N L L N N k k t t L k N L k α ε ε α ε α ε α − + − − − = = − + = = − + + − − − = = = − + = = + + Σ Σ Σ Σ Σ Σ Σ Σ θ θ θ (4.35) If N is large, the second part dominates, which results in a further simplified average FPE as ( ) 1 1 0 2 t t, ˆ 1 L k N L k t L L L np FPE N L L np α ε − − + = = + ≈ − + − Σ Σ θ (4.36) The average FPE in Equation (4.36) is similar to the original one in Equation (4.26), and has the same interpretation as a weighted prediction error, except that the weighting is different. Once L is chosen, the first term on the righthand side of Equation (4.36) is a constant. Then Equations (4.26) and (4.36) are similar, with L the data window length, replacing N, the total number of data. A simplified FPE in Equation (4.37) is used in this work and will continue to be denoted as FPE ( ) 1 2 , ˆ N L t t L L np FPE t L np ε − + = + = − Σ θ (4.37) The value of L is related to the decomposition by Equation (4.29) and determined by considering αL small enough to be negligible. In this work, L is determined as below 4 1 L α = − (4.38) where (1α)1 is termed as memory timeconstant (Ljung, 1999). As shown in Figure 4.6 , the specification of L in Equation (4.29) will ignore the past data with weights less than 48 0.02. Additionally, the number α4/(1α) remains relatively constant between 0.016 and 0.018 if α is over 0.9, which is a common choice for a forgetting factor. Figure 4.6. α vs. α4/(1 α) (the weight for the most remote data) 4.1.5 Regressor Selection Procedure Given several sets of ny, nu and d, their FPEs are evaluated. The set with the minimum FPE on SNNR data is reported including the determined orders. The determination procedure could be conducted in an exhaustive approach for all possible combinations of different ny, nu and d given predefined max_ny, max_nu and max_d for possible maximum ny, nu and d. The pseudocode for the exhaustive search is shown in Figure 4.7. Figure 4.7. Exhaustive dynamic order search 0.014 0.015 0.016 0.017 0.018 0.019 0.9 0.92 0.94 0.96 0.98 1 α4/(1α) α Exhaustive order selection for ny = 1 to max_ny for nu = 0 to max_nu for d = 1 to max_d Compute FPE(ny,nu,d) Keep the minmum FPE end loop end loop end loop return the ny,nu and d with minimum FPE 49 One concern with the exhaustive search is the computational burden. The pay off of the expensive exhaustive search is optimality of the final solution. Suboptimal search techniques are available in a subset selection for linear regression. Subset selection methods include forward selection, backward elimination, cycling replacement as well as heuristic combinatorial search (Miller, 1990). For linear regression problems, one could fully exploit the superposition feature in a linear model to simplify a search. It explains that subset selection method is always accompanied by orthogonalization. An orthogonalization procedure removes the redundant components of two regressors and eliminates the candidate regressors that are highly correlated with selected regressors. In nonlinear systems with unknown nonlinearity, orthogonalization is not possible. However, it does not mean that the subset selection is inapplicable. In this work, a forward selection procedure combing the above mentioned recursive estimation on spatially ordered data is proposed to find important regressors. The procedure starts with users’ input max_ny, max_nu and max_d. Then, a number of candidate regressors are generated and denoted as [x1 x2 x3 … xm xrandom]. xrandom is a random regressor that presumably contains no meaningful information to predict output. At first, m+1 FPEs are computed for (y,[x1]), (y, [x2]), … , (y, [xm]), (y, [xrandom]), where y is the output and xi in bracket is the regressor in consideration. The regressor with the minimum FPE is selected. If x2, for instance, is the first selected regressor, there will be other m FPEs to be evaluated for (y, [x2, x1]), (y, [x2, x3]), …, (y, [x2, xm]), (y, [x2, xrandom]). Each bracket contains a combination of x2 (first selected) with the rest. The regressor combination with the minimum FPE is then kept. The selection continues until the minimum FPE increases or the xrandom is selected. The injection of a random regressor is mentioned in (Miller, 1990) as a stopping criterion. The selection of xrandom signifies that the rest of candidates are less influential on y(t) than a presumably irrelevant one. The selected regressors might define values of ny, nu and d if selected regressors are consecutive due to implicit constraint on the model structure in Equation (3.1), which requires consecutive regressors. For instance, a set of regressors [y(t1), y(t2), u(t1), u(t 2)] defines ny=2, nu=1, and, d=1. Absences, however, could exist in selected regressors such as [y(t1) y(t4) u(t1) u(t3)], which does not correspond a set of ny, nu and d. 50 It seems unlikely in most situations that y(t2), y(t3) and u(t2) should not be included. However, if there are strong correlations or recycle phenomena, those missing variables may be redundant, and the particular selection may not be unique. Another realization of excitement and noise, might select another subset from the correlated variables. The inclusion of redundant variables increases the model complexity. However, for database management simplicity, in this work, if the situation with absence occurs, a further comparison is executed on different order values. For the illustrated example, an exhaustive comparison is conducted on possible values of ny=1, 2, 3 or 4 combined the possible values of nu=0, 1, or 2, with d = 1. However, the extra computation would be unnecessary if the constraint on having consecutive regressors is dropped. 4.2 Nonlinear Component Detection There is an implicit assumption made on the above SNNR operation. The timevarying parameters are functions of all regressors. The assumption is valid for the dynamic model in Equation (4.2), where parameters are functions of two regressors, u(t 1) and y(t1). The model in Equation (4.6) has regressors y(t1), y(t2) and u(t1). However the parameters a1 and a2 are functions of only y(t1) and y(t2). The regressor u(t1) has no impact on parameter variation. It is then expected that the SNNR on [y(t1) y(t2)] might reduce more parameter variation than operating SNNR on [y(t1) y(t2) u(t 1)]. The further reduction in parameter variation should be revealed by a smaller MSE resulted from a recursive estimation. An extension of the SNNRbased order determination procedure is the used to detect the regressors that are affecting the output nonlinearly. The detected regressors are termed as nonlinear components and to be used as antecedent variables in Chapter 5. The purpose of conducting SNNR is to reduce parameter variation so that the recursive estimation is able to capture the variation better, which in turn, results in a smaller MSE. The SNNR mentioned above rearranges data based on all the regressors in order to compare different sets of ny, nu and d. However, it is possible that only a subset of regressors is affecting timevarying parameters. The subset is denoted by [c1,...,cnc]. It is a subset of selected regressors denoted by [x1,…,xnx]. The regressors not included in 51 [c1,...,cnc] have no affect on parameter variation. It is then expected that a SNNR on [c1,...,cnc] only would be able to reduce more parameter variation and produce an smaller MSE. There are totally 2nx1 subsets in [x1,…,xnx] excluding the empty one. Each subset from [x1,…,xnx] is considered as a candidate set of nonlinear components, [c1,...,cnc], on which the SNNR is conducted and a corresponding MSE is computed. The subset with minimum MSE is reported to contain the nonlinear components. 4.3 Extension to MIMO Processes Extending the above technique to a MIMO(m,n) (m inputs and n outputs) process is straight forward. The SISO model in Equation (3.1) is expanded as below for the kth output by including more regressors. ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 1 1 1 1 1 1 1 1 1 1 1 1 , , , , , , , , , , k k kk y y k k k y y k k n kn kn kn k u u k k k u u m km m km km y t y n ny y t d y t ny d y t f y t d y t ny d e t u t d u t nu d u t d u t nu d − − − − − = − − − + − − − − − − L L L L L L L (4.39) where dynamic orders include nyk1, .., nykn and nuk1, …, nukm, and delay 1y , , y k kn d L d between yk and other outputs as well as delay 1u , , u k km d L d between yk and all inputs. All of these numbers are to be determined using the above method for the single output case. The nonlinear components for yk are then selected after orders are determined. 4.4 Simulations and Discussions 4.4.1 Testing Models and Processes The proposed order determination and nonlinear component detection method are tested on data generated by several nonlinear dynamic models, an experimental unit and a distillation column simulator. The first five models are nonlinear autoregressive with exogenous inputs models (NARX). They are different in terms of nonlinear interactions 52 between inputs and outputs. Model 1 has nonlinearity only in the lagged input, u(t1). Model 2 is nonlinear in lagged output only. Model 3 is nonlinear in both lagged input and output, u(t1) and y(t1). Model 4 is also nonlinear in both lagged input and output but have more regressors included than Model 3. Like Model 1, Model 5 is another model with nonlinearity in the lagged input, u(t1). The nonlinear function with respect to u(t1) is, however, different in both models. Model 6 is a linear ARX model used only once to demonstrate the impact of SNNR on recursive estimation with result in Table 4.3. The input signals used in the first five models are generated by a skyline function and bounded between 1 and 1. The shortest and longest durations are 20 and 50 samples respectively. Output signals are initialized as zeros. The noise e(t) is subject to a normal distribution, N(0,σ2). The value of σ is different in each model and specified such that e(t) has a small magnitude compared to outputs. As below, a portion of inputoutput data for the first five models is illustrated along with model equations. A total of 5000 samples are generated and used in order determination and nonlinear component detection. Model 1 (Narendra & Parthasarathy, 1990) ( ) ( ) ( ) ( ( )) ( ( )) ( ( )) ( ) 0.3 1 0.6 2 0.6sin 1 0.3sin 3 1 0.1sin 5 1 y t y t y t u t u t u t e t π π π = − + − + − + − + − + (4.40) where e(t)~N(0,0.52) Figure 4.8. Inputoutput data generated for Model 1 in Equation (4.40) 0 100 200 300 400 500 600 700 800 900 1000 1 0 1 t u 0 100 200 300 400 500 600 700 800 900 1000 10 0 10 t y 53 Model 2 (Narendra & Parthasarathy, 1990) ( ) ( ) ( )( ( ) ) ( ) ( ) ( ) ( ) 2 2 1 2 1 2.5 1 1 1 2 y t y t y t y t u t e t y t y t − − − + = + − + + − + − (4.41) where e(t)~N(0,0.52) Figure 4.9. Inputoutput data generated for Model 2 in Equation (4.41) Model 3 (Narendra & Parthasarathy, 1990) ( ) ( ) ( ) ( ) ( ) 3 2 1 1 1 1 y t y t u t e t y t − = + − + + − (4.42) where e(t)~N(0,0.52) Figure 4.10. Inputoutput data generated for Model 3 in Equation (4.42) 0 100 200 300 400 500 600 700 800 900 1000 1 0 1 t u 0 100 200 300 400 500 600 700 800 900 1000 5 0 5 t y 0 100 200 300 400 500 600 700 800 900 1000 1 0 1 t u 0 100 200 300 400 500 600 700 800 900 1000 2 0 2 4 t y 54 Model 4 (Narendra & Parthasarathy, 1990) ( ) ( ) ( ) ( ) ( )( ( ) ) ( ) ( ) ( ) ( ) 2 2 1 2 3 2 3 1 1 1 3 2 y t y t y t u t y t u t y t e t y t y t − − − − − − + − = + + − + − (4.43) where e(t)~N(0,0.052) Figure 4.11. Inputoutput data generated for Model 4 in Equation (4.43) Model 5 (Narendra & Parthasarathy, 1990) y (t ) = 0.8y (t −1) + (u (t −1) − 0.8)u (t −1)(u (t −1) + 0.5) + e (t ) (4.44) where e(t)~N(0,0.12) Figure 4.12. Inputoutput data generated for Model 5 in Equation (4.44) 0 200 400 600 800 1000 1 0 1 t u 0 200 400 600 800 1000 1 0.5 0 0.5 1 t y 0 100 200 300 400 500 600 700 800 900 1000 1 0 1 t u 0 100 200 300 400 500 600 700 800 900 1000 1 0 1 t y 55 Model 6: y (t ) = 0.8y (t −1) + 0.6y (t − 2) + 0.4u (t −1) (4.45) Models 7 and 8 are two deterministic nonlinear dynamic models. Model 7 ( ) ( ) ( )2 y t = 0.8y t −1 +u t −1 (4.46) Model 8 y (t ) = 0.8y (t −1) + cos (π u(t −1)) (4.47) Different from Models 15, Model 7 has a quadratic term u(t1), where u(t) is also generated by a “skyline” function between 1 and 1. The effect of u(t) on y(t) would be missed in average. As below, Equation (4.48) is the linear timevarying model for Model 7 with a1(t)=0.8 and b0(t)=u(t1). In average, the effect of u(t1) in Equation (4.48) is reflected by E(b0(t)). In this case, E(b0(t)) is 0 since u(t) is a random signal between 1 and 1. ( ) ( ) ( ) ( ) ( ) 1 0 y t = a t y t −1 + b t u t −1 (4.48) Therefore, the regressor u(t1) would be missed if a recursive estimation is conducted in time sequence, where b0(t) is a random number between 1 and 1 in time sequence The recursively estimated b0(t) would be wandering around zero. However, the proposed SNNR is able to reveal the impact of u(t1) on model output. By rearrangement, the randomness in u(t1) is eliminated. Consequently, the varying parameter, b0, is no longer a random variable but gradually increases from 1 to 1. A recursive estimation on the rearranged data is then able to reflect the impact of u(t1) on y(t). Model 8 has a quadraticlike term cos(πu(t1)) and will be used to test the proposed order determination technique. Model 9 in Equation (4.49) is used to demonstrate the nonuniqueness of obtained result as discussed in Section 4.1.1. By observing Equation (4.49), the nonlinear component could be either y(t1) or y(t2). A detailed test will reveal the observation. 56 Model 9 y (t ) = 0.2y (t −1) y (t − 2) + u (t −1) (4.49) Models 10 and 12 are deterministic nonlinear autoregressive (NAR) models in (Molina, Sampson, Fitzgerald & Niranjan, 1996) and used for method comparison. Models 11 and 13 are derived from Models 10 and 12 with noise added to the output and used to compare the influence of noise on different methods. The noise e(t) in Models 11 and 13 has a small magnitude compared to output signals and is subject to N(0,σ2), where σ2 is set to about one thousandth of the average magnitude of output signal in the corresponding deterministic models. Model 10 (Molina, Sampson, Fitzgerald & Niranjan, 1996) y (t ) = 4y (t −1)(1− y (t −1)) (4.50) Model 11: ( ) ( )( ( )) ( ) ( ) ( ) 4 1 1 1 o o o o y t y t y t y t y t e t = − − − = + (4.51) where e(t)~N(0,0.02252) Figure 4.13. Data generated for Model 10 in Equation (4.50) 0 10 20 30 40 50 60 70 80 90 100 0 0.2 0.4 0.6 0.8 1 t y 57 Model 12 (Molina, Sampson, Fitzgerald & Niranjan, 1996) y (t ) = 1−1.4y (t −1)2 + 0.3y (t − 2) (4.52) Model 13: ( ) ( ) ( ) ( ) ( ) ( ) 2 1 1.4 1 0.3 2 o o o o y t y t y t y t y t e t = − − + − = + (4.53) where e(t)~N(0,0.02722) Figure 4.14. Data generated for Model 12 in Equation (4.52) Model 14: Twophase flow process Figure 4.15 shows an experiment setup of a twophase flow process in the unit operation lab in the School of Chemical Engineering at Oklahoma State University. This unit is managed by a laboratory scale distributed control system, Camile. The schematic diagram of the process is shown in Figure 4.16. Both bottom and top pressures of the vertical pipe are measured. There are two air flow supplies labeled as ‘Small air’ and ‘Large air’ in Figure 4.16. Air from the two pipes merges and flows to a T, whose outlet end is connected to the bottom of the vertical pipe. The other inlet end of the T is connected to the water pipe labeled as ‘water’ in Figure 4.16. 0 10 20 30 40 50 60 70 80 90 100 1 0.5 0 0.5 1 1.5 t y 58 In this work, this unit is used to study the dynamics between mixed air & water and the pressure drop across the vertical pipe. Experiment is conducted in an open loop and only the air valve opening (‘Large air’ pipe) is manually changed. The ‘Small air’ pipe is closed. The ‘water’ pipe is controlled at 20 lbmol/hr. The measurements of the water flowrate in the ‘water’ pipe are shown in Figure 4.17. Figure 4.15. The twophase flow experiment setup Pressure transducer “T” Solenoid valve Pressure tap Water Small air Large air 59 Figure 4.16. The schematic diagram for the two phase flow experiment Figure 4.17. Water flowrate measurements with set point at 20 lbmol/hr 0 500 1000 1500 2000 2500 3000 3500 4000 4500 18 19 20 21 22 t water 60 The process could be defined differently by taking signals from different channels. Figure 4.18(a) shows a possible choice. The input, u is chosen to be the measurement of the air flowrate. The output, y is the measuremtn of pressure drop, the difference between top and bottom pressure shown in Figure 4.16. A portion of 4500 measurements are displayed in Figure 4.18(b). There are totally 8830 measurements are recorded. Although the control interval was 0.1 second, the sampling rate for this data was chosen as 0.5 second. (a) (b) Figure 4.18. A choice of input and output channels; input, u is the measurement of air flowrate and output, y is the pressure drop measurement. a) The flowchart; b) The corresponding input and output data As observed in Figure 4.18(b), the pressure drop measurement at low values is 0 500 1000 1500 2000 2500 3000 3500 4000 4500 0 10 20 30 t u 0 500 1000 1500 2000 2500 3000 3500 4000 4500 0 100 200 300 t y 61 noisier than at high values of y. A firstorder filter is added in the data acquisition and control devise ( a Camile 2000 unit) to suppress some noise in y for observation convenience. With the filter included, Figure 4.19(a) shows another possible process definition. The input, denoted as us is the command signal for the air valve opening, which as shown precedes the air flowrate measurement. The output becomes the filtered pressure drop measurement and denoted as yf. The data is shown in Figure 4.19(b). (a) (b) Figure 4.19. A choice of input and output channels; input, us is the signal to the air valve opening and output, yf is the filtered pressure drop measurement. a) The flowchart; b) The corresponding input and output data 0 500 1000 1500 2000 2500 3000 3500 4000 4500 0 50 100 t u 0 500 1000 1500 2000 2500 3000 3500 4000 4500 0 100 200 t y 62 Model 15: Binary distillation column Model 15 is a methanolwater binary distillation column simulator (Ou & Rhinehart, 2002) modified to have 20 trays. The distillation column simulator is a MIMO process. Two inputs are reflux flowrate (gmol/hr), u1, and reboiler heating percentage (TY%), u2. The sample interval is 30 seconds. The reflux flowrate varies between 50 and 90 (gmol/hr) and heating percentage is between 40% and 55%. The duration time for each step change randomly varies between 0.05 and 1 hour. The first 1000 samples of inputs are illustrated in Figure 4.20. Figure 4.20. Reflux flowrate (solid line) and reboiler heat rate (dash line) Inputs to the distillation column Two outputs, y1 and y2, are the overhead and bottom concentrations of methanol xD and xB, in mole faction. The first 1000 output samples are shown in Figure 4.21. 30 40 50 60 70 80 90 0 200 400 600 800 1000 time (sampling numbers) Reflux (lbmol/hr) TY (%) 63 Figure 4.21. The xD (solid line, left scale) and xB (dash line, right scale) in distillation column experiments 4.4.2 Testing on Dynamic Order Determination An example is presented at first to demonstrate the details in order determination. The example is based on the deterministic version of Model 2. The first 1000 data samples are shown in Figure 4.22 and a total 5000 data are generated and used for the order determination. Figure 4.22. Data generated for the determinist version of model in Equation (4.42) In using forward selection to select important regressors, the maximum ny, nu and 0 0.02 0.04 0.06 0.08 0.1 0 0.2 0.4 0.6 0.8 1 1.2 0 200 400 600 800 1000 time (sampling numbers) xD xB 0 100 200 300 400 500 600 700 800 900 1000 1 0 1 t u 0 100 200 300 400 500 600 700 800 900 1000 0 2 4 t y 64 maximum d are set to 5, 4 and 1 respectively. The selection procedure is collected in Table 4.4. Regressor forward selection for data in Figure 4.22 Step y(t1) y(t2) y(t3) y(t4) y(t5) u(t1) u(t2) u(t3) u(t4) u(t5) random 1 0.0516 0.1248 0.2360 0.3572 0.4907 0.4246 0.3697 0.3243 0.3323 0.3661 3.348 2 0.0408 0.0390 0.0400 0.0429 0.0131 0.0371 0.0411 0.0510 0.0534 0.0551 3 0.0045 0.0060 0.0082 0.0146 2.1E+56 0.0117 0.0089 0.0114 0.0193 4 0.0057 0.0073 0.0055 0.0043 0.0093 0.0035 0.0059 0.0098 5 0.0068 0.0042 0.0062 0.0062 1.9E+10 0.0043 0.0105 In Table 4.4, there are 11 regressors including 10 timelagged regressors and a random regressor, the last one. At the first run, all 11 regressors are tried one by one. Their corresponding FPEs are recorded in the first row. Among them, the one with the smallest FPE at 0.0516 is chosen, and the related regressor is y(t1). In the next step, the selected regressor, y(t1) is combined with the rest of 10 regressors. The results of 10 trials are in row 2, where the minimum FPE is due to u(t1) at 0.0131. The blank for y(t 1) in row 2 only indicates that y(t1) has been included. Continuing on this procedure, we then need have both y(t1) and u(t1) included and try their combinations with rest of 9 candidate regressors. The next minimum FPE is 0.0045 for y(t2). Then y(t2) is included. The next discovery is u(t4) with FPE at 0.0035. At the fifth step, the minimum FPE is 0.042, which is however greater than the previous minimum FPE of 0.035. The increase in FPE signals to terminate the forward selection. The above forward selection selects the four regressors [y(t1) y(t2) u(t1) u(t4)]. In theory, one could create an arbitrary model including these regressors. In practice, it is however unlikely to exclude u(t2) and u(t3) while having u(t4) is included. In addition, the objective of this work is to determine dynamic orders, ny and nu. In order to include u(t4), nu and d should be set to 3 and 1 respectively. This configuration however contains additional regressors u(t2) and u(t3), which are however rejected by the forward selection. In this work, a minor exhaustive search is conducted to compare different values for several values for nu, 0, 1, 2 and 3 with fixed ny at 2 and d at 1. The result is collected in Table 4.5, where the best value for nu is 0 with the minimum FPE of 65 0.0045. Therefore, the determined regressors are [y(t1) y(t2) u(t1)] with ny=2, nu=0, d=1. Table 4.5. Exhaustive search on nu with ny=2, d=1 for data in Figure 4.22 nu 0 1 2 3 FPE 0.0045 0.0275 0.0066 0.0378 The above order determination procedure by a forward selection followed by a minor exhaustive search uses SNNR rearranged data in recursive estimation. To reveal the impact of SNNR on order determination, the forward selection procedure is repeated for the same data set without using SNNR. The details of selection are collected in Table 4.6. The selected regressors are y(t1), u(t1) and u(t2). No minor exhaustive search is needed. Compared the model definition in Equation (4.2), the result misses y(t1) while find u(t2) that is not presented in the deterministic model. We simply state that the result include two ‘mistakes’. Table 4.6. Regressor forward selection for data in Figure 4.22 using timesequence data Stepy(t1) y(t2) y(t3) y(t4) y(t5) u(t1) u(t2) u(t3) u(t4) u(t5) random 1 0.0526 0.1260 0.2367 0.3611 0.5010 1.6116 1.5373 1.4841 1.4906 1.5271 5.7726 2 0.0556 0.0521 0.0533 0.0530 0.0462 0.0557 0.0634 0.0619 0.0588 0.0555 3 0.0515 0.0508 0.0506 0.0502 0.0301 0.0501 0.0493 0.0493 0.0484 4 0.0352 0.0325 0.0322 0.0317 0.0402 0.0401 0.0372 0.0316 The order determination method is also applied to other deterministic models, deterministic versions of models in Equations (4.40~4.44). The results are summarized in Table 4.7, which also include the results using original timesequence data for comparison. 66 Table 4.7. Regressors determined for deterministic versions of Models 15 Model Time Sequence SNNR Truth 1 y(t1)y(t2)y(t3) y(t1)y(t2)y(t3)u(t1) y(t1)y(t2)u(t1) 2 y(t1)u(t1)u(t2) y(t1) y(t2) u(t1) y(t1)y(t2)u(t1) 3 y(t1) u(t1) y(t1)u(t1) y(t1) u(t1) 4 u(t1) y(t1)y(t2)y(t3)u(t1) y(t1)y(t2)y(t3)u(t1)u(t2) 5 y(t1)y(t2) y(t1)u(t1) y(t1)u(t1) As observed in Table 4.7, both approaches are tied in the Model 3 test. In the Model 1 test, the ‘Time Sequence’ misses u(t1) but adds y(t3), making 2 mistakes, while the ‘SNNR’ adds y(t3), making 1 mistake. In the Model 2 test, the ‘Time Sequence’ misses y(t2) but adds u(t2), making 2 mistakes. The ‘SNNR’ makes 1 mistakes in the Model 4 test while ‘Time Sequence’ makes 4 mistakes by finding only u(t1). In the Model 5 test, ‘Time Sequence’ adds y(t2) but misses u(t1). For the first 5 tests, the ‘SNNR has 2 mistakes while the ‘Time Sequence’ makes 10 mistakes. Illustrated by this comparison, neither approach is perfect, but the ‘SNNR’ outperforms the ‘Time Sequence’ in terms of number of mistakes made. Tables 4.8 collects the comparison results using timesequence and SNNR rearranged data for stochastic models in Equations (4.40~4.44) with example data shown in Figures 4.8~4.12. Table 4.8. Regressors determined for Models 15 Model Time Sequence SNNR Truth 1 y(t1)y(t2)u(t1) y(t1)y(t2)u(t1) y(t1)y(t2)u(t1) 2 y(t1)y(t2)y(t3)u(t1) y(t1) y(t2) u(t1) y(t1)y(t2)u(t1) 3 y(t1) u(t1) y(t1)u(t1) y(t1) u(t1) 4 u(t1) y(t1)y(t2)y(t3)u(t1) y(t1)y(t2)y(t3)u(t1)u(t2) 5 y(t1)u(t1) y(t1)u(t1) y(t1)u(t1) Observed in Table 4.8, the ‘SNNR’ performs better with 1 mistake while ‘Time 67 Sequence’ makes 5 mistakes. It is also observed that only the result for the Model 1 is different in both Tables 4.7 and 4.8 for the ‘SNNR’ while the results for Models 1, 2, 3 and 5 are different in both tables for the ‘Time Sequence’. It seems that the result due to ‘SNNR’ is less influenced by the additional noise than the ‘Time Sequence’. It might be difficult to draw a general conclusion on the observation. Intuitively, the noise term will affect how model parameters vary, which in turn affects the performance of recursive estimation. Consequently, the order determination results, which are based on recursive estimation, should also be affected. On the other hand, the additional parameter variation after the noise being injected could be attenuated by the ‘SNNR’, which reduces the influence of noise on parameter variation then subsequently on order determination. The details of regressor selection for Models 78 are shown in Tables 4.9 and 4.10, where an extra regressor y(t2) is found for each. It implies that the regressor y(t2) has influence on y(t). In Equations (4.46) and (4.47), although y(t) is not directly related to y(t2), the regressor y(t2) is able to affect y(t) via y(t1). More importantly, Tables 4.9 and 4.10 show that the regressor u(t1) is found for both models. Table 4.9. Regressor selection for Model 7 y(t1) y(t1)u(t1) y(t1)u(t1)y(t2) y(t1)u(t1)y(t2)y(t4) FPE 0.01370.0028 0.0025 0.0032 (Stop) Table 4.10. Regressor selection for Model 8 y(t1) y(t1)y(t2) y(t1)y(t2)u(t1) y(t1)y(t1)u(t1)y(t4) FPE 0.07870.0293 0.0188 0.0222 (Stop) The proposed order determination is also compared to the geometric method .(Molina, Sampson, Fitzgerald & Niranjan, 1996) The testing is conducted on Models 1013, and results are summarized in Table 4.11. As observed, the geometric method is able to extract correct orders for deterministic nonlinear AR models while performs poorly with the presence of additive noise. The geometric method makes a total of four mistakes for both Models 11 and 13. The proposed order determination makes 68 one mistake for Model 12. Table 4.11. Regressors determined for Models 1013 Model SNNR Geometric Truth 10 y(t1) y(t1) y(t1) 11 y(t1) y(t1) y(t2) y(t3) y(t1) 12 y(t1) y(t2) y(t3) y(t1) y(t2) y(t1)y(t2) 13 y(t1) y(t2) y(t1) y(t2)y(t3)y(t4) y(t1)y(t2) The dynamic order determination is app
Click tabs to swap between content that is broken into logical sections.
Rating  
Title  Generalized Rule Antecedent Structure for Tsk Type of Dynamic Models: Structure Identification and Parameter Estimation 
Date  20091201 
Author  Su, Ming 
Keywords  dynamic models, dynamic order determination, fuzzy systems, nonlinear components, system identification, TSK 
Department  Electrical Engineering 
Document Type  
Full Text Type  Open Access 
Abstract  A novel rule antecedent structure is proposed to generalize TSK type of dynamic fuzzy models to deal with the problem of curse of dimensionality in conventional TSK fuzzy models. The proposed antecedent structure uses only nonlinear variables, which directly reduce the number of possible rules by reducing antecedent dimension. Additionally, one more degree of freedom is introduced to design antecedents to cover an antecedent space more efficiently, which further reduces the number of rules. The resultant GTSK model is identified in two stages. A novel recursive estimation based on spatially rearranged data is used to determine the consequent and antecedent variables. Model parameter values are obtained from partitioned antecedent space, which is the result of solving a series of splitting and regression problems 
Note  Dissertation 
Rights  © Oklahoma Agricultural and Mechanical Board of Regents 
Transcript  A GENERALIZED RULE ANTECEDENT STRUCTURE FOR TSK TYPE OF DYNAMIC MODELS: STRUCTURE IDENTIFICATION AND PARAMETER ESTIMATION By MING SU Bachelor of Science East China University of Science and Technology Shanghai, People’s Republic of China 2000 Master of Science East China University of Science and Technology Shanghai, People’s Republic of China 2003 Submitted to the Faculty of the Graduate College of the Oklahoma State University in partial fulfillment of the requirements for the Degree of DOCTOR OF PHILOSOPHY December, 2009 ii A GENERALIZED RULE ANTECEDENT STRUCTURE FOR TSK TYPE OF DYNAMIC MODELS: STRUCTURE IDENTIFICATION AND PARAMETER ESTIMATION Dissertation Approved: Dr. R. Russell Rhinehart Dissertation Adviser Dr. Karen A. High Dr. James R. Whiteley Dr. Martin T. Hagan Dr. Gary G. Yen Dr. A. Gordon Emslie Dean of the Graduate College iii ACKNOWLEDGMENTS I would like to express my sincere gratitude to my research advisor, Dr. R. Russell Rhinehart for his guidance, instruction, knowledge sharing and encouragement over years. He is ready all the time to discuss research problems and give advice. The conversations with Dr. Rhinehart have always been inspirational and joyful. His knowledge, analytical skills, creative thinking and sharp questions broaden my view on many problems. His advising is not limited to research problems. He taught me how to ask questions and sharpened my conversation and writing skills. He guided me how to approach a problem from scratch and organize time effectively to do research. I would also like to thank my committee members, Dr. James R. Whiteley, Dr. Karen A. High, Dr. Gary G. Yen, and Dr. Martin T. Hagan for their insightful comments and suggestions as well as warming support and encouragement. I thank Dr. Yen for the time and effort that he spent on helping me preparing academic papers. I thank Dr. Hagan for his help and time to guide me studying Neural Networks and System Identification, I want to thank Dr. A. H. Johannes from Chemical Engineering for his help over years to improve my presentation and teaching skills. I want to acknowledge Mr. Nittin Sharma, Mr. Pedro de Lima, Ms. Preetica Kumar, and Mr. Garov Aurora for their help and contribution to this project. I wish to extend my thanks to the Edward E. and Helen Turner Bartlett Foundation for financial support. Last, but not least I am thankful to my wife, Yumiao for her unconditional patience and support, and her absurd confidence in me. I am also thankful to my 2year old daughter, Serena, who has always been a comfort to me when I feel frustrated. My special appreciation goes to my parents, Wenxin Su and Minghua Du for their understanding, caring and love. iv TABLE OF CONTENTS Chapter Page I. INTRODUCTION ......................................................................................................1 II. REVIEW OF LITERATURE....................................................................................7 2.1 Literature Survey for Dynamic Order Determination ........................................7 2.2 Literature Survey for Fuzzy Model Structure ..................................................10 2.3 Literature Survey for Fuzzy Model Identification ...........................................12 2.3.1 Variable Selection ...................................................................................12 2.3.2 Fuzzy Model Identification .....................................................................13 III. A GENERALIZED RULE ANTECEDENT STRUCTURE .................................15 3.1 Model Complexity ...........................................................................................15 3.2 Antecedent Dimension .....................................................................................16 3.3 Antecedent Structure ........................................................................................18 3.3.1 A Generalized Antecedent Structure ......................................................18 3.3.2 Interpretation of the Proposed Structure .................................................21 3.4 SISO Models ....................................................................................................23 3.4.1 Model Parameters ...................................................................................24 3.4.2 Model Computation ................................................................................25 3.5 Extension to MIMO Models ............................................................................26 IV. DYNAMIC ORDER DETERMINATION AND NONLINEAR COMPOENT DETECTION .........................................................................................................30 4.1 Dynamic Order Determination ........................................................................31 4.1.1 Nonlinearity Representation ...................................................................31 4.1.2 Recursive Estimation for Time Varying Parameters ..............................35 4.1.3 Sequential Nearest Neighbor Rearrangement .........................................37 4.1.4 Model Comparison Criterion ..................................................................45 4.1.5 Regressor Selection Procedure ...............................................................48 4.2 Nonlinear Component Detection .....................................................................50 4.3 Extension to MIMO Processes.........................................................................51 4.4 Simulations and Discussions ...........................................................................51 4.4.1 Testing Models and Processes ................................................................51 4.4.2 Testing on Dynamic Order Determination .............................................63 4.4.3 Testing on Nonlinear Component Detection ..........................................70 v Chapter Page V. PARAMETER ESTIMATION FOR GTSK MODELS .........................................73 5.1 Parameter Estimation by Newton’s Method ....................................................73 5.1.1 A Constrained Optimization Problem .....................................................73 5.1.2 Interpretation of Local Optimal Solutions ..............................................78 5.1.3 Random Parameter Initialization ............................................................80 5.2 Parameter Estimation for MIMO GTSK Models.............................................83 5.3 Overview of the Proposed Parameter Initialization .........................................84 5.4 A Splitting and Regression Problem ................................................................88 5.4.1 Description of the Splitting and Regression Problem .............................88 5.4.2 SRP is Not a Clustering Problem ............................................................89 5.4.3 Analysis of the Splitting and Regression Problem .................................92 5.5 Solving of the Splitting and Regression Problem ............................................99 5.5.1 Initialization of Data Segregation ...........................................................99 5.5.2 Solving for a Linear Boundary .............................................................105 5.5.3 Boundary Refinement ...........................................................................109 5.5.4 Testing and Demonstration ...................................................................110 5.5.5 Comparison to Other Methods ..............................................................118 5.6 Extension to MultipleOutput Processes ........................................................125 5.7 Recursive Partition by Growing a Binary Tree ..............................................128 5.8 Removal of Insignificant Partitions by Trimming a Tree ..............................130 5.9 Rule Antecedent Parameter Estimation .........................................................135 VI. RESULTS FOR TESTING PROBLEMS ...........................................................138 6.1 Function Approximation ................................................................................138 6.2 Dynamic Fuzzy Modeling..............................................................................161 VII. SUMMARY, CONCLUSIONS AND FUTURE RESEARCH RECOMMENDATIONS .....................................................................................185 7.1 Summary ........................................................................................................185 7.2 Conclusions ....................................................................................................186 7.3 Future Research Recommendations ...............................................................188 REFERENCES ..........................................................................................................191 vi LIST OF TABLES Table Page 4.1 A segment of 10 data samples in time sequence ....................................................42 4.2 SNNR rearranged data for the timesequence data in Table 4.1 ............................42 4.3 MSE for a recursive estimation..............................................................................44 4.4 Regressor forward selection for data in Figure 4.22 ..............................................64 4.5 Exhaustive search on nu with ny=2, d=1 for data in Figure 4.22 ..........................65 4.6 Regressor forward selection for data in Figure 4.22 using timesequence data ....65 4.7 Regressors determined for deterministic versions of Models 15 .........................66 4.8 Regressors determined for Models 15 ..................................................................66 4.9 Regressors selection for Model 7 ...........................................................................67 4.10 Regressors selection for Model 8 .........................................................................67 4.11 Regressors determined for Models 1013 ............................................................68 4.12 Regressors determined for the two phase flow process .......................................68 4.13 Results of order determination for the distillation column ..................................69 4.14 Exhaustive search on ny for y2 ............................................................................69 4.15 Exhaustive search for nonlinear components for Model 1 ..................................70 4.16 Results for nonlinear component detection for Models 1~5 ...............................70 4.17 Exhaustive search for nonlinear components for Model 4 ..................................71 4.18 Details of nonlinear component detection for Model 9 .......................................72 4.19 Nonlinear components detected for the two phase flow process .........................72 4.20 Choices of nonlinear components for the distillation column .............................72 5.1 Solution for a SRP using different τ values .........................................................120 5.2 The number of rules and SSE resulted from different M ...................................130 5.3 The value αc for branch nodes shown in Figure 5.39 ..........................................134 6.1 Trials of antecedent variables for Model 1 .........................................................162 vii Table Page 6.2 Trials of a GTSK model for Model 1...................................................................162 6.3 Comparison of the GTSK with RB and FFNN for Model 1 ................................166 6.4 Trials of a GTSK model for Model 3...................................................................167 6.5 Trials of antecedent variables for Model 3 ..........................................................169 6.6 Trials of a GTSK model for Model 4...................................................................170 6.7 Trials of a GTSK model for Model 4 with all regressors included .....................172 6.8 Trials of a GTSK model for the two phase process .............................................174 6.9 Training and validation results for the GTSK model on y1 .................................176 6.9 Training and validation results for the GTSK model on y2 .................................180 viii LIST OF FIGURES Figure Page 3.1 The ellipsoid contour of TA .................................................................................... 18 3.2 Antecedent space partition and representation ....................................................... 19 3.3 A rotated local region covered by a horizontal or vertical ellipsoid ...................... 20 3.4 A rotated local region covered many small ellipsoids ............................................ 20 3.5 A rotated local region covered by a rotated ellipsoid ............................................. 20 3.6 Model parameters for a 4rule GTSK model .......................................................... 24 4.1 Exponential weighting with α = 0.95 ...................................................................... 36 4.2 Data generated from the model in Equation (4.25)................................................. 41 4.3 Time varying parameters a1(t) and b0(t) in Equation (4.4) ..................................... 41 4.4 SNNR Rearranged regressors from the timesequence data in Figure 4.1 ............. 43 4.5 Varying parameters for the SNNR reaaranged data ............................................... 44 4.6 α vs. α4/(1 α) .............................................................................................................. 48 4.7 Exhaustive dynamic order search ........................................................................... 48 4.8 Inputoutput data generated for Model 1 in Equation (4.40) .................................. 52 4.9 Inputoutput data generated for Model 2 in Equation (4.41) .................................. 53 4.10 Inputoutput data generated for Model 3 in Equation (4.42) ................................. 53 4.11 Inputoutput data generated for Model 4 in Equation (4.43) ................................. 54 4.12 Inputoutput data generated for Model 5 in Equation (4.44) ................................. 54 4.13 Data generated for Model 10 in Equation (4.50) ................................................... 56 4.14 Data generated for Model 12 in Equation (4.52) ................................................... 57 4.15 The two phase flow experiment setup ................................................................... 58 4.16 The schematic diagram for the two phase flow experiment .................................. 59 4.17 Water flowrate measurements with set point at 20 lbmol/hr ................................. 59 ix Figure Page 4.18 A choice of input and output channels ................................................................... 60 4.19 A choice of input and output channels ................................................................... 61 4.20 Reflux flowrate (solid line) and boiler heat rate (dash line) .................................. 62 4.21 The xD and xB in distillation column experiments ................................................. 63 4.22 Data generated for the determinist version of model in Equation (4.42) ............... 63 5.1 Antecedent space defined by antecedent variables u(t9) and y(t3) ...................... 81 5.2 Evaluation of function in Equation (5.28) .............................................................. 82 5.3 An antecedent space partitioned by three linear boundaries ................................... 85 5.4 An iterative procedure to partition an antecedent space ......................................... 86 5.5 Antecedent space partition by a regression tree ...................................................... 87 5.6 Parameters to be estimated in solving a SRP .......................................................... 88 5.7 Data samples for Equation (5.41) ........................................................................... 90 5.8 Data samples in antecedent space for Equation (5.41) ........................................... 91 5.9 A linear boundary based on data distribution ......................................................... 91 5.10 A linear boundary according to function nonlinearity ............................................ 92 5.11 Illustration of Equation (5.42) with different τ ...................................................... 93 5.12 A linear boundary generated for liner separable data .......................................... 107 5.13 A linear nonseparable case ................................................................................. 107 5.14 A liner nonseparable example with equally mixed points .................................. 108 5.15 Clusters found by LVQ for data in Figure 5.14 ................................................... 109 5.16 Flowchart for solving a SRP ................................................................................ 110 5.17 a) Initialization of data segregation for Equation (5.95); b) A linear separation boundary found for the initial data segregation ....................................................... 111 5.18 a) Initialization of data segregation for Equation (5.96); b) Initial linear boundary and its variation over iteration ................................................................................. 112 5.19 a) Initialization of data segregation for Equation (5.97); b) Initial linear boundary and its variation over iteration ................................................................................. 113 5.20 An initial data segregation for Equation (5.97) fails a SVM solver .................... 114 5.21 Clusters recognized using LVQ for the initial segregation in Figure 5.20 .......... 114 5.22 Initial boundary from clusters in Figure 5.21 and its variation in iterations ........ 115 x Figure Page 5.23 Liner boundary solved for a threepiece piecewise function ............................... 115 5.24 Linear boundary solved for a quadratic function ................................................. 116 5.25 SSE with respect to the separation locations for the quadratic function .............. 116 5.26 Initial linear boundary and its variation over iteration......................................... 117 5.27 SSE with respect to the separation locations for Sin(x) ....................................... 117 5.28 Objective function converges using Newton’s method to solve a SRP ............... 119 5.29 Converged objective function value with respect to τ. ........................................ 120 5.30 NelderMead algorithm to solve a SRP ............................................................... 121 5.31 SSE with respect to the separation locations for Equation (5.97) ........................ 122 5.32 Separation locations for Equation (5.97) by NelderMead method ..................... 123 5.33 Illustration of the function in Equation (5.98) ..................................................... 123 5.34 SSE with respect to separation locations for Equation (5.98) ............................. 124 5.35 Separation locations for Equation (5.98) by NelderMead method ..................... 124 5.36 Separation locations for Equation (5.98) by the proposed SRP solver ................ 125 5.37 Antecedent partition using different M .............................................................. 129 5.38 The branch Bt3 from Figure 5.5(a) ....................................................................... 131 5.39 The tree structure for the antecedent partition in Figure 5.37 (c) ........................ 133 5.40 Antecedent space partition after removing splits under branch nodes t5, t6 and t7 in Figure 5.39 ............................................................................................................... 134 5.41 Antecedent space partitions after remove some unimportant splits ..................... 135 5.42 A local region in an antecedent space .................................................................. 136 6.1 Values of αc for antecedent space partition for Equation (6.1) ............................. 139 6.2 Antecedent space partition and TAs based on Equation (6.1) .............................. 140 6.3 Function approximation by the 8rule GTSK model in Figure 6.2....................... 141 6.4 Normalized TAs for those in Figure 6.2. .............................................................. 142 6.5 Optimized TAs from initialization in Figure 6.2. ................................................. 143 6.6 Function approximation by the optimized 8rule GTSK model ........................... 143 6.7 Normalized TAs for those in Figure 6.5 ............................................................... 144 6.8 Optimized TAs starting from random initialization .............................................. 145 6.9 Function approximation by the 8rule GTSK model in Figure 6.8....................... 145 xi Figure Page 6.10 Antecedent space partition and TAs on Equation (6.3) ....................................... 146 6.11 Values of αc for antecedent space partition for Equation (6.4) ............................ 147 6.12 a) Antecedent space partition by αc > 117; b) Ellipsoids (TA = 0.05) ................. 147 6.13 Normalized TAs for those in Figure 6.12. ........................................................... 148 6.14 Normalized TAs for the leftfront rule in Figure 6.13 ......................................... 149 6.15 Quadratic function approximation by the fuzzy model in Figure 6.12 ................ 149 6.16 Antecedent space partition by αc > 130 ............................................................... 150 6.17 A portion of αc in Figure 6.11 with values less than 118 .................................... 150 6.18 a) Antecedent space partition by αc > 3; b) Ellipsoids (TA = 0.05) ..................... 151 6.19 Quadratic function approximation by the fuzzy model in Figure 6.18 ................ 151 6.20 A portion of αc shown in Figure 6.11 with values less than 3 .............................. 152 6.21 Quadratic function approximation by the GTSK model in Figure 6.22 ............. 153 6.22 Optimized TAs for a 16rule GTSK model from random initialization .............. 153 6.23 Illustration of the function in Equation (6.5) and its contour plot ....................... 154 6.24 Values of αc for antecedent space partition on Equation (6.5) ............................. 155 6.25 a) Antecedent space partition by αc > 0.1;b) Ellipsoids (TA=0.05) ..................... 155 6.26 Optimized TAs of a 8rule GTSK model for Equation (6.5) ............................... 156 6.27 Approximation of Equation (6.5) by the GTSK model in Figure (6.26) ............. 156 6.28 Illustration of function in Equation (6.6) and its contour plot ............................. 157 6.29 Values of αc for antecedent space partition on Equation (6.6) ............................. 158 6.30 a) Antecedent space partition by αc>0.81; b) Ellipsoids (TA=0.05) .................... 158 6.31 Function approximation by the model in Figure 6.30 and the contour ................ 159 6.32 Values of αc for antecedent space partition for Equation (6.6) with quadratic local models ...................................................................................................................... 159 6.33 a) Antecedent space partition by αc >1.5; b) Ellipsoids (TA=0.05) ..................... 160 6.34 Function approximation by the model in Figure 6.33 and the contour ............... 160 6.35 Antecedent space partition and TAs for Model 1 ................................................ 163 6.36 The separation boundaries shown for the nonlinear part in Model 1 .................. 163 6.37 Coefficients for local models in the GTSK model in Figure 6.35 ....................... 165 6.38 Twodimension antecedent space for Model 3 .................................................... 167 xii Figure Page 6.39 a) Antecedent space partition by αc > 10; b) Ellipsoids (TA=0.05) ..................... 168 6.40 Coefficients for local models in the fuzzy model in Figure 6.39......................... 169 6.41 Antecedent space partition and TAs for Model 4 ................................................ 171 6.42 Coefficients for local models in the GTSK model in Figure 6.41 ....................... 171 6.43 Twodimension antecedent space for the twophase process .............................. 173 6.44 Validation data set for the twophase flow process ............................................. 173 6.45 a) Antecedent space for two phase flow process; b) Ellipsoids (TA=0.05) ......... 174 6.46 Coefficients for local models in the GTSK model in Figure 6.45 ....................... 175 6.47 Twodimension antecedent space for y1 of the distillation column ..................... 177 6.48 a) Antecedent space partition output y1of the distillation column; b) Ellipsoids (TA=0.05) ................................................................................................................ 177 6.49 Coefficients for local models in the GTSK model in Figure 6.48 ....................... 178 6.50 Antecedent space partition and TAs for y1 of the distillation column ................. 179 6.51 Coefficients for local models in the GTSK model in Figure 6.50 ....................... 179 6.52 Twodimension antecedent space for y2 of the distillation column ..................... 180 6.53 a) Antecedent space partition output y2 of the distillation column; b) Ellipsoids (TA=0.05) ................................................................................................................ 181 6.54 Antecedent space partition and TAs for y2 of the distillation column ................. 181 6.55 Coefficients for local models in the GTSK model in Figure 6.50 ....................... 182 6.56 Twodimension antecedent space for the MIMO(2,2) GTSK model .................. 183 6.57 Antecedent space partition for the MIMO(2,2) GTSK model ............................. 183 1 CHAPTER I INTRODUCTION Efforts to describe chemical processes exist in various forms. Preferentially, based on idealized and simplified understanding of the underlying mechanism, firstprinciples models are developed. Many of these models have been standardized in commercial software such as ChemCAD for education or AspenPlus for prototyping process design. However, hardly can an idealized firstprinciples model find its application in practice; because, often, some artificial factors (like tray efficiency in a distillation column) have to be introduced to augment an ideal model to improve modeling accuracy via tuning against experiment data. Moreover, firstprinciples models are expensive to develop. It takes time for researchers to acquire sufficient knowledge for describing a new process mathematically and comprehensively. An ultimate goal of firstprinciples modeling is to understand the fundamental physics. However, in practice, partial or empirical understanding is often sufficient for certain practical applications. For instance, a modestly accurate inputoutput dynamic model makes controller design possible. Contrasting to firstprinciples modeling, another effort is blackbox modeling by system identification. Blackbox modeling tends to overlook details in mechanism, but focuses on inputoutput behavior of a process. For instance, the inputoutput description via firstorderplustimedelay models is often adequate for process control engineers to tune PID controllers. There are many choices for model structures including Finite Impulse Response, Autoregressive with exogenous inputs, Output Error, Autoregressive and Moving Average with exogenous inputs, and BoxJenkins. For each structure, the simplest one is a linear model. Surprisingly, many chemical processes can be quite well described using linear models due to the fact that most chemical processes are operated around a steady state operating point. The linear model could be interpreted as a local 2 linearization of the truly nonlinear chemical process. Despite the fact that linear models have been successfully used in many chemical processes, efforts have been devoted to describe nonlinear dynamical chemical processes in a more compact or unified approach. It is also expected that nonlinear modeling can provide more accurate description. If a nonlinear model is desired, users have options to represent a nonlinear function mapping. These options include but are not limited to polynomial models, piecewise models, basis function models, network models, and fuzzy models. Interestingly, there is also experiencedbased knowledge existing for chemical processes. These rules are familiar to us in various forms including process operating instructions and manuals, handbooks and rules of thumb. Some rules are derived from prior knowledge, which could be either understanding of fundamentals or experts’ experience. For instance, our knowledge regarding distillation behavior might produce two following rules expressing steady state relations: IF Reflux (R) is Fast THEN Overhead Purity (xd) is High IF Reflux (R) is Slow THEN Overhead Purity (xd) is Low where linguistic terms ‘Fast’ and ‘Slow’ are used to specify Reflux (R) while ‘High’ and ‘Low’ are used to specify xd. Knowledge expressed in logical rules is easy to understand but often difficult to use. Linguistic terms such as Fast, Slow, High, and Low are often not clearly defined. Moreover, human knowledge might be incomplete or outdated. In this work, one focus is to describe the inputoutput behavior of a nonlinear dynamic process. We choose TSK (TakagiSugenoKang) (Sugeno & Kang, 1986; Takagi & Sugeno, 1985) fuzzy models to approximate nonlinearity. The choice is motivated to take advantage of simplicity, interpretability, modularity and flexibility in a fuzzy model. The concept of a fuzzy set was introduced by Zadeh (Zadeh, 1965) to express degrees of membership of elements to sets, which could be viewed as a generalization of the classical 3 notion of set defined on a twovalue (0 and 1 or Ture and False) membership value. Subsequently, fuzzy logic is invented to handle the reasoning based on fuzzy sets. There are many ways to define fuzzy logic. An interesting application of fuzzy logic in engineering fields (fuzzy logic in broad sense) is fuzzy modeling, which uses fuzzy models to represent a nonlinear function. A fundamental proof, which permits the belief in fuzzy modeling shows that a fuzzy model is a universal approximator (Kosko, 1994). It simply means that fuzzy models can theoretically approximate almost any nonlinear function. Although a fuzzy model is not the only universal approximator, it is preferable over other modeling approaches because of its simplicity, interpretability, modularity and flexibility. One aspect of simplicity could be the modeling simplicity. One merit in fuzzy modeling is to allow users to translate their intuition and knowledge into a qualitative model description at first, by a fuzzy model, and leave quantitative description to a later tuning phase. For instance, an experienced operator can quickly provide a model with several rules to describe a distillation column as shown above, then, subsequently the break points defining linguistic categories can be fine tuned. Because fuzzy models are strongly connected to human knowledge, they are often accredited interpretability. The use of linguistic terms seems to be an ‘obvious’ reason. For sure, the involvement of linguistic terms makes a fuzzy model appear friendly to users. More fundamentally, the interpretability is due to the fact that a fuzzy model is expressed in IF…THEN structure, which matches the reasoning procedure for humans and makes a fuzzy model appear ”intelligent”. Another important aspect of interpretability is knowledge transparence, which is due to the modularity in a fuzzy model. Fuzzy models are made of rules. Regardless how ‘big’ a fuzzy model is, each rule in the fuzzy model is relatively simple. A fuzzy model as a whole with thousands of rules looks by no means interpretable no matter how many linguistic terms are used. However, the modularity in a fuzzy model allows users to look at a fuzzy model in a different way by shifting focus onto individual rules. In each rule, knowledge on local behavior of a nonlinear process becomes clear, and interpretability is possible. 4 Modularity is also aligned with the concept of divideandconquer in dealing with complex problems. In fuzzy model identification, modularity could be exploited to convert the identification of a fuzzy model to a number of smaller and simpler identification problems, each of which focuses on a rule. In applications, for instance, designing a fuzzy model based controller, modularity is used to translate the controller design for a fuzzy model into a number of smaller and simpler controller design problems. Modularity also leads to flexibility in fuzzy models. A fuzzy model can be viewed as an interface rather than a model. It serves as a common gateway to connect different types of models and allow communication among them. As shown below is a possibility to let a fuzzy model to incorporate different types of models IF x is High THEN use a firstprinciples model IF x is Low THEN use a Neural Network model IF x is Medium THEN y is High The flexibility and modularity also simplifies the model management maintenance. In addition to adapting model parameters to compensate modelplant mismatch, fuzzy models also allow insertion and deleting operations on rules to incorporate newly discovered events and eliminate obsolete behavior. Different from most blackbox modeling approaches, in our view, fuzzy models explicitly separate nonlinear components in a model from its linear components. This work will exploit this feature to simplify the model structure. However, the applications of fuzzy models are limited by their insufficiency to handle highdimension problems due to a well known problem, the curse of dimensionality. With this restriction, fuzzy models can hardly have any significant practical impact. Even for a singleinputsingleoutput (SISO) dynamic process, fuzzy models will be embarrassed if the SISO process has high dynamic orders. Many successful academic examples of using fuzzy 5 models are demonstrated on dynamic processes with low dynamic orders, often not exceeding four. In this work, fuzzy models, particularly TSK type of fuzzy models are chosen to describe nonlinear dynamics due to the potential benefits mentioned above. The TSK model used in this work is featured with a generalized rule structure, which is proposed to overcome its insufficiency in dealing with high dimensional problem. The new structure has different dimensions in rule antecedent and consequent. Usually, in this work, the antecedent dimension is lower than consequent and contains only ‘nonlinear variables’, which tends to directly handle the curse of dimensionality by having fewer variables included in antecedents. Additionally, the new structure replaces the combinatorial antecedent structure by a more flexible one, where an extra degree of freedom is introduced to ‘rotate’ the coverage of a rule. The new addition reduces the number of rules needed in a TSK model by improving the covering efficiency of each rule. With the generalized rule antecedent structure, the TSK model in this work is referred to as GTSK (generalized TSK). The structure of a GTSK model includes the overall model dimension and antecedent dimension. In this work, since the primary modeling target is nonlinear dynamic processes, the determination of the overall dimension of a GTSK model is translated to discover the dynamic orders from measured inputoutput data. The antecedent dimension of a GTSK model is determined by finding nonlinear components in a GTSK model. Parameter estimation of the GTSK model is automated heuristically by recognizing rules from an iteratively partitioned space. Following the heuristic procedure is the fine tuning of the fuzzy model parameters by solving a nonlinear optimization problem with matrix inequality constraints. This work tends to provide a unified and systematic procedure to obtain a GTSK model with new rule structure from inputoutput data for a nonlinear dynamic process. The procedure is demonstrated on several theoretical benchmark problems, which are drawn from published research works and are used primarily for illustrating ideas, comparing methods and verifying results. The procedure is also tested on a distillation column simulator, which 6 has been successfully used in past research work (Ou, 2001). Additionally, the procedure is tested on a pilotscale chemical process, twophase flow, which exhibit nonlinear dynamics, time delay, and measurement noise. Innovations of this work are design of a new rule antecedent structure, which has a reduced antecedent dimension and a more flexible antecedent structure, design of a systematic approach to determine dynamic orders and detect nonlinear variables, and design of a heuristic procedure that iteratively partition an antecedent to generate regions, within which a linear relation is valid. 7 CHAPTER II LITERATURE SURVEY 2.1 Literature Survey for Dynamic Order Determination TSK type of fuzzy models is used in this work to describe a nonlinear dynamic process. Several potential benefits that users might expect from a fuzzy model have been listed in the Introduction. The modeling procedure proposed in this work is capable of dealing with multipleinputmultipleoutput (MIMO) processes. However, the majority of technical elaboration will be based on singleinputsingleoutput (SISO) models as described in Equation (2.1) for the simplicity of presentation. The extension to MIMO models will be addressed accordingly. y (t ) = f ( y (t −1),L , y (t − ny),u (t − d ),L ,u (t − nu − d )) + e(t ) (2.1) Equation (2.1) is a nonlinear autoregressive with exogenous input (NARX) model. The term NARX is chosen to be consistent with its linear counterpart, ARX models. The terminology is however not unique in the literature. In (Seborg & Henson, 1996), the structure in Equation (2.1) is named as a nonlinear autoregressive and moving average model (NARMA). In this work, ARX structure is chosen for its simplicity. More importantly, function arguments (lagged y and u) in Equation (2.1) include only input and output measurements. Some operations and treatment on raw data in this work are currently limited to model structures that have only measurable function arguments. 8 More complex structures could be used to describe nonlinear dynamics if necessary. A nonlinear NARMAX model is described in (Johansen & Foss, 1993). Its structure information is retrieved from its linear counterpart, ARMAX. As commented in (Nelles, 2001), more advanced structures are often not worth their additional complexities in describing nonlinear dynamics. On the other hand, NARX models as simpler models should often be tried first for any unknown structure nonlinear dynamic processes.The btained NARX models could be the basis for further structure variation or complication. In (Fischer, Nelles & Isermann, 1998), an NARX is first identified then converted to a nonlinear output error model (NOE) by some regressor replacements (for instance, y(t1) is replaced by its prediction ŷ(t1)) followed by model parameter retuning. Additionally, we assume, in this work, that ny, nu and d in Equation (2.1) are time invariants. The additional simplification may be against the nature of some realistic processes. For instance, a timevarying delay is often encountered in chemical processes, where a transportation delay strongly relates to a flow rate that is time varying in nature. On the other hand, a constant delay is often a good enough approximation in practice, especially in a relatively steady working condition. The first step in system identification is to determine orders of the model. For the SISO model in Equation (2.1), the problem is then to discover ny, nu and d. In terms of dynamic order determination, there are welldeveloped methods for linear systems. For dynamical linear systems, a preliminary analysis using autocorrelation and partial autocorrelation (Box, Jenkins & Reinsel, 1994) is able to identify dynamic orders. The result is often a set of candidate orders to be tried and validated further against data. Dynamic order determination can also be translated to problems regarding regressor analysis. Regressor analysis does not result in the dynamic orders and time delay directly. However, it would be a trivial practice to draw, ny, nu and d from the result of regressor analysis. One method is subset selection (Miller, 1990), which has different versions including forward selection, backward elimination, cycling replacement and exhaustive search methods. Among them, only exhaustive search, the most expensive one, is guaranteed to be able to find a global optimal solution, the best set 9 of regressors. Other methods are heuristically motivated aiming at a suboptimal solution with improvement in searching speed or efficiency. Analysis of variance (ANOVA) as a tool to find the influential experimental factors can also be used to find influential regressors (Lind & Liung, 2008). ANOVA method suffers from the curse of dimensionality and the evaluation of interacting influence among factors requires a combinatorial amount of trials. In addition, a conventional ANOVA procedure takes finite levels of experimental factors rather than continuous (‘infinite’ levels) values. Extra computation is required to prepare the raw data for ANOVA analysis (for instance by clustering). For nonlinear dynamical models, even for NARX models, there is not a general method such as the autocorrelation or partial autocorrelation method available for dynamic order determination. Rigorous analysis based on nonlinear correlation is possible if the nonlinear structure of f is known or presumed (Haber & Unbehauen, 1990). There are a variety of choices of predefined nonlinear structures such as bilinear, Wiener, Hammerstein models or their combinations. Another approach aims at a general target and does not depend on a predefined nonlinear structure. The geometric method (Molina, Sampson, Fitzgerald & Niranjan, 1996) is proposed to determine the embedded dimension in deterministic nonlinear autoregressive nonlinear systems. Following the same concept, its extension to dealing with deterministic ARX by including inputs is proposed in (Rhodes & Morari, 1995) based on False Nearest Neighbor. Both methods are more intuitively motivated rather than rigorously derived, and can be roughly argued based upon the firstorder Taylor expansion. Another method also based on the firstorder Taylor expansion argument is Lipschitz Quotient (He & Asada, 1993) aiming at deterministic NARX dynamic processes. The difficulty in determining the order in Equation (2.1) is the unknown nonlinear function, f. Even if f is known to be nonlinear, the richness of nonlinearity would keep users from exhausting all possible nonlinear forms, making it difficult to find ny, nu and d. If the nonlinearity is known, it is possible to transform a nonlinear problem into a linear problem. If the nonlinearity is unknown, users could resort to any one of ‘big’ 10 models such as neural network or any other one being proved to be a universal approximator. These complex structures are able to capture almost any nonlinearity given enough flexibility. Without nonlinearity being a problem, users can then experiment and compare different sets of orders in these ‘big’ models. The drawback of using ‘big’ models is high computational burden. Additionally, as we will present later, experimentation of dynamic orders in ‘big’ models is not suitable for another objective in this work, nonlinear component detection. In our work, a unified approach is proposed for both dynamic order determination and nonlinear component detection. 2.2 Literature Survey for Fuzzy Model Structure There are several different types of fuzzy models. One of them is the Mamdani fuzzy model (Mamdani, 1974). For the nonlinear dynamic process in Equation (2.1), Mamdani fuzzy models might be defined by rules as below ( ( ) ( ) ) ( ) 1 1 1 is is y t r r ny nu r y t A u t nu d A C + + IF − AND AND − − THEN is L (2.2) where, the expression ( ) ( ) 1 1 1 is r is r ny nu y t A u t nu d A + + − AND L AND − − is the antecedent of the rule. The expression y (t ) is Cr is the consequent of the rule. The variables y(t1), …, y(tny), u(td), …, u(tnud) are antecedent variables and 1 Ar is the fuzzy subset for y(t1) in the rule. Notations of fuzzy subsets for other variables should be clear in context. A Mamdani fuzzy model has the perhaps the simplest consequent models. An extension of Mamdani fuzzy models is TakagiSugenoKang (TSK) fuzzy model (Sugeno & Kang, 1986; Takagi & Sugeno, 1985). The generalization goes to rule consequent. For the nonlinear dynamic process in Equation (2.1), a rule in a TSK fuzzy model could be defined by ( ( ) ( ) ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 1 1 1 1 1 1 1 1 1 0 1 1 is is 1 r r ny nu r r r r r r r ny ny r r r r nu nu y t A u t nu d A z y t k z u t d e t z a z a z z b b z b z + + − − − − − − − − − − − = + − + = + + + = + + + IF AND AND THEN A B A B L L L (2.3) 11 where, consequent model is Ar (z−1 ) y (t ) = k r + Br (z−1 )u (t − d ) + er (t ) with dynamic orders ny and nu, pure time delay d and a constant kr. z is the backshift operator. The local model could be interpreted as a linearization of the nonlinear dynamic process in Equation (2.1). The linearization explains the inclusion of the constant term kr. As mentioned in (Leith & Leithead, 1999; Shorten, Smith, Bjorgan & Gollee, 1999), the linearization could be interpreted as conducted around either a steady state or transitional working point. Including of the later is commented to be able to improve modeling performance for transient behavior (Smith & Johansen, 1997). Mamdani and TSK represent two major types of fuzzy models and are different in consequents. In fact, a TSK fuzzy model could be further generalized by replacing its linear consequent models with other types of models. In (Mastorocostas & Theocharis, 2002), a new type of fuzzy model is proposed with neural network consequent models. Hierarchical fuzzy models (Lee, Chung & Yu, 2003; Liu & Li, 2005; Zeng & Keane, 2005) are often mentioned in the literature and could also be considered as a particular type of generalization by having fuzzy models as local models. Interestingly, fuzzy models could also be compared with models originated from other disciplines. It is shown in (Andersen, Lotfi & Westphal, 1998; Roger & Sum, 1993) that a TSK fuzzy model with Gaussian membership functions and product operator for AND logic conjunction is functionally equivalent to a normalized radial basis network under certain restrictions. In (Smith & Johansen, 1997), a TSK fuzzy model is addressed in a broader perspective as a multimodel network. The above mentioned fuzzy models represent one direction of generalization of fuzzy model structure by making consequent models more complex. Interestingly, not much effort is devoted to generalize the antecedent structure in a fuzzy model. The maneuverability in antecedents lies mainly in the choices of different types of membership functions including triangular, trapezoidal and Gaussian, etc., as well as different configurations for a particular type of membership functions. 12 Another degree of freedom in designing antecedents is via using different logic operators. For instance, the AND conjunction in the antecedent expression in Equation (2.2) or (2.3) could be quantitatively evaluated using either product or minimum operator. In addition to these two, there are in fact many other choices for the evaluation of AND conjunction, which is defined by a variety of Tnorms as a result of research on symbolic fuzzy logic (Lee & Zhu, 1995). 2.3 Literature Survey for Fuzzy Model Identification Identifying a fuzzy model generally involves two objectives, structure identification and parameter estimation. The structure identification selects variables for antecedent and consequent, determining number of fuzzy subsets for each variable, and estimating number of rules in a fuzzy model. Parameter estimation determines values of model parameters. As shown in a TSK rule in Equation (2.3), model parameters include parameters defining all fuzzy subsets (membership functions) in the antecedent and those defining consequent models. There are many different approaches for fuzzy model identification. They vary for different types of fuzzy models to be identified or based on different assumptions. Very often in practice, the structure identification and parameter value estimation are coupled. For instance, the number of rules is related to the number of variables in the antecedent as well as the number of fuzzy subsets for each antecedent variable. Meanwhile, an addition or deletion of a fuzzy subset to a variable is expected to affect of the distribution of other fuzzy subsets, which in turn results in retuning of membership functions for optimal result. Inevitably, any variation in antecedent parameter values should be accompanied by corresponding change in consequent model coefficients. 2.3.1 Variable Selection Variable selection determines the variables for rule antecedent and consequent. Very often, it is implicitly assumed for simplicity that all rules in a fuzzy model share the same set of antecedent and consequent variables. It is therefore equivalent in practice to 13 define the problem as antecedent and consequent variable selection for a fuzzy model. Variable selection is not conducted separately but often accompanied by parameter estimation/retuning. A common explicit procedure is to try different sets of selections with evaluation of their corresponding model accuracy and complexity, and find the best. In (Pomares, Rojas, González & Prieto, 2002), the variable selection is conducted iteratively in a constructive approach to build a fuzzy model. In each iteration, a fuzzy model is augmented by either changing the number of fuzzy subsets of already selected variables or adding a variable in antecedent. The better one is kept. Similar to the approach widely used in classification tree identification, the antecedent variable selection is implicitly conducted in (Nelles & Isermann, 1996). In each step, the augmentation of the existing fuzzy model is tried by adding a new rule for each candidate variable. The best rule is then kept. In the end, antecedent variable selection is automatically achieved by discarding variables from the antecedent, which are never selected. The variable selection becomes more complicated for a dynamic process as described in Equation (2.1) since each variable is associated with an unknown dynamic order. The variable selection problem should then be extended to determine the dynamic order for each variable. The extension could be simply achieved by including more lagged terms, which, however, largely increases the problem dimension and makes many methods designated for low dimension problems become difficult. 2.3.2 Fuzzy Model Identification There are several different ways to categorize methods in fuzzy model identification. Some identification methods are based on heuristic criterion for linguistic interpretability and knowledge transparency. On the other hand, many other identification methods tend to find a more accurate fuzzy model by minimizing a quantitative performance index. The approaches to extract fuzzy rules heuristically are mainly inspired by two procedures. The Pittsburgh approach focuses on rule set evolution while the Michigan approach evolves individual rules independently. Both Pittsburgh and Michigan approaches use genetic algorithms for optimization, which is consistent with the main 14 theme being heuristic. More importantly, it is due to the fact that heuristic criteria are unable to provide explicit searching directions expressed by gradients or Hessians. The research on this field focus primarily on inventing new heuristics by digging deep how human process linguistic information, or devise more efficient searching or combinatorial optimization techniques (Cordon, Herrera, Gomide, Hoffmann & Magdalena, 2001). Different from those heuristically inspired approaches, a modeling error driven approach estimate parameter values of a fuzzy model by optimizing the performance index, for instance, sum of squared error. In this approach, one could take either a ‘global’ procedure to tune all parameters (antecedent, consequent parameters) simultaneously or a ‘local’ procedure starting from individual rules and combine them to be a fuzzy model. The ‘global’ procedure requires a good initial guess to avoid trivial solutions or poor local minimal. In (Dickerson & Kosko, 1996), an initial fuzzy model is generated by recognizing piecewise patches along a SISO nonlinear function to be approximated. Then a steepest descent optimizer is followed. Heuristics based on clustering are also used to recognize the prototype rules (Dickerson & Kosko, 1996; Vernieuwe, Baets & Verhoest, 2006; Wang & Yang, 2009). In (Nelles, 2001) rules are progressively generated by conducting an equal division in a dimension in each step. It is also possible to overparameterize a fuzzy model and let a simplification procedure (Yen & Wang, 1999) to merge redundant rules or eliminate invalid rules. It is worthy pointing out that there is a procedure that tends to obtain a fuzzy model representation of a known nonlinear process by mathematical equivalence (Kawamoto, 1992). This approach has nothing to do with above mentioned fuzzy model identification from data. The main purpose of this procedure is to represent a nonlinear model by a fuzzy model and exploit the structure features in the fuzzy model to design controller, and investigate stability for the original nonlinear model. Additionally, heuristicbased stochastic procedures exist to gain both model structures and parameter values simultaneously (Du & Zhang, 2008; Guenounou, Belmehdi & Dahhou, 2009; Lin, 2008; Lin & Xu, 2006), which however require even more computation resources. 15 CHAPTER III A GENERALIZED RULE ANTECEDENT STRUCTURE In this chapter, a generalized rule antecedent structure is proposed. The new rule antecedent uses only nonlinear variables. Additionally, one more degree of freedom is introduced to design antecedents to cover an antecedent space more efficiently. The following elaboration focuses on a singleinputsingleoutput (SISO) model. The extension to multipleinputmultipleoutput MIMO models is provided at the end. 3.1 Model Complexity Equation (3.1) represents a SISO dynamic process with dynamic orders ny, nu, pure time delay d, and an additive noise e (t) y (t ) = f ( y (t −1),L , y (t − ny),u (t − d ),L ,u (t − nu − d )) + e(t ) (3.1) where y is the process response and u is the input. The nonlinear function, f could be approximated by a TSK model in Equation (2.3) and reproduced as below for simple reference ( ( ) ( ) ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 1 1 1 1 1 1 1 1 1 0 1 1 is is 1 r r ny nu r r r r r r r ny ny r r r r nu nu y t A u t nu d A z y t k z u t d e t z a z a z z b b z b z + + − − − − − − − − − − − = + − + = + + + = + + + IF AND AND THEN A B A B L L L (3.2) 16 Complexity of a TSK model could simply be regarded as the number of rules. For the TSK model in Equation (3.2), given that each variable has 5 fuzzy subsets (could be linguistically labeled as Low, MediumLow, Medium, MediumHigh, High), there would be 5ny+nu+1 possible rules to be considered. The problem dimension (ny+nu+1 in this case) is an obvious cause for the complexity. Moreover, the number of rules also depends on the number of fuzzy subsets for each variable. The illustrated number, 5 is quite conservative in practice. Simply put, the TSK model described in Equation (3.2) has difficulty to deal with high dimension problems or it is subject to “the curse of dimensionality”. In the following, a generalized rule antecedent structure is proposed to design an efficient GTSK model by using fewer rules. The new rule antecedent only uses nonlinear variables, which separates the antecedent dimension from the problem dimension. The complexity of a GTSK model is only related to the antecedent dimension. It is then possible to apply a GTSK model to a high dimension problem so long as its antecedent dimension is acceptable. Additionally, the proposed rule antecedents are expressed as ellipsoids covering the underlying local regions and feature one more degree of freedom in design. The extra flexibility makes spatial coverage more efficient and simplifies a fuzzy model in terms of number of rules. 3.2 Antecedent Dimension The direct approach to reduce the number of rules is to control the problem dimension, which is unfortunately determined by the nature of the problem but not by users. However, dimension reduction in the antecedent is still possible by excluding variables that appear linearly. To illustrate dimension reduction, consider the following nonlinear dynamic model with three regressors, [y(t1) y(t2) u(t1)] y (t ) = y (t −1) y (t − 2) + 2.5 + y2 (t −1)u (t −1) (3.3) 17 Using the rule structure in Equation (3.2), the rule antecedent could then be expressed as (y(t1) is A1 AND y(t2) is A2 AND u(t1) is A3)). The antecedent dimension is 3, which is same as the problem dimension. Assuming that each variable has 5 fuzzy sets, the combinatorial construction will then generate 53=125 possible rules. The dynamic model in Equation (3.3) can be represented in a linear format using timevarying parameters. ( y t ) = a1 (t ) y (t −1) + a2 (t ) y (t − 2) + b0 (t )u (t −1) (3.4) with a1(t) = 2.5, a2(t) = y(t1) and b0(t) = y(t1)2 where, model parameters a2 and b0 are not only timevarying but functions of the regressor, y(t1). It indicates that the model can be expressed linearly in all variables except y(t1). The coefficient values in Equation (3.4) are independent of y(t2) and u(t1). Equivalently, the regressor, y(t1), is the only regressor that changes the otherwise linear model coefficient values. Therefore, y(t1) should be the only one included in the antecedent. The simplified rule is defined by ( ) ( ) ( ) ( ) 1 2 1 ( 1) is 1 2 1 r r r r r y t A y t k a y t a y t b u t − = + − + − + − IF THEN (3.5) where the antecedent dimension is reduced to 1. The possible number of rules is reduced from 125 to 5. In Equation (3.5), y(t1) is then an antecedent variable and collected in a vector c(t). Regressors in the consequent including y(t1), y(t2) and u(t1) are collected in vector x(t). The concept to include only nonlinear variables in antecedents have been explicitly mentioned in (Shorten, Smith, Bjorgan & Gollee, 1999) or implicitly applied in (Nelles & Isermann, 1996; Tanaka & Wang, 2001), where fuzzy models are used to describe known nonlinear dynamic processes. However, the above mentioned dimension reduction can only be made practically applicable if it is able to find antecedent variables from data. The detection of antecedent variables will be addressed in Chapter 4. 18 3.3 Antecedent Structure 3.3.1 A Generalized Antecedent Structure As mentioned above, the number of rules is related to the problem dimension by 5ny+nu+1. In Section 3.2, it is illustrated that it is possible to use a number for the exponent less than ny+nu+1. However, the exponential relation between the number of rules and the dimension (antecedent dimension) is still preserved. The underlying cause for the exponential connection is the combinatorial antecedent structure expressed in the TSK rule in Equation (3.2), using AND conjunction to connect antecedent variables. For example, given a two dimensional antecedent (c1 is A1 and c2 is A2), if Gaussian membership functions are assumed and the product operator is used for the AND conjunction, the antecedent is then evaluated by the truth of antecedent (TA) 2 2 1 1 2 2 1 2 exp c o c o TA σ σ − − = − − (3.6) where TA is an ellipsoid centering at (o1,o2) with width by σ1 and σ2. A contour plot of TA is shown below Figure 3.1. The ellipsoid contour of TA In Figure 3.1, the highest value of TA =1 is reached at the centroid. The further out is the contour, the smaller the TA. The value of TA can be interpreted as the 1 c 2 c 1 o 2 o 19 belongingness of a data point to a local region. A fuzzy model has several rules. Given a twodimensional antecedent with equal number of fuzzy sets for each antecedent variable, a typical combinatorial antecedent space partition and representation by horizontal and vertical ellipsoids is shown in Figure 3.2(a) (a) (b) Figure 3.2. Antecedent space partition and representation where, 9 rules result from the exhaustive combinations of 3 fuzzy sets for each antecedent variable. Users might resort to the techniques in (Yen & Wang, 1999) to reduce the redundancy in consequent models and have a more compact fuzzy model. The number of rules can be reduced by merging some regions that exhibit similar local behavior. Figure 3.2(b) shows a possible simplified partition after merging some regions. The partition in Figure 3.2(b) will also become inefficient as shown in Figure 3.3, where neither a horizontal nor a vertical ellipsoid provides an efficient representation of the underlying local region represented by either the rotated “space” of correlated variables or irregular polygons. 1 c 2 c 1 c 2 c Figure 3.3. A rotated local region covered by a horizontal or vertical ellipsoid One possible solution for covering the space is to use many smaller ellipsoids as shown in Figure 3.4, which Figure 3.4. A rotated local region covered Another solution is to rotate the ellipsoid as shown in Figure 3.5. Figure 3.5. 20 3. however might result in a lot of rules. 4. by many small ellipsoids nother A rotated local region covered by a rotated ellipsoid 21 The rotated ellipsoid proposed here with the stretching and contraction is flexible enough to match many geometric shapes. In order to address the rotation mathematically, the parameters σ in Equation(3.6) are replaced by a symmetric positive definite matrix P, which is termed as the shape matrix in this work and redefines the truth of antecedent by exp( ( ) ( )) T TA = − c − o P c − o (3.7) where o is a vector with dimension of nc and represents the centroid, and the dimension for the shape matrix P is nc by nc. The flexibility in representing antecedent subspaces is at cost of additional nc(nc1)/2 new parameters in the shape matrix in Equation (3.7). This approach could be interpreted as a transition from a TSK fuzzy model with many simple rules to a GTSK fuzzy model with fewer complex rules. Clearly, the simplicity and complexity in this context refers to that in rule antecedents. 3.3.2 Interpretation of the Proposed Structure Interested readers could follow the following method to convert the new antecedent structure in Equation (3.7) to a conventional antecedent in Equation (3.2) with new defined variables. Since the treatment in Section 3.3.2 is not essential, readers might also choose to skip it. The conversion is aided via representing the shape matrix in Equation (3.7) by its spectral decomposition. 1 nc T i i i i λ = P =Σ z z (3.8) where λi is an eigenvalue and zi is the corresponding eigenvector. Substituting P by its spectral decomposition, Equation (3.7) then becomes 22 ( ) ( ) ( ( ) ( )) ( ) 1 1 2 , 1 1 exp exp exp nc T T i i i i nc T T i i i i nc nc i j j i j i j TA c o z λ λ λ = = = = = − − − = − − − = − − Σ Π Π Σ c o z z c o c o z z c o (3.9) Then TA could be converted to the conventional form 2 1 exp nc i i i i v q TA = σ − = − Π (3.10) with the new introduced antecedent variable vi, centroid, qi and σi are defined by , 1 , 1 1 2 nc i j i j j nc i j i j j i i v c z q o z σ λ = = − = = = Σ Σ (3.11) The rule antecedent could then be represented in the conventional format using AND conjunction as ( ) ( ) 1 1 1 1 is , is , nc nc nc nc v A q σ ANDL AND v A q σ (3.12) where A1(q1, σ1) denotes a Gaussian membership function with the centroid, q1 and the width specified by σ1. The above mentioned interpretation might be useful to convert an existing GTSK model with the generalized antecedent structure to a conventional TSK fuzzy model to regain the interpretability offered in antecedents using AND conjunction. It seems also that the antecedent structure generalization is to extend a conventional TSK fuzzy model architecturally by introducing an extra layer to linearly combined raw variables to form 23 antecedent variables. However, the above interpretation might not be helpful in estimating model parameters in general. For instance, there are nc(nc+1)/2 variables required to specify the shape matrix. However, there are nc(nc+1) parameters expressed in Equation (3.11); zi,j (i=1,…,nc; j=1,…,nc) and λi (i=1,…,nc). One might need to add additional constraints to eliminate the extra nc(nc+1)/2 degrees of freedom. For instance, eigenvectors are orthogonal to each other and eigenvectors have unit length. 3.4 SISO Models In a GTSK model, model parameters include both antecedent and consequent parameters. Antecedent parameters specify active regions for each rule while consequent parameters describe local models. For simplicity of presentation, a vector x(t) is defined as below to collect the input arguments in Equation (3.1) ( ) 1 ( 1) ( ) ( ) ( ) T x t = y t − L y t − ny u t − d L u t − d − nu (3.13) where the dimension of x(t) is nx+1 with nx = ny+nu +1. If a GTSK model is used to approximate the nonlinear function f in Equation (3.1), the fuzzy model is then defined as below using the proposed antecedent structure ( ( ) ( )) ( ) ( ) ( ( ) ( )) ( ) ( ) is in 1 1, 1 ˆ1 1 is in M M , M ˆ M M t R y t t t R y t t = = IF c o P THEN θ x IF c o P THEN θ x M (3.14) where, superscript 1 indicates the first rule in a GTSK model. The antecedent representation using AND conjunction in Equation (3.2) is replaced by the statement c(t) is in R1 (o1,P1). The expression of R1 (o1,P1) could be interpreted to represent an ellipsoidal active region for the first rule. The number of rules, M, is assumed known. c(t) containing nc antecedent variables is defined as below and obtained as nonlinear components in Chapter 4 for nonlinear dynamic processes. 24 ( ) ( ) ( ) 1 T c t = c t L cnc t (3.15) 3.4.1 Model Parameters Figure 3.6 illustrates the model parameters to be estimated for a GTSK model in a twodimension antecedent structure. Figure 3.6. Model parameters for a 4rule GTSK model Ri represents the active region for the ith rule. Its location and shape are specified by antecedent parameters; a centroid vector, oi ∈Rnc and a positive definite shape matrix, Pi ∈Rnc×nc . They are respectively defined by 1 i i i T nc o = o L o (3.16) 1 2 4 2 3 5 4 5 6 i i i i i i i i i i p p p p p p p p p = P L L L M M M O (3.17) 25 where the symmetric matrix Pi is specified by a vector ( 1) 2 Ri nc× nc+ p ∈ 1 2 3 ( 1) 2 i i i i i nc nc p p p p + = p L (3.18) The Pi matrix can be expressed as a weighted sum of symmetric basis matrices in order to facilitate the computation later on ( 1) 2 1 nc nc i i j j j p + = P = Σ B (3.19) where symmetric basis matrices, Bj, are defined in the following manner 1 1 0 0 0 0 0 0 0 0 = B L L M M O M L , 2 0 1 0 1 0 0 0 0 0 = B L L M M O M L (3.20) 3 0 0 0 0 1 0 0 0 0 = B L L M M O M L , …, ( 1) 2 0 0 0 0 0 0 0 0 1 nc nc+ = B L L M M O M L The local model parameters (consequent parameters) are included in vector θi ∈Rn×+1 defined by 0 1 i i i i nx θ = θ θ L θ (3.21) 3.4.2 Model Computation The computation of the model in Equation (3.14) is defined by ( ) ( ) ( ) 1 ˆ ˆ M i i i y t w t y t = =Σ (3.22) 26 where ŷi(t) is output from the local model in Rule i and weighted by wi(t). Weights wi(t) are defined as the normalized truth of the antecedent (TA) ( ) ( ) ( ) 1 i i M i i TA t w t TA t = = Σ (3.23) with TA evaluated by Equation (3.7) 3.5 Extension to MIMO Models Dealing with MIMO models becomes simple in this work. As below, a MIMO model is shown a collection of several MISO models. Interested readers might follow Section 3.5 to see how a MIMO model is equivalent to multiple MISO or come back later to revisit the subject when dealing with a MIMO case. For a MIMO process, a general description of its onestep predictor is defined by yˆ (t ) = f (y (t −1),L , y (t − ny),u(t − d ),L ,u(t − d − nu)) (3.24) where, the MIMO model has n outputs and m inputs. The output and input vectors, y(t) and u(t) are defined by ( ) ( ) ( ) ( ) ( ) ( ) 1 1 T n T m t y t y t t u t u t = = y u L L (3.25) The above model structure implicitly assumes that the dynamic orders in all yi (i=1,…,n) and uj(j=1,…,m) for each output yk(t) are ny and nu respectively. A universal time delay is also assumed between each pair of uj and yk. The universal order assumption is in general not true in practice. However, a MIMO GTSK in discrete time model could be modified to have such a universalorder structure by adding zeros if necessary. A regressor vector x(t) is defined to collect all input arguments in Equation (3.24) 27 ( ) 1 ( 1) ( ) ( ) ( ) T T T T T x t = y t − L y t − ny u t − d L u t − d − nu (3.26) where the dimension of x(t) is nx+1 with nx = n×ny+(m+1)×nu. The model is then defined as below ( ( ) ( )) ( ) ( ) ( ( ) ( )) ( ) ( ) is in 1 1, 1 ˆ 1 1 is in M M , M ˆ M M t R t t t R t t = = IF c o P THEN y θ x IF c o P THEN y θ x M (3.27) The model in Equation (3.27) is almost identical to that in Equation (3.14). oi and Pi have the same meaning. Antecedent variables are included in vector c(t), which is also a subset of x(t). The only difference is that local models in Equation (3.28) are multipleoutput. The vector ŷi collects the n output predictions by the local model in the ith rule 1 ˆ i ˆi ˆ i T n y = y L y (3.28) Consequently, local model parameters are organized in a matrix ( 1) Ri n× nx+ θ ∈ defined by 1,0 1,1 1, ,0 ,1 , i i i nx i i i i n n n nx θ θ θ θ θ θ = θ L M M O M L (3.29) Each row of θi corresponds to an output and every column of θi is related to a regressor. It is possible to decompose θi in terms of columns or rows as below 1 0 ; i i i i i nx i n = = θ θ θ θ θ θ L M (3.30) Where i θj (j=0,…,nx) represents the jth column in matrix θi and rows i kθ represents the kth row in θi (k=1,…,n) 28 The computation in Equation (3.22) is then extended as below to deal with a multipleoutput model. ( ) ( ) ( ) ( ) 1 1 1 ˆ ˆ i M i i i n n y t w t t y t = = Σ θ x θ M M (3.31) Equation (3.31) could be viewed as a collection of n singleoutput models. For instance, the computation for the kth output is ( ) ( ) ( ) 1 ˆ M i i k k i y t w t t = =Σ θ x (3.32) where i kθ defined in Equation (3.30) is the kth row of matrix θi. It then is possible to define a single output GTSK model for the kth output only by ( ( ) ( )) ( ) ( ) ( ( ) ( )) ( ) ( ) is in 1 1 , 1 ˆ1 1 is in , ˆ k k M M M M M k k t R y t t t R y t t = = IF c o P THEN θ x IF c o P THEN θ x L (3.33) Comparing the singleoutput model for output yk with that in Equation (3.14), equivalence is established by equating kθi in Equation (3.33) to θi in Equation (3.14). However, two models are different. Model in Equation (3.14) is SISO while that in Equation (3.33) is MISO. The x(t) in Equation (3.33) actually collects the lagged multiple inputs and lagged multiple outputs. Fortunately, the difference in contents in x(t) has no impact on evaluation of the first and second order derivatives to be presented later. The computation of gradients and Hessian matrices for a SISO GTSK model can be extended directly to each MISO element in a MIMO GTSK model. A matrix kθ is defined to collect all local model parameters for the kth output. 1 k k r k = θ θ θ M (3.34) 29 The above decomposition can facilitate estimation of model parameters in terms of evaluation of derivatives if a decomposable performance index is used. Simply, the centroids and shape matrices have global influence on a GTSK model. Their influence on all outputs should be accumulated. To the contrary, the consequent parameters, kθ have only local influence on its corresponding output yk. It then could be expected that the interactions between kθ and lθ (k≠l) is zero. The representation of a MIMO GTSK model by several singleoutput GTSK models will be exploited in Chapter 5 to derive the first and second order derivatives of an objective function with respect to model parameters. 30 CHAPTER IV DYNAMIC ORDER DETERMINATION AND NONLINEAR COMPONENT DETECTION Determination of dynamic orders (ny, nu and d in Equation (3.1)) is the first step in system identification. Order determination is in general difficult for nonlinear system identification due to the interaction of model structure (unknown orders) and unknown nonlinearity. If the attenuation of unknown nonlinearity is possible, different model structures could then be fairly compared. Guided by this concept, the work in this chapter uses a recursive estimation to reduce the effect of the underlying nonlinearity on parameter variation, and proposes a sequential nearest neighbor rearrangement to enhance the reduction. The “best” dynamic order will minimize a final prediction error with the consideration of the locality of the model parameters. In addition to determining dynamic orders, the sequential nearest neighbor rearrangement is also extended to detect nonlinear components, which are regressors responsible for parameter variation if a nonlinear dynamic model is converted to a liner timevarying dynamic model. The result from Chapter 4 could be viewed as the preliminary analysis for building a GTSK model to be presented in Chapter 5. The dynamic order determination defines the overall dimension of a model. The nonlinear component detection selects antecedent variables for the model. 31 4.1 Dynamic Order Determination The dynamic orders ny, nu and delay d are described in Equation (3.1). The difficulty in discovering the dynamic orders for a nonlinear dynamic model is caused by the unknown nonlinearities. Even if f is known to be nonlinear, the richness of nonlinearity would keep users from exhausting all possible nonlinear forms, making it difficult to find ny, nu and d. If the unknown nonlinearity is not a problem or at least not as severe as it was, it is possible to devise a procedure for dynamic order analysis for a NARX. The objective of the following methodology is to detangle the nonlinearity and dynamic orders, which makes it possible to define model orders. The methodology simply involves two stages of works. The first is to attenuate the unknown nonlinearity. The second is search for dynamic orders. 4.1.1 Nonlinearity Representation Nonlinearity could be explicitly or implicitly expressed. It is possible to transform a nonlinear dynamic model into a linear one if the nonlinear function is known. For instance, the following nonlinear dynamic model y (t ) = 0.4y (t −1)3 +u(t −1)3 + e(t ) (4.1) could be redefined as a linear dynamic model by static transformation z(t) = y(t1)3, v(t) = u(t1)3 Unknown nonlinearity could be addressed by using structurerich models such as neural network models, basis function models and fuzzy systems. These models are all universal approximators and able to capture almost any nonlinearity given enough flexibility. If a neural network model is used, one then could use the following procedure to find proper dynamic orders. A neural network is tried for different sets of ny, nu and d and the best set is then reported. Due to the application of a neural network, the nonlinearity is presumably addressed. The only affecting factors for modeling performance are ny, nu and d. It then is possible to find the set with the best performance. This approach is very general and could be applied to any scenarios, any nonlinear 32 dynamic models by any universal approximators. The drawback is the computational burden in terms of training ‘big’ models and efforts put to select appropriate network architecture (number of layers, nodes in each layer in neural networks; number of fuzzy subsets, number of rules in a fuzzy system). If simple models are preferred such as linear models, nonlinearity could be addressed by adaptation. Model parameters are recursively updated to track the model parameter variation caused by nonlinearity. Linear models with parameter adaptation require much less computation compared to ‘big’ models. The following example shows how convert a nonlinear dynamic process to a linear format. The example uses a NARX model defined by ( ) ( ) ( ) ( ) ( ) 3 3 1 1 1 1 y t y t u t e t y t − = + − + + − (4.2) which could be represented in a linear format y (t ) = a1 (t ) y (t −1) + b0 (t )u (t −1) + e(t ) (4.3) where a1(t) and b0(t) are timevarying model parameters and are defined in Equation (4.4) as functions of y(t1) and u(t1) to establish onetoone correspondence between Equation (4.3) match Equation (4.2) ( ) ( ) ( ) ( )2 1 2 0 1 , 1 1 1 a t b t u t y t = = − + − (4.4) In general, the nonlinear dynamic model in Equation (3.1) could be expressed in the following linear format ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 1 0 1 ny nu y t a t y t a t y t ny b t u t d b t u t nu d e t = − + + − + − + + − − + L L (4.5) 33 The linear format could be established from a known nonlinear dynamical model by onetoone correspondence as shown in Equation (4.4). However, the linear format is not always unique and one could have options. For instance, given a NARX model defined in Equation (4.6) ( ) ( ) ( )( ( ) ) ( ) ( ) ( ) ( ) 2 2 1 2 1 2.5 1 1 1 2 y t y t y t y t u t e t y t y t − − − + = + − + + − + − (4.6) It is possible to define a linear format ( ) ( ) ( ) ( ) ( ) ( ) 1 0 y t = a t y t −1 + b t u t −1 + e t (4.7) with ( ) ( )( ( ) ) ( ) ( ) ( ) 1 2 2 0 2 1 2.5 , 1 1 1 2 y t y t a t b t y t y t − − + = = + − + − another possibility is defined in Equation (4.8) with a different set of timevarying model parameters ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 1 2 0 y t = a t y t −1 + a t y t − 2 + b t u t −1 + e t (4.8) with ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 2 1 2 2 2 2 2 0 2.5 2 1 , , 1 1 1 2 1 1 2 y t y t a t a t b t y t y t y t y t − − = = = + − + − + − + − In general, it is rather difficult (maybe impossible) to extract the exact parameter functions as defined in Equation (4.4) from data only. There are few exceptions such as the one mentioned in (Young, 1993), where a1(t) and b0(t) are known to be linear functions of y(t1) and u(t1). 34 The nonlinear dynamic model in Equation (3.1) could also be approximately expressed by the following timevarying model ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 1 0 1 ny nu y t k t a t y t a t y t ny b t u t d b t u t nu d e t ≈ + − + + − + − + + − − + L L (4.9) The approximation is due to the firstorder Taylor expansion of Equation (3.1) with following definitions ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 0 0 0 0 0 0 0 0 0 1 1 1 t t t t f f k t y t y t y t ny y t y t ny f f u t u t nu d u t d y t ny ∂ ∂ = − − − − − − ∂ − ∂ − ∂ ∂ − − − − − ∂ − ∂ − L L ( ) ( ) ( ) ( ) 0 0 1 1 t ny t f a t y t f a t y t ny ∂ = ∂ − ∂ = ∂ − M (4.10) ( ) ( ) ( ) ( ) 0 0 0 t nu t f b t u t d f b t u t nu d ∂ = ∂ − ∂ = ∂ − − M where, t0 represents the reference point that the Taylor expansion is based on. The representation of Equation (3.1) by Equation (4.5) or (4.9) are different, although both share the same notations for timevarying model parameters a(t) and b(t). Equation (4.5) is due to the onetoone correspondence to Equation (3.1), while Equation (4.9) is based on onetoone correspondence to the firstorder Taylor expansion of Equation (3.1). The only difference is the additional timevarying intercept term k(t) in 35 Equation (4.9) and the following presented order determination procedure is applicable to both structures. 4.1.2 Recursive Estimation for Time Varying Parameters Equation (4.5) or (4.9) could be represented in a more compact format y (t ) = xT (t )θ(t ) + e(t ) (4.11) with ( ) ( ) ( ) ( ) ( ) ( ) 1 0 T ny nu θ t = k t a t L a t b t L b t ( ) 1 ( 1) ( ) ( ) ( ) T x t = y t − L y t − ny u t − d L u t − nu − d where, the constant regressor will be dropped if format in Equation (4.5) is used. The output prediction is then defined using the estimates of timevarying parameters yˆ (t ) = xT (t )θˆ (t ) (4.12) There are several different ways to estimate θ(t). Recursive estimation attempts to estimate local model parameters instantaneously. Another approach uses stochastic models to describe parameter variation if the statistics regarding parameter variation is assumed known. Among them, the simplest one is a random walk model. A Kalman filter is then used to estimate the timevarying parameter values as the states in the stochastic model. The second approach will not be investigated in this work since we assume the lack of knowledge on the statistics of parameter variation. Recursive estimation for parameter values, θ(t), is based on a timevarying weighted quadratic performance as below ( ) ( ( ) ( )) ( ) 2 1 , t J t y y w t τ τ τ τ = =Σ − ) (4.13) 36 where w(τ,t) is a weighting function. Commonly used weighting functions include rectangular window weighting and exponential weighting (Ljung & Soderstrom, 1986). In this work, the exponential weighting is used and described by, w(τ , t ) =α t−τ , τ = 0,1,L ,N (4.14) where the variable, α, a scalar between 0 and 1, is termed as forgetting factor. Figure 4.1 illustrates a particular exponential weighting with α = 0.95. Figure 4.1. Exponential weighting with α = 0.95 Using exponential weighting, the following equations (Young, 1984) are used to update model parameters from θˆ (N −1) to θˆ (N ) ( ) ( ) ( ) ( ) ( ) ( )( ( ) ( ) ( ) ) ( ) ( ) ( )( ( ) ( )) ( ) ( ( ) ( ) ( ) ( )) 1 ˆ ˆ 1 1 1 ˆ ˆ 1 ˆ 1 1 1 T T T y t t t t t t t t t t t t y t y t t t t t t α α − = − = − − + = − − − = − − − x θ H P x x P x θ θ H P P H x P (4.15) The forgetting factor, determines the influence of data in the past to the current estimation. The suggested range for α is between 0.9 and 0.99 (Young, 1984). In practice, trials for α might be needed for a balanced performance for nonlinearity adaptation speed and parameter estimation precision. 0.2 0.4 0.6 0.8 1 10 12 14 16 18 20 22 24 26 28 30 w(τ, 30) τ 37 4.1.3 Sequential Nearest Neighbor Rearrangement In the recursive estimation with exponential weighting, the tuning variable is the forgetting factor. When adjusting the forgetting factor, one should be aware of its conflicting affects on parameter estimates. The forgetting factor relates to the rate of variation. A smaller forgetting factor is expected for faster parameter variation. On the other hand, the precision of parameter estimates is determined by the size of data included in an “effective” window. The length of the window is also a function of the forgetting factor. Smaller is the tuning factor (shorter window), fewer data are included for estimation. In turn, the variance in estimates is high. Therefore, a larger forgetting factor should be preferred for higher estimation precision. However, a larger forgetting factor is only a good choice for slow parameter variation. The above argument verifies the suggested range for forgetting factor over 0.90, where the precaution is also mentioned for using exponential recursive estimation for slow variation at best (Young, 1984). As a result of the conflicting influence of forgetting factor on parameter estimates, dynamical nonlinear processes being dealt are expected to have slow parameter variation. Unfortunately, the nonlinearity is inherited in the data and determined by the nature of the process to be investigated. There is nothing one can possibly do to alter the nature of the process given only access to test it and generate inputoutput data. However, the nonlinearity is in fact not really the difficulty that we are aiming at but the source of difficulty, the parameter variation. The nonlinearity is believed to be the cause of parameter variation. It is desired to get around the inherited and inaccessible nonlinear nature of a process to change the parameter variation directly. If it is possible, the improvement of the recursive estimation becomes probable. As proposed below is an approach to manipulate raw data in time sequence to create an artificial sequence of data with slowed parameter variation. The following elaboration starts by defining parameter variation explicitly ( ) ( ) ( ) ( ) ( ) ( ) 1 1, , 1 0, , i i i i i i a t a t a t i ny b t b t b t j nu = − − = = − − = L L (4.16) 38 with the definition, the following vector collecting variations for all parameters is defined ( ) ( ) ( ) ( ) ( ) ( ) ( ) 1 0 1 T ny nu t t t a t a t b t b t = − − = θ θ θ L L (4.17) The parameter variation at t could then be quantified by θ(t ) , where norm is not specified. The parameter variation (pv) for the entire data set is ( ) 2 N t pv t = =Σ θ (4.18) If it is possible to minimize pv, it is then expected that resultant data set would be more suitable for a recursive estimation. The optimal solution would be a permutation of a sequence of number (1, …, N). Find the right permutation is like to solve a travelling salesman problem to find the shortest path traveling through all cities and visiting each city only once. The optimization problem is NPcomplete. In this work, a suboptimal solution is pursued rather than the exact optimal solution. The suboptimal solution is the result of a greedy procedure (Cormen, Leiserson, Rivest & Stein, 2001), where minimization of pv is decomposed into N1 simpler minimization problems. ( ( ) ( ) ( ) ) ( ) ( ) ( ) min 2 3 min 2 min 3 min N N + + + ≤ + + + θ θ θ θ θ θ L L (4.19) where N1 minimization problems are slightly dependent to each with dependence in every two consecutive tasks. The greedy procedure is then conduced as below. Assuming θ(1) is known, then θ(2) is searched for the problem of min θ(1)θ(2), which in turn determines θ(2). Subsequently, θ(2)θ(3) is minimized and θ(3) is determined. The procedure stops when θ(N) is determined. Two fundamental steps are involved in this procedure, determination of θ(1) and solving the problem of min θ(k1)θ(k) to determine θ(k). 39 With known θ(k1), the problem of min θ(k1)θ(k) is fully expanded as below ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 1 0 1 0 min 1 min , , , , , min min T ny nu ny nu i j i j k k a k a k b k b k a k b k = = − − = ≤Σ +Σ θ θ L L (4.20) The bound is due to the triangular inequality. The minimization of θ(k1)θ(k) is then translated to minimize ny+nu+1 smaller objectives simultaneously. Given a timevarying model, the parameters ai(k) and bj(k) can be expressed as functions of all states ( ) ( ) ( ) ( ) ( ) 1 , , , , , i i y t y t ny a k a u t d u t nu d − − = − − − L L (4.21) The expression for bj(t) is similar. Note that the indices are different in both sides of Equation (4.21), which simply means that the kth sample in the optimal result is the tth sample in time order. If ai(k1) is known and its correspondence sample in time order is τ. ( ) ( ) ( ) ( ) ( ) 1 , , , 1 , , i i y y ny a k a u d u nu d τ τ τ τ − − − = − − − L L The exact functional form of ai is unknown. If its continuity and differentiability are assumed and its high order derivatives are assumed to be negligible, the difference between ai(k1) and ai(k) could be approximated b ( ) ( ) ( ) ( ( ) ( )) ( ) ( ( ) ( )) 0 0 1 0 1 ny i i i t i nu i t j a a k a k y i y t i y t i a u j d u t j d u t j d τ τ = = ∂ − − ≈ − − − ∂ − ∂ + − − − − − ∂ − − Σ Σ (4.22) If the first order derivative is bounded by a constant Gai, the minimization of ai(k1)  ai(k) could be approached by 40 min ( 1) ( ) min ( ) ( ) i i i t a k a k Ga t τ τ ≠ − − ≤ x − x (4.23) where, x is defined in Equation (4.11) and t becomes the decision variable. Since the functional form is uniformly assumed for all parameter functions, the solution of Problem (4.23) will simultaneously minimize the all upper bounds. In this work, the Euclidean norm is used and described by ( ) ( ) ( ( ) ( )) ( ( ) ( )) 2 2 2 1 0 ny nu i i τ t y τ i y t i u τ d i u t d i = = x − x = Σ − − − +Σ − − − − − (4.24) A nearest neighbor will define the solution for Problem (4.20). The solving procedure is then termed as Sequential Nearest Neighbor Rearrangement (SNNR). The resultant regressor and output are labeled as xsnnr and ysnnr . The rearrangement starts letting xsnnr(1) = x(1) and ysnnr(1) = y(1). If the nearest neighbor of xsnnr(1) is found to be x(t), x(t) and y(t) is then added to the rearranged data set by letting xsnnr(2) = x(t) and ysnnr(2) = y(t). Then the nearest neighbor of xsnnr(2) is found and added to the rearranged data set. The procedure continues until the xsnnr(N) is found. By conducting the SNNR, the raw data in timesequence is reorganized in spatialorder. The treatment is expected to reduce the parameter variation, which enables the choice of a larger forgetting factor, α which in turn improves the parameter estimates. The results of the SNNR procedure are the basis for the analysis in the following section for dynamic order determination. However, first is a demonstration of the impact of the SNNR procedure on parameter variation as well as recursive estimation. The demonstration is based on the deterministic nonlinear dynamic model in Equation (4.25) ( ) ( ) ( ) ( )3 2 1 1 1 1 y t y t u t y t − = + − + − (4.25) 41 Figure 4.2 shows the first 1000 out of 5000 samples generated from the deterministic model when u(t) is driven by a “skyline” function. Figure 4.2. Data generated from the model in Equaiton (4.25) The timevarying model parameters a1(t) and b0(t) are defined in Equation (4.4) and their variation over time is shown in Figure 4.3. Figure 4.3. Time varying parameters a1(t) and b0(t) in Equation (4.4) The parameter variation (pv) defined Equation (4.18) is then evaluted using the 0 200 400 600 800 1000 1 0 1 u(t) 0 200 400 600 800 1000 2 0 2 y(t) 0 200 400 600 800 1000 0 0.5 1 a1(t) 0 200 400 600 800 1000 0 0.5 1 b0(t) 42 Euclidean norm, ( ( ) ( )) ( ( ) ( )) 5000 2 2 1 1 0 0 2 1 1 t pv a t a t b t b t = = Σ − − + − − . The obtained pv is 99.25. The mean squared error (MSE) resulted from a recursive estimation on the timesequenced data is 0.0044. The SNNR operation is illustrated on a segment of data with 10 samples. The raw data in time sequence is shown in Table 4.1 indexed by t. Table 4.1. A segment of 10 data samples in time sequence t 1 2 3 4 5 6 7 8 9 10 y(t1) 0 0.2488 0.8683 0.7200 0.3775 1.1465 0.2815 0.1014 0.8542 0.1648 u(t1) 0 0.2076 0.7200 0.7603 0.3617 0.8668 0.0913 0.3199 0.7120 0.2645 The SNNR rearranged data is shown in Table 4.2 and indexed by k. The index t in Table 4.2 tracks the rearrangement and relates the kth data sample in Table 4.2 to its original position in Table 4.1. Two regressors in Table 4.2 are denoted by y1 and u1 rather than the timelagged notations in the original time sequence data set. Table 4.2. SNNR rearranged data for the timesequence data in Table 4.1 k 1 2 3 4 5 6 7 8 9 10 t 1 7 8 10 2 5 9 3 4 6 y1 0 0.2815 0.1014 0.1648 0.2488 0.3775 0.8542 0.8683 0.9385 1.1465 u1 0 0.0913 0.3199 0.2645 0.2076 0.3617 0.7120 0.7200 0.7603 0.8668 Figure 4.4 shows the first 1000 samples of SNNR rearranged data for the timesequenced data in Figure 4.2. It is observed that the abrupt transition between adjaent levels in Figure 4.2 for both u(t) and y(t) is replaced by a smooth transition in both u1 and y1 in Figure 4.4. 43 Figure 4.4. SNNR Rearranged regressors from the timesequence data in Figure 4.1 For the rearranged data, the varying parameters are redefined in terms of u1 and y1 ( ) ( ( ) ) 2 1 1 1 a k 1 y k − = + ( ) ( )2 0 1 b k = u k The variation of a1(k) and b0(k) is shown in Figure 4.5, which results in a parameter variation of 32.03, only about a third of that in the timesequence data. The mean squared error (MSE) resulted from a recursive estimation on the rearranged data is 0.0022, which is half of that in the timesequence data. 0 200 400 600 800 1000 1 0.5 0 0.5 1 y1 0 200 400 600 800 1000 1 0.5 0 0.5 u1 44 Figure 4.5. Varying parameters for the SNNR rearranged data. The same test and comparison is conducted on 6 deterministic models, their stochastic versions are defined in Equations (4.40~4.45). The results are summarized in Table 4.3. Table 4.3. MSE for a recursive estimation Time Sequence SNNR Model 1 0.0148 0.0112 Model 2 0.0486 0.0084 Model 3 0.0044 0.0025 Model 4 0.0022 0.0017 Model 5 0.0064 0.0034 Model 6 7.38e8 3.57e5 As observed in Table 4.3, SNNR is able to reduce the MSE in the Models 1~5. 0 200 400 600 800 1000 0.7 0.8 0.9 1 a1 0 200 400 600 800 1000 0 0.2 0.4 0.6 0.8 b0 45 Increase of MSE is however observed in the Model 6 test, where the tested model is linear. Therefore, the increase of MSE might signal the ineffectiveness of SNNR treatment and imply that the model is linear. Using this feature, one might use the SNNR to tell if a given process is linear or nonlinear. 4.1.4 Model Comparison Criterion The methodology for determination of dynamic orders could be trying different sets of ny, nu and d and find the best values. Given a set of ny, nu and d, regressors are determined first on the original timesequenced data, x(t). A SNNR is then conducted on x(t) and y(t) producing xsnnr(t) and ysnnr(t), to which an exponential weighting recursive estimation will be applied. The quality of the hypothesized ny, nu and d will then be evaluated by a criterion considering both fitting and generalization performance. In this work, the evaluation is based upon a modified final prediction error (FPE) criterion. The original FPE (Ljung, 1999) is defined for a linear model with N samples by 2 ( ) 1 1 ˆ t, N N t N np FPE N np N ε = + = − Σ θ (4.26) Equation (4.26) can be interpreted as a weighted mean squared error where the weighting is determined by N, the size of data set as well as the model complexity, np, the number of parameters. The FPE criterion results from the performance index 2 ( ) 1 t, ˆ N N N t V ε = =Σ θ (4.27) In application to exponentiallyweighted recursive estimation, the definition of FPE is modified according to the exponentially weighted performance index t 2 ( ) t 1 t, ˆ k k k k V α − ε = =Σ θ (4.28) where Vk is varying, and progressively includes more data. The weighting factor, αkt would become very small for longpast data sets, making the remote error 46 inconsequential in estimating θk. A critical number L is hence introduced to decompose Vk as below ( ) ( ) ( ) t 2 t 2 t 1 t 1 t 2 t 1 t, ˆ t, ˆ t, ˆ k L k k k k k k k L k k k k L V α ε α ε α ε − − − = = − + − = − + = + ≈ Σ Σ Σ θ θ θ (4.29) where, Vk is approximated by its recent portion. By this approximation, the number of data involved in Vk is a constant, L. Subsequently, the FPE based on Vk is redefined ( ) t 2 ( ) t 1 1 ˆ t, k k k k L L np FPE k L np L α − ε = − + + = − Σ θ (4.30) where, the implicit constraints on t by kL+1≥1 and k≤N bound k between L and N. The average of FPE(k) over all k is then defined ( ) t 2 ( ) t 1 1 1 t, ˆ 1 N k L N k k k k L k L FPE FPE k N L L L np N L L np α ε = − = = − + = − + + = − + − Σ Σ Σ θ (4.31) where, the double sum is decomposed into three parts after being switched ( ) ( ) ( ) ( ) 1 t 1 t 2 t 2 t 1 t 1 1 t 1 t 2 t 2 t t t 2 t t, ˆ t, ˆ t, ˆ t, ˆ N k L L k k k k k L k L k L N L L N N k k k k L k N L k α ε α ε α ε α ε − + − − − = = − + = = − + + − − − = = = − + = = + + Σ Σ Σ Σ Σ Σ Σ Σ θ θ θ θ (4.32) The recursive estimation works well if parameter variation within a local range is assumed to be small t 1 t 2 t ˆ ˆ ˆ +L− +L− θ ≈ θ ≈L ≈ θ (4.33) which in turn results in the following approximation 47 2 ( ) 2 ( ) 2 ( ) t 1 t 2 t t, ˆ t, ˆ t, ˆ L L ε ε ε + − + − θ ≈ θ ≈L ≈ θ (4.34) The double sum is then simplified to ( ) ( ) ( ) ( ) 1 t 1 t 2 2 t t 1 t 1 1 t 1 2 t 2 t t t t 2 t t t, ˆ t, ˆ t, ˆ N k L L k k t k L k L k L N L L N N k k t t L k N L k α ε ε α ε α ε α − + − − − = = − + = = − + + − − − = = = − + = = + + Σ Σ Σ Σ Σ Σ Σ Σ θ θ θ (4.35) If N is large, the second part dominates, which results in a further simplified average FPE as ( ) 1 1 0 2 t t, ˆ 1 L k N L k t L L L np FPE N L L np α ε − − + = = + ≈ − + − Σ Σ θ (4.36) The average FPE in Equation (4.36) is similar to the original one in Equation (4.26), and has the same interpretation as a weighted prediction error, except that the weighting is different. Once L is chosen, the first term on the righthand side of Equation (4.36) is a constant. Then Equations (4.26) and (4.36) are similar, with L the data window length, replacing N, the total number of data. A simplified FPE in Equation (4.37) is used in this work and will continue to be denoted as FPE ( ) 1 2 , ˆ N L t t L L np FPE t L np ε − + = + = − Σ θ (4.37) The value of L is related to the decomposition by Equation (4.29) and determined by considering αL small enough to be negligible. In this work, L is determined as below 4 1 L α = − (4.38) where (1α)1 is termed as memory timeconstant (Ljung, 1999). As shown in Figure 4.6 , the specification of L in Equation (4.29) will ignore the past data with weights less than 48 0.02. Additionally, the number α4/(1α) remains relatively constant between 0.016 and 0.018 if α is over 0.9, which is a common choice for a forgetting factor. Figure 4.6. α vs. α4/(1 α) (the weight for the most remote data) 4.1.5 Regressor Selection Procedure Given several sets of ny, nu and d, their FPEs are evaluated. The set with the minimum FPE on SNNR data is reported including the determined orders. The determination procedure could be conducted in an exhaustive approach for all possible combinations of different ny, nu and d given predefined max_ny, max_nu and max_d for possible maximum ny, nu and d. The pseudocode for the exhaustive search is shown in Figure 4.7. Figure 4.7. Exhaustive dynamic order search 0.014 0.015 0.016 0.017 0.018 0.019 0.9 0.92 0.94 0.96 0.98 1 α4/(1α) α Exhaustive order selection for ny = 1 to max_ny for nu = 0 to max_nu for d = 1 to max_d Compute FPE(ny,nu,d) Keep the minmum FPE end loop end loop end loop return the ny,nu and d with minimum FPE 49 One concern with the exhaustive search is the computational burden. The pay off of the expensive exhaustive search is optimality of the final solution. Suboptimal search techniques are available in a subset selection for linear regression. Subset selection methods include forward selection, backward elimination, cycling replacement as well as heuristic combinatorial search (Miller, 1990). For linear regression problems, one could fully exploit the superposition feature in a linear model to simplify a search. It explains that subset selection method is always accompanied by orthogonalization. An orthogonalization procedure removes the redundant components of two regressors and eliminates the candidate regressors that are highly correlated with selected regressors. In nonlinear systems with unknown nonlinearity, orthogonalization is not possible. However, it does not mean that the subset selection is inapplicable. In this work, a forward selection procedure combing the above mentioned recursive estimation on spatially ordered data is proposed to find important regressors. The procedure starts with users’ input max_ny, max_nu and max_d. Then, a number of candidate regressors are generated and denoted as [x1 x2 x3 … xm xrandom]. xrandom is a random regressor that presumably contains no meaningful information to predict output. At first, m+1 FPEs are computed for (y,[x1]), (y, [x2]), … , (y, [xm]), (y, [xrandom]), where y is the output and xi in bracket is the regressor in consideration. The regressor with the minimum FPE is selected. If x2, for instance, is the first selected regressor, there will be other m FPEs to be evaluated for (y, [x2, x1]), (y, [x2, x3]), …, (y, [x2, xm]), (y, [x2, xrandom]). Each bracket contains a combination of x2 (first selected) with the rest. The regressor combination with the minimum FPE is then kept. The selection continues until the minimum FPE increases or the xrandom is selected. The injection of a random regressor is mentioned in (Miller, 1990) as a stopping criterion. The selection of xrandom signifies that the rest of candidates are less influential on y(t) than a presumably irrelevant one. The selected regressors might define values of ny, nu and d if selected regressors are consecutive due to implicit constraint on the model structure in Equation (3.1), which requires consecutive regressors. For instance, a set of regressors [y(t1), y(t2), u(t1), u(t 2)] defines ny=2, nu=1, and, d=1. Absences, however, could exist in selected regressors such as [y(t1) y(t4) u(t1) u(t3)], which does not correspond a set of ny, nu and d. 50 It seems unlikely in most situations that y(t2), y(t3) and u(t2) should not be included. However, if there are strong correlations or recycle phenomena, those missing variables may be redundant, and the particular selection may not be unique. Another realization of excitement and noise, might select another subset from the correlated variables. The inclusion of redundant variables increases the model complexity. However, for database management simplicity, in this work, if the situation with absence occurs, a further comparison is executed on different order values. For the illustrated example, an exhaustive comparison is conducted on possible values of ny=1, 2, 3 or 4 combined the possible values of nu=0, 1, or 2, with d = 1. However, the extra computation would be unnecessary if the constraint on having consecutive regressors is dropped. 4.2 Nonlinear Component Detection There is an implicit assumption made on the above SNNR operation. The timevarying parameters are functions of all regressors. The assumption is valid for the dynamic model in Equation (4.2), where parameters are functions of two regressors, u(t 1) and y(t1). The model in Equation (4.6) has regressors y(t1), y(t2) and u(t1). However the parameters a1 and a2 are functions of only y(t1) and y(t2). The regressor u(t1) has no impact on parameter variation. It is then expected that the SNNR on [y(t1) y(t2)] might reduce more parameter variation than operating SNNR on [y(t1) y(t2) u(t 1)]. The further reduction in parameter variation should be revealed by a smaller MSE resulted from a recursive estimation. An extension of the SNNRbased order determination procedure is the used to detect the regressors that are affecting the output nonlinearly. The detected regressors are termed as nonlinear components and to be used as antecedent variables in Chapter 5. The purpose of conducting SNNR is to reduce parameter variation so that the recursive estimation is able to capture the variation better, which in turn, results in a smaller MSE. The SNNR mentioned above rearranges data based on all the regressors in order to compare different sets of ny, nu and d. However, it is possible that only a subset of regressors is affecting timevarying parameters. The subset is denoted by [c1,...,cnc]. It is a subset of selected regressors denoted by [x1,…,xnx]. The regressors not included in 51 [c1,...,cnc] have no affect on parameter variation. It is then expected that a SNNR on [c1,...,cnc] only would be able to reduce more parameter variation and produce an smaller MSE. There are totally 2nx1 subsets in [x1,…,xnx] excluding the empty one. Each subset from [x1,…,xnx] is considered as a candidate set of nonlinear components, [c1,...,cnc], on which the SNNR is conducted and a corresponding MSE is computed. The subset with minimum MSE is reported to contain the nonlinear components. 4.3 Extension to MIMO Processes Extending the above technique to a MIMO(m,n) (m inputs and n outputs) process is straight forward. The SISO model in Equation (3.1) is expanded as below for the kth output by including more regressors. ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 1 1 1 1 1 1 1 1 1 1 1 1 , , , , , , , , , , k k kk y y k k k y y k k n kn kn kn k u u k k k u u m km m km km y t y n ny y t d y t ny d y t f y t d y t ny d e t u t d u t nu d u t d u t nu d − − − − − = − − − + − − − − − − L L L L L L L (4.39) where dynamic orders include nyk1, .., nykn and nuk1, …, nukm, and delay 1y , , y k kn d L d between yk and other outputs as well as delay 1u , , u k km d L d between yk and all inputs. All of these numbers are to be determined using the above method for the single output case. The nonlinear components for yk are then selected after orders are determined. 4.4 Simulations and Discussions 4.4.1 Testing Models and Processes The proposed order determination and nonlinear component detection method are tested on data generated by several nonlinear dynamic models, an experimental unit and a distillation column simulator. The first five models are nonlinear autoregressive with exogenous inputs models (NARX). They are different in terms of nonlinear interactions 52 between inputs and outputs. Model 1 has nonlinearity only in the lagged input, u(t1). Model 2 is nonlinear in lagged output only. Model 3 is nonlinear in both lagged input and output, u(t1) and y(t1). Model 4 is also nonlinear in both lagged input and output but have more regressors included than Model 3. Like Model 1, Model 5 is another model with nonlinearity in the lagged input, u(t1). The nonlinear function with respect to u(t1) is, however, different in both models. Model 6 is a linear ARX model used only once to demonstrate the impact of SNNR on recursive estimation with result in Table 4.3. The input signals used in the first five models are generated by a skyline function and bounded between 1 and 1. The shortest and longest durations are 20 and 50 samples respectively. Output signals are initialized as zeros. The noise e(t) is subject to a normal distribution, N(0,σ2). The value of σ is different in each model and specified such that e(t) has a small magnitude compared to outputs. As below, a portion of inputoutput data for the first five models is illustrated along with model equations. A total of 5000 samples are generated and used in order determination and nonlinear component detection. Model 1 (Narendra & Parthasarathy, 1990) ( ) ( ) ( ) ( ( )) ( ( )) ( ( )) ( ) 0.3 1 0.6 2 0.6sin 1 0.3sin 3 1 0.1sin 5 1 y t y t y t u t u t u t e t π π π = − + − + − + − + − + (4.40) where e(t)~N(0,0.52) Figure 4.8. Inputoutput data generated for Model 1 in Equation (4.40) 0 100 200 300 400 500 600 700 800 900 1000 1 0 1 t u 0 100 200 300 400 500 600 700 800 900 1000 10 0 10 t y 53 Model 2 (Narendra & Parthasarathy, 1990) ( ) ( ) ( )( ( ) ) ( ) ( ) ( ) ( ) 2 2 1 2 1 2.5 1 1 1 2 y t y t y t y t u t e t y t y t − − − + = + − + + − + − (4.41) where e(t)~N(0,0.52) Figure 4.9. Inputoutput data generated for Model 2 in Equation (4.41) Model 3 (Narendra & Parthasarathy, 1990) ( ) ( ) ( ) ( ) ( ) 3 2 1 1 1 1 y t y t u t e t y t − = + − + + − (4.42) where e(t)~N(0,0.52) Figure 4.10. Inputoutput data generated for Model 3 in Equation (4.42) 0 100 200 300 400 500 600 700 800 900 1000 1 0 1 t u 0 100 200 300 400 500 600 700 800 900 1000 5 0 5 t y 0 100 200 300 400 500 600 700 800 900 1000 1 0 1 t u 0 100 200 300 400 500 600 700 800 900 1000 2 0 2 4 t y 54 Model 4 (Narendra & Parthasarathy, 1990) ( ) ( ) ( ) ( ) ( )( ( ) ) ( ) ( ) ( ) ( ) 2 2 1 2 3 2 3 1 1 1 3 2 y t y t y t u t y t u t y t e t y t y t − − − − − − + − = + + − + − (4.43) where e(t)~N(0,0.052) Figure 4.11. Inputoutput data generated for Model 4 in Equation (4.43) Model 5 (Narendra & Parthasarathy, 1990) y (t ) = 0.8y (t −1) + (u (t −1) − 0.8)u (t −1)(u (t −1) + 0.5) + e (t ) (4.44) where e(t)~N(0,0.12) Figure 4.12. Inputoutput data generated for Model 5 in Equation (4.44) 0 200 400 600 800 1000 1 0 1 t u 0 200 400 600 800 1000 1 0.5 0 0.5 1 t y 0 100 200 300 400 500 600 700 800 900 1000 1 0 1 t u 0 100 200 300 400 500 600 700 800 900 1000 1 0 1 t y 55 Model 6: y (t ) = 0.8y (t −1) + 0.6y (t − 2) + 0.4u (t −1) (4.45) Models 7 and 8 are two deterministic nonlinear dynamic models. Model 7 ( ) ( ) ( )2 y t = 0.8y t −1 +u t −1 (4.46) Model 8 y (t ) = 0.8y (t −1) + cos (π u(t −1)) (4.47) Different from Models 15, Model 7 has a quadratic term u(t1), where u(t) is also generated by a “skyline” function between 1 and 1. The effect of u(t) on y(t) would be missed in average. As below, Equation (4.48) is the linear timevarying model for Model 7 with a1(t)=0.8 and b0(t)=u(t1). In average, the effect of u(t1) in Equation (4.48) is reflected by E(b0(t)). In this case, E(b0(t)) is 0 since u(t) is a random signal between 1 and 1. ( ) ( ) ( ) ( ) ( ) 1 0 y t = a t y t −1 + b t u t −1 (4.48) Therefore, the regressor u(t1) would be missed if a recursive estimation is conducted in time sequence, where b0(t) is a random number between 1 and 1 in time sequence The recursively estimated b0(t) would be wandering around zero. However, the proposed SNNR is able to reveal the impact of u(t1) on model output. By rearrangement, the randomness in u(t1) is eliminated. Consequently, the varying parameter, b0, is no longer a random variable but gradually increases from 1 to 1. A recursive estimation on the rearranged data is then able to reflect the impact of u(t1) on y(t). Model 8 has a quadraticlike term cos(πu(t1)) and will be used to test the proposed order determination technique. Model 9 in Equation (4.49) is used to demonstrate the nonuniqueness of obtained result as discussed in Section 4.1.1. By observing Equation (4.49), the nonlinear component could be either y(t1) or y(t2). A detailed test will reveal the observation. 56 Model 9 y (t ) = 0.2y (t −1) y (t − 2) + u (t −1) (4.49) Models 10 and 12 are deterministic nonlinear autoregressive (NAR) models in (Molina, Sampson, Fitzgerald & Niranjan, 1996) and used for method comparison. Models 11 and 13 are derived from Models 10 and 12 with noise added to the output and used to compare the influence of noise on different methods. The noise e(t) in Models 11 and 13 has a small magnitude compared to output signals and is subject to N(0,σ2), where σ2 is set to about one thousandth of the average magnitude of output signal in the corresponding deterministic models. Model 10 (Molina, Sampson, Fitzgerald & Niranjan, 1996) y (t ) = 4y (t −1)(1− y (t −1)) (4.50) Model 11: ( ) ( )( ( )) ( ) ( ) ( ) 4 1 1 1 o o o o y t y t y t y t y t e t = − − − = + (4.51) where e(t)~N(0,0.02252) Figure 4.13. Data generated for Model 10 in Equation (4.50) 0 10 20 30 40 50 60 70 80 90 100 0 0.2 0.4 0.6 0.8 1 t y 57 Model 12 (Molina, Sampson, Fitzgerald & Niranjan, 1996) y (t ) = 1−1.4y (t −1)2 + 0.3y (t − 2) (4.52) Model 13: ( ) ( ) ( ) ( ) ( ) ( ) 2 1 1.4 1 0.3 2 o o o o y t y t y t y t y t e t = − − + − = + (4.53) where e(t)~N(0,0.02722) Figure 4.14. Data generated for Model 12 in Equation (4.52) Model 14: Twophase flow process Figure 4.15 shows an experiment setup of a twophase flow process in the unit operation lab in the School of Chemical Engineering at Oklahoma State University. This unit is managed by a laboratory scale distributed control system, Camile. The schematic diagram of the process is shown in Figure 4.16. Both bottom and top pressures of the vertical pipe are measured. There are two air flow supplies labeled as ‘Small air’ and ‘Large air’ in Figure 4.16. Air from the two pipes merges and flows to a T, whose outlet end is connected to the bottom of the vertical pipe. The other inlet end of the T is connected to the water pipe labeled as ‘water’ in Figure 4.16. 0 10 20 30 40 50 60 70 80 90 100 1 0.5 0 0.5 1 1.5 t y 58 In this work, this unit is used to study the dynamics between mixed air & water and the pressure drop across the vertical pipe. Experiment is conducted in an open loop and only the air valve opening (‘Large air’ pipe) is manually changed. The ‘Small air’ pipe is closed. The ‘water’ pipe is controlled at 20 lbmol/hr. The measurements of the water flowrate in the ‘water’ pipe are shown in Figure 4.17. Figure 4.15. The twophase flow experiment setup Pressure transducer “T” Solenoid valve Pressure tap Water Small air Large air 59 Figure 4.16. The schematic diagram for the two phase flow experiment Figure 4.17. Water flowrate measurements with set point at 20 lbmol/hr 0 500 1000 1500 2000 2500 3000 3500 4000 4500 18 19 20 21 22 t water 60 The process could be defined differently by taking signals from different channels. Figure 4.18(a) shows a possible choice. The input, u is chosen to be the measurement of the air flowrate. The output, y is the measuremtn of pressure drop, the difference between top and bottom pressure shown in Figure 4.16. A portion of 4500 measurements are displayed in Figure 4.18(b). There are totally 8830 measurements are recorded. Although the control interval was 0.1 second, the sampling rate for this data was chosen as 0.5 second. (a) (b) Figure 4.18. A choice of input and output channels; input, u is the measurement of air flowrate and output, y is the pressure drop measurement. a) The flowchart; b) The corresponding input and output data As observed in Figure 4.18(b), the pressure drop measurement at low values is 0 500 1000 1500 2000 2500 3000 3500 4000 4500 0 10 20 30 t u 0 500 1000 1500 2000 2500 3000 3500 4000 4500 0 100 200 300 t y 61 noisier than at high values of y. A firstorder filter is added in the data acquisition and control devise ( a Camile 2000 unit) to suppress some noise in y for observation convenience. With the filter included, Figure 4.19(a) shows another possible process definition. The input, denoted as us is the command signal for the air valve opening, which as shown precedes the air flowrate measurement. The output becomes the filtered pressure drop measurement and denoted as yf. The data is shown in Figure 4.19(b). (a) (b) Figure 4.19. A choice of input and output channels; input, us is the signal to the air valve opening and output, yf is the filtered pressure drop measurement. a) The flowchart; b) The corresponding input and output data 0 500 1000 1500 2000 2500 3000 3500 4000 4500 0 50 100 t u 0 500 1000 1500 2000 2500 3000 3500 4000 4500 0 100 200 t y 62 Model 15: Binary distillation column Model 15 is a methanolwater binary distillation column simulator (Ou & Rhinehart, 2002) modified to have 20 trays. The distillation column simulator is a MIMO process. Two inputs are reflux flowrate (gmol/hr), u1, and reboiler heating percentage (TY%), u2. The sample interval is 30 seconds. The reflux flowrate varies between 50 and 90 (gmol/hr) and heating percentage is between 40% and 55%. The duration time for each step change randomly varies between 0.05 and 1 hour. The first 1000 samples of inputs are illustrated in Figure 4.20. Figure 4.20. Reflux flowrate (solid line) and reboiler heat rate (dash line) Inputs to the distillation column Two outputs, y1 and y2, are the overhead and bottom concentrations of methanol xD and xB, in mole faction. The first 1000 output samples are shown in Figure 4.21. 30 40 50 60 70 80 90 0 200 400 600 800 1000 time (sampling numbers) Reflux (lbmol/hr) TY (%) 63 Figure 4.21. The xD (solid line, left scale) and xB (dash line, right scale) in distillation column experiments 4.4.2 Testing on Dynamic Order Determination An example is presented at first to demonstrate the details in order determination. The example is based on the deterministic version of Model 2. The first 1000 data samples are shown in Figure 4.22 and a total 5000 data are generated and used for the order determination. Figure 4.22. Data generated for the determinist version of model in Equation (4.42) In using forward selection to select important regressors, the maximum ny, nu and 0 0.02 0.04 0.06 0.08 0.1 0 0.2 0.4 0.6 0.8 1 1.2 0 200 400 600 800 1000 time (sampling numbers) xD xB 0 100 200 300 400 500 600 700 800 900 1000 1 0 1 t u 0 100 200 300 400 500 600 700 800 900 1000 0 2 4 t y 64 maximum d are set to 5, 4 and 1 respectively. The selection procedure is collected in Table 4.4. Regressor forward selection for data in Figure 4.22 Step y(t1) y(t2) y(t3) y(t4) y(t5) u(t1) u(t2) u(t3) u(t4) u(t5) random 1 0.0516 0.1248 0.2360 0.3572 0.4907 0.4246 0.3697 0.3243 0.3323 0.3661 3.348 2 0.0408 0.0390 0.0400 0.0429 0.0131 0.0371 0.0411 0.0510 0.0534 0.0551 3 0.0045 0.0060 0.0082 0.0146 2.1E+56 0.0117 0.0089 0.0114 0.0193 4 0.0057 0.0073 0.0055 0.0043 0.0093 0.0035 0.0059 0.0098 5 0.0068 0.0042 0.0062 0.0062 1.9E+10 0.0043 0.0105 In Table 4.4, there are 11 regressors including 10 timelagged regressors and a random regressor, the last one. At the first run, all 11 regressors are tried one by one. Their corresponding FPEs are recorded in the first row. Among them, the one with the smallest FPE at 0.0516 is chosen, and the related regressor is y(t1). In the next step, the selected regressor, y(t1) is combined with the rest of 10 regressors. The results of 10 trials are in row 2, where the minimum FPE is due to u(t1) at 0.0131. The blank for y(t 1) in row 2 only indicates that y(t1) has been included. Continuing on this procedure, we then need have both y(t1) and u(t1) included and try their combinations with rest of 9 candidate regressors. The next minimum FPE is 0.0045 for y(t2). Then y(t2) is included. The next discovery is u(t4) with FPE at 0.0035. At the fifth step, the minimum FPE is 0.042, which is however greater than the previous minimum FPE of 0.035. The increase in FPE signals to terminate the forward selection. The above forward selection selects the four regressors [y(t1) y(t2) u(t1) u(t4)]. In theory, one could create an arbitrary model including these regressors. In practice, it is however unlikely to exclude u(t2) and u(t3) while having u(t4) is included. In addition, the objective of this work is to determine dynamic orders, ny and nu. In order to include u(t4), nu and d should be set to 3 and 1 respectively. This configuration however contains additional regressors u(t2) and u(t3), which are however rejected by the forward selection. In this work, a minor exhaustive search is conducted to compare different values for several values for nu, 0, 1, 2 and 3 with fixed ny at 2 and d at 1. The result is collected in Table 4.5, where the best value for nu is 0 with the minimum FPE of 65 0.0045. Therefore, the determined regressors are [y(t1) y(t2) u(t1)] with ny=2, nu=0, d=1. Table 4.5. Exhaustive search on nu with ny=2, d=1 for data in Figure 4.22 nu 0 1 2 3 FPE 0.0045 0.0275 0.0066 0.0378 The above order determination procedure by a forward selection followed by a minor exhaustive search uses SNNR rearranged data in recursive estimation. To reveal the impact of SNNR on order determination, the forward selection procedure is repeated for the same data set without using SNNR. The details of selection are collected in Table 4.6. The selected regressors are y(t1), u(t1) and u(t2). No minor exhaustive search is needed. Compared the model definition in Equation (4.2), the result misses y(t1) while find u(t2) that is not presented in the deterministic model. We simply state that the result include two ‘mistakes’. Table 4.6. Regressor forward selection for data in Figure 4.22 using timesequence data Stepy(t1) y(t2) y(t3) y(t4) y(t5) u(t1) u(t2) u(t3) u(t4) u(t5) random 1 0.0526 0.1260 0.2367 0.3611 0.5010 1.6116 1.5373 1.4841 1.4906 1.5271 5.7726 2 0.0556 0.0521 0.0533 0.0530 0.0462 0.0557 0.0634 0.0619 0.0588 0.0555 3 0.0515 0.0508 0.0506 0.0502 0.0301 0.0501 0.0493 0.0493 0.0484 4 0.0352 0.0325 0.0322 0.0317 0.0402 0.0401 0.0372 0.0316 The order determination method is also applied to other deterministic models, deterministic versions of models in Equations (4.40~4.44). The results are summarized in Table 4.7, which also include the results using original timesequence data for comparison. 66 Table 4.7. Regressors determined for deterministic versions of Models 15 Model Time Sequence SNNR Truth 1 y(t1)y(t2)y(t3) y(t1)y(t2)y(t3)u(t1) y(t1)y(t2)u(t1) 2 y(t1)u(t1)u(t2) y(t1) y(t2) u(t1) y(t1)y(t2)u(t1) 3 y(t1) u(t1) y(t1)u(t1) y(t1) u(t1) 4 u(t1) y(t1)y(t2)y(t3)u(t1) y(t1)y(t2)y(t3)u(t1)u(t2) 5 y(t1)y(t2) y(t1)u(t1) y(t1)u(t1) As observed in Table 4.7, both approaches are tied in the Model 3 test. In the Model 1 test, the ‘Time Sequence’ misses u(t1) but adds y(t3), making 2 mistakes, while the ‘SNNR’ adds y(t3), making 1 mistake. In the Model 2 test, the ‘Time Sequence’ misses y(t2) but adds u(t2), making 2 mistakes. The ‘SNNR’ makes 1 mistakes in the Model 4 test while ‘Time Sequence’ makes 4 mistakes by finding only u(t1). In the Model 5 test, ‘Time Sequence’ adds y(t2) but misses u(t1). For the first 5 tests, the ‘SNNR has 2 mistakes while the ‘Time Sequence’ makes 10 mistakes. Illustrated by this comparison, neither approach is perfect, but the ‘SNNR’ outperforms the ‘Time Sequence’ in terms of number of mistakes made. Tables 4.8 collects the comparison results using timesequence and SNNR rearranged data for stochastic models in Equations (4.40~4.44) with example data shown in Figures 4.8~4.12. Table 4.8. Regressors determined for Models 15 Model Time Sequence SNNR Truth 1 y(t1)y(t2)u(t1) y(t1)y(t2)u(t1) y(t1)y(t2)u(t1) 2 y(t1)y(t2)y(t3)u(t1) y(t1) y(t2) u(t1) y(t1)y(t2)u(t1) 3 y(t1) u(t1) y(t1)u(t1) y(t1) u(t1) 4 u(t1) y(t1)y(t2)y(t3)u(t1) y(t1)y(t2)y(t3)u(t1)u(t2) 5 y(t1)u(t1) y(t1)u(t1) y(t1)u(t1) Observed in Table 4.8, the ‘SNNR’ performs better with 1 mistake while ‘Time 67 Sequence’ makes 5 mistakes. It is also observed that only the result for the Model 1 is different in both Tables 4.7 and 4.8 for the ‘SNNR’ while the results for Models 1, 2, 3 and 5 are different in both tables for the ‘Time Sequence’. It seems that the result due to ‘SNNR’ is less influenced by the additional noise than the ‘Time Sequence’. It might be difficult to draw a general conclusion on the observation. Intuitively, the noise term will affect how model parameters vary, which in turn affects the performance of recursive estimation. Consequently, the order determination results, which are based on recursive estimation, should also be affected. On the other hand, the additional parameter variation after the noise being injected could be attenuated by the ‘SNNR’, which reduces the influence of noise on parameter variation then subsequently on order determination. The details of regressor selection for Models 78 are shown in Tables 4.9 and 4.10, where an extra regressor y(t2) is found for each. It implies that the regressor y(t2) has influence on y(t). In Equations (4.46) and (4.47), although y(t) is not directly related to y(t2), the regressor y(t2) is able to affect y(t) via y(t1). More importantly, Tables 4.9 and 4.10 show that the regressor u(t1) is found for both models. Table 4.9. Regressor selection for Model 7 y(t1) y(t1)u(t1) y(t1)u(t1)y(t2) y(t1)u(t1)y(t2)y(t4) FPE 0.01370.0028 0.0025 0.0032 (Stop) Table 4.10. Regressor selection for Model 8 y(t1) y(t1)y(t2) y(t1)y(t2)u(t1) y(t1)y(t1)u(t1)y(t4) FPE 0.07870.0293 0.0188 0.0222 (Stop) The proposed order determination is also compared to the geometric method .(Molina, Sampson, Fitzgerald & Niranjan, 1996) The testing is conducted on Models 1013, and results are summarized in Table 4.11. As observed, the geometric method is able to extract correct orders for deterministic nonlinear AR models while performs poorly with the presence of additive noise. The geometric method makes a total of four mistakes for both Models 11 and 13. The proposed order determination makes 68 one mistake for Model 12. Table 4.11. Regressors determined for Models 1013 Model SNNR Geometric Truth 10 y(t1) y(t1) y(t1) 11 y(t1) y(t1) y(t2) y(t3) y(t1) 12 y(t1) y(t2) y(t3) y(t1) y(t2) y(t1)y(t2) 13 y(t1) y(t2) y(t1) y(t2)y(t3)y(t4) y(t1)y(t2) The dynamic order determination is app 



A 

B 

C 

D 

E 

F 

I 

J 

K 

L 

O 

P 

R 

S 

T 

U 

V 

W 


