

small (250x250 max)
medium (500x500 max)
Large
Extra Large
large ( > 500x500)
Full Resolution


CHARACTERIZATION OF CLOSEDLOOP PROCESS VARIABLE DATA By ANAND VENNAVELLI Bachelor of Technology Osmania University Hyderabad, India 2002 Submitted to the Faculty of the Graduate college of the Oklahoma State University in partial fulfillment of the requirements for the Degree of MASTER OF SCIENCE December, 2006 CHARACTERIZATION OF CLOSEDLOOP PROCESS VARIABLE DATA Thesis Approved: Dr. James R. Whiteley (Thesis Adviser) Dr. Russell R. Rhinehart Dr. Karen A. High Dr. Gordon A. Emslie (Dean of the Graduate college) ii ACKNOWLEDGMENTS I would like to express my sincere gratitude to my graduate adviser Dr. Rob Whiteley for his constant support, guidance and motivation. I would also like to thank my graduate advisory committee members Dr. Russell Rhinehart and Dr. Karen High for their valuable inputs and suggestions. Heartfelt thanks to my family members for their unconditional support and encouragement to pursue my interests, even when the interests went beyond boundaries of language and geography. Kudos to all my friends and roomies. You have just been through your worst nightmare! iii TABLE OF CONTENTS Chapter Page 1. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Types of controller performance assessment . . . . . . . . . . . . . . . . . 1 1.1.1 Performance assessment objectives . . . . . . . . . . . . . . . . . 1 1.1.2 Performance assessment input data characteristics . . . . . . . . . . 2 1.2 Contribution of this work . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2. BACKGROUND AND LITERATURE SURVEY . . . . . . . . . . . . . . . . . 6 2.1 Closedloop data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Engineering analysis methods for performance assessment . . . . . . . . . 7 2.2.1 SISO performance measures . . . . . . . . . . . . . . . . . . . . . 7 2.2.2 MIMO performance measures . . . . . . . . . . . . . . . . . . . . 12 2.3 Business analysis methods for controller performance assessment . . . . . 13 2.4 Data characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.5 Research Focus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3. DATA ANALYSIS TECHNIQUES . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.1 Unordered Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.1.1 Histograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.1.2 Normal probability plots . . . . . . . . . . . . . . . . . . . . . . . 22 3.1.3 QuantileQuantile plots (QQ plots) . . . . . . . . . . . . . . . . . 25 iv 3.2 Ordered Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2.1 Autocorrelation function (ACF) . . . . . . . . . . . . . . . . . . . 29 3.3 GUI tool for data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.4 Data analysis tools summary . . . . . . . . . . . . . . . . . . . . . . . . . 37 4. CHARACTERIZATION OF CLOSEDLOOP DATA . . . . . . . . . . . . . . . 38 4.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.2 Variability trends in actuating error . . . . . . . . . . . . . . . . . . . . . . 44 4.3 Unordered analysis results . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.3.1 Identification of variability bands using histograms . . . . . . . . . 48 4.3.2 Identification of variability bands using Normslope . . . . . . . . . 53 4.3.3 Identification of variability bands using qqslope . . . . . . . . . . . 56 4.4 Summary of unordered analysis results . . . . . . . . . . . . . . . . . . . . 59 4.5 Ordered analysis results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.5.1 Approach to ACF . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.5.2 Identification of variability bands using the autocorrelation function (ACF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.6 Effect of controller mode change results . . . . . . . . . . . . . . . . . . . 65 4.6.1 Controller modes . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.7 Discussion of data analysis results . . . . . . . . . . . . . . . . . . . . . . 75 5. CONCLUSIONS AND FUTURE WORK . . . . . . . . . . . . . . . . . . . . . 79 5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 APPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 v LIST OF TABLES Table Page 3.1 Nk pairs of observations for lag k. Illustration for N = 8 and k = 2. . . . . 30 3.2 Description of the functions used in the GUI tool plots. . . . . . . . . . . . 36 4.1 Summary of the information available in the closedloop data sets . . . . . 39 4.2 Compression factors on PV, SP and OP for each data set. . . . . . . . . . . 40 4.3 Error variability bands determined from visual observation. . . . . . . . . . 46 4.4 Percentage output saturation (OP > 90% or OP < 10%). . . . . . . . . . . . 47 4.5 Normslope for each loop calculated monthly. The normslope is a robust measure of the variability in the time series. . . . . . . . . . . . . . . . . . 55 4.6 The qqslope comparison of error spread: monthly to annual error. Composite annual error distribution is considered as the reference with its qqslope = 1.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.7 Number of appreciable ACF coefficients (with a maximum of 181) in the error time series for each data set by month. . . . . . . . . . . . . . . . . . 63 4.8 Number of days in each controller mode for all data sets. The number of error bands are the same as listed in Table. 4.3 . . . . . . . . . . . . . . . . 67 vi LIST OF FIGURES Figure Page 3.1 Histogram of the closedloop error data for the dataset FC1a with superimposed bell curve. . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2 Histogram of 250,000 normally distributed numbers with mean 0 and standard deviation 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.3 Histogram of 200,000 normally distributed numbers with mean 0 and standard deviation 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.4 Histogram of the composite distribution. . . . . . . . . . . . . . . . . . . . 22 3.5 Normal probability plot of 5000 numbers drawn randomly from a standard normal distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.6 Normal probability plot of 5000 numbers randomly drawn from a standard uniform distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.7 QQ plot of 5000 numbers drawn randomly from a standard normal distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.8 QQ plot of 5000 numbers drawn randomly from a uniform distribution between zero and five. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.9 Histogram of 5000 numbers drawn randomly from uniform distribution between zero and five (series X). . . . . . . . . . . . . . . . . . . . . . . . 27 3.10 Histogram of 5000 numbers drawn randomly from uniform distribution between five and fifteen (series Y). . . . . . . . . . . . . . . . . . . . . . . 27 vii 3.11 QQ plot of 5000 numbers drawn randomly from uniform distribution between zero and five (series X) and 5000 numbers drawn randomly from uniform distribution between five and fifteen (series Y). . . . . . . . . . . . 28 3.12 ACF plot for 5000 numbers drawn from standard normal distribution. . . . 31 3.13 Plot of x(t) = sin (5t) for t = 0 to 2 for 638 data points. . . . . . . . . . . . 31 3.14 ACF plot f(x) = sin (5t) for t = 0 to 2 for 638 data points. . . . . . . . . . 32 3.15 Screen shot of GUI tool main screen. . . . . . . . . . . . . . . . . . . . . . 34 3.16 Screen shots of features in the GUI tool. . . . . . . . . . . . . . . . . . . . 35 4.1 Description of flow loops . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.2 Description of pressure loops . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.3 Description of temperature loops . . . . . . . . . . . . . . . . . . . . . . . 43 4.4 Time series of actuating error, FC2 loop from Oct 1, 2003 to Sep 30, 2004. Figure shows presence of different variability bands. . . . . . . . . . . . . 45 4.5 Histogram of actuating error for loop FC4 from Oct. 1, 2003 to Sep. 30, 2004. Figure shows the presence of a peak and heavytails. . . . . . . . . . 48 4.6 Histogram of actuating error for loop PC3 from Oct. 1, 2003 to Sep. 30, 2004. Figure shows the presence of a peak and heavytails. . . . . . . . . . 49 4.7 Histogram of actuating error for loop TC3 from Oct. 1, 2003 to Sep. 30, 2004. Figure shows the presence of a peak and heavytails. . . . . . . . . . 49 4.8 Actuating error time series and histogram with superimposed bell curve for the loop FC2 from Feb. 6, 2004 to Feb. 13, 2004. . . . . . . . . . . . . . . 51 4.9 Actuating error time series and histogram with superimposed bell curve for the loop FC2 from Feb. 14, 2004 to Feb. 24, 2004. . . . . . . . . . . . . . 51 4.10 Actuating error time series and histogram with superimposed bell curve for the loop FC2 from Feb. 6, 2004 to Feb. 24, 2004. . . . . . . . . . . . . . . 51 4.11 Actuating error time series and histogram with superimposed bell curve for the loop PC3 from Dec. 14, 2003 to Mar. 1, 2004. . . . . . . . . . . . . . . 52 viii 4.12 Actuating error time series and histogram with superimposed bell curve for the loop TC3 from Feb. 15, 2004 to Apr. 10, 2004. . . . . . . . . . . . . . 52 4.13 ACF for FC4, actuating error from 10/1/03 to 9/30/04. . . . . . . . . . . . 61 4.14 ACF for PC3, actuating error from 10/1/03 to 9/30/04. . . . . . . . . . . . 61 4.15 ACF for TC2, actuating error from 10/1/03 to 9/30/04. . . . . . . . . . . . 61 4.16 Illustration of typical controller configurations and corresponding controller modes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.17 Configuration of TC3. PV ( F) is an APC MV. The output from the APC controller is the set point to the TC. . . . . . . . . . . . . . . . . . . . . . 68 4.18 TC3 cascade from Oct. 21, 2003 to Oct. 26, 2003 (blue) and auto from Oct. 1, 2003 to Oct. 06, 2003 (red). Different error variability in different controller modes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.19 Configuration of FC1a. PV (bpd) is an APC MV. The output from the APC controller is the set point to the FC. . . . . . . . . . . . . . . . . . . . . . . 70 4.20 FC1a auto from Oct. 1, 2003 to Oct. 9, 2003 (blue) and cascade from Oct. 14, 2003 to Oct. 19, 2003 (red). Same error variability in different controller modes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.21 Configuration of PC2. PV (psig) is an APC MV. The output from the APC controller is the set point to the PC. . . . . . . . . . . . . . . . . . . . . . . 72 4.22 PC2: Bcascade from Oct. 1, 2003 to Oct. 28, 2003 (blue) and cascade from Jan. 22, 2004 to Mar. 5, 2004 (red). Mixture indiscernible. . . . . . . 72 4.23 Impact of setpoint variability introduced by APC on TC3 actuating error. TC3 from Feb. 15, 2004 to Feb. 28, 2004. . . . . . . . . . . . . . . . . . . 74 ix NOMENCLATURE Standard Deviation μ Mean ACF (sample) Autocorrelation Function APC Advanced Process Control CLPA Closedloop Performance Assessment CLPM Closedloop Performance Metric DTW Dynamic Time Warping ECDF Empirical Cumulative Distribution Function Error Actuating Error FCOR Filtering and Correlation Algorithm HI Harris Index HISTFIT Histogram with a superimposed bell curve IE Integrated error IQR Inter Quartile Range LQG Linear Quadratic Gaussian LSL Lower Specification Limit MIMO Multiple Input Multiple Output MVC Minimum Variance Controller Normslope Slope of the normal probability plot OP Controller Output PCA Principal Component Analysis PCI Process Capability Index x PV Process Variable QQ plot QuantileQuantile plot qqslope Slope of the QQ plot SISO Single Input Single Output SP Set point SPC Statistical Process Control USL Upper Specification Limit xi 1. INTRODUCTION There are tens of thousands of controllers employed in the process industries. Most of these are proportionalintegral (PI) controllers. Estimates indicate that 66% to 88% of industrial controllers have performance problems (Harris et al. 1999). Often these problems fail to attract the attention of the personnel who could investigate and improve performance of the controller. Even a 1% improvement in controller performance represents millions of dollars in potential savings to the process industries (Chaudhary et al. 2005). In the United States alone, estimates show that losses to the petrochemical industry from poor monitoring and control exceeds $20 billion per year (Venkatasubramanian 2006). Controller performance assessment therefore has significant economic incentives. 1.1 Types of controller performance assessment 1.1.1 Performance assessment objectives For purposes of this thesis, two distinct types of controller performance assessment are identified. The first type, which we will refer to as “engineering analysis methods,” are the techniques employed by control engineers to identify undesirable dynamic characteristics such as valve stiction, improper controller tuning, and excessive controller action. Engineering analysis methods focus on shortterm response characteristics to individual setpoint changes and process disturbances. Engineering analysis methods calculate performance indexes from highly sampled (Ts 1 second) closedloop process data, as the data contain all the information about the performance of a controller. Commercial products that can perform the engineering 1 analysis have evolved in the recent years and are being continuously improved. These include products by ABB (Loop Performance ManagerTM), Honeywell (Loop ScoutTM), Expertune (Plant TriageTM), ISC (PROBEwatchTM), Matrikon (Process DoctorTM), PAS (Control WizardTM), ProControl Technology (PCT Loop OptimiserTM) and ASPEN (PIDWatchTM). The second type of methods used for controller performance assessment are “business analysis methods.” They address management’s view of control systems as assets to be managed. These techniques utilize a longerterm (weeks or months) view of controller performance with a business emphasis on continuous quality improvement, identification of best practices, and allocation of limited resources for control system maintenance. Business analysis methods are implemented using statistical process control (SPC) and “sixsigma” principles. The focus of this thesis is on the characterization and analysis of data used by business analysis methods for controller performance assessment. 1.1.2 Performance assessment input data characteristics The key variable for both the engineering and the business types of performance assessment when applied to feedback control is the controller error, given by the difference between the measured process variable (PV) and the setpoint (SP). e(k) = PV(k)  SP(k) (1.1) For singleinputsingleoutput (SISO) loops, a well performing loop should reject disturbances and the process variable should closely follow the setpoint. The variability in the controller error in such a case will be a direct indicator of controller performance (Thornhill et al. 1999). For a multivariable control loop, however, the characteristics of the 2 controller error can be different due to the presence of various factors including interactions between loops and constraints on certain variables. An important distinction between the engineering analysis methods and the business analysis methods is in the use of closedloop actuating error variability in different contexts to answer different questions. Engineering analysis techniques are very powerful tools to help the control engineer to assess controller performance. Business analysis methods based on SPC techniques, on the other hand, are tools to help management assess the performance of controllers from a business perspective. SPCbased techniques provide key process performance indicators that facilitate comparison of similar controller configurations within sites, or within units at the same site, and are ultimately aimed at establishing the best practices. Engineering analysis methods are used for continuous performance assessment on a loopbyloop basis using highfidelity closedloop data collected at the same rate as the controller. Business analysis methods, on the other hand, are used for assessing the performance of complex control configurations, such as multivariable controllers. Business analysis methods use archived closedloop data collected over an extended period of time such as an year (Herman 1989) with the intention of identifying opportunities for continuous improvement. The business analysis methods do not require high frequency sampling like the engineering analysis methods but require closedloop data over long periods of time such as a year. Use of archived closedloop data is best suited for this purpose as it is impractical to set up a data collection system over long periods of time over which statistical process control analysis is done. The SPC techniques involve strong assumptions about randomness and the normality of data. Since archived closedloop data are not collected to test any specific statistical hypothesis, they may contain unexpected features and unsuspected correlations between variables. If data historians compress the data for archival, data characteristics may be 3 compromised upon regeneration. It is important to address all of these assumptions because SPC metrics derived from archived process data are used to make important decisions about the control system performance. Only limited work has been done so far in characterizing archived closedloop data that are used in SPCbased performance analysis of regulatory and advanced control loops. The premise of this work is that characterization of archived closedloop data will result in a better understanding and interpretation of quality SPC metrics that are derived from such data. 1.2 Contribution of this work The main contribution of this work is the characterization of closedloop archived data used for SPCbased controller performance analysis with emphasis on trends in actuating error variability over long periods of time. Since most of these performance measures are stochastic in nature, statistical tools should be used to detect statistically significant changes in controller performance. The statistical nature of actuating error distributions and their conformity to the Gaussian model was studied. To answer this, it was necessary to develop a graphical user interface (GUI) tool using MATLABTM to automate the analysis. Statistics to quantify closedloop data characteristics observed in normal probability plots, quantilequantile plots, and autocorrelation function have been proposed. Qualitative visual characteristics like “heavytailedness” and “peakedness” in histograms of error distributions are also presented. Variability trends in actuating error time series can be identified using simple statistical techniques that are easy to automate. Trends in actuating error variability are indicative of changing controller performance. Identification of changing performance is the first step for continuous improvement. This work aims to provide a platform for further diagnosis of the causes for changing controller performances and thus opportunities for real business improvement. 4 1.3 Thesis outline The organization of the rest of the thesis is as follows • Chapter 2 describes closedloop performance analysis and the important role of controller error as the key closedloop performance variable with emphasis on performance assessment using a minimum variance benchmark, and extension of SISO performance measures to MIMO loops. An introduction to the application of SPC based analysis using closedloop data is then presented. • Chapter 3 presents an introduction to the data analysis tools used in this work for closedloop data characterization. The goal of the dataanalysis tools is to enable detection and interpretation of the variability trends observed in closedloop data. • Chapter 4 presents the results of closedloop data characterization studies on industrial data obtained from a petroleum refinery. In this chapter, statistics that describe the variability trends through histograms, normal probability plots, quantilequantile plots and autocorrelation function have been developed. In addition, effects of controller mode changes are also discussed. • Chapter 5 contains conclusions from this work and recommendations for future work. 5 2. BACKGROUND AND LITERATURE SURVEY 2.1 Closedloop data Almost all process industries now employ Distributed Control Systems (DCS) as regulatory control hardware. The closedloop data available through the DCS are usually collected and saved in a separate hardware system referred to as the plant historian. To manage the large demand for storage space, data are usually compressed for archival in the plant historian (Thornhill et al. 2004). Estimates indicate that most chemical process plants require over one hundred gigabytes of storage space to archive one years worth of data (Huang and Shah 1999). The amount of closedloop data available for analysis continues to increase with advances in computers and networks. Properly archived data can be a tremendous source of information. The challenge now is extracting useful information from these closedloop data. All the information about the performance of a controller is contained in the closedloop plant data. A typical industrial process plant has hundreds of control loops. Instrument technicians generally maintain and service these loops, but rather infrequently. Routine maintenance of such loops at optimal settings can save millions of dollars a year (Chaudhary et al. 2005). The development of quality measures of performance for such control loops is therefore an important area of industrial interest. This type of controller performance monitoring also falls in the realm of enterprise asset management. This is from a viewpoint that controllers, whether PID type or advanced, should be treated like other capital assets and monitored on a routine basis. 6 The goal of engineering analysis methods is to ensure that control systems perform according to their specifications. This means that controlled variables meet their operating targets such as specifications on output variability, effectiveness in constraint enforcement or proximity to optimal control. On the other hand, the goal of business analysis methods is to provide opportunities for real business improvement. This is achieved using key process indicators which are fueled by the back propagation of business objectives. In order to further clarify the distinction between the engineering analysis methods and the SPCbased business analysis methods, and for the sake of completeness, a discussion on engineering analysis techniques and types of performance problems addressed by those techniques are presented in section 2.2 2.2 Engineering analysis methods for performance assessment 2.2.1 SISO performance measures This section provides a brief review of engineering analysis methods used to identify undesirable dynamic characteristics such as valve stiction, improper controller tuning, and excessive controller action. Controller performance is frequently characterized by comparing the actual process output variance to the output variance of an optimal controller such as the minimum variance controller. Astrom proposed the minimum variance control (MVC) principle and use of autocorrelation to characterize shortterm controller performance (Astrom 1967, Harris 1989). Harris proposed the use of closedloop data to evaluate and diagnose controller performance using the minimum variance controller as a benchmark. The Harris Index (HI) compares the ratio of the variance of the actuating error signal to the minimum variance achievable by an ideal controller and is denoted as: HI = Current error variance Minimum achievable variance (2.1) 7 The HI indicates best possible control when HI approaches one and no control when HI is large. A modified version of the Harris index that is normalized between 0 and 1 is given by equation 2.2. CLPM = 1− 1 HI (2.2) where: CLPM = Closedloop performance metric HI = Harris Index CLPM = 0 indicates optimal control; CLPM = 1 indicates no control. The advantage of Harris index is that it does well in indicating loops that have oscillation problems. The major disadvantage of the HI or CLPM is that the process time delay or dead time must be known for the loop. Since processes change during operation, this is a major limitation of any minimum variance control benchmark. Hagglund has proposed a method in the time domain, which considers integrated error (IE) between all zero crossings of the signal (Hagglund 1995). If the IE is large enough, a counter is increased. An oscillation is indicated if this counter exceeds a certain threshold. In order to quantify the critical value of the counter above which the change in the counter is statistically significant, the author used ultimate frequency of the loop in question. This method is appealing as it is able to quantify the size of the oscillation. However, it assumes that the loop oscillates at its ultimate frequency which is not always true. Further, the ultimate frequency may not always be known for a loop. Hagglund also proposed an idle index for detecting sluggish loops (Hagglund 1999). This idle index value depends heavily on on the data pretreatment. Kuehl and Horch proposed a data pretreatment procedure using noise filtering for improving the idle index, which is however, limited to detecting sluggishness (Kuehl and Horch 2005). Ko and Edgar suggested an index that computes the ratio of the actual variance and the minimum achievable variance using a PI controller (Ko and Edgar 1998). This approach 8 assumes that a process model is available. A limitation of this method is that the models need to be updated periodically. Kadali proposed the use of Linear Quadratic Gaussian (LQG) benchmark as a more appropriate tool for assessing the performance of controllers (Kadali and Huang 2002). Calculation of the LQG benchmark requires a complete knowledge of the process model, which is often a demanding requirement. Li et al. proposed the use of a chisquared, goodnessoffit statistic to compare the distribution of a performance index within a window of data to a reference run length distribution in order to determine the performance of a controller (Li et al. 2004). A statistically significant change in any section of the distribution, not just an average value, is indicative of a significant change in controller performance. Srinivasan and Rengaswamy proposed a qualitative pattern recognition approach for stiction diagnosis. Stiction in control valves leave distinct qualitative shapes in the controller output (OP) and controlled process variable (PV) data. To classify the patterns that evolve due to stiction, a pattern recognition approach using dynamic time warping (DTW) technique was proposed (Srinivasan and Rengaswamy 2005) Thornhill and Hagglund proposed a set of procedures to detect and diagnose oscillating loops using offline data (Thornhill and Hagglund 1997, Thornhill et al. 2003). They combine techniques of controller performance assessment along with operational signatures (OPPV plots) and spectral analysis of the controller error for diagnosis. These techniques, though not completely automated, can differentiate oscillation caused by poor controller tuning, process nonlinearities, or external disturbances. Inferred loop signatures that are based on spectral analysis or from plots of controller output (OP) versus process variable (PV) have to be manually identified. Recently Paulonis and Cox of Eastman Chemical Company improved the above technique and developed a large scale system to identify and troubleshoot poorly performing control loops (Paulonis and Cox 2003). 9 Xia and Howell proposed the use of signaltonoise ratio indices for the process variable and the output, their ratio R, and the variability in R ( R) to facilitate the status monitoring of PI/PID loops and isolation of the problem loop (Xia and Howell 2003). The major limitation of this statistic is that it assumes regulatory control and fails when there are frequent setpoint changes. Horch presented a simple, practical approach to distinguish oscillating loops caused by external disturbances and stiction (Horch 1999). This approach is based on crosscorrelation between the controller output (OP) and the process output (PV). Horch and Issakson also proposed a technique to identify stiction using nonlinear filters (Horch and Isakkson 1998). The method assumed that information such as mass of the stem, diaphragm area, and so on for each valve is readily available. Since in a typical process industry facility there can be hundreds or thousands of control loops, it may be nearly impossible to build/maintain the required database of control valves, making this technique difficult to implement. Chaudhary et al. used higher order statistics for detecting nonlinearity in data and have extended the method for diagnosing stiction by fitting an ellipse of the OPPV plot and inferring the stiction from an assumed stiction model (Chaudhary et al. 2005). However, the success of this approach lies in correctly identifying the oscillation period and its start and end point in the OPPV data. Huang et al. showed that the minimum feedforward plus feedback control variance can be estimated from routine operating data, and can then be used as a benchmark for performance assessment of feedforward and/or feedback controllers (Huang et al. 2000). Bezergianni and Georgakis proposed a relative variance index that compares actual control to both minimumvariance control and open loop control (Bezergianni and Georgakis 2000). 10 Jain and Lakshminarayanan proposed a novel filterbased method to address the shortcomings of the minimum variance benchmarking and to provide a realistic performance measure using closedloop data (Jain and Lakshminarayanan 2005). Tabe et al. presented an application of acoustic spectral PCA to the monitoring of fermentation process equipment (Tabe et al. 1998). Thornhill et al applied principal component analysis (PCA) of the power spectra of data from chemical processes (Thornhill et al. 2002). Harris et al. reported plant wide control loop assessment in which they found the spectral analysis of the univariate trends to be useful (Harris et al. 1996b). Ingimundarson et al. proposed closedloop monitoring using loop tuning and an extended horizon performance index similar to that used by Thornhill et al. (Ingimundarson and Hagglund 2005, Thornhill et al. 1999). In this method the user selects a prediction horizon and an alarm limit based on loop tuning rather than from the process characteristics. Thornhill et al. discussed the impact of compression on datadriven process analysis (Thornhill et al. 2004). They observed that data compression using the swinging door method changes the statistical properties of the data. The nonorthogonality is not maintained because the reconstructed error is strongly correlated with reconstructed signal. This could be an important observation which questions the use of archived data for analysis. The use of archived data very much depends on the method used for archival and reconstruction. Huang and Shah (Huang and Shah 1999) developed a filtering and correlation algorithm (FCOR) to estimate the minimum variance. A summary of recent work in the area of engineering analysis methods for controller performance assessment has been published by Qin (Qin 1998). 11 2.2.2 MIMO performance measures The extension of performance assessment to multivariable systems has been studied by Harris (Harris et al. 1996a) and Huang and Shah (Huang and Shah 1996). Assessment of minimum variance performance bounds arising from dead times requires the knowledge of the interactor matrix. The interactor matrix allows a multivariate transfer function to be factored into two terms, one having zeros located at infinity and another containing finite zeros. For the multivariate case, it can be shown that the multivariate minimum variance performance can be estimated from routine operating data if the interactor matrix is known (Harris et al. 1999). It is important to note that the interactor matrix is: • not always unique. • cannot always be constructed from the knowledge of the SISO delay structure. Huang and Shah used a performance index using a multivariate extension of the FCOR algorithm (Huang and Shah 1996). The presence of process and controller interactions significantly complicates the analysis and diagnosis in multivariable situations. There has been limited work in diagnosis for the multivariate case. In many cases, multivariable controllers are used where constraints are important. The definition and computation of an appropriate multivariable performance index in these situations remains unresolved. Process control performance assessment measures have tended to compare the total closedloop variance relative to minimum variance control. With the exception of Desborough and Harris, and Vishnubotla et al., little has been done on understanding the decomposition of closedloop variance (Desborough and Harris 1993, Vishnubotla et al. 1997). The multivariate performance assessment measures are nontrivial generalizations of the univariate measures. The diagnosis of multivariate systems has not been thoroughly 12 investigated. The interactive nature of these systems means that this will be a nontrivial task (Kesavan and Lee 1998). 2.3 Business analysis methods for controller performance assessment The closedloop performance metrics discussed in section 2.2 are derived from shortterm characteristics of the data. This section provides an introduction to the statistical process control type of quality metrics used by business for performance assessment. SPC analysis is based on the Shewhart’s concept of twofold variability: ‘chance’ cause variability, which is a random variability inherent in the process, and ‘assignable’ cause variability, which is caused by an external factor. Using an appropriate control chart, for example, we can determine if the variability observed is chance cause or assignable cause. In a period that is void of any assignable causes, a major function of SPC based analysis is to use a process capability index (PCI) to compare the actual performance of a process to specified or desired performance. The PCIs are defined as: Cp = US L− LS L 6 (2.3) and Cpk = min US L−μ 6 , μ− LS L 6 (2.4) 13 Where, Cp : capability ratio defined as the ratio of spread between the specification limits to the natural process limits Cpk : is the capability ratio defined as the distance to the nearest specification (in sigma units) divided by 3.0 US L : upper specification limit LS L : lower specification limit μ : mean of the process : standard deviation of the process The capability ratios assume that the process variable follows a normal distribution so that there is a 99.97% chance that process variable value is within 3sigma units on either side of the mean. The conformity to a normal distribution is an important consideration in the interpretation of the capability ratios. When Cp < 1, the process is not capable and produces some nonconforming product. An improved Cp thus indicates an improved process. From equation 2.3 it is obvious that the capability is inversely proportional to the variability in the process. Therefore, the key to continuous process improvement of a process devoid of assignable causes lies in reducing the variability inherent in the process. Shunta presents the application of SPC in the following manner: “statistical metrics (process capability and process performance) derived from closedloop data determine which of the key variables do not meet the desired performance. The statistics provide a basis to determine if the control strategy needs to be modified or the process changed to gain the improvements” (Shunta 1995). 14 Tucker et al. introduced an algorithmic statistical process control (ASPC) model in which SPC is used as a monitoring tool that obviates the need for APC for a polymerization application (Tucker et al. 1993). Tucker et al, in the same paper, also point out that the ASPC analysis needs an efficient data compression algorithm that facilitates good regeneration of the closedloop data. Lin proposed a process “incapability” index based on a large sample approach as opposed to a process capability index (Lin 2006). This technique is only applicable when the underlying distribution is assumed to be normal. Shore described a new approach using a family of distributions and momentbased fitting procedures to approximate an unknown source distribution and then incorporate the fitted distribution in quality metric calculations (Shore 1998). Such an approach would eliminate the need for normal approximation but would mean that a source distribution has to be fitted for each closedloop series. Ding proposed the use of the first four moments of the closedloop PV data to numerically derive a cumulative distribution function that can be used for process capability index analysis (Ding 2004). Lant and Steffens used closedloop data from a wastewater treatment plant for benchmarking studies (Lant and Steffens 1998). The authors define benchmarking as a “measure” of process control practice, relative to absolute performance measure (world class quality). Such benchmarks can be used to answer questions like: • How good is my process control? • Is it worth improving the control technology? Process capability or performance is inversely proportional to actuating error variability. This means that error variability trends indicate varying controller performance. The SPC based quality metrics are thus used as performance indicators of a control system. They are equally applicable to simple SISO loops and to complex multivariable loops. These 15 techniques utilize a longerterm (weeks or months) view of controller performance with a business emphasis on continuous quality improvement, identification of best practices, and allocation of limited resources for control system maintenance. 2.4 Data characterization Commercial softwares packages such as Aspen WatchTM or Loop ScoutTM implement engineering analysis techniques to assess control loop performance. These techniques use the same data as input to the controller. That is, the sample period of the data used by these methods is the same as employed by the DCS (e.g., 1 second). The length of the analysis period is on the order of the closedloop time constant. Consequently, the analysis may span performance over a 20 minute period and involve a data time series with 1200 measurements. Engineering analysis methods require use of actual rather than archived process data. Commericial products that can perform the engineering analysis are now beginning to incorporate business analysis tools. Since the data required for the SPCbased business analysis is a subset of the data already collected for engineering analysis, such an extension is possible when data are available for long periods. Among other products, AspenTech’s Aspen Watch TM and Expertune’s PlantTriageTM provide options for userdefined “key performance indicators” (KPIs) in addition to inbuilt simple KPIs such as percenttime the controller is in ON or average error variance. Expertune’s PlantTriageTM also allows creating templates for benchmarking and an option to mark the current or historical performance as a benchmark. It should be noted that the KPIs and capability metrics generated should be interpreted on an appropriate time scale (weeks or greater) although they can be generated for any time scale using these products. This requires expensive data storage infrastructure and subsequent maintenance. 16 SPCbased techniques use a long time frame (days/months) to calculate quality metrics from closedloop data. The sample period for the process data used in this type of analysis is much longer than that required for engineering analysis methods. Therefore, archived process data can be used. Techniques for evaluating process capability and performance indexes from closedloop data are in place. The problem of dealing with autocorrelated and nonnormal data for SPC analysis, however, is a concern (Shore 1998). To date, no major efforts to characterize archived closedloop data have been undertaken. The data are readily available and there is no additional cost required in the form of plant tests. It is thus an under utilized resource particularly for performing SPC type of controller assessment. A fundamental tenet of SPC is that the key to achieving process improvement lies in our ability to listen to the data. 2.5 Research Focus The remainder of this thesis addresses the characterization of archived closedloop plant data for SPCtype analysis of controller performance assessment. Chapter 3 describes the analytical tool created to characterize closedloop plant data. Chapter 4 describes the application of the analytical tool to numerous industrial data sets. The results from this work can be applied for identifying variability bands in actuating error time series. Methods to detect and interpret error variability bands using histograms, normal probability plots, quantilequantile plots and the autocorrelation function plots are presented. Finally the effects of controller mode changes on the error distributions are discussed. 17 3. DATA ANALYSIS TECHNIQUES Time series plots of closedloop data immediately give an idea of the center, spread, and certain patterns in the time series such as the presence of outliers or missing data. Depending on the nature of the time series, the plots may also reveal features specific to that time series such as zerocentering in actuating error or saturation in controller output. For a more detailed study of variability, however, additional statistical tools have to be used. This chapter presents a brief review of the data analysis tools used in this work for closedloop data characterization. The statistical analysis techniques have been grouped into the following two categories based on their data treatment: 1. Unordered Analysis: Analysis where the order in which the data occur is ignored. Data grouping (e.g., histograms) or data sorting (e.g., normal probability plots) are examples of unordered analysis techniques. 2. Ordered Analysis: The order of the data is not lost by grouping or sorting. The autocorrelation function is an example of an ordered statistical analysis technique. A graphical user interface (GUI) tool using MATLABTM, v7, R14 with the statistics toolbox was developed at OSU for performing the data analysis. The tool uses MATLABTM’s extensive plotting capabilities for visual presentation of analysis results. The functionality of the tool is discussed in section 3.3 after an introduction to the statistical analysis techniques employed by the tool. 18 3.1 Unordered Analysis Run charts, histograms, probability plots, X and MR control charts, Xbar and R control charts, Xbar and s control charts, process capability analysis, and measurement systems analysis are examples of statistical process control tools used for identification of assignable causes and for continuous process improvement (Hart 2005). Stanton illustrates the use of trend plots and histograms as effective tools in the analysis of process data (Stanton 1990). Miller presents inplant experiences using histograms and probability plots coupled with Xbar and R, and Pareto charts for detecting assignable causes of process variation (Miller 1989). In this section, three unordered analysis techniques: histograms, normalprobability plots and quantilequantile plots are presented. These plots can be generated in MATLABTM using the inbuilt functions histfit, normplot and qqplot respectively. MATLABTM help files for these functions are available in Appendix A. 3.1.1 Histograms The histogram is the simplest graphical representation of the distribution of a time series. The histogram is popular because it is uncomplicated and easy to construct. The histogram offers the advantage of consolidating large amounts of data into bins of chosen width thus revealing the overall features of the time series. The histogram allows for a visual interpretation of many features of the distribution including mean, standard deviation, range, symmetry and presence of peak or heavy tails. Data grouping in histograms is particularly attractive for comparison purposes as we do not want to compare each and every point of the time series but only the general characteristics. A histogram can be used as a powerful visual tool for comparing two distributions, whether we choose to compare the distributions to a standard distribution such as a normal distribution, or if we choose to compare them to each other. In order to 19 facilitate visual comparison of conformity to a normal distribution, a bell curve may be superimposed on the histogram as shown in Figure 3.1. Figure 3.1: Histogram of the closedloop error data for the dataset FC1a with superimposed bell curve. Peakedness and Heavytailedness Heavytailedness refers to observed frequency in the histogram beyond that predicted for three standard deviations on either side of the mean when compared to a normal distribution. Peakedness means that there is a spike or peak observed in the histogram around the mean when compared to a normal (Gaussian) distribution. Heavytailedness and peakedness may result when two or more distributions overlap creating an overall composite distribution. The following simulation shows how mixtures of distributions cause heavytails and a peak in the histogram of the composite distribution. Figure 3.2 presents the histogram for 250,000 normally distributed points with mean 0 and standard deviation 1. Figure 3.3 presents the histogram for 200,000 normally distributed points with mean 0 and standard deviation 3. The heavy tail and the peak in the composite distribution Figure 3.4 are a result of the overlapping of the distributions in Figure 3.2 and Figure 3.3. 20 Figure 3.2: Histogram of 250,000 normally distributed numbers with mean 0 and standard deviation 1. Figure 3.3: Histogram of 200,000 normally distributed numbers with mean 0 and standard deviation 3. 21 Figure 3.4: Histogram of the composite distribution. Disadvantages of histograms The advantage that the histogram offers through grouping can also be a disadvantage when applied to time series analysis. Unlike the distribution of the heights of a class of students, time series data occur in a particular order. A histogram completely disregards this order, and valuable information could be lost in such grouping. Another disadvantage of histograms is the absence of a standard method of choosing bin size. The bin size is often chosen so as to give the best possible visual representation. While there are recommendations on choosing bin size, there is no consensus on a procedure to choose an optimum bin size. Therefore, visual comparison of the features of the histograms needs a thorough understanding of this limitation. 3.1.2 Normal probability plots Normal probability plots present data with the probability of their occurrence if sampled from a normal distribution. The normal probability plot, or the normplot, is plotted on probability paper for easy interpretation. The yaxis does not have a linear scale, but reflects the probabilities expected from a normal distribution for corresponding zscores on the 22 xaxis. For example, the probability of a point having a zscore of 1 or less is 0.158. Similarly, the probability of a point having a zscore of 1 or less is 0.841. These values are obtained from the cumulative density function of the normal distribution. If the time series data are normally distributed, the plot will appear linear. Nonnormal distributions will introduce curvature in the plot. The normal probability plots are used as a tool for graphical normality testing. For a homogeneous distribution, a linear normal probability plot means that the data can be modeled using a normal distribution as the underlying standard distribution. The MATLABTM function normplot() has been used for generating normal probability plots. The plot has the sample data displayed with the plot symbol ’+’. Superimposed on the plot is a line joining the last data points in the first and third quantiles of the data. This line is extrapolated out to the ends of the sample data to help evaluate the linearity of the plot. ‘Normslope’, defined as the slope of this line, can be a useful statistic when deviation from normality is negligible. For a truly normal distribution, normslope is the standard deviation of the data set. Figure 3.5 shows a normal probability plot for 5000 points drawn randomly from a normal distribution of mean zero and standard deviation one. The linearity of the normplot indicates that the data come from a normal distribution. Figure 3.6 shows the normplot for a sample of 5000 points drawn from a standard uniform distribution. Notice that a curvature is introduced into the normplot indicating that the sample is not normally distributed. 23 Figure 3.5: Normal probability plot of 5000 numbers drawn randomly from a standard normal distribution. Figure 3.6: Normal probability plot of 5000 numbers randomly drawn from a standard uniform distribution. 24 3.1.3 QuantileQuantile plots (QQ plots) QQ plots are based on the same principle as normal probability plots. Using QQ plots we can compare the distribution of a time series to any reference distribution. The input to the QQ plots consists of two samples: the time series and a reference time series. If the samples do come from the same distribution type (same shape), even if one distribution is shifted and rescaled from the other (different location and scale parameters), the plot will be linear. The MATLABTM function qqplot() has been used to generate the quantilequantile plots. The plot has the sample data displayed with the plot symbol ’+’. Superimposed on the plot is a line joining the last points in the first and third quantiles of each distribution (this is a robust linear fit of the order statistics of the two samples). This line is extrapolated out to the ends of the sample to help evaluate the linearity of the data. The slope of this line, defined as ‘qqslope’ is a useful statistic to compare the variability between any two distributions. When the qqslope is one, both distributions have the same spread (variability). Figure 3.7 displays quantilequantile plot of a sample X drawn from a normal distribution and a sample Y also drawn from a standard normal distribution. The plot is linear showing that the data sets were drawn from the same distribution. Figure 3.8 displays quantilequantile plot of a sample X drawn from a normal distribution and a sample Y drawn from a uniform distribution between zero and five. A curvature is introduced into the plot showing that the samples were not drawn from the same distribution. X is 5000 points from a uniform distribution between zero and five as shown in Figure 3.9. Sample Y is 5000 points from a uniform distribution between five and fifteen as shown in Figure 3.10. Figure 3.11 displays a quantilequantile plot of two samples, X and Y. If the samples do come from the same distribution, the plot will be linear as shown in the 25 Figure 3.7: QQ plot of 5000 numbers drawn randomly from a standard normal distribution. Figure 3.8: QQ plot of 5000 numbers drawn randomly from a uniform distribution between zero and five. 26 Figure 3.9: Histogram of 5000 numbers drawn randomly from uniform distribution between zero and five (series X). Figure 3.10: Histogram of 5000 numbers drawn randomly from uniform distribution between five and fifteen (series Y). graph. Notice that the QQ plot can identify if the samples are from the same distribution type even if they do not have the same scale on center and spread. The normplots and the QQ plots are computationally more cumbersome than histograms, but are great visual tools. The most attractive feature of QQ plots is that they use quantiles which are based on median and interquartile range rather than mean and standard deviation. They are therefore considered to be robust to extreme values. 27 Figure 3.11: QQ plot of 5000 numbers drawn randomly from uniform distribution between zero and five (series X) and 5000 numbers drawn randomly from uniform distribution between five and fifteen (series Y). 28 3.2 Ordered Analysis 3.2.1 Autocorrelation function (ACF) An important guide to the properties of a time series is provided by a series of quantities called sample autocorrelation coefficients, which measure the correlation between observations at different intervals in the time series. The general formula used for calculating the sample ACF is given as follows (Chatfield 1989): The autocorrelation function (rk) as function of lag (k) is given by: rk = ck c0 (3.1) Where, ck is the autocovariance function, given by: ck = 1 N −k NX−k j=1 (x j − x)(x j+k − x) (3.2) N = total number of points in the time series k = lag x j = value at jth point in the time series x = average of the time series And, c0 is the variance, given by: c0 = 1 N XN j=1 (x j − x)2 (3.3) The (Nk) observations used in the calculation of the autocovariance function (equation 3.2) are selected as shown in Table 3.1. A graph of autocorrelation coefficients (rk) vs lag (k) is known as a correlogram, which is a useful aid in interpreting the autocorrelation coefficients. A correlogram for a normally 29 Table 3.1: Nk pairs of observations for lag k. Illustration for N = 8 and k = 2. distributed data set is shown in Figure 3.12. The autocorrelation coefficient is one at lag zero, which means that a point is completely correlated with itself. For randomly distributed data, as can be seen from the figure, all autocorrelation coefficients at lags greater than zero are nearly zero. The dotted lines show the 95% confidence interval on the autocorrelation coefficients. This means that for a randomly distributed data set, the probability of a nonzero autocorrelation coefficient occurring outside these dotted lines is one out of twenty (5%). Figure 3.14 shows the correlogram for the periodic function x(t)= sin (5t) shown in Figure 3.13. The series consists of 638 points. Figure 3.14 reveals the ability of the ACF to detect cycles in the data. When a time series has a periodic component, it reflects in the ACF as an oscillation. The correlogram is a fundamentally different analysis tool when compared to the histogram. While the histogram completely disregards the order in which the time series occur, the basis of the autocorrelation function is the order of the time series itself. Furthermore, the correlation coefficients are nothing but normalized covariances and are, therefore, representations of variability in the time series. 30 Figure 3.12: ACF plot for 5000 numbers drawn from standard normal distribution. Figure 3.13: Plot of x(t) = sin (5t) for t = 0 to 2 for 638 data points. 31 Figure 3.14: ACF plot f(x) = sin (5t) for t = 0 to 2 for 638 data points. The correlogram can be used to identify the features of a time series that are difficult to capture from the raw trends. The characteristics that could be obtained from a correlogram include: 1. Randomness in series 2. Short term correlation 3. Alternating series 4. Nonstationary (or nonhomogeneous) nature 5. Periodic fluctuations 6. Outliers While the ACF can be used to characterize the time series in all the above mentioned ways, a major disadvantage with using ACF is a lack of uniqueness. Although a given time series has a unique ACF, it is usually possible to find many other time series with the same ACF (Jenkins and Watts 1968). 32 Another feature of ACF is its distortion in the presence of outliers. Every outlier in the time series will cause two extreme coefficients which will tend to depress the sample coefficients towards zero. A comprehensive review on interpreting the correlogram is given in Chatfield (Chatfield 1989). 3.3 GUI tool for data analysis Closedloop data is comprised of the time series of set point (SP), process variable (PV), controller output (OP) and controller mode. The controller mode indicates the active controller configuration at that time. The state of a controller at any time is defined by one of the following four modes: manual, auto, cascade or Bcascade. These modes distinguish control configurations and therefore expectation of data characteristics. For the analysis of closedloop data in a multivariable context, simultaneous comparison of SP, PV, OP and controller mode plots is necessary. In addition, comparisons between distributions, ACFs, or between the time series are needed. For advanced control loops, the setpoint is changing at all times. Furthermore, for advanced control loops, PV and OP constraints come into play. While dealing with massive amounts of data, keeping track of all the trends simultaneously becomes a tedious task. The GUI tool developed at OSU is a convenient way to tackle the above difficulties. It is a broadbased utility tool developed for this analysis. Figures 3.15 and 3.16 are screen shots of some of the features of the tool. The functions used to generate the GUI tool plots are listed in Table 3.2. The capabilities of the tool include: • Six simultaneous plots each with an inputseries choice and a plot choice. • Inputseries choices include actuating error, PV, SP, OP, SP and OP. • Plot choices include: time series, histogram with superimposed bell curve, pdf, normalized pdf, fourier transform, power spectrum, ACF, normplot, QQ plot and boxplot. 33 • An attractive feature of the tool that enables easy comparison is the overlay feature. • Option for outlier removal at ±4 times the standard deviation of the data. • Data cursor option to read the coordinates on any plot. • Interactive plot edit tools such as zoom, pan and 3D rotate. One of the powerful features of the tool is the overlay feature. The overlay feature allows the superimposition of plots during various times on one another. For example, Figure 3.15(c) shows the ACF for three months (October, November, and December) superimposed on each other. This enables simultaneous visual analysis of various plots during multiple periods. The differences in data characteristics during multiple periods can thus be simultaneously analyzed. The overlay feature can be enabled or disabled using the overlay check box as shown in the figure. Figure 3.15: Screen shot of GUI tool main screen. 34 (a) Choose period dialog (b) Choose series listbox (c) Overlay feature is used to superimpose ACF plot for three months (October, November, and December) on each other. Overlay feature enables simultaneous visual analysis of various plots during multiple periods. (d) Choose plot listbox (e) Data cursor feature Figure 3.16: Screen shots of features in the GUI tool. 35 Table 3.2: Description of the functions used in the GUI tool plots. 36 3.4 Data analysis tools summary In this chapter, unordered analysis using histograms with superimposed bell curves, normal probability plots and QQ plots, and ordered analysis using the autocorrelation function for assessing variability have been described. These tools have been incorporated into a GUI tool that enables simultaneous plotting and comparison. Respective advantages and limitations of each of these analysis techniques have also been presented. The next chapter deals with the characterization of industrial closedloop data sets using these exploratory tools followed by discussion. 37 4. CHARACTERIZATION OF CLOSEDLOOP DATA This chapter starts with a description of the industrial closedloop data sets that were analyzed using the tool described in Chapter 3. The second part of the current chapter presents representative results for some of the data sets. In particular, there is a need to identify and characterize error variability bands over sustained periods of time (days and weeks). The existence of such bands as discussed in section 4.2 is indicative of assignable causes in SPC terms (section 2.3). Real business improvements can be achieved by eliminating assignable causes. Recognition of the existence of assignable causes is the first step in their elimination. Methods to reveal trends in error variability have been described. The presence of variability trends (error bands) are shown using the time series trends and the histograms. Results using normal probability plots, QQ plots and the ACF are presented to quantitatively identify the variability trends. Finally, the effect of mode changes on actuating error distributions are presented and discussed. 4.1 Data Archived closedloop data from a major refinery have been obtained for data characterization studies. These are regenerated compressed data from the plant historian at a sample frequency of one min−1. Four sets each of flow, pressure and temperature control loops are available for a period of one year. Each set is comprised of the time series of set point, process variable, controller output and controllermode. Compression factors for the data are also available. The data sets are summarized in the matrix shown in Table 4.1. The compression factors are summarized in Table 4.2. Figures 4.1 through 4.3 describe the 38 Table 4.1: Summary of the information available in the closedloop data sets Filename SP PV OP Mode Comp Period FC1a 4 4 4 4 4 Oct200321Mar2004 FC1b 4 4 4 4 4 22Mar2004Sep2004 FC2 4 4 4 4 4 Oct2003Sep2004 FC3 4 4 4 4 4 Oct2003Sep2004 FC4 4 4 4 4 4 Oct2003Sep2004 PC1a 4 4 4 4 4 Oct200321Mar2004 PC1b 4 4 4 4 4 22Mar2004Sep2004 PC2 4 4 4 4 4 Oct2003Sep2004 PC3 4 4 4 4 4 Oct2003Sep2004 PC4 4 4 4 4 4 Oct2003Sep2004 TC1a 5 5 5 4 4 Oct200321Mar2004 TC1b 5 5 5 4 4 22Mar2004Sep2004 TC2 4 4 4 5 4 Oct2003Sep2004 TC3 4 4 4 4 4 Oct2003Sep2004 TC4 4 4 4 4 4 Oct2003Sep2004 SP: Set point, PV: Process variable, OP: Controller output, Mode: Controller mode 4: Data available as time series. 5: No data available. Comp: Compression closedloop data in more detail. The PV in loops FC1, FC3, PC1, PC2, PC4, TC1, TC2 and TC3 is a manipulated variable in a multivariable controller. The PV in loops FC2, FC4, and TC4 is a controlled variable in a multivariable controller. PC3 is a regulatory loop not used for APC. 39 Table 4.2: Compression factors on PV, SP and OP for each data set. Loop Type Zero Span PV Comp SP Comp OP Comp Units FC1 APCMV 10 21040 50.00   bpd FC2 APCCV 5 100000 500.00 500.00 0.10 bpd FC3 APCMV 10 20400 50.00 50.00 0.50 bpd FC4 APCCV 5 13510 25.00 25.00 0.50 bpd PC1 APCMV 5 70 0.25   psi PC2 APCMV 0 60 0.30 0.10 0.10 psi PC3 Regulatory loop 0 40 0.16 0.05 0.50 psi PC4 APCMV 0 60 0.10 0.30 0.50 psi TC1 APCMV 100 300 0.25   F TC2 APCMV 400 2400 0.25 0.25 0.50 F TC3 APCMV 10 840 0.50 0.25 0.25 F TC4 APCCV 500 300 0.25 1.50 0.50 F 40 Figure 4.1: Description of flow loops 41 Figure 4.2: Description of pressure loops 42 Figure 4.3: Description of temperature loops 43 4.2 Variability trends in actuating error The time series of actuating error (PVSP) over a period of one year for a flow loop (data set FC2) is shown in Figure 4.4. The deviation in the error value about a mean, as often indicated by standard deviation or interquartile range (IQR), is also commonly referred to as ‘variability in error’ or ‘error spread’. From Figure 4.4, we can see that the variability in error from Nov. 2003 to Feb. 2004 is noticeably different from the variability in the actuating error from Mar. 2004 to May 2004. If each period of variability is interpreted as a ‘band’, at least two different bands can be identified in Figure 4.4. A similar result is obtained from plotting the time series for all the flow, pressure and temperature loops. By visual inspection, it is possible to identify the presence of multiple bands in eight of the thirteen data sets. The error variability bands and their corresponding error variability measures (standard deviation and IQR) for all the data sets are summarized in Table 4.3. The number of bands in column two of Table 4.3 were established empirically by visual inspection. Later in the chapter, analogous tables generated analytically will be presented. The mean, standard deviation and the interquartile range for each dataset were calculated using MATLABTM builtin functions. A time series that has more than one band is said to be nonstationary in nature. It will also be referred to as a ‘mixture’ when represented as a distribution. 44 Figure 4.4: Time series of actuating error, FC2 loop from Oct 1, 2003 to Sep 30, 2004. Figure shows presence of different variability bands. 45 Table 4.3: Error variability bands determined from visual observation. Loop #Bands Band# Days Mean STD IQR Units FC1a 1 1 173 0.20 36.60 49.50 bpd FC1b 1 1 187 0.20 34.00 45.90 bpd FC2 2 1 126 6.60 181.50 245.00 bpd 2 234 0.40 115.00 143.40 bpd FC3 3 1 328 0.50 89.00 120.70 bpd 2 17 2.20 123.10 165.00 bpd 3 15 0.80 125.60 169.20 bpd FC4 2 1 171 1.20 19.70 26.60 bpd 2 189 0.00 59.95 82.50 bpd PC1a 2 1 161 0.00 0.05 0.07 psi 2 12 0.00 0.09 0.13 psi PC1b 2 1 112 0.00 0.04 0.06 psi 2 39 0.00 0.06 0.08 psi PC2 1 1 360 0.00 0.21 0.29 psi PC3 3 1 64 0.00 0.02 0.02 psi 2 250 0.01 0.02 0.02 psi 3 46 0.00 0.02 0.03 psi PC4 1 1 360 0.00 0.18 0.24 psi TC2 4 1 60 0.00 0.82 1.11 F 2 33 0.00 0.65 0.88 F 3 130 0.01 0.61 0.82 F 4 137 0.01 0.93 1.25 F TC3 4 1 60 0.01 0.48 0.65 F 2 30 0.01 0.37 0.50 F 3 171 0.01 0.43 0.58 F 4 99 0.02 0.45 0.61 F TC4 1 1 360 0.03 0.78 1.02 F 46 The time series of actuating error indicates how well the loop is doing to keep the process variable value close to the set point. On the other hand, the output time series indicates the effort expended by the controller. While the absolute value of the output time series does not have a context like the actuating error time series, it is essential that the output always stays within limits or remains ‘unsaturated.’ When the output, measured as a percentage, is above 90% or below 10%, the controller is said to be saturated. It is important to realize that when the output of a controller is saturated, the data is characterized as openloop (no control) rather than closedloop. Table 4.4 shows the percentage of the time output is saturated for each of the loops. Table 4.4: Percentage output saturation (OP > 90% or OP < 10%). Loop %OP Saturation FC1a 2.62 FC1b 0.11 FC2 0.76 FC3 50.06 FC4 0.91 PC1a 1.72 PC1b 21.04 PC2 0.61 PC3 12.73 PC4 0.10 TC1a 7.49 TC1b 0.00 TC2 0.05 TC3 0.00 TC4 0.01 47 4.3 Unordered analysis results 4.3.1 Identification of variability bands using histograms This section presents the graphical identification of error variability bands using the ‘peakedness’ and ‘heavytailedness’ in histograms. Figures 4.5 through 4.7 show histograms of the actuating error time series for the FC4, PC3 and TC3 data sets respectively. All three data sets span one year of plant operation (Oct. 1, 2003 to Sep. 30, 2004). In each of the three histograms, it can be seen that the center is located approximately at zero. This is because the actuating error is kept as close to zero as possible by control action. A distinct feature observed in all the histograms is the presence of “heavytailedness” and “peakedness.” Figure 4.5: Histogram of actuating error for loop FC4 from Oct. 1, 2003 to Sep. 30, 2004. Figure shows the presence of a peak and heavytails. 48 Figure 4.6: Histogram of actuating error for loop PC3 from Oct. 1, 2003 to Sep. 30, 2004. Figure shows the presence of a peak and heavytails. Figure 4.7: Histogram of actuating error for loop TC3 from Oct. 1, 2003 to Sep. 30, 2004. Figure shows the presence of a peak and heavytails. 49 The peakedness and heavytailedness observed in the histograms can be explained by considering the distributions over shorter periods of time. Figure 4.8 shows the time series and the histogram for flow loop FC2 from Feb. 6, 2004 to Feb. 13, 2004. Figure 4.9 shows the time series and the histogram for the flow loop FC2 from Feb. 14, 2004 to Feb. 24, 2004. Both time series exhibit a single error band. Likewise, the histograms do not exhibit heavytailedness or peakedness. Furthermore, the data in both time series are well represented by the bell curve. Figure 4.10 shows the time series and the histogram of the composite distribution for the flow loop FC2 from Feb. 6, 2004 to Feb. 24, 2004. It can be seen from the figure that the presence of two error bands in the composite time series translates into a mixed distribution with a peak and heavy tails in the histogram. The deviation from the superimposed bell curve on the histogram indicates that the distribution of the error time series is nonnormal. Figure 4.11 is the composite distribution for pressure loop PC3 from Dec. 14, 2003 to Mar. 1, 2004. The mixture in the time series and the presence of a peak and heavy tails in the histogram can be observed. These results confirm the visual observation of more than one band in the error time series. Figure 4.12 is the composite distribution for temperature loop TC3 from Feb. 15, 2004 to Apr. 10, 2004. In this case, the presence of bands in the time series is distinctly visible, and so are the peak and the heavy tails in the histogram. Significant deviations from the bell curve are also observed. The presence of error variability bands can thus be identified from the peak and the heavytail in the histograms. However, there is no standard way for determining either the bin size used in the histograms or the parameters used for the superimposed bell curve. This makes it difficult to quantify the peak and heavytail. The identification using histograms is therefore limited in its utility. 50 Figure 4.8: Actuating error time series and histogram with superimposed bell curve for the loop FC2 from Feb. 6, 2004 to Feb. 13, 2004. Figure 4.9: Actuating error time series and histogram with superimposed bell curve for the loop FC2 from Feb. 14, 2004 to Feb. 24, 2004. Figure 4.10: Actuating error time series and histogram with superimposed bell curve for the loop FC2 from Feb. 6, 2004 to Feb. 24, 2004. 51 Figure 4.11: Actuating error time series and histogram with superimposed bell curve for the loop PC3 from Dec. 14, 2003 to Mar. 1, 2004. Figure 4.12: Actuating error time series and histogram with superimposed bell curve for the loop TC3 from Feb. 15, 2004 to Apr. 10, 2004. 52 4.3.2 Identification of variability bands using Normslope Deviation from normality can also be judged from the linearity of the normal probability plots. For a distribution that satisfies the normality assumption, the probability of error values occurring between ±1 of mean is 68.27%. Normslope, defined as the slope of the normal probability plot, is also the distance between the points on either side of the mean that contain 68.27% of the population. Therefore, it is a measure of the variability of the distribution but only considering 68.27% of the population. The normslope can be calculated as the slope of the best linear fit of the normal probability plot. If the data are perfectly normal, then the normslope will be equal to the standard deviation of the data. Normslope can be used to quantify the error spread over any period. The advantage of using normslope is that the changes in the error variability can be assessed in engineering units. Since the normslope is based on the center 68.27% of the distribution, the spread estimate is not effected by extreme values. Table 4.5 shows the normslope for all the loops on a month by month basis. This table can be used to identify the error variability bands. Each value of the normslope is an indication of the spread in that month. As an example, consider loop FC3. As indicated in the Table 4.5, at least three bands can be identified in FC3: Oct 2003 and Nov 2003, where the variability in actuating error is about 130 bpd; Jan 2004 through Apr 2004, where the variability in the actuating error is about 80 bpd; and Jun 2004 through Aug 2004, where the variability in the actuating error is about 95 bpd. A similar result is also obtained from the visual observations. The spread estimates generated in Table 4.3 based on visual band identification differ somewhat from the estimates produced by the normslope technique. This is because the normslope considers only the center of the distribution while the statistics in Table 4.3 were generated using the entire time series and may also include outliers. 53 The normslope is thus a single number, robust to extreme values, that allows for simultaneous comparison of variability in different periods. Furthermore, the technique lends itself to full automation. 54 Table 4.5: Normslope for each loop calculated monthly. The normslope is a robust measure of the variability in the time series. 55 4.3.3 Identification of variability bands using qqslope The qqslope can be used to compare two distributions without any reliance on a standard model. The QQ plot characteristics can be used, for instance, to compare the error distribution for each month to the composite distribution for a full year. The idea is to evaluate the current and longterm controller performance. In this case, the annual composite distribution can be considered as the reference or benchmark distribution. The qqslope, analogous to normslope, is the ratio of the distance between the last points on the leastsquares line through the first and third quartiles of the sampled distribution to the distance between the corresponding points on the leastsquares line through the first and third quartiles of the annual (or reference) distribution. This ratio will be an indication of whether the variability of the sampled distribution is greater or smaller than the annual (or reference) distribution. For example, if the qqslope is greater than 1, then the sampled distribution has greater spread than annual distribution. The qqslope ratio thus allows for comparing the spread of the monthly distributions to the annual distribution. If the annual distribution is considered as the ‘average’ characteristic of the loop, then the monthly distribution will determine the deviation from the annual average. The annual distribution can also be replaced with another distribution if it is desired to compare the performance against a period of known good performance. The annual distribution contains all the seasonal, cyclic changes that occur during an year and is thus a natural composite measure. Another possible extension of qqslope could be a comparison of several years of performance to the current year. Table 4.6 shows the qqslope for all loops in each month using the annual distribution as the standard. This table can be used to identify the error variability bands. Each value of the qqslope is the ratio of the spread in that month to the variability in the annual composite distribution. 56 As an example, consider loop FC3. As indicated in the Table 4.6, at least three bands can be identified in FC3: Oct 2003 and Nov 2003, where the variability in actuating error is about 1.3 to 1.4 times the annual; Jan. 2004 through Apr. 2004, where the variability in the actuating error is about 0.85 times the annual; and Jun 2004 through Aug 2004 where the variability in the actuating error is about the same as the annual. This result is consistent with visual observation of the FC3 error time series and the normslope results presented in the previous section. 57 Table 4.6: The qqslope comparison of error spread: monthly to annual error. Composite annual error distribution is considered as the reference with its qqslope = 1.0 58 4.4 Summary of unordered analysis results Nonnormality and the presence of multiple distributions are the two most important characteristics that can be identified from histograms of the time series of actuating error. Nonnormality is indicated by deviation from the bell curve shape. The presence of heavytail and peak indicates the presence of mixtures. Furthermore, the presence of variability bands or mixtures is the cause of heavytailedness and peakedness in the histograms. Nonnormality and presence of mixtures are not necessarily independent characteristics. The presence of mixtures is likely to be one of the causes of nonnormality as it causes an overlap of distributions of different spreads. The disadvantage with the peakedness and the heavytail characteristics of the histogram is that they are not quantifiable. The normslope is a statistic that can be used to detect error variability bands. The normslope uses the normal probability density function to estimate the variability in a given period. The qqslope is analogous to the normplot but can be used to compare any two distributions without the assumption of underlying normality. All of these methods to detect error variability bands give similar results that agree with visual observations. However, histograms, normal probability plots, and QQ plots totally disregard order in the time series. Histograms group the data into bins and sort the bins while normal probability plots and QQ plots sort the numbers themselves. As a result, the original order is lost. The next section describes ordered analysis using the autocorrelation function in which the order of the time series is preserved. 4.5 Ordered analysis results The autocorrelation function uses the information of the order of the time series. Therefore, the analysis using the ACF has been termed as Ordered Analysis. 59 4.5.1 Approach to ACF In this work, the autocorrelation function is not used in the traditional sense. Although the ACF cannot be applied to nonstationary series, it can be used to detect nonstationarity, particularly when there are trends present in the time series. A review of ACF and its applicability as an exploratory tool to detect nonstationary series is available in literature (Chatfield 1989). In addition, the ACF is a discrete function and has values only at integer lags. The figures shown in this work, however, show ACF as a continuous function. Such a representation is only for visual convenience. 4.5.2 Identification of variability bands using the autocorrelation function (ACF) The shape of the autocorrelation function is a characteristic of the time series. Figure 4.13 shows the ACF of the FC4 error time series. The ACF shape resembles a damped oscillation. Figure 4.14 shows the ACF for the PC3 error time series for which the ACF coefficients do not reduce to zero even for very large lag values. Figure 4.15 is the ACF for the TC2 error time series and is markedly different from the FC4 and PC3 distributions. The above examples show that the shape of the ACF is different for different loops. Features in the time series transform into distinct shapes of the ACF. Of the many features of the ACF that are of interest, the number of appreciable ACF coefficients is an important one. The ACF coefficients are appreciable when they are statistically significant from zero (based on 95% confidence limits on the estimation of ACF coefficients). Appreciable ACF coefficients beyond a certain lag mean there are nonrandom effects in the timeseries on a time scale greater than the lag. The nonzero ACF coefficients mean that the error is not random but has a deterministic effect (special cause) embedded in it. 60 Figure 4.13: ACF for FC4, actuating error from 10/1/03 to 9/30/04. Figure 4.14: ACF for PC3, actuating error from 10/1/03 to 9/30/04. Figure 4.15: ACF for TC2, actuating error from 10/1/03 to 9/30/04. 61 The ACF lag is markedly different for the flow, pressure and temperature loops. Even within each loop, the ACF shape could be different from month to month as the process and the controller effectiveness changes. Table 4.7 shows the ACF lag for each loop for each month. The ACF lag is generated by calculating the ACF coefficients up to a lag of 180 min for each loop and then checking to determine the number of ACF coefficients that are appreciable. The confidence limits for the ACF coefficients are given by 1n ± 2 p n (Chatfield 1989). All ACF coefficients that are outside these limits are nonzero (or appreciable). Table 4.7 can be used to check the randomness in error time series. If the number of appreciable ACF coefficients are high in any period, it means that a deterministic effect is in play. Reconsider the FC3 example. As indicated in Table 4.7, the number of appreciable ACF coefficients are unusually high for Oct. 2003 through Dec. 2003 when compared to the rest of the months. This indicates that there is a nonrandom effect in FC3 data from Oct. 2003 to Dec. 2003. Visual observation of the error time series for FC3 from Oct. 2003 thorugh Sep. 2004 confirms the increased variability from Oct. 2003 to Dec. 2003. The results from the ACF coefficients cannot distinguish the bands in error variability as the normslope or the qqslope do. This might be because they consider the entire time series with regard to order as opposed to the unordered analysis which estimate the variability based on a portion of the data. 62 Table 4.7: Number of appreciable ACF coefficients (with a maximum of 181) in the error time series for each data set by month. 63 The autocorrelation function is a measure of the randomness of a time series. When there is a change in the variability of the series, the process has changed. If this change is statistically significant, it shows up as a higher number of appreciable autocorrelation coefficients. The number of appreciable autocorrelation coefficients can be used to detect the presence of multiple distributions in a time series. This method, however, is different from the ordered analysis as it considers all the data, the order in which the data occur, and if the changes in variability are statistically significant. The next section deals with the effect of mode changes on the actuating error distributions. 64 4.6 Effect of controller mode change results The identification of error variability bands using unordered and ordered techniques was presented in the previous sections. These error variability bands indicate the presence of assignable causes responsible for changes in the controller performance. This section deals with the impact of controller mode change, a known assignable cause, on error variability bands. 4.6.1 Controller modes Operators may switch control loop configurations (by turning off an APC controller or operating in openloop mode) when controller performance is unsatisfactory or for maintenance or tuning purposes. The control configurations available to the operator depend on the nature of the loop and the control strategy. Changes in controller configuration are known assignable causes that can produce changes in the width of error variability bands. The controller mode indicates the active controller configuration at that time. Error variability trends in different controller modes thus indicate the performance of the controller in their respective configurations. A change in the controller mode implies a change in the way the process variable is being controlled or manipulated. This change may translate into a change in the width of error variability bands. For the industrial closedloop data sets used in this work, the state of a controller at any time is defined by one of the following four modes: manual, auto, cascade or Bcascade. Each of the controller modes is described by illustration in Figure 4.16. Table 4.8 shows the number of days each of the loops in the individual data sets described in Section 4.1 were in manual, auto, cascade or Bcascade mode. The table also includes the number of error variability bands observed for each loop. Consider a controller whose process variable is configured as an APC manipulated variable. In cascade mode, the controller receives its setpoint from an APC controller and in auto mode the controller receives its setpoint from an operator. Setpoint changes made 65 Figure 4.16: Illustration of typical controller configurations and corresponding controller modes. 66 Table 4.8: Number of days in each controller mode for all data sets. The number of error bands are the same as listed in Table. 4.3 Loop Manual Auto Cascade BCascade Total # of Error (Days) (Days) (Days) (Days) (Days) Bands FC1a 0.0 16.9 156.7 0.0 173.6 1 FC1b 0.3 4.2 186.9 0.0 191.4 1 FC2 1.5 0.2 363.3 0.0 365.0 2 FC3 1.2 26.2 337.6 0.0 365.0 3 FC4 0.0 0.0 365.0 0.0 365.0 2 PC1a 0.0 16.4 157.2 0.0 173.6 2 PC1b 0.0 3.9 187.5 0.0 191.4 2 PC2 2.7 2.2 321.9 38.2 365.0 1 PC3 0.0 365.0 0.0 0.0 365.0 3 PC4 2.8 0.1 362.1 0.0 365.0 1 TC1a 0.2 173.4 0.0 0.0 173.6  TC1b 0.0 191.4 0.0 0.0 191.4  TC3 0.0 142.1 222.9 0.0 365.0 4 TC4 0.0 365.0 0.0 0.0 365.0 1 by the operator in auto mode are relatively infrequent and the process usually completes its response to the setpoint change. As a result the error variability is primarily due to process disturbances. In cascade mode, however, the setpoint is being changed by the APC controller. A new setpoint change occurs before the process completely responds to the previous change, which in turn effects the error variability. As a result, the error variability is attributable not only to process disturbances but also to additional variability introduced by the APC controller. The following four cases illustrate the effect of controller mode changes from cascade (APC on) to auto (APC off) or vice versa on actuating error variability of APCMVs. Case 1  Comparison of TC3 error variability in cascade and auto modes: The process variable for data set TC3 is the outlet temperature of a furnace and is configured as a manipulated variable in an APC controller. The configuration of TC3 is shown in the Figure 4.17. 67 Two periods, with the controller in cascade and auto modes, respectively, are selected for error variability comparison. Figure 4.18 shows the actuating error time series, ACF, error histogram, and the setpoint time series for loop TC3 from Oct. 1, 2003 to Oct. 6, 2003 (red) and from Oct. 21, 2003 to Oct. 26, 2003 (blue). The controller is in auto mode during the first period (red) and in cascade mode in the second (blue). For TC3, which is an APCMV, the configurations in auto and cascade controller Figure 4.17: Configuration of TC3. PV ( F) is an APC MV. The output from the APC controller is the set point to the TC. modes can be described as follows: Cascade: Output from the APC controller is the setpoint to the TC loop. Auto: Operator provides the setpoint to the TC loop. As expected, the error variability in the auto mode is less than the variability in cascade mode for loop TC3. The constant setpoint changes in cascade mode, generated by the APC controller, introduce additional variability in the TC3 actuating error. This is confirmed by the histogram as well as the time series in Figure 4.18. Case 2  Comparison of FC1a error variability in cascade and auto modes: 68 Figure 4.18: TC3 cascade from Oct. 21, 2003 to Oct. 26, 2003 (blue) and auto from Oct. 1, 2003 to Oct. 06, 2003 (red). Different error variability in different controller modes. The process variable for data set FC1a is the flow rate of a side draw from a fractionation column. FC1a is configured as a manipulated variable in an APC controller as shown in the Figure 4.19. Two periods, with the controller in cascade and auto modes respectively, are selected for error variability comparison. Figure 4.20 shows the actuating error time series, ACF, error histogram, and the setpoint time series for loop FC1a from Oct. 1, 2003 to Oct. 9, 2003 (blue) and Oct. 14, 2003 to Oct. 19, 2003 (red). The controller is in the auto mode during the first period (blue) and in cascade mode in the second (red). For FC1a, which is an APCMV, the configurations in auto and cascade controller modes can be described as follows: Cascade: Output from the APC controller is the setpoint to the FC loop. 69 Auto: Operator provides the setpoint to the FC loop. Figure 4.19: Configuration of FC1a. PV (bpd) is an APC MV. The output from the APC controller is the set point to the FC. It can be observed from the error distribution and the actuating error timeseries that the error variability is not very different in the auto and cascade modes. In this case, the closedloop dynamics of the flow control loop are sufficiently fast to be completed well within the 1 min APC sample time. Therefore, noticeable increase in variability was not introduced in cascade mode by the APC controller as in Case 1. Case 3  Comparison of PC2 error variability in Bcascade and cascade modes: The process variable for data set PC2 is the fuel gas pressure in the inner loop of a cascade temperature controller TC for adjusting the furnace outlet temperature as shown in the Figure 4.21. The temperature controller is an APCMV. When the APC controller is on, both the temperature controller and PC2 are in cascade mode. When the APC controller is off, the temperature controller is in auto mode and PC2 is in Bcascade mode. For PC2, the configurations in Bcascade and cascade controller modes can be described as follows: 70 Figure 4.20: FC1a auto from Oct. 1, 2003 to Oct. 9, 2003 (blue) and cascade from Oct. 14, 2003 to Oct. 19, 2003 (red). Same error variability in different controller modes. Cascade: Output from the APC controller is the setpoint to the TC loop. The output from the TC loop is the setpoint to PC2. BCascade: APC controller is turned off. Regular cascade arrangement. Operator provides the setpoint to the TC loop. The output from the TC loop is the setpoint to PC2. Figure 4.22 shows the actuating error time series, ACF, error histogram, and the setpoint time series for loop PC2 from Oct. 1, 2003 to Oct. 28, 2003 (blue) and from Jan. 22, 2004 to Mar. 5, 2004 (red). The controller is in Bcascade mode during the first period (blue) and in cascade mode in the second (red). 71 Figure 4.21: Configuration of PC2. PV (psig) is an APC MV. The output from the APC controller is the set point to the PC. Figure 4.22: PC2: Bcascade from Oct. 1, 2003 to Oct. 28, 2003 (blue) and cascade from Jan. 22, 2004 to Mar. 5, 2004 (red). Mixture indiscernible. 72 It is observed from the error histogram and the actuating error timeseries that error variability is not very different in the Bcascade and cascade modes. In this case, there is little change in the output variability of the temperature controller (setpoint to PC2) in the cascade mode when compared to the variability in auto mode. This result means that, in this case, there was no appreciable change in the performance of the secondary controller whether the setpoint to the primary controller was set by the operator or APC. Case 4  Impact of setpoint variability introduced by APC on TC3 actuating error: TC3 is an APCMV whose configuration was described in Case 1. In this case, the TC3 controller is in the cascade mode at all times in the period selected. Figure 4.23 shows the actuating error time series, ACF, error histogram, and the setpoint time series for loop TC3 from Feb. 15, 2004 to Feb. 17, 2004 (blue) and Feb. 17, 2004 to Feb. 28, 2004 (red). The blue and red periods are chosen such that the setpoint variability in the blue period is higher than the setpoint variability in the red period. Towards the end of the blue period, it can be seen from the time series plots of the setpoint and the actuating error that reduced setpoint variability translates into reduced error variability. This explains the presence of two error variability bands in the blue period. The red ACF has appreciable coefficients at higher lags which indicates the presence of a nonrandom effect, which in this case is the presence of outliers. The blue ACF shows a cycle which is the evidence of a latent cycle in the data. The time series, the histogram and the appreciable ACF coefficients all indicate greater variability in the blue period. This result confirms the observations from previous cases that setpoint variability has translated into actuating error variability. 73 Figure 4.23: Impact of setpoint variability introduced by APC on TC3 actuating error. TC3 from Feb. 15, 2004 to Feb. 28, 2004. 74 Based on the four different cases presented, it is observed that setpoint variability may translate into actuating error variability. Even though setpoint variability can vary depending on APC control action (Case 4), a controller mode change generally produces a change in the setpoint variability (Cases 1,2 and 3) due to the change in the controller configuration. Therefore, controller mode changes can have a significant impact on actuating error variability and business analysis metrics such as process capability, Cp, or process performance, Pp. In particular, the setpoint variability could be an important factor when assessing error variability of PVs that are APCMVs. 4.7 Discussion of data analysis results Actuating error, the key variable for SPCbased performance assessment, shows different bands of variability when considered over a long period of time. These error variability bands indicate the presence of assignable causes that are responsible for changes in controller performance. Methods to identify error variability bands using two approaches, ordered and unordered analysis, have been presented in this chapter. Unordered analysis totally disregards the order of the time series and involves grouping or sorting of the data. Histograms, normal probability plots and QQ plots are examples of unordered analysis. Ordered analysis considers the order in which the time series occur. Autocorrelation function is an example of an ordered analysis. Error variability band identification using ordered and unordered analysis can be summarized as follows: Histograms: Histogram with a superimposed bell curve can be used as a visual tool to identify error variability bands. The presence of heavytails and a peak compared to the superimposed bell curve on the histogram indicate the presence of multiple distributions. The multiple distributions are a direct result of the presence of variability bands. The disadvantage with the histogram is that it is difficult to quantify the heavytail and peak characteristics. Histograms of the error distributions also reveal that the distributions can be significantly nonnormal in the presence of bands. 75 While there are many reasons for nonnormality, the presence of mixtures itself introduces some degree of nonnormality. Normslope: The normslope, defined as the slope of the normal probability plot, is a quantitative measure that can be used to detect error variability bands. If the data are perfectly normal, then the normslope will be equal to the standard deviation of the data. The advantage of using normslope is that the changes in the error variability can be assessed in engineering units. The normslope is based on the center 68.27% of the distribution. Therefore, the spread estimate is not effected by extreme values. The normslope is a single number, robust to extreme values, that allows for simultaneous comparison of variability in different periods. qqslope: The qqslope, a quantitative measure analogous to normslope, is the ratio of the distance between the last points on the leastsquares line through the first and third quartiles of the sampled distribution to the distance between the corresponding points on the leastsquares line through the first and third quartiles of the reference distribution. This ratio will be an indication of whether the variability of the sampled distribution is greater or smaller than the reference distribution. The qqslope can be used to detect error variability bands without any reliance on a standard model. ACF: The autocorrelation function, which is an ordered analysis technique, determines the randomness of a time series. When there is a change in the variability of the series, the process has changed. If this change is statistically significant, it shows up as a change in the number of appreciable autocorrelation coefficients. This change can be used to detect the presence of multiple distributions. This method, however, is different from the ordered analysis as it considers all the data, the order in which the data occur, and if the changes in variability are statistically significant. Visual and quantitative methods to detect error variability bands have been presented in the first part of this chapter. All the unordered methods to detect error variability bands give 76 similar results that agree with visual observations. Histograms are easy to compute but are limited in their utility since the heavytail and peak characteristics are difficult to automate. Normslope and qqslope are similar techniques that are capable of full automation. For the data sets considered, both normslope and qqslope are effective ways to detect error variability bands. The bands identified using normslope and qqslope are in excellent agreement with visual observations. Normslope and qqslope are also robust to outliers since they both consider the center of the distribution. The ACF is a complimentary tool for detecting error variability bands, deterministic effects in the time series or latent cycles in the time series. The calculation of the ACF can also be fully automated. The ACF, however, is not robust to outliers. The sample ACF is also used in most of the commercial products as an engineering analysis tool to estimate the finite impulse response of the process. The use of histograms, probability plots and ACF for error band identification is unique in this work. Although normslope, qqslope and the ACF can be calculated for any time scale, the application of these techniques to smaller time periods is not recommended, as it falls outside the realm of SPC analysis. Archived data are not suitable for an analysis on a short time scale. Changes in controller configuration (controller modes) are assignable causes that can cause the performance of the controller to change. The impact of changing the controller mode on error variability bands is presented in the second part of this chapter. The controller mode indicates the active controller configuration at that time. Case studies on controller mode changes show that setpoint variability may translate into actuating error variability. Even though setpoint variability can vary depending on APC control action, a controller mode change generally produces a change in the setpoint variability due to the change in the controller configuration. Therefore, controller mode changes can have a significant impact on actuating error variability and business analysis metrics such as 77 process capability, Cp, or process performance, Pp. In particular, the setpoint variability could be an important factor when assessing error variability of PVs that are APCMVs. 78 5. CONCLUSIONS AND FUTURE WORK This work focuses on the characterization of closedloop archived data for use in SPCbased analysis for controller performance assessment. Twelve closedloop industrial data sets obtained from a petroleum refinery were used in the analysis. The contributions of this work include: 1. Development of a graphical user interface (GUI) tool for data analysis. The capabilities of the GUI tool include: • Six simultaneous plots each with an inputseries choice and a plot choice. • Inputseries choices include actuating error, process variable, setpoint, output, change in setpoint and change in output. • Plot choices include: time series, histogram with superimposed bell curve, pdf, normalized pdf, fourier transform, power spectrum, ACF, normplot, qqplot and boxplot. • One of the powerful features of the tool is the overlay feature. The overlay feature allows the superimposition of plots during various times on one another. The differences in data characteristics during multiple periods can thus be simultaneously analyzed. • Option for outlier removal at ±4 times the standard deviation of the data. • Data cursor option to read the coordinates on any plot. • Interactive plot edit tools such as zoom, pan and 3D rotate. 79 2. Application of the GUI tool on 12 industrial data sets for characterization studies. The conclusions are summarized in Section 5.1. 3. Demonstration of the ability to identify error variability bands in the closedloop data sets using histograms, normslope, qqslope and the sample autocorrelation function. 4. Demonstration through case studies, the effect of APC controllers on the error variability of APC manipulated variables. Recommendations for future work are summarized in Section 5.2. 5.1 Conclusions Actuating error variability is the key variable for controller performance assessment. Changes in the error variability indicate changes in controller effectiveness. Different levels of variability during different periods in the time series are termed as error variability bands. Error variability bands are common in the actuating error time series of manipulated variables in an advanced process controller (APCMVs) when considered over a long period of time. Eight of the twelve data sets analyzed are APCMVs. Of these, five APCMVs (FC3, PC1, TC1, TC2 and TC3) contain multiple error variability bands. These error variability bands imply the presence of nonhomogeneity in the closedloop data. Since SPCbased metrics involve strong assumptions about homogeneity and normality of data, the implications of the presence of error variability bands on SPC metrics cannot be ignored. Actuating error distributions for the five APCMVs which contain error variability bands are nonnormal. While there are many reasons for nonnormality, the presence of variability bands causes some degree of deviation from normality. However, actuating error series of APCMVs not containing error variability bands, and other time series in time periods devoid of bands were well approximated by the normal distribution. This further emphasizes that the SPC performance metrics should be limited to the bands. 80 Histogram, normslope, qqslope and sample ACF are the four methods proposed in this work for the identification of error variability bands. Normslope and qqslope are similar statistics that are capable of full automation. For all the data sets considered, both normslope and qqslope are effective ways to detect error variability bands. The bands identified using normslope and qqslope are in agreement with visual observations. Normslope and qqslope are also robust to outliers since they consider the center of the distribution. The ACF is a complimentary tool for detecting error variability bands, deterministic effects or latent cycles in the time series. The calculation of the ACF can also be fully automated. The ACF, however, is not robust to outliers. Case studies also show that setpoint variability as a result of APC controllers can be translated into actuating error variability for APCMVs which have relatively slow dynamics (temperature loops). It is therefore not sufficient to base performance metrics on actuating error variability alone. The closedloop data should be used collectively to provide greater context. The GUI tool developed at OSU as a part of this work is an excellent tool for such simultaneous analysis and was used for all the case studies in this work. 5.2 Future Work The scope of this work has been primarily datadriven. Therefore, the extension of these results to similar data sets is limited. A theoretical approach would facilitate the extension of the results to similar control configurations. The actuating error (process variable minus set point) distribution is a function of the joint distribution of process variable and setpoint and the relationship between process variable and setpoint (through the controller output). Analytical expressions for the error probability density function (distribution model) generated from several possible process variable and setpoint probability density functions, will help in understanding the nature of the error distributions for different base case process variable and setpoint distributions. 81 A valued addition to this work would be the study of data handling procedures. Data compression is a major issue when using archived closedloop data. Excessive data compression is a concern as it could compromise closedloop data characteristics. More study is needed in this area to understand the effect of compression particularly on error variability bands. Similarly, an understanding of the minimum sampling frequency required for SPCbased analysis has several potential benefits of improved data handling. The amount of data required would be greatly reduced with such an understanding. For instance, one years worth of data sampled at 1 min would be 500,000+ data points. If the sampling frequency of 2 min would achieve the same result, only 250,000+ data points need to be handled. With hundreds of loops over long periods of time in question, using the minimum sampling frequency greatly simplifies data handling problems. 82 BIBLIOGRAPHY Astrom, K. J. (1967). Computer control of a paper machine  an application of linear stochastic control theory. IBM Journal, page 389. Bezergianni, S. and Georgakis, C. (2000). Controller performance assessment based on minimum and openloop output variance. Control Eng. Prac., 8:791–797. Chatfield, C. (1989). The analysis of time series, an introduction. Chapman and Hall, NY, 4th edition. Chaudhary, S. M., Thornhill, N. F., and Shah, S. (2005). Modelling valve stiction. Control Engineering Practice, 13:641–658. Desborough, L. and Harris, T. J. (1993). Performance assessment measures for univariate feedforward/feedback control. Canadian Journal of Chemical Engineering, 71:605. Ding, J. (2004). A method of estimating process capability index from the first four moments of nonnormal data. Quality and Reliability Engineering International, 20:787–805. Hagglund, T. (1995). A controlloop performance monitor. Control Eng. Prac., 3:1543– 1551. Hagglund, T. (1999). Automatic detection of sluggish control loops. Control Eng. Prac., 7:1505–1512. Harris, T. (1989). Assessment of control loop performance. Can. J. Chem Eng., 67:856– 861. 83 Harris, T., Seppala, C., and Desborough, L. D. (1999). A review of performance monitoring and assessment techniques for univariate and mutlivariate control systems. Journal of Process Control, 9:1–17. Harris, T. J., Boudreau, F., and MacGregor, J. (1996a). Performance assessment of multivariable feedback controllers. Automatica, 32:1505. Harris, T. J., Seppala, C., Jofreit, P. J., and Surgenor, B. W. (1996b). Plantwide feedback control performance assessment using an expert system framework. Control Eng. Prac., 9:1297–1303. Hart, M. (2005). Learning by doing: A series of handson projects for spc. Quality Engineering, 17(1):127–137. Herman, J. T. (1989). Capability index  enough for process industries? In ASQC Quality Congress Transactions, Toronto. Horch, A. (1999). A simple method for detection of stiction in control valves. Control Eng. Prac., 7:1221. Horch, A. and Isakkson, A. (1998). A method for detection of stiction in control valves. In IFAC workshop on online fault detection and supervision in the chemical process industry, France. Huang, B. and Shah, S. (1996). Performance limits: practical control loop performance assessment. In Proceedings of American Institute of Chemical Engineers annual meeting, Chicago. Huang, B. and Shah, S. (1999). Performance assessment of control loops. Springer, London. 84 Huang, B., Shah, S., and Miller, R. (2000). Feedforward plus feedback controller performance assessment of mimo systems. IEEE transactions on control systems technology, 8(3):580–587. Ingimundarson, A. and Hagglund, T. (2005). Closed loop performance monitoring using loop tuning. Journal of Process Control, 15:127–133. Jain, M. and Lakshminarayanan, S. (2005). A filter based approach for performance assessment and enhancement of siso control systems. Ind. Eng. Chem. Res., 44:8260– 8276. Kadali, R. and Huang, B. (2002). Controller performance analysis with lqg benchmark obtained under closed loop conditions. ISA transactions, 41:512–532. Kesavan, P. and Lee, J. (1998). Diagnostic tools for multivariate model based control systems. IE & C Research. Ko, B. and Edgar, T. (1998). Assessment of achievable pi control performance for linear processes with dead time. In American Control Conf., Philadelphia, PA. Kuehl, P. and Horch, A. (2005). Detection of sluggish control loopsexperiences and improvements. Control Eng. Prac., 13:1019–1025. Lant, P. and Steffens, M. (1998). Benchmarking for process control: Should i invest in process control? Water Science and Technology, 37(12):49–54. Li, Q., Whiteley, J., and Rhinehart, R. (2004). An automated performance monitor for process controllers. Control Eng. Prac. Lin, G. H. (2006). Process performance assessment based on subsamples  a large sample approach. Int J Adv Manuf Technology, 27:1223–1227. 85 Miller, T. (1989). Statistical process control in food processing. Proceedings of the ISA/89 International Conference and Exhibition: Advances in Instrumentation and Control, pages 1081–1089. Paulonis, M. A. and Cox, W. J. (2003). A practical approach for large scale controller performance assessment, diagnosis and improvement. Journal of Process Control, 13(2):155. Qin, S. (1998). Control performance monitoringa review and assessment. Computers and Chemcial Engineering, 23:173–186. Shore, H. (1998). A new approach to analyzing nonnormal quality data with application to process capability analysis. Int J Prod. Res., 36(7):1917–1933. Shunta, J. P. (1995). Achieving world class manufacturing through process control. Prentice Hall, New Jersey. Srinivasan, R. and Rengaswamy, R. (2005). Control loop performance assessment, a qualitative approach for stiction diagnosis. Ind. Eng. Chem. Res., 44:6708–6718. Stanton, B. D. (1990). Using historical data to justify controls. Hydrocarbon processing, 69(6):57–60. Tabe, H. T., C., C. K., Tan, K., Zhang, J., and Thornhill, N. F. (1998). Dynamic principal component analysis using integral transforms. In American Institute of Chemical Engineers annual meeting, Miami Beach. Thornhill, N. F., Choudhary, M., and Shah, S. (2004). The impact of compression on datadriven process analyses. Journal of Process Control, 14:389–398. Thornhill, N. F. and Hagglund, T. (1997). Detection and diagnosis of oscillation in control loops. Control Eng. Prac., 5:1343–1354. 86 Thornhill, N. F., Huang, B., and Zhang, H. (2003). Detection of multiple oscillations in control loops. Journal of Process Control, 13. Thornhill, N. F., Oettinger, M., and Fedenczuk, P. (1999). Refinerywide control loop performance assessment. Journal of Process Control, 9:109–124. Thornhill, N. F., Shah, S., Huang, B., and Vishnubotla, A. (2002). Spectral principal component analysis of dynamic process data. Control Eng. Prac., 10:833–846. Tucker,W. T., Faltin, F.W., andWiel, V. A. (1993). Algorithmic statistical process control: An elaboration. Technometrics, 35(4):363–375. Venkatasubramanian, V. (2006). http://molecule.ecn.purdue.edu/ lips/research.html. Vishnubotla, A., Shah, S., and Huang, B. (1997). Feedback and feedforward performance analysis of the shell industrial closed loop data set. In Proc. IFAC Adchem 97, page 295, Alberta. Xia, C. and Howell, J. (2003). Loop status monitoring and fault localization. Journal of Process Control, 13(7):679. 87 APPENDIX This appendix lists MATLABTM function help for the inbuilt MATLABTM functions used in the GUI tool discussed in section 3.3. The help files are taken from MATLABTM documentation. Histfit HISTFIT Histogram with superimposed fitted normal density. HISTFIT(DATA,NBINS) plots a histogram of the values in the vector DATA. using NBINS bars in the histogram. With one input argument, NBINS is set to the square root of the number of elements in DATA. H = HISTFIT(DATA,NBINS) returns a vector of handles to the plotted lines. H(1) is a handle to the histogram, H(2) is a handle to the density curve. Boxplot BOXPLOT Display boxplots of a data sample. BOXPLOT(X) produces a box and whisker plot with one box for each column of X. The boxes have lines at the lower quartile, median, and upper quartile values. The whiskers are lines extending from each end of the boxes to show the extent of the rest of the data. Outliers are data with values beyond the ends of the whiskers. BOXPLOT(X,G) produces a box and whisker plot for the vector X grouped by G. G is a grouping variable defined as a vector, string matrix, or cell array of strings. G can also be 88 a cell array of several grouping variables (such as G1 G2 G3) to group the values in X by each unique combination of grouping variable values. BOXPLOT(...,’PARAM1’,val1,’PARAM2’,val2,...) specifies optional parameter name/value pairs: ’notch’ ’on’ to include notches (default is ’off’). ’symbol’ Symbol and color to use for all outliers (default is ’r+’). ’orientation’ Box orientation, ’vertical’ (default) or ’horizontal’. ’whisker’ Maximum whisker length (default 1.5). ’labels’ Character array or cell array of strings containing labels for each column of X, or each group in G. ’colors’ A string or a threecolumn matrix of box colors. Each box (outline, median line, and whiskers) is drawn in the corresponding color. Default is to draw all boxes with blue outline, red median, and black whiskers. Colors are recycled if necessary. ’widths’ A numeric vector or scalar of box widths. Default is 0.5, or slightly smaller for fewer than three boxes. Widths are recycled if necessary. ’positions’ A numeric vector of box positions. Default is 1:n. ’grouporder’ When G is given, a character array or cell array of group names, specifying the ordering of the groups in G. Ignored when G is not given. In a notched box plot the notches represent a robust estimate of the uncertainty about the medians for boxtobox comparison. Boxes whose notches do not overlap indicate that the medians of the two groups differ at the 5out to the most extreme data value within WHIS*IQR, where WHIS is the value of the ’whisker’ parameter and IQR is the interquartile range of the sample. H = BOXPLOT(...) returns the handle H to the lines in the box plot. H has one column per box, consisting of the handles for the various parts of the box. Each column contains 7 handles for the upper whisker, lower whisker, upper adjacent value, lower adjacent value, box, median, and outliers. Example: Box plot of car mileage grouped by country load carsmall boxplot(MPG, Origin) boxplot(MPG, Origin, ’sym’,’r*’, ’colors’,hsv(7)) boxplot(MPG, Origin, ’grouporder’, ... ’France’ ’Germany’ ’Italy’ ’Japan’ ’Sweden’ ’USA’) 89 hist HIST Histogram. N = HIST(Y) bins the elements of Y into 10 equally spaced containers and returns the number of elements in each container. If Y is a matrix, HIST works down the columns. N = HIST(Y,M), where M is a scalar, uses M bins. N = HIST(Y,X), where X is a vector, returns the distribution of Y among bins with centers specified by X. The first bin includes data between inf and the first center and the last bin includes data between the last bin and inf. Note: Use HISTC if it is more natural to specify bin edges instead. [N,X] = HIST(...) also returns the position of the bin centers in X. HIST(...) without output arguments produces a histogram bar plot of the results. The bar edges on the first and last bins may extend to cover the min and max of the data unless a matrix of data is supplied. HIST(AX,...) plots into AX instead of GCA. Class support for inputs Y, X: float: double, single fft FFT Discrete Fourier transform. FFT(X) is the discrete Fourier transform (DFT) of vector X. For matrices, the FFT operation is applied to each column. For ND arrays, the FFT operation operates on the first nonsingleton dimension. FFT(X,N) is the Npoint FFT, padded with zeros if X has less than N points and truncated if it has more. FFT(X,[],DIM) or FFT(X,N,DIM) applies the FFT operation across the dimension DIM. 90 For length N input vector x, the DFT is a length N vector X, with elements X(k) = XN n=1 x(n) exp −j 2 (k−1) (n−1) N ! 1 k N. The inverse DFT (computed by IFFT) is given by: x(n) = 1 N XN k=1 X(k) exp j 2 (k−1) (n−1) N ! 1 n N. qqplot QQPLOT Display an empirical quantilequantile plot. QQPLOT(X) makes an empirical QQplot of the quantiles of the data set X versus the quantiles of a standard Normal distribution. QQPLOT(X,Y) makes an empirical QQplot of the quantiles of the data set X versus the quantiles of the data set Y. H = QQPLOT(X,Y,PVEC) allows you to specify the plotted quantiles in the vector PVEC. H is a handle to the plotted lines. When both X and Y are input, the default quantiles are those of the smaller data set. The purpose of the quantilequantile plot is to determine whether the sample in X is drawn from a Normal (i.e., Gaussian) distribution, or whether the samples in X and Y come from the same distribution type. If the samples do come from the same distribution (same shape), even if one distribution is shifted and rescaled from the other (different location and scale parameters), the plot will be linear. normplot NORMPLOT Displays a normal probability plot. H = NORMPLOT(X) makes a normal probability plot of the data in X. For matrix, X, NORMPLOT displays a plot for each column. H is a handle to the plotted lines. 91 The purpose of a normal probability plot is to graphically assess whether the data in X could come from a normal distribution. If the data are normal the plot will be linear. Other distribution types will introduce curvature in the plot. 92 VITA Anand Vennavelli Candidate for the Degree of Master of Science Thesis: CHARACTERIZATION OF CLOSEDLOOP PROCESS VARIABLE DATA Major Field: Chemical Engineering Biographical: Personal Data: Born on February, 1981 in Hyderabad, India. Education: Graduated with Bachelor of Technology degree in Chemical Engineering from Osmania University, Hyderabad, India, in May 2002; completed the requirements for the Master of Science degree with a major in Chemical Engineering at Oklahoma State University in December 2006. Experience: Worked as project assistant at the Indian Institute of Chemical Technology (IICT), Hyderabad, India, 20022003. Employed by Oklahoma State University, School of Chemical Engineering, as a research assistant, 2003present. Worked as a summer intern at the ConocoPhillips refinery, San Francisco, CA as an advanced controls engineer, summer of 2006. Professional Memberships: Student member of AICHE and ASQ. Name: Anand Vennavelli Date of Degree: December 2006 Institution: Oklahoma State University Location: Stillwater, Oklahoma Title of Study: CHARACTERIZATION OF CLOSEDLOOP PROCESS VARIABLE DATA Pages in Study: 92 Candidate for the Degree of Master of Science Major Field: Chemical Engineering Scope and Method of Study: “Business analysis methods” for controller performance assessment address management’s view of control systems as assets to be managed. These techniques utilize a longterm (weeks or months) view of controller performance with a business emphasis on continuous quality improvement, identification of best practices, and allocation of limited resources for control system maintenance. Business analysis methods are implemented using statistical process control (SPC) and “sixsigma” principles. The focus of this thesis is on the characterization and analysis of data used by business analysis methods for controller performance assessment. To automate the analysis of large industrial closedloop data sets, a graphical user interface tool using MATLABTM has been developed. Findings and Conclusions: This work focuses on the characterization closedloop archived data primarily for use in SPCbased analysis for controller performance assessment. Plots of the closedloop data sets for the advanced process control manipulated variables (APCMVs) exhibit different levels of variability when considered over a long period of time (one year). These periods of variability are termed as “error variability bands.” Changes in the error variability bands are attributable to assignable causes responsible for changes in controller performance. Automatic identification of the error variability bands provides the starting point for further diagnosis and elimination of assignable causes that can lead to real business improvement. This thesis presents four error variability band identification techniques using general purpose statistical tools including histograms, normalprobability plots, quantilequantile plots and the sample autocorrelation function. The performance of these methods is presented using archived refinery data reconstructed on a oneminute sample period for flow, pressure, and temperature loops. The impact of setpoint variability on APC manipulated variables is also illustrated. ADVISER’S APPROVAL : James R. Whiteley
Click tabs to swap between content that is broken into logical sections.
Rating  
Title  Characterization of Closedloop Process Variable Data 
Date  20061201 
Author  Vennavelli, Anand N. 
Department  Chemical Engineering 
Document Type  
Full Text Type  Open Access 
Abstract  Business analysis methods for controller performance assessment are implemented using statistical process control (SPC) and "sixsigma" principles. This work focuses on the characterization closedloop archived data primarily for use in SPCbased analysis for controller performance assessment. Closedloop data sets for the advanced process control manipulated variables (APCMVs) exhibit different levels of variability when considered over a one year period. These periods of variability are termed as "error variability bands." This thesis presents four error variability band identification techniques using general purpose statistical tools including histograms, normal probability plots, quantilequantile plots and the sample autocorrelation function. The performance of these methods is presented using archived refinery data reconstructed on a oneminute sample period for flow, pressure, and temperature loops. The impact of setpoint variability on APC manipulated variables is also illustrated. 
Note  Thesis 
Rights  © Oklahoma Agricultural and Mechanical Board of Regents 
Transcript  CHARACTERIZATION OF CLOSEDLOOP PROCESS VARIABLE DATA By ANAND VENNAVELLI Bachelor of Technology Osmania University Hyderabad, India 2002 Submitted to the Faculty of the Graduate college of the Oklahoma State University in partial fulfillment of the requirements for the Degree of MASTER OF SCIENCE December, 2006 CHARACTERIZATION OF CLOSEDLOOP PROCESS VARIABLE DATA Thesis Approved: Dr. James R. Whiteley (Thesis Adviser) Dr. Russell R. Rhinehart Dr. Karen A. High Dr. Gordon A. Emslie (Dean of the Graduate college) ii ACKNOWLEDGMENTS I would like to express my sincere gratitude to my graduate adviser Dr. Rob Whiteley for his constant support, guidance and motivation. I would also like to thank my graduate advisory committee members Dr. Russell Rhinehart and Dr. Karen High for their valuable inputs and suggestions. Heartfelt thanks to my family members for their unconditional support and encouragement to pursue my interests, even when the interests went beyond boundaries of language and geography. Kudos to all my friends and roomies. You have just been through your worst nightmare! iii TABLE OF CONTENTS Chapter Page 1. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Types of controller performance assessment . . . . . . . . . . . . . . . . . 1 1.1.1 Performance assessment objectives . . . . . . . . . . . . . . . . . 1 1.1.2 Performance assessment input data characteristics . . . . . . . . . . 2 1.2 Contribution of this work . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2. BACKGROUND AND LITERATURE SURVEY . . . . . . . . . . . . . . . . . 6 2.1 Closedloop data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Engineering analysis methods for performance assessment . . . . . . . . . 7 2.2.1 SISO performance measures . . . . . . . . . . . . . . . . . . . . . 7 2.2.2 MIMO performance measures . . . . . . . . . . . . . . . . . . . . 12 2.3 Business analysis methods for controller performance assessment . . . . . 13 2.4 Data characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.5 Research Focus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3. DATA ANALYSIS TECHNIQUES . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.1 Unordered Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.1.1 Histograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.1.2 Normal probability plots . . . . . . . . . . . . . . . . . . . . . . . 22 3.1.3 QuantileQuantile plots (QQ plots) . . . . . . . . . . . . . . . . . 25 iv 3.2 Ordered Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2.1 Autocorrelation function (ACF) . . . . . . . . . . . . . . . . . . . 29 3.3 GUI tool for data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.4 Data analysis tools summary . . . . . . . . . . . . . . . . . . . . . . . . . 37 4. CHARACTERIZATION OF CLOSEDLOOP DATA . . . . . . . . . . . . . . . 38 4.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.2 Variability trends in actuating error . . . . . . . . . . . . . . . . . . . . . . 44 4.3 Unordered analysis results . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.3.1 Identification of variability bands using histograms . . . . . . . . . 48 4.3.2 Identification of variability bands using Normslope . . . . . . . . . 53 4.3.3 Identification of variability bands using qqslope . . . . . . . . . . . 56 4.4 Summary of unordered analysis results . . . . . . . . . . . . . . . . . . . . 59 4.5 Ordered analysis results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.5.1 Approach to ACF . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.5.2 Identification of variability bands using the autocorrelation function (ACF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.6 Effect of controller mode change results . . . . . . . . . . . . . . . . . . . 65 4.6.1 Controller modes . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.7 Discussion of data analysis results . . . . . . . . . . . . . . . . . . . . . . 75 5. CONCLUSIONS AND FUTURE WORK . . . . . . . . . . . . . . . . . . . . . 79 5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 APPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 v LIST OF TABLES Table Page 3.1 Nk pairs of observations for lag k. Illustration for N = 8 and k = 2. . . . . 30 3.2 Description of the functions used in the GUI tool plots. . . . . . . . . . . . 36 4.1 Summary of the information available in the closedloop data sets . . . . . 39 4.2 Compression factors on PV, SP and OP for each data set. . . . . . . . . . . 40 4.3 Error variability bands determined from visual observation. . . . . . . . . . 46 4.4 Percentage output saturation (OP > 90% or OP < 10%). . . . . . . . . . . . 47 4.5 Normslope for each loop calculated monthly. The normslope is a robust measure of the variability in the time series. . . . . . . . . . . . . . . . . . 55 4.6 The qqslope comparison of error spread: monthly to annual error. Composite annual error distribution is considered as the reference with its qqslope = 1.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.7 Number of appreciable ACF coefficients (with a maximum of 181) in the error time series for each data set by month. . . . . . . . . . . . . . . . . . 63 4.8 Number of days in each controller mode for all data sets. The number of error bands are the same as listed in Table. 4.3 . . . . . . . . . . . . . . . . 67 vi LIST OF FIGURES Figure Page 3.1 Histogram of the closedloop error data for the dataset FC1a with superimposed bell curve. . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2 Histogram of 250,000 normally distributed numbers with mean 0 and standard deviation 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.3 Histogram of 200,000 normally distributed numbers with mean 0 and standard deviation 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.4 Histogram of the composite distribution. . . . . . . . . . . . . . . . . . . . 22 3.5 Normal probability plot of 5000 numbers drawn randomly from a standard normal distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.6 Normal probability plot of 5000 numbers randomly drawn from a standard uniform distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.7 QQ plot of 5000 numbers drawn randomly from a standard normal distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.8 QQ plot of 5000 numbers drawn randomly from a uniform distribution between zero and five. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.9 Histogram of 5000 numbers drawn randomly from uniform distribution between zero and five (series X). . . . . . . . . . . . . . . . . . . . . . . . 27 3.10 Histogram of 5000 numbers drawn randomly from uniform distribution between five and fifteen (series Y). . . . . . . . . . . . . . . . . . . . . . . 27 vii 3.11 QQ plot of 5000 numbers drawn randomly from uniform distribution between zero and five (series X) and 5000 numbers drawn randomly from uniform distribution between five and fifteen (series Y). . . . . . . . . . . . 28 3.12 ACF plot for 5000 numbers drawn from standard normal distribution. . . . 31 3.13 Plot of x(t) = sin (5t) for t = 0 to 2 for 638 data points. . . . . . . . . . . . 31 3.14 ACF plot f(x) = sin (5t) for t = 0 to 2 for 638 data points. . . . . . . . . . 32 3.15 Screen shot of GUI tool main screen. . . . . . . . . . . . . . . . . . . . . . 34 3.16 Screen shots of features in the GUI tool. . . . . . . . . . . . . . . . . . . . 35 4.1 Description of flow loops . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.2 Description of pressure loops . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.3 Description of temperature loops . . . . . . . . . . . . . . . . . . . . . . . 43 4.4 Time series of actuating error, FC2 loop from Oct 1, 2003 to Sep 30, 2004. Figure shows presence of different variability bands. . . . . . . . . . . . . 45 4.5 Histogram of actuating error for loop FC4 from Oct. 1, 2003 to Sep. 30, 2004. Figure shows the presence of a peak and heavytails. . . . . . . . . . 48 4.6 Histogram of actuating error for loop PC3 from Oct. 1, 2003 to Sep. 30, 2004. Figure shows the presence of a peak and heavytails. . . . . . . . . . 49 4.7 Histogram of actuating error for loop TC3 from Oct. 1, 2003 to Sep. 30, 2004. Figure shows the presence of a peak and heavytails. . . . . . . . . . 49 4.8 Actuating error time series and histogram with superimposed bell curve for the loop FC2 from Feb. 6, 2004 to Feb. 13, 2004. . . . . . . . . . . . . . . 51 4.9 Actuating error time series and histogram with superimposed bell curve for the loop FC2 from Feb. 14, 2004 to Feb. 24, 2004. . . . . . . . . . . . . . 51 4.10 Actuating error time series and histogram with superimposed bell curve for the loop FC2 from Feb. 6, 2004 to Feb. 24, 2004. . . . . . . . . . . . . . . 51 4.11 Actuating error time series and histogram with superimposed bell curve for the loop PC3 from Dec. 14, 2003 to Mar. 1, 2004. . . . . . . . . . . . . . . 52 viii 4.12 Actuating error time series and histogram with superimposed bell curve for the loop TC3 from Feb. 15, 2004 to Apr. 10, 2004. . . . . . . . . . . . . . 52 4.13 ACF for FC4, actuating error from 10/1/03 to 9/30/04. . . . . . . . . . . . 61 4.14 ACF for PC3, actuating error from 10/1/03 to 9/30/04. . . . . . . . . . . . 61 4.15 ACF for TC2, actuating error from 10/1/03 to 9/30/04. . . . . . . . . . . . 61 4.16 Illustration of typical controller configurations and corresponding controller modes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.17 Configuration of TC3. PV ( F) is an APC MV. The output from the APC controller is the set point to the TC. . . . . . . . . . . . . . . . . . . . . . 68 4.18 TC3 cascade from Oct. 21, 2003 to Oct. 26, 2003 (blue) and auto from Oct. 1, 2003 to Oct. 06, 2003 (red). Different error variability in different controller modes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.19 Configuration of FC1a. PV (bpd) is an APC MV. The output from the APC controller is the set point to the FC. . . . . . . . . . . . . . . . . . . . . . . 70 4.20 FC1a auto from Oct. 1, 2003 to Oct. 9, 2003 (blue) and cascade from Oct. 14, 2003 to Oct. 19, 2003 (red). Same error variability in different controller modes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.21 Configuration of PC2. PV (psig) is an APC MV. The output from the APC controller is the set point to the PC. . . . . . . . . . . . . . . . . . . . . . . 72 4.22 PC2: Bcascade from Oct. 1, 2003 to Oct. 28, 2003 (blue) and cascade from Jan. 22, 2004 to Mar. 5, 2004 (red). Mixture indiscernible. . . . . . . 72 4.23 Impact of setpoint variability introduced by APC on TC3 actuating error. TC3 from Feb. 15, 2004 to Feb. 28, 2004. . . . . . . . . . . . . . . . . . . 74 ix NOMENCLATURE Standard Deviation μ Mean ACF (sample) Autocorrelation Function APC Advanced Process Control CLPA Closedloop Performance Assessment CLPM Closedloop Performance Metric DTW Dynamic Time Warping ECDF Empirical Cumulative Distribution Function Error Actuating Error FCOR Filtering and Correlation Algorithm HI Harris Index HISTFIT Histogram with a superimposed bell curve IE Integrated error IQR Inter Quartile Range LQG Linear Quadratic Gaussian LSL Lower Specification Limit MIMO Multiple Input Multiple Output MVC Minimum Variance Controller Normslope Slope of the normal probability plot OP Controller Output PCA Principal Component Analysis PCI Process Capability Index x PV Process Variable QQ plot QuantileQuantile plot qqslope Slope of the QQ plot SISO Single Input Single Output SP Set point SPC Statistical Process Control USL Upper Specification Limit xi 1. INTRODUCTION There are tens of thousands of controllers employed in the process industries. Most of these are proportionalintegral (PI) controllers. Estimates indicate that 66% to 88% of industrial controllers have performance problems (Harris et al. 1999). Often these problems fail to attract the attention of the personnel who could investigate and improve performance of the controller. Even a 1% improvement in controller performance represents millions of dollars in potential savings to the process industries (Chaudhary et al. 2005). In the United States alone, estimates show that losses to the petrochemical industry from poor monitoring and control exceeds $20 billion per year (Venkatasubramanian 2006). Controller performance assessment therefore has significant economic incentives. 1.1 Types of controller performance assessment 1.1.1 Performance assessment objectives For purposes of this thesis, two distinct types of controller performance assessment are identified. The first type, which we will refer to as “engineering analysis methods,” are the techniques employed by control engineers to identify undesirable dynamic characteristics such as valve stiction, improper controller tuning, and excessive controller action. Engineering analysis methods focus on shortterm response characteristics to individual setpoint changes and process disturbances. Engineering analysis methods calculate performance indexes from highly sampled (Ts 1 second) closedloop process data, as the data contain all the information about the performance of a controller. Commercial products that can perform the engineering 1 analysis have evolved in the recent years and are being continuously improved. These include products by ABB (Loop Performance ManagerTM), Honeywell (Loop ScoutTM), Expertune (Plant TriageTM), ISC (PROBEwatchTM), Matrikon (Process DoctorTM), PAS (Control WizardTM), ProControl Technology (PCT Loop OptimiserTM) and ASPEN (PIDWatchTM). The second type of methods used for controller performance assessment are “business analysis methods.” They address management’s view of control systems as assets to be managed. These techniques utilize a longerterm (weeks or months) view of controller performance with a business emphasis on continuous quality improvement, identification of best practices, and allocation of limited resources for control system maintenance. Business analysis methods are implemented using statistical process control (SPC) and “sixsigma” principles. The focus of this thesis is on the characterization and analysis of data used by business analysis methods for controller performance assessment. 1.1.2 Performance assessment input data characteristics The key variable for both the engineering and the business types of performance assessment when applied to feedback control is the controller error, given by the difference between the measured process variable (PV) and the setpoint (SP). e(k) = PV(k)  SP(k) (1.1) For singleinputsingleoutput (SISO) loops, a well performing loop should reject disturbances and the process variable should closely follow the setpoint. The variability in the controller error in such a case will be a direct indicator of controller performance (Thornhill et al. 1999). For a multivariable control loop, however, the characteristics of the 2 controller error can be different due to the presence of various factors including interactions between loops and constraints on certain variables. An important distinction between the engineering analysis methods and the business analysis methods is in the use of closedloop actuating error variability in different contexts to answer different questions. Engineering analysis techniques are very powerful tools to help the control engineer to assess controller performance. Business analysis methods based on SPC techniques, on the other hand, are tools to help management assess the performance of controllers from a business perspective. SPCbased techniques provide key process performance indicators that facilitate comparison of similar controller configurations within sites, or within units at the same site, and are ultimately aimed at establishing the best practices. Engineering analysis methods are used for continuous performance assessment on a loopbyloop basis using highfidelity closedloop data collected at the same rate as the controller. Business analysis methods, on the other hand, are used for assessing the performance of complex control configurations, such as multivariable controllers. Business analysis methods use archived closedloop data collected over an extended period of time such as an year (Herman 1989) with the intention of identifying opportunities for continuous improvement. The business analysis methods do not require high frequency sampling like the engineering analysis methods but require closedloop data over long periods of time such as a year. Use of archived closedloop data is best suited for this purpose as it is impractical to set up a data collection system over long periods of time over which statistical process control analysis is done. The SPC techniques involve strong assumptions about randomness and the normality of data. Since archived closedloop data are not collected to test any specific statistical hypothesis, they may contain unexpected features and unsuspected correlations between variables. If data historians compress the data for archival, data characteristics may be 3 compromised upon regeneration. It is important to address all of these assumptions because SPC metrics derived from archived process data are used to make important decisions about the control system performance. Only limited work has been done so far in characterizing archived closedloop data that are used in SPCbased performance analysis of regulatory and advanced control loops. The premise of this work is that characterization of archived closedloop data will result in a better understanding and interpretation of quality SPC metrics that are derived from such data. 1.2 Contribution of this work The main contribution of this work is the characterization of closedloop archived data used for SPCbased controller performance analysis with emphasis on trends in actuating error variability over long periods of time. Since most of these performance measures are stochastic in nature, statistical tools should be used to detect statistically significant changes in controller performance. The statistical nature of actuating error distributions and their conformity to the Gaussian model was studied. To answer this, it was necessary to develop a graphical user interface (GUI) tool using MATLABTM to automate the analysis. Statistics to quantify closedloop data characteristics observed in normal probability plots, quantilequantile plots, and autocorrelation function have been proposed. Qualitative visual characteristics like “heavytailedness” and “peakedness” in histograms of error distributions are also presented. Variability trends in actuating error time series can be identified using simple statistical techniques that are easy to automate. Trends in actuating error variability are indicative of changing controller performance. Identification of changing performance is the first step for continuous improvement. This work aims to provide a platform for further diagnosis of the causes for changing controller performances and thus opportunities for real business improvement. 4 1.3 Thesis outline The organization of the rest of the thesis is as follows • Chapter 2 describes closedloop performance analysis and the important role of controller error as the key closedloop performance variable with emphasis on performance assessment using a minimum variance benchmark, and extension of SISO performance measures to MIMO loops. An introduction to the application of SPC based analysis using closedloop data is then presented. • Chapter 3 presents an introduction to the data analysis tools used in this work for closedloop data characterization. The goal of the dataanalysis tools is to enable detection and interpretation of the variability trends observed in closedloop data. • Chapter 4 presents the results of closedloop data characterization studies on industrial data obtained from a petroleum refinery. In this chapter, statistics that describe the variability trends through histograms, normal probability plots, quantilequantile plots and autocorrelation function have been developed. In addition, effects of controller mode changes are also discussed. • Chapter 5 contains conclusions from this work and recommendations for future work. 5 2. BACKGROUND AND LITERATURE SURVEY 2.1 Closedloop data Almost all process industries now employ Distributed Control Systems (DCS) as regulatory control hardware. The closedloop data available through the DCS are usually collected and saved in a separate hardware system referred to as the plant historian. To manage the large demand for storage space, data are usually compressed for archival in the plant historian (Thornhill et al. 2004). Estimates indicate that most chemical process plants require over one hundred gigabytes of storage space to archive one years worth of data (Huang and Shah 1999). The amount of closedloop data available for analysis continues to increase with advances in computers and networks. Properly archived data can be a tremendous source of information. The challenge now is extracting useful information from these closedloop data. All the information about the performance of a controller is contained in the closedloop plant data. A typical industrial process plant has hundreds of control loops. Instrument technicians generally maintain and service these loops, but rather infrequently. Routine maintenance of such loops at optimal settings can save millions of dollars a year (Chaudhary et al. 2005). The development of quality measures of performance for such control loops is therefore an important area of industrial interest. This type of controller performance monitoring also falls in the realm of enterprise asset management. This is from a viewpoint that controllers, whether PID type or advanced, should be treated like other capital assets and monitored on a routine basis. 6 The goal of engineering analysis methods is to ensure that control systems perform according to their specifications. This means that controlled variables meet their operating targets such as specifications on output variability, effectiveness in constraint enforcement or proximity to optimal control. On the other hand, the goal of business analysis methods is to provide opportunities for real business improvement. This is achieved using key process indicators which are fueled by the back propagation of business objectives. In order to further clarify the distinction between the engineering analysis methods and the SPCbased business analysis methods, and for the sake of completeness, a discussion on engineering analysis techniques and types of performance problems addressed by those techniques are presented in section 2.2 2.2 Engineering analysis methods for performance assessment 2.2.1 SISO performance measures This section provides a brief review of engineering analysis methods used to identify undesirable dynamic characteristics such as valve stiction, improper controller tuning, and excessive controller action. Controller performance is frequently characterized by comparing the actual process output variance to the output variance of an optimal controller such as the minimum variance controller. Astrom proposed the minimum variance control (MVC) principle and use of autocorrelation to characterize shortterm controller performance (Astrom 1967, Harris 1989). Harris proposed the use of closedloop data to evaluate and diagnose controller performance using the minimum variance controller as a benchmark. The Harris Index (HI) compares the ratio of the variance of the actuating error signal to the minimum variance achievable by an ideal controller and is denoted as: HI = Current error variance Minimum achievable variance (2.1) 7 The HI indicates best possible control when HI approaches one and no control when HI is large. A modified version of the Harris index that is normalized between 0 and 1 is given by equation 2.2. CLPM = 1− 1 HI (2.2) where: CLPM = Closedloop performance metric HI = Harris Index CLPM = 0 indicates optimal control; CLPM = 1 indicates no control. The advantage of Harris index is that it does well in indicating loops that have oscillation problems. The major disadvantage of the HI or CLPM is that the process time delay or dead time must be known for the loop. Since processes change during operation, this is a major limitation of any minimum variance control benchmark. Hagglund has proposed a method in the time domain, which considers integrated error (IE) between all zero crossings of the signal (Hagglund 1995). If the IE is large enough, a counter is increased. An oscillation is indicated if this counter exceeds a certain threshold. In order to quantify the critical value of the counter above which the change in the counter is statistically significant, the author used ultimate frequency of the loop in question. This method is appealing as it is able to quantify the size of the oscillation. However, it assumes that the loop oscillates at its ultimate frequency which is not always true. Further, the ultimate frequency may not always be known for a loop. Hagglund also proposed an idle index for detecting sluggish loops (Hagglund 1999). This idle index value depends heavily on on the data pretreatment. Kuehl and Horch proposed a data pretreatment procedure using noise filtering for improving the idle index, which is however, limited to detecting sluggishness (Kuehl and Horch 2005). Ko and Edgar suggested an index that computes the ratio of the actual variance and the minimum achievable variance using a PI controller (Ko and Edgar 1998). This approach 8 assumes that a process model is available. A limitation of this method is that the models need to be updated periodically. Kadali proposed the use of Linear Quadratic Gaussian (LQG) benchmark as a more appropriate tool for assessing the performance of controllers (Kadali and Huang 2002). Calculation of the LQG benchmark requires a complete knowledge of the process model, which is often a demanding requirement. Li et al. proposed the use of a chisquared, goodnessoffit statistic to compare the distribution of a performance index within a window of data to a reference run length distribution in order to determine the performance of a controller (Li et al. 2004). A statistically significant change in any section of the distribution, not just an average value, is indicative of a significant change in controller performance. Srinivasan and Rengaswamy proposed a qualitative pattern recognition approach for stiction diagnosis. Stiction in control valves leave distinct qualitative shapes in the controller output (OP) and controlled process variable (PV) data. To classify the patterns that evolve due to stiction, a pattern recognition approach using dynamic time warping (DTW) technique was proposed (Srinivasan and Rengaswamy 2005) Thornhill and Hagglund proposed a set of procedures to detect and diagnose oscillating loops using offline data (Thornhill and Hagglund 1997, Thornhill et al. 2003). They combine techniques of controller performance assessment along with operational signatures (OPPV plots) and spectral analysis of the controller error for diagnosis. These techniques, though not completely automated, can differentiate oscillation caused by poor controller tuning, process nonlinearities, or external disturbances. Inferred loop signatures that are based on spectral analysis or from plots of controller output (OP) versus process variable (PV) have to be manually identified. Recently Paulonis and Cox of Eastman Chemical Company improved the above technique and developed a large scale system to identify and troubleshoot poorly performing control loops (Paulonis and Cox 2003). 9 Xia and Howell proposed the use of signaltonoise ratio indices for the process variable and the output, their ratio R, and the variability in R ( R) to facilitate the status monitoring of PI/PID loops and isolation of the problem loop (Xia and Howell 2003). The major limitation of this statistic is that it assumes regulatory control and fails when there are frequent setpoint changes. Horch presented a simple, practical approach to distinguish oscillating loops caused by external disturbances and stiction (Horch 1999). This approach is based on crosscorrelation between the controller output (OP) and the process output (PV). Horch and Issakson also proposed a technique to identify stiction using nonlinear filters (Horch and Isakkson 1998). The method assumed that information such as mass of the stem, diaphragm area, and so on for each valve is readily available. Since in a typical process industry facility there can be hundreds or thousands of control loops, it may be nearly impossible to build/maintain the required database of control valves, making this technique difficult to implement. Chaudhary et al. used higher order statistics for detecting nonlinearity in data and have extended the method for diagnosing stiction by fitting an ellipse of the OPPV plot and inferring the stiction from an assumed stiction model (Chaudhary et al. 2005). However, the success of this approach lies in correctly identifying the oscillation period and its start and end point in the OPPV data. Huang et al. showed that the minimum feedforward plus feedback control variance can be estimated from routine operating data, and can then be used as a benchmark for performance assessment of feedforward and/or feedback controllers (Huang et al. 2000). Bezergianni and Georgakis proposed a relative variance index that compares actual control to both minimumvariance control and open loop control (Bezergianni and Georgakis 2000). 10 Jain and Lakshminarayanan proposed a novel filterbased method to address the shortcomings of the minimum variance benchmarking and to provide a realistic performance measure using closedloop data (Jain and Lakshminarayanan 2005). Tabe et al. presented an application of acoustic spectral PCA to the monitoring of fermentation process equipment (Tabe et al. 1998). Thornhill et al applied principal component analysis (PCA) of the power spectra of data from chemical processes (Thornhill et al. 2002). Harris et al. reported plant wide control loop assessment in which they found the spectral analysis of the univariate trends to be useful (Harris et al. 1996b). Ingimundarson et al. proposed closedloop monitoring using loop tuning and an extended horizon performance index similar to that used by Thornhill et al. (Ingimundarson and Hagglund 2005, Thornhill et al. 1999). In this method the user selects a prediction horizon and an alarm limit based on loop tuning rather than from the process characteristics. Thornhill et al. discussed the impact of compression on datadriven process analysis (Thornhill et al. 2004). They observed that data compression using the swinging door method changes the statistical properties of the data. The nonorthogonality is not maintained because the reconstructed error is strongly correlated with reconstructed signal. This could be an important observation which questions the use of archived data for analysis. The use of archived data very much depends on the method used for archival and reconstruction. Huang and Shah (Huang and Shah 1999) developed a filtering and correlation algorithm (FCOR) to estimate the minimum variance. A summary of recent work in the area of engineering analysis methods for controller performance assessment has been published by Qin (Qin 1998). 11 2.2.2 MIMO performance measures The extension of performance assessment to multivariable systems has been studied by Harris (Harris et al. 1996a) and Huang and Shah (Huang and Shah 1996). Assessment of minimum variance performance bounds arising from dead times requires the knowledge of the interactor matrix. The interactor matrix allows a multivariate transfer function to be factored into two terms, one having zeros located at infinity and another containing finite zeros. For the multivariate case, it can be shown that the multivariate minimum variance performance can be estimated from routine operating data if the interactor matrix is known (Harris et al. 1999). It is important to note that the interactor matrix is: • not always unique. • cannot always be constructed from the knowledge of the SISO delay structure. Huang and Shah used a performance index using a multivariate extension of the FCOR algorithm (Huang and Shah 1996). The presence of process and controller interactions significantly complicates the analysis and diagnosis in multivariable situations. There has been limited work in diagnosis for the multivariate case. In many cases, multivariable controllers are used where constraints are important. The definition and computation of an appropriate multivariable performance index in these situations remains unresolved. Process control performance assessment measures have tended to compare the total closedloop variance relative to minimum variance control. With the exception of Desborough and Harris, and Vishnubotla et al., little has been done on understanding the decomposition of closedloop variance (Desborough and Harris 1993, Vishnubotla et al. 1997). The multivariate performance assessment measures are nontrivial generalizations of the univariate measures. The diagnosis of multivariate systems has not been thoroughly 12 investigated. The interactive nature of these systems means that this will be a nontrivial task (Kesavan and Lee 1998). 2.3 Business analysis methods for controller performance assessment The closedloop performance metrics discussed in section 2.2 are derived from shortterm characteristics of the data. This section provides an introduction to the statistical process control type of quality metrics used by business for performance assessment. SPC analysis is based on the Shewhart’s concept of twofold variability: ‘chance’ cause variability, which is a random variability inherent in the process, and ‘assignable’ cause variability, which is caused by an external factor. Using an appropriate control chart, for example, we can determine if the variability observed is chance cause or assignable cause. In a period that is void of any assignable causes, a major function of SPC based analysis is to use a process capability index (PCI) to compare the actual performance of a process to specified or desired performance. The PCIs are defined as: Cp = US L− LS L 6 (2.3) and Cpk = min US L−μ 6 , μ− LS L 6 (2.4) 13 Where, Cp : capability ratio defined as the ratio of spread between the specification limits to the natural process limits Cpk : is the capability ratio defined as the distance to the nearest specification (in sigma units) divided by 3.0 US L : upper specification limit LS L : lower specification limit μ : mean of the process : standard deviation of the process The capability ratios assume that the process variable follows a normal distribution so that there is a 99.97% chance that process variable value is within 3sigma units on either side of the mean. The conformity to a normal distribution is an important consideration in the interpretation of the capability ratios. When Cp < 1, the process is not capable and produces some nonconforming product. An improved Cp thus indicates an improved process. From equation 2.3 it is obvious that the capability is inversely proportional to the variability in the process. Therefore, the key to continuous process improvement of a process devoid of assignable causes lies in reducing the variability inherent in the process. Shunta presents the application of SPC in the following manner: “statistical metrics (process capability and process performance) derived from closedloop data determine which of the key variables do not meet the desired performance. The statistics provide a basis to determine if the control strategy needs to be modified or the process changed to gain the improvements” (Shunta 1995). 14 Tucker et al. introduced an algorithmic statistical process control (ASPC) model in which SPC is used as a monitoring tool that obviates the need for APC for a polymerization application (Tucker et al. 1993). Tucker et al, in the same paper, also point out that the ASPC analysis needs an efficient data compression algorithm that facilitates good regeneration of the closedloop data. Lin proposed a process “incapability” index based on a large sample approach as opposed to a process capability index (Lin 2006). This technique is only applicable when the underlying distribution is assumed to be normal. Shore described a new approach using a family of distributions and momentbased fitting procedures to approximate an unknown source distribution and then incorporate the fitted distribution in quality metric calculations (Shore 1998). Such an approach would eliminate the need for normal approximation but would mean that a source distribution has to be fitted for each closedloop series. Ding proposed the use of the first four moments of the closedloop PV data to numerically derive a cumulative distribution function that can be used for process capability index analysis (Ding 2004). Lant and Steffens used closedloop data from a wastewater treatment plant for benchmarking studies (Lant and Steffens 1998). The authors define benchmarking as a “measure” of process control practice, relative to absolute performance measure (world class quality). Such benchmarks can be used to answer questions like: • How good is my process control? • Is it worth improving the control technology? Process capability or performance is inversely proportional to actuating error variability. This means that error variability trends indicate varying controller performance. The SPC based quality metrics are thus used as performance indicators of a control system. They are equally applicable to simple SISO loops and to complex multivariable loops. These 15 techniques utilize a longerterm (weeks or months) view of controller performance with a business emphasis on continuous quality improvement, identification of best practices, and allocation of limited resources for control system maintenance. 2.4 Data characterization Commercial softwares packages such as Aspen WatchTM or Loop ScoutTM implement engineering analysis techniques to assess control loop performance. These techniques use the same data as input to the controller. That is, the sample period of the data used by these methods is the same as employed by the DCS (e.g., 1 second). The length of the analysis period is on the order of the closedloop time constant. Consequently, the analysis may span performance over a 20 minute period and involve a data time series with 1200 measurements. Engineering analysis methods require use of actual rather than archived process data. Commericial products that can perform the engineering analysis are now beginning to incorporate business analysis tools. Since the data required for the SPCbased business analysis is a subset of the data already collected for engineering analysis, such an extension is possible when data are available for long periods. Among other products, AspenTech’s Aspen Watch TM and Expertune’s PlantTriageTM provide options for userdefined “key performance indicators” (KPIs) in addition to inbuilt simple KPIs such as percenttime the controller is in ON or average error variance. Expertune’s PlantTriageTM also allows creating templates for benchmarking and an option to mark the current or historical performance as a benchmark. It should be noted that the KPIs and capability metrics generated should be interpreted on an appropriate time scale (weeks or greater) although they can be generated for any time scale using these products. This requires expensive data storage infrastructure and subsequent maintenance. 16 SPCbased techniques use a long time frame (days/months) to calculate quality metrics from closedloop data. The sample period for the process data used in this type of analysis is much longer than that required for engineering analysis methods. Therefore, archived process data can be used. Techniques for evaluating process capability and performance indexes from closedloop data are in place. The problem of dealing with autocorrelated and nonnormal data for SPC analysis, however, is a concern (Shore 1998). To date, no major efforts to characterize archived closedloop data have been undertaken. The data are readily available and there is no additional cost required in the form of plant tests. It is thus an under utilized resource particularly for performing SPC type of controller assessment. A fundamental tenet of SPC is that the key to achieving process improvement lies in our ability to listen to the data. 2.5 Research Focus The remainder of this thesis addresses the characterization of archived closedloop plant data for SPCtype analysis of controller performance assessment. Chapter 3 describes the analytical tool created to characterize closedloop plant data. Chapter 4 describes the application of the analytical tool to numerous industrial data sets. The results from this work can be applied for identifying variability bands in actuating error time series. Methods to detect and interpret error variability bands using histograms, normal probability plots, quantilequantile plots and the autocorrelation function plots are presented. Finally the effects of controller mode changes on the error distributions are discussed. 17 3. DATA ANALYSIS TECHNIQUES Time series plots of closedloop data immediately give an idea of the center, spread, and certain patterns in the time series such as the presence of outliers or missing data. Depending on the nature of the time series, the plots may also reveal features specific to that time series such as zerocentering in actuating error or saturation in controller output. For a more detailed study of variability, however, additional statistical tools have to be used. This chapter presents a brief review of the data analysis tools used in this work for closedloop data characterization. The statistical analysis techniques have been grouped into the following two categories based on their data treatment: 1. Unordered Analysis: Analysis where the order in which the data occur is ignored. Data grouping (e.g., histograms) or data sorting (e.g., normal probability plots) are examples of unordered analysis techniques. 2. Ordered Analysis: The order of the data is not lost by grouping or sorting. The autocorrelation function is an example of an ordered statistical analysis technique. A graphical user interface (GUI) tool using MATLABTM, v7, R14 with the statistics toolbox was developed at OSU for performing the data analysis. The tool uses MATLABTM’s extensive plotting capabilities for visual presentation of analysis results. The functionality of the tool is discussed in section 3.3 after an introduction to the statistical analysis techniques employed by the tool. 18 3.1 Unordered Analysis Run charts, histograms, probability plots, X and MR control charts, Xbar and R control charts, Xbar and s control charts, process capability analysis, and measurement systems analysis are examples of statistical process control tools used for identification of assignable causes and for continuous process improvement (Hart 2005). Stanton illustrates the use of trend plots and histograms as effective tools in the analysis of process data (Stanton 1990). Miller presents inplant experiences using histograms and probability plots coupled with Xbar and R, and Pareto charts for detecting assignable causes of process variation (Miller 1989). In this section, three unordered analysis techniques: histograms, normalprobability plots and quantilequantile plots are presented. These plots can be generated in MATLABTM using the inbuilt functions histfit, normplot and qqplot respectively. MATLABTM help files for these functions are available in Appendix A. 3.1.1 Histograms The histogram is the simplest graphical representation of the distribution of a time series. The histogram is popular because it is uncomplicated and easy to construct. The histogram offers the advantage of consolidating large amounts of data into bins of chosen width thus revealing the overall features of the time series. The histogram allows for a visual interpretation of many features of the distribution including mean, standard deviation, range, symmetry and presence of peak or heavy tails. Data grouping in histograms is particularly attractive for comparison purposes as we do not want to compare each and every point of the time series but only the general characteristics. A histogram can be used as a powerful visual tool for comparing two distributions, whether we choose to compare the distributions to a standard distribution such as a normal distribution, or if we choose to compare them to each other. In order to 19 facilitate visual comparison of conformity to a normal distribution, a bell curve may be superimposed on the histogram as shown in Figure 3.1. Figure 3.1: Histogram of the closedloop error data for the dataset FC1a with superimposed bell curve. Peakedness and Heavytailedness Heavytailedness refers to observed frequency in the histogram beyond that predicted for three standard deviations on either side of the mean when compared to a normal distribution. Peakedness means that there is a spike or peak observed in the histogram around the mean when compared to a normal (Gaussian) distribution. Heavytailedness and peakedness may result when two or more distributions overlap creating an overall composite distribution. The following simulation shows how mixtures of distributions cause heavytails and a peak in the histogram of the composite distribution. Figure 3.2 presents the histogram for 250,000 normally distributed points with mean 0 and standard deviation 1. Figure 3.3 presents the histogram for 200,000 normally distributed points with mean 0 and standard deviation 3. The heavy tail and the peak in the composite distribution Figure 3.4 are a result of the overlapping of the distributions in Figure 3.2 and Figure 3.3. 20 Figure 3.2: Histogram of 250,000 normally distributed numbers with mean 0 and standard deviation 1. Figure 3.3: Histogram of 200,000 normally distributed numbers with mean 0 and standard deviation 3. 21 Figure 3.4: Histogram of the composite distribution. Disadvantages of histograms The advantage that the histogram offers through grouping can also be a disadvantage when applied to time series analysis. Unlike the distribution of the heights of a class of students, time series data occur in a particular order. A histogram completely disregards this order, and valuable information could be lost in such grouping. Another disadvantage of histograms is the absence of a standard method of choosing bin size. The bin size is often chosen so as to give the best possible visual representation. While there are recommendations on choosing bin size, there is no consensus on a procedure to choose an optimum bin size. Therefore, visual comparison of the features of the histograms needs a thorough understanding of this limitation. 3.1.2 Normal probability plots Normal probability plots present data with the probability of their occurrence if sampled from a normal distribution. The normal probability plot, or the normplot, is plotted on probability paper for easy interpretation. The yaxis does not have a linear scale, but reflects the probabilities expected from a normal distribution for corresponding zscores on the 22 xaxis. For example, the probability of a point having a zscore of 1 or less is 0.158. Similarly, the probability of a point having a zscore of 1 or less is 0.841. These values are obtained from the cumulative density function of the normal distribution. If the time series data are normally distributed, the plot will appear linear. Nonnormal distributions will introduce curvature in the plot. The normal probability plots are used as a tool for graphical normality testing. For a homogeneous distribution, a linear normal probability plot means that the data can be modeled using a normal distribution as the underlying standard distribution. The MATLABTM function normplot() has been used for generating normal probability plots. The plot has the sample data displayed with the plot symbol ’+’. Superimposed on the plot is a line joining the last data points in the first and third quantiles of the data. This line is extrapolated out to the ends of the sample data to help evaluate the linearity of the plot. ‘Normslope’, defined as the slope of this line, can be a useful statistic when deviation from normality is negligible. For a truly normal distribution, normslope is the standard deviation of the data set. Figure 3.5 shows a normal probability plot for 5000 points drawn randomly from a normal distribution of mean zero and standard deviation one. The linearity of the normplot indicates that the data come from a normal distribution. Figure 3.6 shows the normplot for a sample of 5000 points drawn from a standard uniform distribution. Notice that a curvature is introduced into the normplot indicating that the sample is not normally distributed. 23 Figure 3.5: Normal probability plot of 5000 numbers drawn randomly from a standard normal distribution. Figure 3.6: Normal probability plot of 5000 numbers randomly drawn from a standard uniform distribution. 24 3.1.3 QuantileQuantile plots (QQ plots) QQ plots are based on the same principle as normal probability plots. Using QQ plots we can compare the distribution of a time series to any reference distribution. The input to the QQ plots consists of two samples: the time series and a reference time series. If the samples do come from the same distribution type (same shape), even if one distribution is shifted and rescaled from the other (different location and scale parameters), the plot will be linear. The MATLABTM function qqplot() has been used to generate the quantilequantile plots. The plot has the sample data displayed with the plot symbol ’+’. Superimposed on the plot is a line joining the last points in the first and third quantiles of each distribution (this is a robust linear fit of the order statistics of the two samples). This line is extrapolated out to the ends of the sample to help evaluate the linearity of the data. The slope of this line, defined as ‘qqslope’ is a useful statistic to compare the variability between any two distributions. When the qqslope is one, both distributions have the same spread (variability). Figure 3.7 displays quantilequantile plot of a sample X drawn from a normal distribution and a sample Y also drawn from a standard normal distribution. The plot is linear showing that the data sets were drawn from the same distribution. Figure 3.8 displays quantilequantile plot of a sample X drawn from a normal distribution and a sample Y drawn from a uniform distribution between zero and five. A curvature is introduced into the plot showing that the samples were not drawn from the same distribution. X is 5000 points from a uniform distribution between zero and five as shown in Figure 3.9. Sample Y is 5000 points from a uniform distribution between five and fifteen as shown in Figure 3.10. Figure 3.11 displays a quantilequantile plot of two samples, X and Y. If the samples do come from the same distribution, the plot will be linear as shown in the 25 Figure 3.7: QQ plot of 5000 numbers drawn randomly from a standard normal distribution. Figure 3.8: QQ plot of 5000 numbers drawn randomly from a uniform distribution between zero and five. 26 Figure 3.9: Histogram of 5000 numbers drawn randomly from uniform distribution between zero and five (series X). Figure 3.10: Histogram of 5000 numbers drawn randomly from uniform distribution between five and fifteen (series Y). graph. Notice that the QQ plot can identify if the samples are from the same distribution type even if they do not have the same scale on center and spread. The normplots and the QQ plots are computationally more cumbersome than histograms, but are great visual tools. The most attractive feature of QQ plots is that they use quantiles which are based on median and interquartile range rather than mean and standard deviation. They are therefore considered to be robust to extreme values. 27 Figure 3.11: QQ plot of 5000 numbers drawn randomly from uniform distribution between zero and five (series X) and 5000 numbers drawn randomly from uniform distribution between five and fifteen (series Y). 28 3.2 Ordered Analysis 3.2.1 Autocorrelation function (ACF) An important guide to the properties of a time series is provided by a series of quantities called sample autocorrelation coefficients, which measure the correlation between observations at different intervals in the time series. The general formula used for calculating the sample ACF is given as follows (Chatfield 1989): The autocorrelation function (rk) as function of lag (k) is given by: rk = ck c0 (3.1) Where, ck is the autocovariance function, given by: ck = 1 N −k NX−k j=1 (x j − x)(x j+k − x) (3.2) N = total number of points in the time series k = lag x j = value at jth point in the time series x = average of the time series And, c0 is the variance, given by: c0 = 1 N XN j=1 (x j − x)2 (3.3) The (Nk) observations used in the calculation of the autocovariance function (equation 3.2) are selected as shown in Table 3.1. A graph of autocorrelation coefficients (rk) vs lag (k) is known as a correlogram, which is a useful aid in interpreting the autocorrelation coefficients. A correlogram for a normally 29 Table 3.1: Nk pairs of observations for lag k. Illustration for N = 8 and k = 2. distributed data set is shown in Figure 3.12. The autocorrelation coefficient is one at lag zero, which means that a point is completely correlated with itself. For randomly distributed data, as can be seen from the figure, all autocorrelation coefficients at lags greater than zero are nearly zero. The dotted lines show the 95% confidence interval on the autocorrelation coefficients. This means that for a randomly distributed data set, the probability of a nonzero autocorrelation coefficient occurring outside these dotted lines is one out of twenty (5%). Figure 3.14 shows the correlogram for the periodic function x(t)= sin (5t) shown in Figure 3.13. The series consists of 638 points. Figure 3.14 reveals the ability of the ACF to detect cycles in the data. When a time series has a periodic component, it reflects in the ACF as an oscillation. The correlogram is a fundamentally different analysis tool when compared to the histogram. While the histogram completely disregards the order in which the time series occur, the basis of the autocorrelation function is the order of the time series itself. Furthermore, the correlation coefficients are nothing but normalized covariances and are, therefore, representations of variability in the time series. 30 Figure 3.12: ACF plot for 5000 numbers drawn from standard normal distribution. Figure 3.13: Plot of x(t) = sin (5t) for t = 0 to 2 for 638 data points. 31 Figure 3.14: ACF plot f(x) = sin (5t) for t = 0 to 2 for 638 data points. The correlogram can be used to identify the features of a time series that are difficult to capture from the raw trends. The characteristics that could be obtained from a correlogram include: 1. Randomness in series 2. Short term correlation 3. Alternating series 4. Nonstationary (or nonhomogeneous) nature 5. Periodic fluctuations 6. Outliers While the ACF can be used to characterize the time series in all the above mentioned ways, a major disadvantage with using ACF is a lack of uniqueness. Although a given time series has a unique ACF, it is usually possible to find many other time series with the same ACF (Jenkins and Watts 1968). 32 Another feature of ACF is its distortion in the presence of outliers. Every outlier in the time series will cause two extreme coefficients which will tend to depress the sample coefficients towards zero. A comprehensive review on interpreting the correlogram is given in Chatfield (Chatfield 1989). 3.3 GUI tool for data analysis Closedloop data is comprised of the time series of set point (SP), process variable (PV), controller output (OP) and controller mode. The controller mode indicates the active controller configuration at that time. The state of a controller at any time is defined by one of the following four modes: manual, auto, cascade or Bcascade. These modes distinguish control configurations and therefore expectation of data characteristics. For the analysis of closedloop data in a multivariable context, simultaneous comparison of SP, PV, OP and controller mode plots is necessary. In addition, comparisons between distributions, ACFs, or between the time series are needed. For advanced control loops, the setpoint is changing at all times. Furthermore, for advanced control loops, PV and OP constraints come into play. While dealing with massive amounts of data, keeping track of all the trends simultaneously becomes a tedious task. The GUI tool developed at OSU is a convenient way to tackle the above difficulties. It is a broadbased utility tool developed for this analysis. Figures 3.15 and 3.16 are screen shots of some of the features of the tool. The functions used to generate the GUI tool plots are listed in Table 3.2. The capabilities of the tool include: • Six simultaneous plots each with an inputseries choice and a plot choice. • Inputseries choices include actuating error, PV, SP, OP, SP and OP. • Plot choices include: time series, histogram with superimposed bell curve, pdf, normalized pdf, fourier transform, power spectrum, ACF, normplot, QQ plot and boxplot. 33 • An attractive feature of the tool that enables easy comparison is the overlay feature. • Option for outlier removal at ±4 times the standard deviation of the data. • Data cursor option to read the coordinates on any plot. • Interactive plot edit tools such as zoom, pan and 3D rotate. One of the powerful features of the tool is the overlay feature. The overlay feature allows the superimposition of plots during various times on one another. For example, Figure 3.15(c) shows the ACF for three months (October, November, and December) superimposed on each other. This enables simultaneous visual analysis of various plots during multiple periods. The differences in data characteristics during multiple periods can thus be simultaneously analyzed. The overlay feature can be enabled or disabled using the overlay check box as shown in the figure. Figure 3.15: Screen shot of GUI tool main screen. 34 (a) Choose period dialog (b) Choose series listbox (c) Overlay feature is used to superimpose ACF plot for three months (October, November, and December) on each other. Overlay feature enables simultaneous visual analysis of various plots during multiple periods. (d) Choose plot listbox (e) Data cursor feature Figure 3.16: Screen shots of features in the GUI tool. 35 Table 3.2: Description of the functions used in the GUI tool plots. 36 3.4 Data analysis tools summary In this chapter, unordered analysis using histograms with superimposed bell curves, normal probability plots and QQ plots, and ordered analysis using the autocorrelation function for assessing variability have been described. These tools have been incorporated into a GUI tool that enables simultaneous plotting and comparison. Respective advantages and limitations of each of these analysis techniques have also been presented. The next chapter deals with the characterization of industrial closedloop data sets using these exploratory tools followed by discussion. 37 4. CHARACTERIZATION OF CLOSEDLOOP DATA This chapter starts with a description of the industrial closedloop data sets that were analyzed using the tool described in Chapter 3. The second part of the current chapter presents representative results for some of the data sets. In particular, there is a need to identify and characterize error variability bands over sustained periods of time (days and weeks). The existence of such bands as discussed in section 4.2 is indicative of assignable causes in SPC terms (section 2.3). Real business improvements can be achieved by eliminating assignable causes. Recognition of the existence of assignable causes is the first step in their elimination. Methods to reveal trends in error variability have been described. The presence of variability trends (error bands) are shown using the time series trends and the histograms. Results using normal probability plots, QQ plots and the ACF are presented to quantitatively identify the variability trends. Finally, the effect of mode changes on actuating error distributions are presented and discussed. 4.1 Data Archived closedloop data from a major refinery have been obtained for data characterization studies. These are regenerated compressed data from the plant historian at a sample frequency of one min−1. Four sets each of flow, pressure and temperature control loops are available for a period of one year. Each set is comprised of the time series of set point, process variable, controller output and controllermode. Compression factors for the data are also available. The data sets are summarized in the matrix shown in Table 4.1. The compression factors are summarized in Table 4.2. Figures 4.1 through 4.3 describe the 38 Table 4.1: Summary of the information available in the closedloop data sets Filename SP PV OP Mode Comp Period FC1a 4 4 4 4 4 Oct200321Mar2004 FC1b 4 4 4 4 4 22Mar2004Sep2004 FC2 4 4 4 4 4 Oct2003Sep2004 FC3 4 4 4 4 4 Oct2003Sep2004 FC4 4 4 4 4 4 Oct2003Sep2004 PC1a 4 4 4 4 4 Oct200321Mar2004 PC1b 4 4 4 4 4 22Mar2004Sep2004 PC2 4 4 4 4 4 Oct2003Sep2004 PC3 4 4 4 4 4 Oct2003Sep2004 PC4 4 4 4 4 4 Oct2003Sep2004 TC1a 5 5 5 4 4 Oct200321Mar2004 TC1b 5 5 5 4 4 22Mar2004Sep2004 TC2 4 4 4 5 4 Oct2003Sep2004 TC3 4 4 4 4 4 Oct2003Sep2004 TC4 4 4 4 4 4 Oct2003Sep2004 SP: Set point, PV: Process variable, OP: Controller output, Mode: Controller mode 4: Data available as time series. 5: No data available. Comp: Compression closedloop data in more detail. The PV in loops FC1, FC3, PC1, PC2, PC4, TC1, TC2 and TC3 is a manipulated variable in a multivariable controller. The PV in loops FC2, FC4, and TC4 is a controlled variable in a multivariable controller. PC3 is a regulatory loop not used for APC. 39 Table 4.2: Compression factors on PV, SP and OP for each data set. Loop Type Zero Span PV Comp SP Comp OP Comp Units FC1 APCMV 10 21040 50.00   bpd FC2 APCCV 5 100000 500.00 500.00 0.10 bpd FC3 APCMV 10 20400 50.00 50.00 0.50 bpd FC4 APCCV 5 13510 25.00 25.00 0.50 bpd PC1 APCMV 5 70 0.25   psi PC2 APCMV 0 60 0.30 0.10 0.10 psi PC3 Regulatory loop 0 40 0.16 0.05 0.50 psi PC4 APCMV 0 60 0.10 0.30 0.50 psi TC1 APCMV 100 300 0.25   F TC2 APCMV 400 2400 0.25 0.25 0.50 F TC3 APCMV 10 840 0.50 0.25 0.25 F TC4 APCCV 500 300 0.25 1.50 0.50 F 40 Figure 4.1: Description of flow loops 41 Figure 4.2: Description of pressure loops 42 Figure 4.3: Description of temperature loops 43 4.2 Variability trends in actuating error The time series of actuating error (PVSP) over a period of one year for a flow loop (data set FC2) is shown in Figure 4.4. The deviation in the error value about a mean, as often indicated by standard deviation or interquartile range (IQR), is also commonly referred to as ‘variability in error’ or ‘error spread’. From Figure 4.4, we can see that the variability in error from Nov. 2003 to Feb. 2004 is noticeably different from the variability in the actuating error from Mar. 2004 to May 2004. If each period of variability is interpreted as a ‘band’, at least two different bands can be identified in Figure 4.4. A similar result is obtained from plotting the time series for all the flow, pressure and temperature loops. By visual inspection, it is possible to identify the presence of multiple bands in eight of the thirteen data sets. The error variability bands and their corresponding error variability measures (standard deviation and IQR) for all the data sets are summarized in Table 4.3. The number of bands in column two of Table 4.3 were established empirically by visual inspection. Later in the chapter, analogous tables generated analytically will be presented. The mean, standard deviation and the interquartile range for each dataset were calculated using MATLABTM builtin functions. A time series that has more than one band is said to be nonstationary in nature. It will also be referred to as a ‘mixture’ when represented as a distribution. 44 Figure 4.4: Time series of actuating error, FC2 loop from Oct 1, 2003 to Sep 30, 2004. Figure shows presence of different variability bands. 45 Table 4.3: Error variability bands determined from visual observation. Loop #Bands Band# Days Mean STD IQR Units FC1a 1 1 173 0.20 36.60 49.50 bpd FC1b 1 1 187 0.20 34.00 45.90 bpd FC2 2 1 126 6.60 181.50 245.00 bpd 2 234 0.40 115.00 143.40 bpd FC3 3 1 328 0.50 89.00 120.70 bpd 2 17 2.20 123.10 165.00 bpd 3 15 0.80 125.60 169.20 bpd FC4 2 1 171 1.20 19.70 26.60 bpd 2 189 0.00 59.95 82.50 bpd PC1a 2 1 161 0.00 0.05 0.07 psi 2 12 0.00 0.09 0.13 psi PC1b 2 1 112 0.00 0.04 0.06 psi 2 39 0.00 0.06 0.08 psi PC2 1 1 360 0.00 0.21 0.29 psi PC3 3 1 64 0.00 0.02 0.02 psi 2 250 0.01 0.02 0.02 psi 3 46 0.00 0.02 0.03 psi PC4 1 1 360 0.00 0.18 0.24 psi TC2 4 1 60 0.00 0.82 1.11 F 2 33 0.00 0.65 0.88 F 3 130 0.01 0.61 0.82 F 4 137 0.01 0.93 1.25 F TC3 4 1 60 0.01 0.48 0.65 F 2 30 0.01 0.37 0.50 F 3 171 0.01 0.43 0.58 F 4 99 0.02 0.45 0.61 F TC4 1 1 360 0.03 0.78 1.02 F 46 The time series of actuating error indicates how well the loop is doing to keep the process variable value close to the set point. On the other hand, the output time series indicates the effort expended by the controller. While the absolute value of the output time series does not have a context like the actuating error time series, it is essential that the output always stays within limits or remains ‘unsaturated.’ When the output, measured as a percentage, is above 90% or below 10%, the controller is said to be saturated. It is important to realize that when the output of a controller is saturated, the data is characterized as openloop (no control) rather than closedloop. Table 4.4 shows the percentage of the time output is saturated for each of the loops. Table 4.4: Percentage output saturation (OP > 90% or OP < 10%). Loop %OP Saturation FC1a 2.62 FC1b 0.11 FC2 0.76 FC3 50.06 FC4 0.91 PC1a 1.72 PC1b 21.04 PC2 0.61 PC3 12.73 PC4 0.10 TC1a 7.49 TC1b 0.00 TC2 0.05 TC3 0.00 TC4 0.01 47 4.3 Unordered analysis results 4.3.1 Identification of variability bands using histograms This section presents the graphical identification of error variability bands using the ‘peakedness’ and ‘heavytailedness’ in histograms. Figures 4.5 through 4.7 show histograms of the actuating error time series for the FC4, PC3 and TC3 data sets respectively. All three data sets span one year of plant operation (Oct. 1, 2003 to Sep. 30, 2004). In each of the three histograms, it can be seen that the center is located approximately at zero. This is because the actuating error is kept as close to zero as possible by control action. A distinct feature observed in all the histograms is the presence of “heavytailedness” and “peakedness.” Figure 4.5: Histogram of actuating error for loop FC4 from Oct. 1, 2003 to Sep. 30, 2004. Figure shows the presence of a peak and heavytails. 48 Figure 4.6: Histogram of actuating error for loop PC3 from Oct. 1, 2003 to Sep. 30, 2004. Figure shows the presence of a peak and heavytails. Figure 4.7: Histogram of actuating error for loop TC3 from Oct. 1, 2003 to Sep. 30, 2004. Figure shows the presence of a peak and heavytails. 49 The peakedness and heavytailedness observed in the histograms can be explained by considering the distributions over shorter periods of time. Figure 4.8 shows the time series and the histogram for flow loop FC2 from Feb. 6, 2004 to Feb. 13, 2004. Figure 4.9 shows the time series and the histogram for the flow loop FC2 from Feb. 14, 2004 to Feb. 24, 2004. Both time series exhibit a single error band. Likewise, the histograms do not exhibit heavytailedness or peakedness. Furthermore, the data in both time series are well represented by the bell curve. Figure 4.10 shows the time series and the histogram of the composite distribution for the flow loop FC2 from Feb. 6, 2004 to Feb. 24, 2004. It can be seen from the figure that the presence of two error bands in the composite time series translates into a mixed distribution with a peak and heavy tails in the histogram. The deviation from the superimposed bell curve on the histogram indicates that the distribution of the error time series is nonnormal. Figure 4.11 is the composite distribution for pressure loop PC3 from Dec. 14, 2003 to Mar. 1, 2004. The mixture in the time series and the presence of a peak and heavy tails in the histogram can be observed. These results confirm the visual observation of more than one band in the error time series. Figure 4.12 is the composite distribution for temperature loop TC3 from Feb. 15, 2004 to Apr. 10, 2004. In this case, the presence of bands in the time series is distinctly visible, and so are the peak and the heavy tails in the histogram. Significant deviations from the bell curve are also observed. The presence of error variability bands can thus be identified from the peak and the heavytail in the histograms. However, there is no standard way for determining either the bin size used in the histograms or the parameters used for the superimposed bell curve. This makes it difficult to quantify the peak and heavytail. The identification using histograms is therefore limited in its utility. 50 Figure 4.8: Actuating error time series and histogram with superimposed bell curve for the loop FC2 from Feb. 6, 2004 to Feb. 13, 2004. Figure 4.9: Actuating error time series and histogram with superimposed bell curve for the loop FC2 from Feb. 14, 2004 to Feb. 24, 2004. Figure 4.10: Actuating error time series and histogram with superimposed bell curve for the loop FC2 from Feb. 6, 2004 to Feb. 24, 2004. 51 Figure 4.11: Actuating error time series and histogram with superimposed bell curve for the loop PC3 from Dec. 14, 2003 to Mar. 1, 2004. Figure 4.12: Actuating error time series and histogram with superimposed bell curve for the loop TC3 from Feb. 15, 2004 to Apr. 10, 2004. 52 4.3.2 Identification of variability bands using Normslope Deviation from normality can also be judged from the linearity of the normal probability plots. For a distribution that satisfies the normality assumption, the probability of error values occurring between ±1 of mean is 68.27%. Normslope, defined as the slope of the normal probability plot, is also the distance between the points on either side of the mean that contain 68.27% of the population. Therefore, it is a measure of the variability of the distribution but only considering 68.27% of the population. The normslope can be calculated as the slope of the best linear fit of the normal probability plot. If the data are perfectly normal, then the normslope will be equal to the standard deviation of the data. Normslope can be used to quantify the error spread over any period. The advantage of using normslope is that the changes in the error variability can be assessed in engineering units. Since the normslope is based on the center 68.27% of the distribution, the spread estimate is not effected by extreme values. Table 4.5 shows the normslope for all the loops on a month by month basis. This table can be used to identify the error variability bands. Each value of the normslope is an indication of the spread in that month. As an example, consider loop FC3. As indicated in the Table 4.5, at least three bands can be identified in FC3: Oct 2003 and Nov 2003, where the variability in actuating error is about 130 bpd; Jan 2004 through Apr 2004, where the variability in the actuating error is about 80 bpd; and Jun 2004 through Aug 2004, where the variability in the actuating error is about 95 bpd. A similar result is also obtained from the visual observations. The spread estimates generated in Table 4.3 based on visual band identification differ somewhat from the estimates produced by the normslope technique. This is because the normslope considers only the center of the distribution while the statistics in Table 4.3 were generated using the entire time series and may also include outliers. 53 The normslope is thus a single number, robust to extreme values, that allows for simultaneous comparison of variability in different periods. Furthermore, the technique lends itself to full automation. 54 Table 4.5: Normslope for each loop calculated monthly. The normslope is a robust measure of the variability in the time series. 55 4.3.3 Identification of variability bands using qqslope The qqslope can be used to compare two distributions without any reliance on a standard model. The QQ plot characteristics can be used, for instance, to compare the error distribution for each month to the composite distribution for a full year. The idea is to evaluate the current and longterm controller performance. In this case, the annual composite distribution can be considered as the reference or benchmark distribution. The qqslope, analogous to normslope, is the ratio of the distance between the last points on the leastsquares line through the first and third quartiles of the sampled distribution to the distance between the corresponding points on the leastsquares line through the first and third quartiles of the annual (or reference) distribution. This ratio will be an indication of whether the variability of the sampled distribution is greater or smaller than the annual (or reference) distribution. For example, if the qqslope is greater than 1, then the sampled distribution has greater spread than annual distribution. The qqslope ratio thus allows for comparing the spread of the monthly distributions to the annual distribution. If the annual distribution is considered as the ‘average’ characteristic of the loop, then the monthly distribution will determine the deviation from the annual average. The annual distribution can also be replaced with another distribution if it is desired to compare the performance against a period of known good performance. The annual distribution contains all the seasonal, cyclic changes that occur during an year and is thus a natural composite measure. Another possible extension of qqslope could be a comparison of several years of performance to the current year. Table 4.6 shows the qqslope for all loops in each month using the annual distribution as the standard. This table can be used to identify the error variability bands. Each value of the qqslope is the ratio of the spread in that month to the variability in the annual composite distribution. 56 As an example, consider loop FC3. As indicated in the Table 4.6, at least three bands can be identified in FC3: Oct 2003 and Nov 2003, where the variability in actuating error is about 1.3 to 1.4 times the annual; Jan. 2004 through Apr. 2004, where the variability in the actuating error is about 0.85 times the annual; and Jun 2004 through Aug 2004 where the variability in the actuating error is about the same as the annual. This result is consistent with visual observation of the FC3 error time series and the normslope results presented in the previous section. 57 Table 4.6: The qqslope comparison of error spread: monthly to annual error. Composite annual error distribution is considered as the reference with its qqslope = 1.0 58 4.4 Summary of unordered analysis results Nonnormality and the presence of multiple distributions are the two most important characteristics that can be identified from histograms of the time series of actuating error. Nonnormality is indicated by deviation from the bell curve shape. The presence of heavytail and peak indicates the presence of mixtures. Furthermore, the presence of variability bands or mixtures is the cause of heavytailedness and peakedness in the histograms. Nonnormality and presence of mixtures are not necessarily independent characteristics. The presence of mixtures is likely to be one of the causes of nonnormality as it causes an overlap of distributions of different spreads. The disadvantage with the peakedness and the heavytail characteristics of the histogram is that they are not quantifiable. The normslope is a statistic that can be used to detect error variability bands. The normslope uses the normal probability density function to estimate the variability in a given period. The qqslope is analogous to the normplot but can be used to compare any two distributions without the assumption of underlying normality. All of these methods to detect error variability bands give similar results that agree with visual observations. However, histograms, normal probability plots, and QQ plots totally disregard order in the time series. Histograms group the data into bins and sort the bins while normal probability plots and QQ plots sort the numbers themselves. As a result, the original order is lost. The next section describes ordered analysis using the autocorrelation function in which the order of the time series is preserved. 4.5 Ordered analysis results The autocorrelation function uses the information of the order of the time series. Therefore, the analysis using the ACF has been termed as Ordered Analysis. 59 4.5.1 Approach to ACF In this work, the autocorrelation function is not used in the traditional sense. Although the ACF cannot be applied to nonstationary series, it can be used to detect nonstationarity, particularly when there are trends present in the time series. A review of ACF and its applicability as an exploratory tool to detect nonstationary series is available in literature (Chatfield 1989). In addition, the ACF is a discrete function and has values only at integer lags. The figures shown in this work, however, show ACF as a continuous function. Such a representation is only for visual convenience. 4.5.2 Identification of variability bands using the autocorrelation function (ACF) The shape of the autocorrelation function is a characteristic of the time series. Figure 4.13 shows the ACF of the FC4 error time series. The ACF shape resembles a damped oscillation. Figure 4.14 shows the ACF for the PC3 error time series for which the ACF coefficients do not reduce to zero even for very large lag values. Figure 4.15 is the ACF for the TC2 error time series and is markedly different from the FC4 and PC3 distributions. The above examples show that the shape of the ACF is different for different loops. Features in the time series transform into distinct shapes of the ACF. Of the many features of the ACF that are of interest, the number of appreciable ACF coefficients is an important one. The ACF coefficients are appreciable when they are statistically significant from zero (based on 95% confidence limits on the estimation of ACF coefficients). Appreciable ACF coefficients beyond a certain lag mean there are nonrandom effects in the timeseries on a time scale greater than the lag. The nonzero ACF coefficients mean that the error is not random but has a deterministic effect (special cause) embedded in it. 60 Figure 4.13: ACF for FC4, actuating error from 10/1/03 to 9/30/04. Figure 4.14: ACF for PC3, actuating error from 10/1/03 to 9/30/04. Figure 4.15: ACF for TC2, actuating error from 10/1/03 to 9/30/04. 61 The ACF lag is markedly different for the flow, pressure and temperature loops. Even within each loop, the ACF shape could be different from month to month as the process and the controller effectiveness changes. Table 4.7 shows the ACF lag for each loop for each month. The ACF lag is generated by calculating the ACF coefficients up to a lag of 180 min for each loop and then checking to determine the number of ACF coefficients that are appreciable. The confidence limits for the ACF coefficients are given by 1n ± 2 p n (Chatfield 1989). All ACF coefficients that are outside these limits are nonzero (or appreciable). Table 4.7 can be used to check the randomness in error time series. If the number of appreciable ACF coefficients are high in any period, it means that a deterministic effect is in play. Reconsider the FC3 example. As indicated in Table 4.7, the number of appreciable ACF coefficients are unusually high for Oct. 2003 through Dec. 2003 when compared to the rest of the months. This indicates that there is a nonrandom effect in FC3 data from Oct. 2003 to Dec. 2003. Visual observation of the error time series for FC3 from Oct. 2003 thorugh Sep. 2004 confirms the increased variability from Oct. 2003 to Dec. 2003. The results from the ACF coefficients cannot distinguish the bands in error variability as the normslope or the qqslope do. This might be because they consider the entire time series with regard to order as opposed to the unordered analysis which estimate the variability based on a portion of the data. 62 Table 4.7: Number of appreciable ACF coefficients (with a maximum of 181) in the error time series for each data set by month. 63 The autocorrelation function is a measure of the randomness of a time series. When there is a change in the variability of the series, the process has changed. If this change is statistically significant, it shows up as a higher number of appreciable autocorrelation coefficients. The number of appreciable autocorrelation coefficients can be used to detect the presence of multiple distributions in a time series. This method, however, is different from the ordered analysis as it considers all the data, the order in which the data occur, and if the changes in variability are statistically significant. The next section deals with the effect of mode changes on the actuating error distributions. 64 4.6 Effect of controller mode change results The identification of error variability bands using unordered and ordered techniques was presented in the previous sections. These error variability bands indicate the presence of assignable causes responsible for changes in the controller performance. This section deals with the impact of controller mode change, a known assignable cause, on error variability bands. 4.6.1 Controller modes Operators may switch control loop configurations (by turning off an APC controller or operating in openloop mode) when controller performance is unsatisfactory or for maintenance or tuning purposes. The control configurations available to the operator depend on the nature of the loop and the control strategy. Changes in controller configuration are known assignable causes that can produce changes in the width of error variability bands. The controller mode indicates the active controller configuration at that time. Error variability trends in different controller modes thus indicate the performance of the controller in their respective configurations. A change in the controller mode implies a change in the way the process variable is being controlled or manipulated. This change may translate into a change in the width of error variability bands. For the industrial closedloop data sets used in this work, the state of a controller at any time is defined by one of the following four modes: manual, auto, cascade or Bcascade. Each of the controller modes is described by illustration in Figure 4.16. Table 4.8 shows the number of days each of the loops in the individual data sets described in Section 4.1 were in manual, auto, cascade or Bcascade mode. The table also includes the number of error variability bands observed for each loop. Consider a controller whose process variable is configured as an APC manipulated variable. In cascade mode, the controller receives its setpoint from an APC controller and in auto mode the controller receives its setpoint from an operator. Setpoint changes made 65 Figure 4.16: Illustration of typical controller configurations and corresponding controller modes. 66 Table 4.8: Number of days in each controller mode for all data sets. The number of error bands are the same as listed in Table. 4.3 Loop Manual Auto Cascade BCascade Total # of Error (Days) (Days) (Days) (Days) (Days) Bands FC1a 0.0 16.9 156.7 0.0 173.6 1 FC1b 0.3 4.2 186.9 0.0 191.4 1 FC2 1.5 0.2 363.3 0.0 365.0 2 FC3 1.2 26.2 337.6 0.0 365.0 3 FC4 0.0 0.0 365.0 0.0 365.0 2 PC1a 0.0 16.4 157.2 0.0 173.6 2 PC1b 0.0 3.9 187.5 0.0 191.4 2 PC2 2.7 2.2 321.9 38.2 365.0 1 PC3 0.0 365.0 0.0 0.0 365.0 3 PC4 2.8 0.1 362.1 0.0 365.0 1 TC1a 0.2 173.4 0.0 0.0 173.6  TC1b 0.0 191.4 0.0 0.0 191.4  TC3 0.0 142.1 222.9 0.0 365.0 4 TC4 0.0 365.0 0.0 0.0 365.0 1 by the operator in auto mode are relatively infrequent and the process usually completes its response to the setpoint change. As a result the error variability is primarily due to process disturbances. In cascade mode, however, the setpoint is being changed by the APC controller. A new setpoint change occurs before the process completely responds to the previous change, which in turn effects the error variability. As a result, the error variability is attributable not only to process disturbances but also to additional variability introduced by the APC controller. The following four cases illustrate the effect of controller mode changes from cascade (APC on) to auto (APC off) or vice versa on actuating error variability of APCMVs. Case 1  Comparison of TC3 error variability in cascade and auto modes: The process variable for data set TC3 is the outlet temperature of a furnace and is configured as a manipulated variable in an APC controller. The configuration of TC3 is shown in the Figure 4.17. 67 Two periods, with the controller in cascade and auto modes, respectively, are selected for error variability comparison. Figure 4.18 shows the actuating error time series, ACF, error histogram, and the setpoint time series for loop TC3 from Oct. 1, 2003 to Oct. 6, 2003 (red) and from Oct. 21, 2003 to Oct. 26, 2003 (blue). The controller is in auto mode during the first period (red) and in cascade mode in the second (blue). For TC3, which is an APCMV, the configurations in auto and cascade controller Figure 4.17: Configuration of TC3. PV ( F) is an APC MV. The output from the APC controller is the set point to the TC. modes can be described as follows: Cascade: Output from the APC controller is the setpoint to the TC loop. Auto: Operator provides the setpoint to the TC loop. As expected, the error variability in the auto mode is less than the variability in cascade mode for loop TC3. The constant setpoint changes in cascade mode, generated by the APC controller, introduce additional variability in the TC3 actuating error. This is confirmed by the histogram as well as the time series in Figure 4.18. Case 2  Comparison of FC1a error variability in cascade and auto modes: 68 Figure 4.18: TC3 cascade from Oct. 21, 2003 to Oct. 26, 2003 (blue) and auto from Oct. 1, 2003 to Oct. 06, 2003 (red). Different error variability in different controller modes. The process variable for data set FC1a is the flow rate of a side draw from a fractionation column. FC1a is configured as a manipulated variable in an APC controller as shown in the Figure 4.19. Two periods, with the controller in cascade and auto modes respectively, are selected for error variability comparison. Figure 4.20 shows the actuating error time series, ACF, error histogram, and the setpoint time series for loop FC1a from Oct. 1, 2003 to Oct. 9, 2003 (blue) and Oct. 14, 2003 to Oct. 19, 2003 (red). The controller is in the auto mode during the first period (blue) and in cascade mode in the second (red). For FC1a, which is an APCMV, the configurations in auto and cascade controller modes can be described as follows: Cascade: Output from the APC controller is the setpoint to the FC loop. 69 Auto: Operator provides the setpoint to the FC loop. Figure 4.19: Configuration of FC1a. PV (bpd) is an APC MV. The output from the APC controller is the set point to the FC. It can be observed from the error distribution and the actuating error timeseries that the error variability is not very different in the auto and cascade modes. In this case, the closedloop dynamics of the flow control loop are sufficiently fast to be completed well within the 1 min APC sample time. Therefore, noticeable increase in variability was not introduced in cascade mode by the APC controller as in Case 1. Case 3  Comparison of PC2 error variability in Bcascade and cascade modes: The process variable for data set PC2 is the fuel gas pressure in the inner loop of a cascade temperature controller TC for adjusting the furnace outlet temperature as shown in the Figure 4.21. The temperature controller is an APCMV. When the APC controller is on, both the temperature controller and PC2 are in cascade mode. When the APC controller is off, the temperature controller is in auto mode and PC2 is in Bcascade mode. For PC2, the configurations in Bcascade and cascade controller modes can be described as follows: 70 Figure 4.20: FC1a auto from Oct. 1, 2003 to Oct. 9, 2003 (blue) and cascade from Oct. 14, 2003 to Oct. 19, 2003 (red). Same error variability in different controller modes. Cascade: Output from the APC controller is the setpoint to the TC loop. The output from the TC loop is the setpoint to PC2. BCascade: APC controller is turned off. Regular cascade arrangement. Operator provides the setpoint to the TC loop. The output from the TC loop is the setpoint to PC2. Figure 4.22 shows the actuating error time series, ACF, error histogram, and the setpoint time series for loop PC2 from Oct. 1, 2003 to Oct. 28, 2003 (blue) and from Jan. 22, 2004 to Mar. 5, 2004 (red). The controller is in Bcascade mode during the first period (blue) and in cascade mode in the second (red). 71 Figure 4.21: Configuration of PC2. PV (psig) is an APC MV. The output from the APC controller is the set point to the PC. Figure 4.22: PC2: Bcascade from Oct. 1, 2003 to Oct. 28, 2003 (blue) and cascade from Jan. 22, 2004 to Mar. 5, 2004 (red). Mixture indiscernible. 72 It is observed from the error histogram and the actuating error timeseries that error variability is not very different in the Bcascade and cascade modes. In this case, there is little change in the output variability of the temperature controller (setpoint to PC2) in the cascade mode when compared to the variability in auto mode. This result means that, in this case, there was no appreciable change in the performance of the secondary controller whether the setpoint to the primary controller was set by the operator or APC. Case 4  Impact of setpoint variability introduced by APC on TC3 actuating error: TC3 is an APCMV whose configuration was described in Case 1. In this case, the TC3 controller is in the cascade mode at all times in the period selected. Figure 4.23 shows the actuating error time series, ACF, error histogram, and the setpoint time series for loop TC3 from Feb. 15, 2004 to Feb. 17, 2004 (blue) and Feb. 17, 2004 to Feb. 28, 2004 (red). The blue and red periods are chosen such that the setpoint variability in the blue period is higher than the setpoint variability in the red period. Towards the end of the blue period, it can be seen from the time series plots of the setpoint and the actuating error that reduced setpoint variability translates into reduced error variability. This explains the presence of two error variability bands in the blue period. The red ACF has appreciable coefficients at higher lags which indicates the presence of a nonrandom effect, which in this case is the presence of outliers. The blue ACF shows a cycle which is the evidence of a latent cycle in the data. The time series, the histogram and the appreciable ACF coefficients all indicate greater variability in the blue period. This result confirms the observations from previous cases that setpoint variability has translated into actuating error variability. 73 Figure 4.23: Impact of setpoint variability introduced by APC on TC3 actuating error. TC3 from Feb. 15, 2004 to Feb. 28, 2004. 74 Based on the four different cases presented, it is observed that setpoint variability may translate into actuating error variability. Even though setpoint variability can vary depending on APC control action (Case 4), a controller mode change generally produces a change in the setpoint variability (Cases 1,2 and 3) due to the change in the controller configuration. Therefore, controller mode changes can have a significant impact on actuating error variability and business analysis metrics such as process capability, Cp, or process performance, Pp. In particular, the setpoint variability could be an important factor when assessing error variability of PVs that are APCMVs. 4.7 Discussion of data analysis results Actuating error, the key variable for SPCbased performance assessment, shows different bands of variability when considered over a long period of time. These error variability bands indicate the presence of assignable causes that are responsible for changes in controller performance. Methods to identify error variability bands using two approaches, ordered and unordered analysis, have been presented in this chapter. Unordered analysis totally disregards the order of the time series and involves grouping or sorting of the data. Histograms, normal probability plots and QQ plots are examples of unordered analysis. Ordered analysis considers the order in which the time series occur. Autocorrelation function is an example of an ordered analysis. Error variability band identification using ordered and unordered analysis can be summarized as follows: Histograms: Histogram with a superimposed bell curve can be used as a visual tool to identify error variability bands. The presence of heavytails and a peak compared to the superimposed bell curve on the histogram indicate the presence of multiple distributions. The multiple distributions are a direct result of the presence of variability bands. The disadvantage with the histogram is that it is difficult to quantify the heavytail and peak characteristics. Histograms of the error distributions also reveal that the distributions can be significantly nonnormal in the presence of bands. 75 While there are many reasons for nonnormality, the presence of mixtures itself introduces some degree of nonnormality. Normslope: The normslope, defined as the slope of the normal probability plot, is a quantitative measure that can be used to detect error variability bands. If the data are perfectly normal, then the normslope will be equal to the standard deviation of the data. The advantage of using normslope is that the changes in the error variability can be assessed in engineering units. The normslope is based on the center 68.27% of the distribution. Therefore, the spread estimate is not effected by extreme values. The normslope is a single number, robust to extreme values, that allows for simultaneous comparison of variability in different periods. qqslope: The qqslope, a quantitative measure analogous to normslope, is the ratio of the distance between the last points on the leastsquares line through the first and third quartiles of the sampled distribution to the distance between the corresponding points on the leastsquares line through the first and third quartiles of the reference distribution. This ratio will be an indication of whether the variability of the sampled distribution is greater or smaller than the reference distribution. The qqslope can be used to detect error variability bands without any reliance on a standard model. ACF: The autocorrelation function, which is an ordered analysis technique, determines the randomness of a time series. When there is a change in the variability of the series, the process has changed. If this change is statistically significant, it shows up as a change in the number of appreciable autocorrelation coefficients. This change can be used to detect the presence of multiple distributions. This method, however, is different from the ordered analysis as it considers all the data, the order in which the data occur, and if the changes in variability are statistically significant. Visual and quantitative methods to detect error variability bands have been presented in the first part of this chapter. All the unordered methods to detect error variability bands give 76 similar results that agree with visual observations. Histograms are easy to compute but are limited in their utility since the heavytail and peak characteristics are difficult to automate. Normslope and qqslope are similar techniques that are capable of full automation. For the data sets considered, both normslope and qqslope are effective ways to detect error variability bands. The bands identified using normslope and qqslope are in excellent agreement with visual observations. Normslope and qqslope are also robust to outliers since they both consider the center of the distribution. The ACF is a complimentary tool for detecting error variability bands, deterministic effects in the time series or latent cycles in the time series. The calculation of the ACF can also be fully automated. The ACF, however, is not robust to outliers. The sample ACF is also used in most of the commercial products as an engineering analysis tool to estimate the finite impulse response of the process. The use of histograms, probability plots and ACF for error band identification is unique in this work. Although normslope, qqslope and the ACF can be calculated for any time scale, the application of these techniques to smaller time periods is not recommended, as it falls outside the realm of SPC analysis. Archived data are not suitable for an analysis on a short time scale. Changes in controller configuration (controller modes) are assignable causes that can cause the performance of the controller to change. The impact of changing the controller mode on error variability bands is presented in the second part of this chapter. The controller mode indicates the active controller configuration at that time. Case studies on controller mode changes show that setpoint variability may translate into actuating error variability. Even though setpoint variability can vary depending on APC control action, a controller mode change generally produces a change in the setpoint variability due to the change in the controller configuration. Therefore, controller mode changes can have a significant impact on actuating error variability and business analysis metrics such as 77 process capability, Cp, or process performance, Pp. In particular, the setpoint variability could be an important factor when assessing error variability of PVs that are APCMVs. 78 5. CONCLUSIONS AND FUTURE WORK This work focuses on the characterization of closedloop archived data for use in SPCbased analysis for controller performance assessment. Twelve closedloop industrial data sets obtained from a petroleum refinery were used in the analysis. The contributions of this work include: 1. Development of a graphical user interface (GUI) tool for data analysis. The capabilities of the GUI tool include: • Six simultaneous plots each with an inputseries choice and a plot choice. • Inputseries choices include actuating error, process variable, setpoint, output, change in setpoint and change in output. • Plot choices include: time series, histogram with superimposed bell curve, pdf, normalized pdf, fourier transform, power spectrum, ACF, normplot, qqplot and boxplot. • One of the powerful features of the tool is the overlay feature. The overlay feature allows the superimposition of plots during various times on one another. The differences in data characteristics during multiple periods can thus be simultaneously analyzed. • Option for outlier removal at ±4 times the standard deviation of the data. • Data cursor option to read the coordinates on any plot. • Interactive plot edit tools such as zoom, pan and 3D rotate. 79 2. Application of the GUI tool on 12 industrial data sets for characterization studies. The conclusions are summarized in Section 5.1. 3. Demonstration of the ability to identify error variability bands in the closedloop data sets using histograms, normslope, qqslope and the sample autocorrelation function. 4. Demonstration through case studies, the effect of APC controllers on the error variability of APC manipulated variables. Recommendations for future work are summarized in Section 5.2. 5.1 Conclusions Actuating error variability is the key variable for controller performance assessment. Changes in the error variability indicate changes in controller effectiveness. Different levels of variability during different periods in the time series are termed as error variability bands. Error variability bands are common in the actuating error time series of manipulated variables in an advanced process controller (APCMVs) when considered over a long period of time. Eight of the twelve data sets analyzed are APCMVs. Of these, five APCMVs (FC3, PC1, TC1, TC2 and TC3) contain multiple error variability bands. These error variability bands imply the presence of nonhomogeneity in the closedloop data. Since SPCbased metrics involve strong assumptions about homogeneity and normality of data, the implications of the presence of error variability bands on SPC metrics cannot be ignored. Actuating error distributions for the five APCMVs which contain error variability bands are nonnormal. While there are many reasons for nonnormality, the presence of variability bands causes some degree of deviation from normality. However, actuating error series of APCMVs not containing error variability bands, and other time series in time periods devoid of bands were well approximated by the normal distribution. This further emphasizes that the SPC performance metrics should be limited to the bands. 80 Histogram, normslope, qqslope and sample ACF are the four methods proposed in this work for the identification of error variability bands. Normslope and qqslope are similar statistics that are capable of full automation. For all the data sets considered, both normslope and qqslope are effective ways to detect error variability bands. The bands identified using normslope and qqslope are in agreement with visual observations. Normslope and qqslope are also robust to outliers since they consider the center of the distribution. The ACF is a complimentary tool for detecting error variability bands, deterministic effects or latent cycles in the time series. The calculation of the ACF can also be fully automated. The ACF, however, is not robust to outliers. Case studies also show that setpoint variability as a result of APC controllers can be translated into actuating error variability for APCMVs which have relatively slow dynamics (temperature loops). It is therefore not sufficient to base performance metrics on actuating error variability alone. The closedloop data should be used collectively to provide greater context. The GUI tool developed at OSU as a part of this work is an excellent tool for such simultaneous analysis and was used for all the case studies in this work. 5.2 Future Work The scope of this work has been primarily datadriven. Therefore, the extension of these results to similar data sets is limited. A theoretical approach would facilitate the extension of the results to similar control configurations. The actuating error (process variable minus set point) distribution is a function of the joint distribution of process variable and setpoint and the relationship between process variable and setpoint (through the controller output). Analytical expressions for the error probability density function (distribution model) generated from several possible process variable and setpoint probability density functions, will help in understanding the nature of the error distributions for different base case process variable and setpoint distributions. 81 A valued addition to this work would be the study of data handling procedures. Data compression is a major issue when using archived closedloop data. Excessive data compression is a concern as it could compromise closedloop data characteristics. More study is needed in this area to understand the effect of compression particularly on error variability bands. Similarly, an understanding of the minimum sampling frequency required for SPCbased analysis has several potential benefits of improved data handling. The amount of data required would be greatly reduced with such an understanding. For instance, one years worth of data sampled at 1 min would be 500,000+ data points. If the sampling frequency of 2 min would achieve the same result, only 250,000+ data points need to be handled. With hundreds of loops over long periods of time in question, using the minimum sampling frequency greatly simplifies data handling problems. 82 BIBLIOGRAPHY Astrom, K. J. (1967). Computer control of a paper machine  an application of linear stochastic control theory. IBM Journal, page 389. Bezergianni, S. and Georgakis, C. (2000). Controller performance assessment based on minimum and openloop output variance. Control Eng. Prac., 8:791–797. Chatfield, C. (1989). The analysis of time series, an introduction. Chapman and Hall, NY, 4th edition. Chaudhary, S. M., Thornhill, N. F., and Shah, S. (2005). Modelling valve stiction. Control Engineering Practice, 13:641–658. Desborough, L. and Harris, T. J. (1993). Performance assessment measures for univariate feedforward/feedback control. Canadian Journal of Chemical Engineering, 71:605. Ding, J. (2004). A method of estimating process capability index from the first four moments of nonnormal data. Quality and Reliability Engineering International, 20:787–805. Hagglund, T. (1995). A controlloop performance monitor. Control Eng. Prac., 3:1543– 1551. Hagglund, T. (1999). Automatic detection of sluggish control loops. Control Eng. Prac., 7:1505–1512. Harris, T. (1989). Assessment of control loop performance. Can. J. Chem Eng., 67:856– 861. 83 Harris, T., Seppala, C., and Desborough, L. D. (1999). A review of performance monitoring and assessment techniques for univariate and mutlivariate control systems. Journal of Process Control, 9:1–17. Harris, T. J., Boudreau, F., and MacGregor, J. (1996a). Performance assessment of multivariable feedback controllers. Automatica, 32:1505. Harris, T. J., Seppala, C., Jofreit, P. J., and Surgenor, B. W. (1996b). Plantwide feedback control performance assessment using an expert system framework. Control Eng. Prac., 9:1297–1303. Hart, M. (2005). Learning by doing: A series of handson projects for spc. Quality Engineering, 17(1):127–137. Herman, J. T. (1989). Capability index  enough for process industries? In ASQC Quality Congress Transactions, Toronto. Horch, A. (1999). A simple method for detection of stiction in control valves. Control Eng. Prac., 7:1221. Horch, A. and Isakkson, A. (1998). A method for detection of stiction in control valves. In IFAC workshop on online fault detection and supervision in the chemical process industry, France. Huang, B. and Shah, S. (1996). Performance limits: practical control loop performance assessment. In Proceedings of American Institute of Chemical Engineers annual meeting, Chicago. Huang, B. and Shah, S. (1999). Performance assessment of control loops. Springer, London. 84 Huang, B., Shah, S., and Miller, R. (2000). Feedforward plus feedback controller performance assessment of mimo systems. IEEE transactions on control systems technology, 8(3):580–587. Ingimundarson, A. and Hagglund, T. (2005). Closed loop performance monitoring using loop tuning. Journal of Process Control, 15:127–133. Jain, M. and Lakshminarayanan, S. (2005). A filter based approach for performance assessment and enhancement of siso control systems. Ind. Eng. Chem. Res., 44:8260– 8276. Kadali, R. and Huang, B. (2002). Controller performance analysis with lqg benchmark obtained under closed loop conditions. ISA transactions, 41:512–532. Kesavan, P. and Lee, J. (1998). Diagnostic tools for multivariate model based control systems. IE & C Research. Ko, B. and Edgar, T. (1998). Assessment of achievable pi control performance for linear processes with dead time. In American Control Conf., Philadelphia, PA. Kuehl, P. and Horch, A. (2005). Detection of sluggish control loopsexperiences and improvements. Control Eng. Prac., 13:1019–1025. Lant, P. and Steffens, M. (1998). Benchmarking for process control: Should i invest in process control? Water Science and Technology, 37(12):49–54. Li, Q., Whiteley, J., and Rhinehart, R. (2004). An automated performance monitor for process controllers. Control Eng. Prac. Lin, G. H. (2006). Process performance assessment based on subsamples  a large sample approach. Int J Adv Manuf Technology, 27:1223–1227. 85 Miller, T. (1989). Statistical process control in food processing. Proceedings of the ISA/89 International Conference and Exhibition: Advances in Instrumentation and Control, pages 1081–1089. Paulonis, M. A. and Cox, W. J. (2003). A practical approach for large scale controller performance assessment, diagnosis and improvement. Journal of Process Control, 13(2):155. Qin, S. (1998). Control performance monitoringa review and assessment. Computers and Chemcial Engineering, 23:173–186. Shore, H. (1998). A new approach to analyzing nonnormal quality data with application to process capability analysis. Int J Prod. Res., 36(7):1917–1933. Shunta, J. P. (1995). Achieving world class manufacturing through process control. Prentice Hall, New Jersey. Srinivasan, R. and Rengaswamy, R. (2005). Control loop performance assessment, a qualitative approach for stiction diagnosis. Ind. Eng. Chem. Res., 44:6708–6718. Stanton, B. D. (1990). Using historical data to justify controls. Hydrocarbon processing, 69(6):57–60. Tabe, H. T., C., C. K., Tan, K., Zhang, J., and Thornhill, N. F. (1998). Dynamic principal component analysis using integral transforms. In American Institute of Chemical Engineers annual meeting, Miami Beach. Thornhill, N. F., Choudhary, M., and Shah, S. (2004). The impact of compression on datadriven process analyses. Journal of Process Control, 14:389–398. Thornhill, N. F. and Hagglund, T. (1997). Detection and diagnosis of oscillation in control loops. Control Eng. Prac., 5:1343–1354. 86 Thornhill, N. F., Huang, B., and Zhang, H. (2003). Detection of multiple oscillations in control loops. Journal of Process Control, 13. Thornhill, N. F., Oettinger, M., and Fedenczuk, P. (1999). Refinerywide control loop performance assessment. Journal of Process Control, 9:109–124. Thornhill, N. F., Shah, S., Huang, B., and Vishnubotla, A. (2002). Spectral principal component analysis of dynamic process data. Control Eng. Prac., 10:833–846. Tucker,W. T., Faltin, F.W., andWiel, V. A. (1993). Algorithmic statistical process control: An elaboration. Technometrics, 35(4):363–375. Venkatasubramanian, V. (2006). http://molecule.ecn.purdue.edu/ lips/research.html. Vishnubotla, A., Shah, S., and Huang, B. (1997). Feedback and feedforward performance analysis of the shell industrial closed loop data set. In Proc. IFAC Adchem 97, page 295, Alberta. Xia, C. and Howell, J. (2003). Loop status monitoring and fault localization. Journal of Process Control, 13(7):679. 87 APPENDIX This appendix lists MATLABTM function help for the inbuilt MATLABTM functions used in the GUI tool discussed in section 3.3. The help files are taken from MATLABTM documentation. Histfit HISTFIT Histogram with superimposed fitted normal density. HISTFIT(DATA,NBINS) plots a histogram of the values in the vector DATA. using NBINS bars in the histogram. With one input argument, NBINS is set to the square root of the number of elements in DATA. H = HISTFIT(DATA,NBINS) returns a vector of handles to the plotted lines. H(1) is a handle to the histogram, H(2) is a handle to the density curve. Boxplot BOXPLOT Display boxplots of a data sample. BOXPLOT(X) produces a box and whisker plot with one box for each column of X. The boxes have lines at the lower quartile, median, and upper quartile values. The whiskers are lines extending from each end of the boxes to show the extent of the rest of the data. Outliers are data with values beyond the ends of the whiskers. BOXPLOT(X,G) produces a box and whisker plot for the vector X grouped by G. G is a grouping variable defined as a vector, string matrix, or cell array of strings. G can also be 88 a cell array of several grouping variables (such as G1 G2 G3) to group the values in X by each unique combination of grouping variable values. BOXPLOT(...,’PARAM1’,val1,’PARAM2’,val2,...) specifies optional parameter name/value pairs: ’notch’ ’on’ to include notches (default is ’off’). ’symbol’ Symbol and color to use for all outliers (default is ’r+’). ’orientation’ Box orientation, ’vertical’ (default) or ’horizontal’. ’whisker’ Maximum whisker length (default 1.5). ’labels’ Character array or cell array of strings containing labels for each column of X, or each group in G. ’colors’ A string or a threecolumn matrix of box colors. Each box (outline, median line, and whiskers) is drawn in the corresponding color. Default is to draw all boxes with blue outline, red median, and black whiskers. Colors are recycled if necessary. ’widths’ A numeric vector or scalar of box widths. Default is 0.5, or slightly smaller for fewer than three boxes. Widths are recycled if necessary. ’positions’ A numeric vector of box positions. Default is 1:n. ’grouporder’ When G is given, a character array or cell array of group names, specifying the ordering of the groups in G. Ignored when G is not given. In a notched box plot the notches represent a robust estimate of the uncertainty about the medians for boxtobox comparison. Boxes whose notches do not overlap indicate that the medians of the two groups differ at the 5out to the most extreme data value within WHIS*IQR, where WHIS is the value of the ’whisker’ parameter and IQR is the interquartile range of the sample. H = BOXPLOT(...) returns the handle H to the lines in the box plot. H has one column per box, consisting of the handles for the various parts of the box. Each column contains 7 handles for the upper whisker, lower whisker, upper adjacent value, lower adjacent value, box, median, and outliers. Example: Box plot of car mileage grouped by country load carsmall boxplot(MPG, Origin) boxplot(MPG, Origin, ’sym’,’r*’, ’colors’,hsv(7)) boxplot(MPG, Origin, ’grouporder’, ... ’France’ ’Germany’ ’Italy’ ’Japan’ ’Sweden’ ’USA’) 89 hist HIST Histogram. N = HIST(Y) bins the elements of Y into 10 equally spaced containers and returns the number of elements in each container. If Y is a matrix, HIST works down the columns. N = HIST(Y,M), where M is a scalar, uses M bins. N = HIST(Y,X), where X is a vector, returns the distribution of Y among bins with centers specified by X. The first bin includes data between inf and the first center and the last bin includes data between the last bin and inf. Note: Use HISTC if it is more natural to specify bin edges instead. [N,X] = HIST(...) also returns the position of the bin centers in X. HIST(...) without output arguments produces a histogram bar plot of the results. The bar edges on the first and last bins may extend to cover the min and max of the data unless a matrix of data is supplied. HIST(AX,...) plots into AX instead of GCA. Class support for inputs Y, X: float: double, single fft FFT Discrete Fourier transform. FFT(X) is the discrete Fourier transform (DFT) of vector X. For matrices, the FFT operation is applied to each column. For ND arrays, the FFT operation operates on the first nonsingleton dimension. FFT(X,N) is the Npoint FFT, padded with zeros if X has less than N points and truncated if it has more. FFT(X,[],DIM) or FFT(X,N,DIM) applies the FFT operation across the dimension DIM. 90 For length N input vector x, the DFT is a length N vector X, with elements X(k) = XN n=1 x(n) exp −j 2 (k−1) (n−1) N ! 1 k N. The inverse DFT (computed by IFFT) is given by: x(n) = 1 N XN k=1 X(k) exp j 2 (k−1) (n−1) N ! 1 n N. qqplot QQPLOT Display an empirical quantilequantile plot. QQPLOT(X) makes an empirical QQplot of the quantiles of the data set X versus the quantiles of a standard Normal distribution. QQPLOT(X,Y) makes an empirical QQplot of the quantiles of the data set X versus the quantiles of the data set Y. H = QQPLOT(X,Y,PVEC) allows you to specify the plotted quantiles in the vector PVEC. H is a handle to the plotted lines. When both X and Y are input, the default quantiles are those of the smaller data set. The purpose of the quantilequantile plot is to determine whether the sample in X is drawn from a Normal (i.e., Gaussian) distribution, or whether the samples in X and Y come from the same distribution type. If the samples do come from the same distribution (same shape), even if one distribution is shifted and rescaled from the other (different location and scale parameters), the plot will be linear. normplot NORMPLOT Displays a normal probability plot. H = NORMPLOT(X) makes a normal probability plot of the data in X. For matrix, X, NORMPLOT displays a plot for each column. H is a handle to the plotted lines. 91 The purpose of a normal probability plot is to graphically assess whether the data in X could come from a normal distribution. If the data are normal the plot will be linear. Other distribution types will introduce curvature in the plot. 92 VITA Anand Vennavelli Candidate for the Degree of Master of Science Thesis: CHARACTERIZATION OF CLOSEDLOOP PROCESS VARIABLE DATA Major Field: Chemical Engineering Biographical: Personal Data: Born on February, 1981 in Hyderabad, India. Education: Graduated with Bachelor of Technology degree in Chemical Engineering from Osmania University, Hyderabad, India, in May 2002; completed the requirements for the Master of Science degree with a major in Chemical Engineering at Oklahoma State University in December 2006. Experience: Worked as project assistant at the Indian Institute of Chemical Technology (IICT), Hyderabad, India, 20022003. Employed by Oklahoma State University, School of Chemical Engineering, as a research assistant, 2003present. Worked as a summer intern at the ConocoPhillips refinery, San Francisco, CA as an advanced controls engineer, summer of 2006. Professional Memberships: Student member of AICHE and ASQ. Name: Anand Vennavelli Date of Degree: December 2006 Institution: Oklahoma State University Location: Stillwater, Oklahoma Title of Study: CHARACTERIZATION OF CLOSEDLOOP PROCESS VARIABLE DATA Pages in Study: 92 Candidate for the Degree of Master of Science Major Field: Chemical Engineering Scope and Method of Study: “Business analysis methods” for controller performance assessment address management’s view of control systems as assets to be managed. These techniques utilize a longterm (weeks or months) view of controller performance with a business emphasis on continuous quality improvement, identification of best practices, and allocation of limited resources for control system maintenance. Business analysis methods are implemented using statistical process control (SPC) and “sixsigma” principles. The focus of this thesis is on the characterization and analysis of data used by business analysis methods for controller performance assessment. To automate the analysis of large industrial closedloop data sets, a graphical user interface tool using MATLABTM has been developed. Findings and Conclusions: This work focuses on the characterization closedloop archived data primarily for use in SPCbased analysis for controller performance assessment. Plots of the closedloop data sets for the advanced process control manipulated variables (APCMVs) exhibit different levels of variability when considered over a long period of time (one year). These periods of variability are termed as “error variability bands.” Changes in the error variability bands are attributable to assignable causes responsible for changes in controller performance. Automatic identification of the error variability bands provides the starting point for further diagnosis and elimination of assignable causes that can lead to real business improvement. This thesis presents four error variability band identification techniques using general purpose statistical tools including histograms, normalprobability plots, quantilequantile plots and the sample autocorrelation function. The performance of these methods is presented using archived refinery data reconstructed on a oneminute sample period for flow, pressure, and temperature loops. The impact of setpoint variability on APC manipulated variables is also illustrated. ADVISER’S APPROVAL : James R. Whiteley 



A 

B 

C 

D 

E 

F 

I 

J 

K 

L 

O 

P 

R 

S 

T 

U 

V 

W 


