

small (250x250 max)
medium (500x500 max)
Large
Extra Large
large ( > 500x500)
Full Resolution


MIXEDEFFECTS MODELING OF SHORTLEAF PINE (PINUS ECHINATA MILL.) GROWTH DATA By CHAKRA BAHADUR BUDHATHOKI Bachelor of Science Tribhuvan University, Nepal 1992 Diploma in Statistics The University of Reading, England 1996 Master of Science in Biometry The University of Reading, England 1997 Submitted to the Faculty of the Graduate College of the Oklahoma State University in partial fulfillment of the requirements for the Degree of DOCTOR OF PHILOSOPHY July, 2006 ii MIXEDEFFECTS MODELING OF SHORTLEAF PINE (PINUS ECHINATA MILL.) GROWTH DATA Dissertation Approved: Dr. Thomas B. Lynch ________________________________________________ Thesis Advisor Dr. David K. Lewis ________________________________________________ Dr. Robert F. Wittwer ________________________________________________ Dr. Mark E. Payton ________________________________________________ Dr. A. Gordon Emslie ________________________________________________ Dean of the Graduate College iii PREFACE The objectives of this dissertation are to review mixedeffects models in forestry literature, and to use these analysis tools for modeling shortleaf pine (Pinus echinata Mill.) growth variables. Data were available from over 200 permanent plots established in naturallyregenerated shortleaf pine stands of eastern Oklahoma and western Arkansas. It is evident from the review of literature that the mixedeffects models have been increasingly used in forestry since the 1990’s. However, these tools have never previously been used in shortleaf pine growth and yield modeling. This dissertation project was successful in expanding two of the major components of growth and yield modeling for shortleaf pine. The distanceindependent individualtree model for annual basal area growth model of Lynch et al. (1999) was improved to incorporate randomeffects for plots in a potentialmodifier framework with stand level and tree level explanatory variables. Furthermore, the individualtree model for total height of Lynch et al. (1999) was also expanded to have plotspecific random parameters for the relationship between total height and diameter at breast height in which dominant height is also a predictor. The fitted mixedeffects models for annual basal area growth and total height were found to fit the data, and to predict the responses better than the previous models reported by Lynch et al. (1999). Possible correlated and/or heterogeneous withinplot errors were also investigated. It was found that within iv plot errors did not appear to be significantly correlated in the presence of plot randomeffects; however, there was some evidence of heterogeneous errors. These model estimates can be utilized in the Shortleaf Pine Stand Simulator developed at Oklahoma State University after fitting other components such as mortality functions in mixed modeling framework. Due to voluminous data, complexity of the models attempted and some degree of correlations among model parameters, computing limitations were experienced at times, resulting in model convergence issues. v ACKNOWLEDGMENTS First and the foremost, I am grateful to my major advisor Dr. Tom Lynch for his guidance, encouragement and cooperation throughout my graduate study, both coursework and research at Oklahoma State University. Thank you Dr. Lynch! Thanks are also due to Drs. Mark Payton, David Lewis and Robert Wittwer for helping and encouraging me as advisory committee members. I also would like to thank the Department of Forestry at OSU and its faculty and staff for providing all the facilities, including a Graduate Research Assistantship and Dr. Michel Afanasiev Graduate Fellowship that made my study possible. The assistance of the USDA Forest Service Southern Research Station and the Ozark and Ouachita National Forests in data collection and financial support for the study is much appreciated. I also appreciate the friendship of all graduate students in the Department of Forestry and Environmental Science graduate program at OSU. I cannot say enough in words how much love and help I have received from my wife Reena and son Saubhagya. Thanks for understanding me as a husband and as a father. I am thankful to my parents and brother Milan for their understanding of my study pursuit for such a long period of about four years. Our stay in Stillwater would not have been so comfortable, had there not been a small but coherent community of Nepalese friends and families. Thank you all for your time and help that we received. vi TABLE OF CONTENTS Chapter Page I. INTRODUCTION......................................................................................................1 Forest Growth and Yield Modeling .........................................................................3 Objectives ................................................................................................................6 II. REVIEW OF LITERATURE………………………………………………………7 Background..............................................................................................................7 Growth Data........................................................................................................7 Modeling Biological Growth ..............................................................................8 Types of Forest Growth Models ............................................................................13 DensityFree StandLevel Models ....................................................................13 VariableDensity StandLevel Models .............................................................13 Diameter Class Models .....................................................................................17 IndividualTree Models ....................................................................................18 DistanceDependent IndividualTree Model ...............................................19 DistanceIndependent IndividualTree Model.............................................21 UnevenAged Stand Modeling .........................................................................30 Forest Growth and Yield Simulation Systems.......................................................32 Mixed Models ........................................................................................................34 Fixed vs. RandomEffects.................................................................................34 Longitudinal Data/Repeated Measures .............................................................35 Correlated and/or Heterogeneous Errors ..........................................................38 Application of Mixed Models in Forestry .............................................................39 Estimation/Computational Methods ......................................................................49 III. DATA AND DATA MANAGEMENT.................................................................50 Study Area and Establishment of Plots..................................................................50 Measurements/Observations ..................................................................................54 Repeated Measurements ........................................................................................55 Data Management ..................................................................................................57 vii IV. METHODS FOR DATA ANALYSIS AND MODELING ..................................64 MixedEffects Models ...........................................................................................65 Linear Mixed Model ..........................................................................................67 Nonlinear Mixed Model.....................................................................................68 Extended Mixed Models ....................................................................................70 Estimation and Computing ....................................................................................71 Model Comparison Criteria ...................................................................................74 Error Modeling.......................................................................................................75 Correlated Errors.................................................................................................75 Spatial Correlation Structure for WithinGroup Errors ......................................76 Heterogeneous Errors..........................................................................................78 Model Checking.....................................................................................................80 An Approach to Mixed Model Development ........................................................80 Exploratory Data Analysis.....................................................................................82 Fitting the Growth Models.....................................................................................83 Basal Area Growth Model ..................................................................................83 Total Height Model.............................................................................................86 Fitting the Extended Models..................................................................................87 Calibration and Validation.....................................................................................90 Summary Statistics for Calibration Data Set ......................................................90 Summary Statistics for Validation Data Set .......................................................92 V. RESULTS ...............................................................................................................94 Basal Area Growth Model .....................................................................................94 Calibration Results..............................................................................................94 Extended Models with Period and Plot RandomEffects ...............................98 Extended Models with Power Variance Function ..........................................99 Extended Models with Spatial Correlation Function for WithinPlot Errors ...................................................................................104 Extended Models with Variance and Spatial Correlation Functions ............105 Validation Results.............................................................................................105 Residuals vs. Predicted Values .....................................................................106 Residuals vs. Tree Basal Area ......................................................................109 Residuals by Design Criteria ........................................................................111 Total Height Model..............................................................................................119 Calibration Results............................................................................................119 Validation Results.............................................................................................120 VI. DISCUSSION AND CONCLUSIONS...............................................................125 Basal Area Growth Model ...................................................................................125 Extended Model with Period and Plot RandomEffects ..................................128 viii Extended Model Accounting for Variance Heterogeneity...............................129 Extended Model with RandomEffects for Growth Period and Plots Accounting for Variance Heterogeneity .................................................130 Residual Analysis for Basal Area Growth Models..........................................131 Total Height Model..............................................................................................136 Residual Analysis for Height Model II............................................................137 Discussion............................................................................................................139 Conclusions..........................................................................................................145 LITERATURE CITED..............................................................................................147 APPENDIX I .............................................................................................................163 APPENDIX II ............................................................................................................167 ix LIST OF TABLES Table Page 2.1 Some selected individual tree growth and yield simulation models/systems available for the United States (Adapted from Davis et al. 2001) .................................................................................................33 3.1 Study design for naturally regenerated evenaged shortleaf plots installed from 1985 to 1987 with thinning and herbicide treatment in eastern Oklahoma and western Arkansas (Lynch et al. 1999)...............................................................................................52 3.2 Summary statistics for 208 plots (Studies 48 and 58 combined) for plot/stand level variables................................................................................61 3.3 Summary statistics for 208 plots (Studies 48 and 58 combined) for individual tree variables .................................................................................62 4.1 List of fitted basal area growth models................................................................89 4.2 Summary statistics of plot level variables for calibration data set with 139 plots.......................................................................................................91 4.3 Summary statistics of tree level variables for calibration data set with 139 plots.......................................................................................................91 4.4 Summary statistics of plot level variables for validation data set with 69 plots.........................................................................................................92 4.5 Summary statistics of tree level variables for validation data set with 69 plots.........................................................................................................93 5.1 Parameter estimates and other associated statistics for BAG Model I for calibration dataset (total degrees of freedom (df) =10268, residual standard deviation (SD) = 0.006839469, residual df = 10261).............................................................................................95 5.2 Summary statistics for fitted basal area growth models from calibration dataset ................................................................................................96 x 5.3 Parameter estimates and other associated statistics for BAG Model II for calibration dataset (number of plots = 139, residual df = 10124, and total observations = 10269)..........................................97 5.4 Fixedeffect parameter estimates and related statistics for BAG Model VI for calibration dataset (number of growth period=2, plots within period = 278, residual df = 9986, and total observations = 10269)..........................................................................................98 5.5 Parameter estimates and other associated statistics for BAG Model VII for calibration dataset (number of plots = 139, residual df = 10124, and total observations = 10269) .....................................................101 5.6 Parameter estimates and other associated statistics for BAG Model VIII for calibration dataset (number of plots = 139, residual df = 10124, and total observations = 10269)........................................102 5.7 95% confidence intervals for the parameters in BAG Model VIII....................103 5.8 Parameter estimates and other associated statistics for BAG Model X for calibration dataset (number of periods = 2, plots within period = 278, residual df = 9986, and total observations = 10269) ..............................103 5.9 Summary statistics for candidate basal area growth models using validation dataset ...............................................................................................106 5.10 Summary of results for Height Model I without randomeffects for plots for calibration dataset (total observations = 5968, residual df = 5964, residual variance=22.8269) ..............................................................119 5.11 Summary of results for Height Model II with randomeffects for plots for calibration dataset (total observations = 5968, group df = 138)..........120 6.1 Summary statistics for fitted basal area growth models using complete dataset (total observations = 15,669, and number of plots = 208)........................................................................................125 6.2 Parameter estimates and other associated statistics for BAG Model I for complete dataset (number of plots = 208, total observations=15,669, and residual df = 15,661).................................................126 6.3 Parameter estimates and other associated statistics for BAG Model II for complete dataset (number of plots = 208, total observations = 15,669, and residual df = 15,455)...............................................127 xi 6.4 Parameter estimates and other associated statistics for BAG Model VI for complete dataset (number of periods = 2, plots within period = 416, residual df = 15248, and total observations = 15,669)........................................................................................128 6.5 Parameter estimates and other associated statistics for BAG Model VII for complete dataset (number of plots = 208, total observations = 15,669, and residual df = 15,455)...............................................129 6.6 Parameter estimates and other associated statistics for BAG Model X for complete dataset (number of plots = 208, total observations = 15,669, and residual df = 15,248)...............................................130 6.7 Parameter estimates from SAS PROC NLIN for complete dataset (total observations = 8964, and residual variance = 21.5688) ............................136 6.8 Parameter estimates from PROC NLMIXED for complete dataset (total observations = 8964, and number of groups = 208)..................................136 xii LIST OF FIGURES Figure Page 1.1 General distribution of shortleaf pine the US .........................................................2 3.1 Shortleaf pine growth study locations in eastern Oklahoma and western Arkansas (Lynch et al. 1991)...................................................................51 3.2 A representation of 0.2 acres measurement plot with two 5milacre plots (USDA Forest Service 1995) .......................................................................53 3.3 Histogram of midDBH for the first growth period..............................................63 3.4 Histogram of midDBH for the second growth period .........................................63 5.1 Plot of residuals vs. tree basal area for BAG Model II.......................................100 5.2 Plot of standardized residuals vs. fitted values for BAG Model II.....................107 5.3 Plot of standardized residuals vs. fitted values for BAG Model VI ...................107 5.4 Plot of standardized residuals vs. fitted values for BAG Model VII ..................108 5.5 Plot of standardized residuals vs. fitted values for BAG Model X.....................108 5.6 Plot of standardized residuals vs. tree basal area for BAG Model II..................109 5.7 Plot of standardized residuals vs. tree basal area for BAG Model VI ................109 5.8 Plot of standardized residuals vs. tree basal area for BAG Model VII...............110 5.9 Plot of standardized residuals vs. tree basal area for BAG Model X .................110 5.10 Boxplots of standardized residuals by age class for BAG Model II.................111 5.11 Boxplots of standardized residuals by age class for BAG Model VI ...............112 5.12 Boxplots of standardized residuals by age class for BAG Model VII..............112 5.13 Boxplots of standardized residuals by age class for BAG Model X.................113 xiii 5.14 Boxplots of standardized residuals by site index class for BAG Model II.......113 5.15 Boxplots of standardized residuals by site index class for BAG Model VI .....114 5.16 Boxplots of standardized residuals by site index class for BAG Model VII....114 5.17 Boxplots of standardized residuals by site index class for BAG Model X.......115 5.18 Boxplots of standardized residuals by basal area class for BAG Model II ......116 5.19 Boxplots of standardized residuals by basal area class for BAG Model VI.....116 5.20 Boxplots of standardized residuals by basal area class for BAG Model VII....117 5.21 Boxplots of standardized residuals by basal area class for BAG Model X ......117 5.22 Plot of standardized residuals vs. predicted values for total height model.......121 5.23 Plot of standardized residuals vs. diameter at breast height for total height model........................................................................................121 5.24 Plot of standardized residuals vs. dominant height for total height model.......122 5.25 Boxplots of standardized residuals for Height Model II by age classes....................................................................................................122 5.26 Boxplots of standardized residuals for Height Model II by site index classes ..........................................................................................123 5.27 Boxplots of standardized residuals for Height Model II by stand basal area classes ................................................................................124 6.1 Standardized residuals vs. fitted values for BAG Model II (complete dataset) .............................................................................................131 6.2 Standardized residuals vs. fitted values for BAG Model VI (complete dataset) .............................................................................................132 6.3 Standardized residuals vs. fitted values for BAG Model VII (complete dataset) .............................................................................................132 6.4 Standardized residuals vs. fitted values for BAG Model X (complete dataset) .............................................................................................133 6.5 Boxplots of standardized residuals for BAG Model II (complete dataset) .............................................................................................133 xiv 6.6 Boxplots of standardized residuals for BAG Model VI (complete dataset) .............................................................................................134 6.7 Boxplots of standardized residuals for BAG Model VII (complete dataset) .............................................................................................135 6.8 Boxplots of standardized residuals for BAG Model X (complete dataset) .............................................................................................135 6.9 Plot of standardized residuals vs. predicted values for total height model from complete dataset .................................................................138 6.10 Plot of standardized residuals vs. DBH for total height model from complete dataset.......................................................................................138 1 CHAPTER I INTRODUCTION Forests have been managed since time immemorial for direct benefit of human kind, although some forests are kept untouched for several indirect (or even unknown) reasons. Forest management involves one or more than one objective such as timber production, biodiversity and water supply. With increasing understanding of basic sciences and increasing demand for resources, forest management is becoming more and more important (Davis et al. 2001). Progressive forest management requires a good understanding of how individual trees and stands grow over time under different environmental conditions. Therefore, forest growth and yield modeling is an integral part of forest management, at least in developed countries. Shortleaf pine (Pinus echinata Mill.) is second to loblolly pine (Pinus taeda L.) in terms of volume of the southern pines in the United States. It has the widest range of any southern yellow pine in the US. It grows in 22 states over more than 440,000 square miles (1,139,600 km²), ranging from southeastern New York to eastern Texas. Despite its importance, there has been relatively little research and management effort compared to other southern pines (Willet 1986). The overall distribution of shortleaf pine in the US is presented in Figure 1.1. 2 Figure 1.1. General distribution of shortleaf pine in the US (Source: Western North Carolina Nature Center, http://wildwnc.org/trees/Pinus_echinata.html March 30, 2006) Shortleaf pine is adapted to a range of geography, soils, topography and habitats; however, individual trees grow best on deep welldrained soils of the Upper Coastal Plain, and the most prominent shortleaf communities are found in the Ouachita Highlands (Guldin 1986). Natural stands of shortleaf pine occur from almost sea level to 3,300 feet (1,006 m). Favorable average temperature for growth ranges from 480 to 700F, with minimums of 220F and maximums of 1020F. A rainfall range of 40 to 55 inches per year is common in areas where shortleaf stands occur naturally (Williston and Balmer 1980). It is believed that the Ouachita Mountains of Arkansas and Oklahoma originally contained the largest shortleaf pine forest in the world. According to an estimate, shortleaf and shortleafhardwood stands must have covered approximately 5,000 square 3 miles, which is approximately 50% of the Ouachita Mountains, until the middle of the twentieth century (Smith 1986). Shortleaf pine is common on nonindustrial private ownerships in Oklahoma and Arkansas, but is important in industrial ownerships also. It is native to 20 counties of Oklahoma. Shortleaf pine is also the state tree of Arkansas (ODAFS 2000). It is a common observation that loblolly pine is preferred over shortleaf pine for commercial reasons due to higher growth rates at younger ages. However, shortleaf pine continues to be an important pine species in the south, especially in naturally regenerated forests (McWilliams et al. 1986). Forest Growth and Yield Modeling Forest resource management goals can be achieved only after proper assessment of available resources. Forest resource assessment is becoming increasingly quantitative, and more reliable estimate of resources scattered in a wider area can be made. The estimation procedure can be improved both in the field during data collection and in the office during data management and analysis. This can be achieved by following rigorous scientific techniques and methods. For resource assessment, standard methods in existence can be followed, although it is a subject of continuous research. With the advent of technologies in other areas (e.g. computing) research methods in forestry are also changing over time. Researchers are using techniques like remote sensing and geographic information systems for forest resource assessments at a larger scale. 4 We can recognize the following two extremes in forest growth modeling in terms of the scale of unit of sampling and observation. 1. Models based on individual plant level information for biological processes that attempt to generalize to growth models that could be used for economically important practical applications. These are generally called physiological process models. 2. Models based on large scale geographic information systems such as models utilizing satellite imagery data that attempt to estimate or predict stand or forest level information that could be used for practical applications. A complete issue (number 3) of Volume 49 of Forest Science (2003) has been devoted to research publication for this aspect, although a wide array of literature are available from elsewhere. Forest growth models in general have often been regressiontype statistical models utilizing site index, tree and stand measurements. These traditionally popular growth models often ignore environmental variables such as climatic factors. Such typical forest growth models can possibly be considered to fall somewhere in between the two extremes mentioned above. Forest growth data are typically longitudinal in nature resulting from repeated measurements in permanent plots or approximately longitudinal in nature resulting from crosssectional data from tree measurements in stands of different ages. Traditional forest growth and yield models have been used to predict the present as well as future states of forest conditions. It has been common to use data from temporary 5 plots, however permanent plot studies are becoming more popular. Modeling techniques are also being refined to make use of new developments in the areas of statistics and computing. Forest growth and yield models can be developed either for natural stands or for plantations. The models for natural stands could be either for evenaged or unevenaged stands. Similarly, the models for plantations could be either for thinned or unthinned stands (Clutter et al. 1983). According to Davis et al. (2001), forest growth and yield models can be classified as follows. (1) Whole stand models: (a) densityfree, and (b) variabledensity (2) Diameter class models (3) Individual tree models: (a) distancedependent, and (b) distanceindependent These models are reviewed, and a short description is presented in Chapter II (Review of Literature). This dissertation presents the work of analysis of shortleaf pine growth measurements data from eastern Oklahoma and western Arkansas. There have been considerable growth studies in the past on shortleaf pine including Murphy (1982, 1986), Lynch et al. (1991, 1999), Murphy et al. (1992), and Lynch and Murphy (1995). Moreover, this project attempts to make an improvement over the previous work and to revise the parameter estimates, especially on the work reported by Lynch et al. (1999). Previous growth model parameters for shortleaf pine have generally been fitted using ordinary least squares methods. Shortleaf pine individualtree models fitted by ordinary least squares have not accounted for plotlevel grouping of individual observations. Mixedeffects models can use random plot effects to account for grouping in this data structure. Thus, it is expected that better parameter estimates can be obtained using more 6 advanced statistical tools of mixed modeling. Furthermore, this work incorporates a third measurement of permanent plots that was not used by Lynch et al. (1999). It is expected that the resulting estimates could be used for better prediction of the shortleaf pine growth for Oklahoma and Arkansas areas. Objectives This dissertation concerns development of individualtree mixedeffects models for evenaged shortleaf pine. It improves on what has been done in the past on shortleaf pine growth and yield modeling. Specifically, the following are the objectives of this project: (1) Review the past work in shortleaf pine growth and yield modeling, and mixedeffects modeling work in forestry in general (2) Improve on past work of individualtree growth model for evenaged shortleaf pine by developing a nonlinear mixedeffects growth model with plotlevel randomeffects that takes into account possible spatially correlated and heterogeneous errors using a sample (calibration) of complete data set, and with minimal explanatory variables that are easy to measure in the field (3) Make further assessments of the model and verification using an independently selected (validation) data set (4) Carry out model checking and diagnostics, and arrive with final estimates from the complete data set (5) Write and present the work with recommendations in a format that is common for the scientific audience in the field of forestry 7 CHAPTER II REVIEW OF LITERATURE Background The general objective of this chapter is to review aspects of forest growth and yield modeling relevant to development of shortleaf pine growth models. Moreover, specific objectives are to review shortleaf pine modeling in general, and mixedeffects modeling in forestry. The following areas will be reviewed, but emphasis will remain in application of mixed models in forestry. • Forest growth and yield models in general • Shortleaf pine growth and yield modeling • Mixedeffects models in general • Application of mixedeffects models in forestry Growth Data In the terminology of Moser and Hall (1969), forestry data can be classified as: (1) real growth series, (2) abstract growth series, and (3) approximated real growth series. When a complete chronological data series from establishment (e.g. planting) to harvest for a tree or stand is available, then it is called a “real growth series.” On the other hand, 8 when data are available from different temporary plots representing different age groups and sites, then the data are called an “abstract growth series.” Ideally, one would prefer data from real growth series, but due to time and expense factors, real growth series data are not common. On the other hand, abstract growth series data may not be of high quality. Therefore, it is common to have an “approximated real growth series.” An approximated real growth series is in between real and abstract growth series in which remeasurements are utilized from several age groups and sites. In this method, data are collected from a wide range (in terms of age and site) of permanent plots, but a particular plot would not have data from a complete life history of trees. Data for this study are of approximated real growth series nature, which will be discussed in Chapter III. Repeated measurements from permanent plots have different sources of variation. The error component in a linear model from such measurements can be regarded to consist of plot, time, and residual random variations (Gregoire 1987). Modeling Biological Growth A model is a description of a system. A system can be defined as any collection of interrelated objects or units in which observations are made. Therefore, a description is a signal that can be interpreted by humans. That is, a system is anything humans wish to understand and models are one tool that facilitates the understanding (Haefner 1996). Mathematical models not only should fit the reallife data well, but also should provide some practically useful interpretation. Irrespective of the nature of the model, whether empirical or theoretical, the modeler should be careful in selecting explanatory variables to provide realistic and robust predictions (Vanclay 1994). Typically, nonlinear 9 regression models of biological growth should have parameters that can be interpreted in terms of biological significance. In growth modeling, one would be mostly interested at the rates of growth over time, and at the time point when growth is stopped. It is also important to be able to understand the process under which different factors (genetic and environmental) influence the growth rate. This information might help in practice to manipulate the growth artificially to some extent by altering the growth rate. As biological growth is achieved due to cell division, it is in theory considered an exponential process (Zeide 1989), which is constrained by several factors. Biological systems inherently have two common components of gain and loss in metabolism commonly called anabolism and catabolism. Such processes have been considered in modeling biological systems, such as early work by Von Bertalanffy (Pienaar and Turnbull 1973). According to Pienaar and Turnbull (1973), an allometric model can be presented as: P = cQa (1) where P = one dimension of an organism, e.g. length of the femur Q = another dimension of the same organism, e.g. width of the skull a = allometric constant, which characterizes the particular kind of organism and environment c = parameter depending on initial conditions 10 This model assumes that the specific growth rate of P is constant proportional to the specific growth rate of Q, i.e. ⎟ ⎟⎠ ⎞ ⎜ ⎜⎝ ⎛ = Qdt a dQ Pdt dP (2) Von Bertalanffy’s model can be written as (Pienaar and Turnbull 1973): Y Y dt dY =α 2 / 3 −γ (3) where Y = size of the organism or population t = time α, γ = constants (α>0, γ>0), and 2/3 is the allometric constant. The ChapmanRichards function is a flexible model which conceptualizes the growth rate of an organism or a population as a resultant of an anabolic growth rate, which is a positive term, and a catabolic growth rate, which is a negative term (Clutter et al. 1983). This function introduces additional parameter in Bertalanffy’s function replacing the 2/3 allometric constant. A standard ChapmanRichards function is written as: Y Y dt dY =α β −γ (4) 11 where Y = size of the organism or population t = time α, β, γ = constants (α>0, 0<β<1, γ>0) This model attempts to explain how an organism or population grows over time in presence of environmental stress or pressure that tends to reduce growth. The ChapmanRichards function is popular in forest growth and yield modeling, which is empirical in nature. The model is derived as a generalization of the Bertalanffy’s growth model (Pienaar and Turnbull 1973, Yang et al. 1978). According to Yang et al. (1978), Weibull function can be modified to model different biological growth processes due to flexible nature of the function. Zeide (1989) described growth equations as a combination of power and exponential functions by taking diameter growth as an example. He compared (tested) five different forms of diameter growth models (ChapmanRichards, Gompertz, Logistic, Power decline and Weibull) in integral form, and found that power decline model, which he derived, performed better in predicting growth for various species and site qualities. He presented an elementary power function called power decline 1 as: at b Ydt dY = − (5) where Y = a measure of plant size 12 t = plant age a, b = constants (a, b > 0) The parameters a and b are interpreted as the initial relative growth rate and the rate of aging, respectively. Many biological systems are modeled using standard regressiontype techniques as described in Draper and Smith (1998), and Neter et al. (1996). Nonlinear regression techniques are more popular than standard linear regression to model reallife biological data (Bates and Watts 1988, Ratkowsky 1983). Interpretability, parsimony and validity beyond the observed range of the data are the main reasons why nonlinear models are often more appropriate than linear models for biological data (Pinheiro and Bates 2000). Haefner (1996) is an example of how complex biological systems can be modeled through computer simulation, instead of using empirical methods. Simulation is a popular method in systems modeling. However, many forest growth and yield modeling exercises are not systems modeling, so statistical models supported by measurements are still popular methods. However, process modeling is also gaining popularity in forestry (Baldwin et al. 2001, Johnsen et al. 2001), especially in organizations where strong multidisciplinary teams of forest biometricians, computer scientists and physiologists exist. There has been considerable work in forest growth and yield modeling including southern pines, but there is much less work in shortleaf pine growth studies despite its relative importance. Murphy (1986) is a good source on shortleaf pine growth studies prior to 1986. Later work will be described in the following sections. 13 Types of Forest Growth Models DensityFree StandLevel Models These models are based on full stocking assumptions. Full stocking, often called “normal”, is the density thought to maximize standing tree volume. However, the concept of normal density is subjective, and it is rarely observed in practice. “Normal” yield tables are developed from temporary plot measurements. For example as cited by Rose (1998), Sylvester (1938) developed yield tables based on data from 240 plots of loblolly pine in Louisiana and Arkansas, and compared the results with the yield tables of Miscellaneous Publication 50 (USDA Forest Service 1929). It was concluded that the yield tables of Miscellaneous Publication 50 were not very correct. But the conclusions drew comments from others such as F.X. Schumacher (1939), again as cited by Rose (1998) that the normality concept was subjective, which led to the discrepancy. The yield tables developed from average density instead of maximum density are called empirical yield tables. Another example of densityfree yield table is the work of Schumacher and Coile (1960), which is quoted by Lynch et al. (1999) for shortleaf pine. Schumacher and Coile (1960) developed “normal” yield tables based on data from 74 temporary plots in the Piedmont region of North Carolina. VariableDensity StandLevel Models In this category of growth and yield models, stand density is used as an explicit explanatory variable. These are considered an improvement over densityfree models, as growth and yield information can be obtained at varying levels of stand density. 14 According to Clutter et al. (1983), Schumacher (1939) developed the following variabledensity model: ln( ) ( ) ( ) 2 3 1 0 1 s V = β + β A− + β f S + β g D (6) where V = per unit area volume/yield, A = stand age, f(S) = function of site index, g(Ds) = function of stand density, and β0, β1, β2, β3 are model parameters This framework can be used to fit a standlevel variabledensity model. Clutter (1963) developed a compatible growth and yield model for loblolly pine. Compatible growth and yield models are developed with mathematical properties that permit yield models to be derived by integration of growth models (Davis et al. 2001). Murphy and Beltz (1981) developed the first variabledensity models for natural evenaged shortleaf pine. They used permanent plot measurements from Arkansas, Louisiana, Oklahoma and Texas. Murphy (1982) used the same set of data and basal area equation to predict sawtimber volume. Lynch et al. (1991) developed variabledensity stand volume equations for natural evenaged shortleaf pine of eastern Oklahoma and western Arkansas. They used the Schumachertype yield model with data from 191 permanent plots. The plots were established by the USDA Forest Service and Oklahoma State University in the period 15 between 1985 and 1987. The models are available for estimating merchantable cubicfoot, sawtimber cubicfoot, and boardfoot volumes per acre for natural evenaged shortleaf pine. The volume equations have the following general form: 1 2 0 V = β Bβ Hβ (7) where V = volume per unit area, B = basal area per unit area, H = average total height of dominant and codominant trees, and β0, β1, β2 are model parameters This equation predicts current volume. Future volume can be predicted by first estimating the future basal area and total height, and then by using the projected basal area and total height in the volume equation. The volume growth prediction is obtained by using the basal area growth projection equation along with the stand volume equation. The future basal area growth per unit area can be projected as a function of stand density and age (Lynch et al. 1991). The following shortleaf pine site index relationship for Ouachita Mountains has been developed by Graney and Burkhart (1973): [ ][1 exp( ( ) )] 4 0 1 2 3 H = a + a SI − − a + a SI × A a (8) 16 where H = average total height (ft) of dominant and codominant trees, SI = site index (ft at base age 50 years), A = stand age (years), and a0, a1, a2, a3, a4 are model parameters Since the parameters in this equation are intrinsically nonlinear, they must be fitted by nonlinear regression methods. A numerical method as explained in Gerald and Wheatley (1994) must be used to find a solution for site index (SI). Similarly, some other site index (height prediction) equations have been suggested for southern pines, including shortleaf, by Farrar (1973). Alternative methods of site index estimation for shortleaf as well as other southern pines are mentioned in Miscellaneous Publication 50 (USDA Forest Service 1929), as cited by Lynch et al. (1999). Zeide (2002) also developed techniques for growth modeling for evenaged stands in which stand density (number of trees per unit area and their average diameter) is explicitly included in the model. The modeling framework combines two sets of processes; one that models the growth of trees and the other that makes adjustments to the growth process as a result of competition among trees. Stand density is an important variable that is part of tree dynamics in a stand, which can be used to manage the stands with a specific objective. 17 Diameter Class Models Diameter class models provide growth and yield information by specified diameter classes, so these are more detailed than wholestand models. Stand volume can be computed by summing the volumes over diameter classes. Diameter class models often use a probability density function to allocate individual trees to diameter classes. The Weibull distribution is the probability distribution most widely used to model diameter distributions, because it is a flexible distribution that can accommodate a range of shapes and scales of distribution. Bailey and Dell (1973) first used the Weibull distribution to model tree diameter distribution. They found it to be very useful in modeling diameter distribution, although other distributions such as beta and exponential functions have also been used. The Weibull probability density can be presented as: ⎥ ⎥⎦ ⎤ ⎢ ⎢⎣ ⎡ ⎟⎠ ⎞ ⎜⎝ ⎛ − − ⎟⎠ ⎞ ⎜⎝ = ⎛ − c− c b X a b X a b f (X ) c exp 1 (a ≤ X < ∞) (9) = 0, otherwise where X = random variable such as diameter at breast height a, b, and c are parameters of the distribution Parameter a is called the location parameter, which is nonnegative (a ≥ 0) for diameter distribution, although it can take any real value theoretically. Parameters b and c are respectively called scale and shape parameters, and must always be positive (b, c > 0). 18 The Weibull is a very flexible probability distribution. Certain other distributions can be represented as special cases of the Weibull. For example, if a=0 and c=1, it is actually an exponential distribution. These results are found in standard statistical references such as Mood et al. (1974) and Wackerly et al. (2002). Use of the Weibull distribution in forestry has been well documented by Clutter et al. (1983) and Bailey and Dell (1973). To use the Weibull distribution for diameter class modeling, the model parameters need to be estimated, which is typically achieved by using maximumlikelihood method. Then the number of trees in a stand can be multiplied by appropriate diameter class probability to assign number of trees in diameter classes accordingly. Smalley and Bailey (1974) used the Weibull distribution to develop a diameter class model for shortleaf pine plantations. They used tree measurements from 104 shortleaf pine plantation plots in Tennessee, Alabama and Georgia. IndividualTree Models Models which use individual trees as the basic unit to predict stand growth and yield are called individualtree models. Such models can be either distanceindependent or distancedependent, depending on whether or not the spatial arrangement of individual trees is taken into account. Distanceindependent models predict tree growth either individually or by size classes. Growth prediction is made as a function of present size and standlevel variables such as stand age, site index, and stand basal area (Avery and Burkhart 2002). Individual tree data are aggregated after the model grows each tree under individualtree model, whereas the stand level model aggregates individual tree data into 19 stand variables before applying the growth model (Davis et al. 2001). So growth rate estimates are on an individual tree basis and stand level basis, respectively. That is, no individual tree growth estimate can be made from a stand level model, whereas stand level growth can be estimated from an individualtree level model. Therefore, diameter class and stand level information can be derived from individualtree models. DistanceDependent IndividualTree Model This model takes the spatial relationship between trees into account using a coordinate system. Initial stand conditions are either userspecified or computergenerated, and the growth of each tree is simulated as a function of its attributes, the site quality, and a measure of competition from neighbors as per the assigned tree locations (coordinate system). Yield is estimated by summing the individualtree values (either from tree volume or a taper equation), and by adjusting for some expansion factors (Avery and Burkhart 2002). This modeling approach requires extensive computer programming. PTAEDA2 is an example of distancedependent individualtree model developed for loblolly pine. This simulation system has two main subsystems: (1) one dealing with the generation of an initial precompetitive stand, and (2) another with the growth and dynamics of that stand. Various management scenarios for competition, fertilization and thinning options have been incorporated in the system (Burkhart et al. 1987, Avery and Burkhart 2002). Different diameter growth and height growth models were evaluated by Martin and Ek (1984) for red pine plantations. They observed some benefit of incorporating a competition index in the model for diameter growth model, but not in 20 height growth model. Similarly, Tome and Burkhart (1989) compared different measures of competition for Eucalyptus plantations under a spacing study in Portugal. Spurr (1962) developed a method of quantifying competition as point density by using a sampling angle gauge (prism). Another measure of point density as the potentially available area was developed by Brown (1965), as cited by Rose (1998). Crown competition factor (CCF) as defined by Krajicek et al. (1961) is a popular competition index (Avery and Burkhart 2002). Bullock and Burkhart (2005) evaluated spatial dependence in juvenile loblolly pine plantations using a simultaneous autoregressive model, and they found a significant dependence among neighboring stems. Distancedependent models are based on the concept of spatial variation. The literature of spatial statistics is huge (e.g. Cressie 1993, Ripley 1981, Webster and Oliver 1990). Spatial statistical methods are popular in geography, geology and ecology, but have started to be used in other fields. Units that are physically close tend to be similar, and correlation between the units decreases as the distance between the units increases. However, the plants or animals too close together might also compete for resources. For example, Mead (1967) has developed a mathematical model for estimating interplant competition. This kind of interplant competition might lead more dissimilar plants that are physically close. Therefore, the utilization of concept of spatial variability or competition measures depends on purpose and the scale of consideration. In summary, distancedependent models are more expensive and computationally intensive than distanceindependent models. Distanceindependent models are popular among forest professionals (Tome and Burkhart 1989, Davis et al. 2001). 21 DistanceIndependent IndividualTree Model Distanceindependent models do not require data on spatial location for each tree. That is, they ignore actual spatial distribution of trees to estimate individual tree growth, and assume that the trees are uniformly distributed over space. Typically, a distanceindependent model consists of the following components (Avery and Burkhart 2002): (1) a model for tree diameter (or basal area) growth (2) a model for tree height growth (or heightdiameter relationship to predict height as diameter is easily measured) (3) a tree mortality estimate Combining the three model components into a simulation system would provide different output scenarios with different input or management variables such as thinning/density regimes and site quality. These outputs might be useful to a forest manager in decision making. Distanceindependent models have comparatively less data demand than distancedependent models. In distanceindependent modeling, measures of competition often relate individual tree size to average tree size or to the size distribution. Two most commonly used competition measures are: (1) ratio of the quadratic mean diameter to diameter of an individual tree at breast height, and (2) cumulative basal area of trees larger than the subject tree. Basal area growth is a common response variable modeled in practice, because it is less sensitive to competition indices than diameter growth (Bella 1971). West (1980) compared different models using both basal area growth and diameter growth as response variables, and found that basal area growth was more 22 strongly related with explanatory variables than diameter growth. Depending on the type of model, different stand level as well as individualtree level variables can be used as explanatory variables. For example, Lynch et al. (1999) used two stand level variables (stand age and stand basal area), and two tree level variables (ratio of quadratic mean diameter to DBH of individual tree and tree basal area) to model annual basal area growth of natural evenaged shortleaf pine in Oklahoma and Arkansas. Distanceindependent models can be either (1) composite models of tree growth as a function of tree measurements, site properties and stand characteristics or (2) potentialmodifier growth functions. The potentialmodifier function models tree growth as a theoretically possible maximum growth, which is expressed as a function of tree characteristics. The potential is then adjusted for the modifier, which is expressed as a function of a stand and tree characteristics including competition (Rose 1998). The Forest Vegetation Simulator (FVS), previously called PROGNOSIS, is a simulation framework for composite distanceindependent modeling (Davis et al. 2001, see also www.fs.fed.us/fmsc/fvs). The composite growth model is appropriate when it is difficult to obtain dominantage and site relationships for mixedspecies stands. In such cases, a potentialmodifier growth function is difficult to apply (Wykoff 1990). FVS can be used to simulate growth for either basal area or diameter from such models, and modeling could be either composite or potentialmodifier type in nature. The following composite model was fitted by Wykoff (1990) to inventory and regeneration study data sets for 11 conifer species in the northern Rocky Mountains. b SL b EL b EL b CR b CR b BL dbh b CCF dds b b dbh b dbh b SL ASP b SL ASP b SL 11 12 2 9 10 2 7 8 2 6 3 4 5 2 0 1 2 / ln( 1) ln( ) ln( ) [cos( )] [sin( )] + + + + + + + + = + + + + + (10) 23 where ln(dds) = natural logarithm of 10year periodic change in squared diameter (inches) dbh = tree diameter outside bark at breast height (in.) BL = total basal area in trees with larger dbh than the subject tree (ft2/ac) CR = ratio of live crown length to total tree height CCF = crown competition factor SL = average slope percent for the stand ASP = average aspect for the stand (radians) EL = average elevation for the stand (100 ft) bi = estimated regression coefficients; b0 dependent on habitat type and location, b2 dependent on location, and b12 dependent on habitat type This model incorporated the site characteristics (EL, SL and ASP for each combination of habitat and forest type), competition indices (CR, CCF and BL) and tree diameter. The model is specific to a habitat and forest type (location) as quantified by a separate intercept for each combination of habitat and forest type. The model was also validated using data based on permanentplot measurements. On the other hand, a potentialmodifier function is suitable when we can easily obtain dominantage and site relationships, for example evenaged stand of a single species, in which siteindex is often used. This is one of the reasons why evenaged natural shortleaf pine has previously been modeled based on this approach. There are two components in this framework, a potential and a modifier. The potential quantifies the 24 theoretical maximum growth (diameter or basal area) of a tree which is grown without any competition for resources, e.g. an opengrown tree. The modifier component makes adjustment to the potential in light of the competition factors that exist in the field, e.g. competition for light based on crown size, spacing etc. Since a potentialmodifier model has two components, parameter estimation can be achieved in one of the two ways: (1) fit potential function and then estimate modifier by fixing the potential to a constant, or (2) fit both potential and modifier functions simultaneously. The second approach may be a better one as its estimation procedure considers both the functions together, which is what happens in tree growth. Tree growth is dynamic interplay between a potential to grow and slowing down of the growth due to environmental restrictions. Shifley and Brand (1984) provide a good example of how a potential function can be constrained to follow a biological principle. They used a modified ChapmanRichards function (Pienaar and Turnbull 1973) to restrict the tree growth to a biologically possible total size. When a standard ChapmanRichards equation is set to zero, and solved for organism/tree size (to find the maximum as the ChapmanRichards equation is the first derivative of size with respect to time), the resulting tree size would be a maximum biologically potential size (Shifley and Brand 1984). Potential/Asymptote (A) = β λ α − ⎟⎠ ⎜ ⎞ ⎝ ⎛ 1 1 (11) 25 Any measure of tree size, such as basal area, could be used. The derivative form of ChapmanRichards function presents the variable in growth form, so if tree size is represented by basal area, then the response variable would be basal area growth. The modified ChapmanRichards model obtained by Shifley and Brand (1984) was of the following form: =αY β −αY / A(1−β ) dt dY (12) One needs to insert an appropriate value for A (e.g. maximum possible tree basal area) to revise the estimates of α and β. According to Shifley and Brand (1984), such a value for A can be obtained from the National Register of Big Trees (American Forestry Association 1982 as cited by Rose 1998). Murphy and Shelton (1996) also used a ChapmanRichards function for potential of the form Potential = [1 exp( )] 1 2 t β − β B (13) where Bt = average tree basal area during the growth period βi = parameters, i = 1,2 They experienced a severe convergence problem with the nonlinear leastsquares estimation. Some other examples of potential functions are found in Amateis et al. (1989) and Smith et al. (1992). Amateis et al. (1989) proposed the idea of estimating potential 26 growth from open grown trees without any competition effects. This idea motivated Smith et al. (1992) to estimate maximum potential growth for shortleaf, longleaf and loblolly pines. A modifier function takes the growing conditions into account to adjust the potential to estimate net growth. Several examples of modifier functions are available in forestry literature. One that is most relevant to shortleaf pine growth modeling is that of Hitch (1994). He considered two modifier functions: (1) A variant of Shifley’s (1987) functions for TWIGS and STEMS for the Central States, and (2) A modified logistic function developed by Murphy and Shelton (1996). The logistic function is a useful modifier, since it is restricted between 0 and 1, and can include several independent variables. The logistic modifier has a form: Modifier = 1 exp( ) 1 3 1 4 s 5 q + β B +β B +β D (14) where B1 = basal area in trees larger than or equal to the diameter of the subject tree Bs = stand basal area Dq = quadratic mean diameter βi = parameters, i = 3,4,5 One advantage of the logistic modifier function is that we can add more explanatory variables to improve the fit index of a model, but still it is within the range of 0 and 1. So a complete growth model would take the form 27 Predicted growth = Potential growth × Modifier That is, the Murphy and Shelton (1996) model would be: [ ] 1 exp( ) 1 exp( ) 3 1 4 5 1 2 s q t B B D B β β β β β + + + − (15) The model terms are defined as before. They used this equation to fit an individualtree basal area growth model for loblolly pine A distanceindependent individualtree model for shortleaf pine for the Ouachita Mountains was first developed by Hitch (1994) using the potentialmodifier function approach. He used a data set for natural evenaged shortleaf pine in Oklahoma and Arkansas from a cooperative study between OSU Department of Forestry and USDA Forest Service. Bitoki et al. (1997) used data from a study for unevenaged continuous forest inventory (CFI) plots for shortleaf pine from the same Oklahoma and Arkansas region to develop a basal area growth model. They also used a potentialmodifier function approach. They first estimated the potential growth separately by excluding a parameter from Hitch’s (1994) equation. Then the parameters in the modifier function were estimated by keeping the potential constant, so all parameters were not estimated simultaneously. A simultaneous estimation technique for estimating the parameters in a potentialmodifier form has been used by Murphy and Shelton (1996) for loblolly pine. Hasenauer et al. (1998) used simultaneous regression methods to analyze individual tree data for Norway spruce from Australia. They found strong crossequation 28 correlations, and the simultaneous estimation method was more efficient than separate estimation by ordinary least squares. Similar estimation method in a nonlinear system was examined for white spruce data by Huang and Titus (1999). An example of simultaneous estimation for heightdiameter model fitting is Omule and MacDonald (1991). Rose (1998), and Rose and Lynch (2001) used a seemingly unrelated regression technique to fit a basal area growth model for shortleaf pine as the technique would help model correlated errors. Lynch and Murphy (1995) also used seemingly unrelated regression to fit a heightdiameter model for natural evenaged shortleaf pine. Lynch and Murphy (1995) used the following general framework for modeling height growth in natural, evenaged shortleaf pine stands: ˆ ( , ˆ , , , ,β ) i i Di i Di Li h = f A H D S D (16) where i hˆ = predicted individual tree height at time i i A = stand age at time i Di Hˆ = predicted average height of dominants and codominants at time i (could be obtained from site index curves in evenaged stands) i D = individual tree dbh at time i Li D = dbh distribution location identifier at time i such as maximum dbh or quadratic mean dbh Di S = an expression of stand density at time i β = vector of parameters 29 Lynch and Murphy (1995) also provide a comprehensive review of models relating tree height to DBH and age (or time). The Hitch’s (1994) model was later slightly modified by Lynch et al. (1999) by including additional data available from later plot measurements. They developed a complete growth and yield prediction system including models for basal area growth, heightdiameter relationship, crown characteristics and mortality functions. The estimates are now being used in the shortleaf pine stand simulator, SLPSS (Huebschmann et al. 1998). They fitted the following model: ( ) i s i i b i M b i i b b B b A b R b B b B b B B G +ε + + + + + − = − 1 exp( ) / 3 4 5 6 7 1 1 1 2 2 (17) where Gi = annual basal area growth (ft2) of tree i Bi = basal area (ft2) of tree i A = stand age (years) Ri = ratio of quadratic mean stand diameter to the dbh of tree i Bs = stand basal area (ft2/ac) BM = 7.068384 ft2 (the maximum expected basal area for a shortleaf pine in managed stands) b1, b2, ….., b7 = parameters i ε = random error 30 Lynch et al. (1999) used the following nonlinear model for individual tree height given by Lynch and Murphy (1995): i ( D ) ( i ) i H − = β H − β1 −β D −β 3 +ε 0 2 ( 4.5) 4.5 exp (18) where i H = total height (ft) of tree i i D = dbh (in) of tree i D H = dominant height as per Graney and Burkhart (1973) 0 β , 1 β , 2 β , 3 β = model parameters i ε = random error Beaumont et al. (1999) evaluated the generalized method of moments for modeling dominant height through site index in black spruce. They found that the new method of simultaneous estimation to be better than traditional twoequation approach. In the traditional method, site index is first estimated as the average height of dominant trees at base age, and then dominant height is predicted as a function of age and site index as done by Graney and Burkhart (1973). UnevenAged Stand Modeling Several basic ideas of evenaged stand modeling are relevant to unevenaged stand modeling also. Furthermore, this dissertation deals with data from evenaged shortleaf stands only. Therefore, only a brief review indicating the differences is presented. 31 The most important difference in unevenaged stands modeling is that the concept of site index and stand age cannot be easily used since there are different ageclasses in these stands. And at least in principle, modeling techniques for unevenaged stands can also be applied to evenaged stands (Avery and Burkhart 2002). Key ideas on unevenaged stands modeling are presented in a classic work by Moser and Hall (1969), and Moser (1972). They developed a modeling framework in which yield is expressed as a differential function of elapsed time from a given initial condition. A Markov Chain approach for predicting diameter distributions in unevenaged stands is provided by Bruner and Moser (1973). In such a framework, future diameter distributions, number of surviving trees, number of dead trees, and number of harvested trees are predicted using inventory data under a Markov process, which is a stochastic process for modeling an uncertain event. Lynch and Moser (1986) developed a technique of predicting stand tables for two species groups based on a differential equations approach. A matrix model approach to modeling unevenaged forest management has been reported by Buongiorno and Michie (1980). In this approach, a matrix model of forest growth similar to the Markov model of Brunner and Moser (1973) is combined with linear programming techniques to answer some economic questions. Following the idea of Buongiorno and Michie (1980), Schulte and Buongiorno (2004) developed a growth and yield model for naturallyregenerated shortleaf pine forests including hardwoods from southern US. Their work was based on repeated measurements on forestry inventory and analysis plots. According to Smith (1986), industrial forestry organizations and USDA Forest Service began a basic shift from uneven to evenaged stands in about 1970. Many 32 commercial activities involve evenaged stand management, but some argue against the ideas of evenaged management because of ecological and environmental reasons. Murphy and Farrar (1988) developed a standlevel growth and yield model for unevenaged loblollyshortleaf pine using inventory data. Guldin and Baker (1988) carried out yield comparisons for both evenaged and unevenaged loblollyshortleaf pine stands. They found that total merchantable cubicfoot yields were highest for conventionally managed evenaged plantations. AttaBoateng and Moser (2000) developed a compatible growth and yield modeling system for managed mixed forests of the tropical rain forest region. An individualtree growth and yield model for unevenaged shortleaf pine stands was developed by Huebschmann et al. (2000). Forest Growth and Yield Simulation Systems Different models/systems have been reported in the literature for simulating forest growth and yield. The USDA Forest Service has developed a comprehensive system of vegetation simulation called Forest Vegetation Simulator (FVS). The FVS system simulates vegetation change based on forest inventory data or stand examination data about the forest, stand, and trees (USDA 2002). A brief summary of some simulation systems is presented in Table 2.1. 33 Table 2.1. Some selected individual tree growth and yield simulation models/systems available for the United States (Adapted from Davis et al. 2001) Simulation System Description The Forest Vegetation Simulator (FVS) PTAEDA2 ORGANON CRYPTOS, CACTOS SLPSS Previously called PROGNOSIS, developed by USDA Forest Service for nationally supported framework for forest growth and yield modeling, more than 20 variants nationwide, many models, developed by universities and research institutions, such as TWIGS, STEMS, or GENGEM are now part of FVS (www.fs.fed.us/fmsc/fvs) Distancedependent system capable of modeling the growth of loblolly pine plantations, has also been linked with MAESTRO, a process model, developed and supported by Virginia Tech Developed for western Oregon conifer and hardwood types, developed through a public and industry cooperative and is supported by Oregon State University CRYPTOS developed for the California coastal redwood and fir forests, and CACTOS covers the California mixed conifer forests, both developed through a public and industry cooperative and are supported by University of California Berkeley Shortleaf Pine Stand Simulator, which simulates the growth of natural shortleaf pine stands in western Arkansas and eastern Oklahoma, developed from a cooperative study by Oklahoma State University (OSU), the US Forest Service Southern Station, and the Ouachita and Ozark National Forests, supported by the Department of Forestry at OSU. The following sections present a review of mixed modeling techniques that form the foundation of this dissertation work. 34 Mixed Models Fixed vs. RandomEffects When we are interested only in the levels or classes observed in the study sample, then the estimates are fixedeffects. On the other hand, if the classes or levels observed are a sample from a population, and we are not primarily interested in the particular levels observed, then this prompts the randomeffects approach. In this approach, we are interested in the variability pattern in the population itself from which the sample was observed (Snedecor and Cochran 1980, Laird and Ware 1982). For example, when a forest growth model is developed from a fixedeffects approach for the stands observed only, then the model is fixedeffects model, and the results are applicable only to the stands observed in the sample. If the forest growth model needs to be generalized to a forest from which a sample of stands was observed, plot randomeffects should be considered. Basically, we are not interested in the particular stands. Therefore, a growth model in which stands/plots have randomeffects would be desirable. However, many traditional/regular forest growth models have been based on a fixedeffects approach. Observations classified/grouped/clustered by a certain categorical variable are often analyzed by using analysis of variance (ANOVA) and analysis of covariance (ANCOVA) techniques. However, these techniques are relevant only when the number of categories is relatively small, and we are interested in comparisons of the observed categories only. This represents a case of fixedeffects. In many cases, we are interested in considering the categories as a representative random sample from a population. A fixedeffects analysis would not be appropriate in such cases, instead a randomeffects 35 approach should be used that estimates the variance components for different levels of a population (Longford 1993). In random coefficient models, the coefficients for each class (intercept and slope for each combination of different levels) quantify deviation from a defined population regression model (Tao 2002). A mixed model is an extension of a randomcoefficient regression model in which fixedeffect parameters are also included. In other words, when a model has both fixed and randomeffects, then the model is called a mixed model. A mixed model typically has more than one error level, so such a model is also called a multilevel model. If classification factors form a hierarchy (nested structure), then the model is also called a hierarchical model. Gumpertz and Pantula (1989) provide a simple way of making inference based on mean of the individual coefficients in random coefficient regression models. They interestingly contrast the longitudinal data in biological science studies with those of economic studies data. Their appreciation of agricultural and biomedical studies with a small number of repeated measures on large number of experimental units/subjects is important in the context of our project. This is different in nature compared to economic and meteorological data in which there are often multiple measurements/observations for a long period of time (time series) on relatively small number of subjects. The mathematical details for mixedmodeling will be addressed in Chapter IV. Longitudinal Data/Repeated Measures Data resulting from repeated observations/measurements on the same unit/subject over time are often called longitudinal data. Such data are common in practice, especially 36 in medicine, biology and economics (Diggle et al. 1994, Gregoire et al. 1995, Davis 2002). Mixed models are an increasingly popular method for analysis of longitudinal data because of their flexibility in handling different data structures (irregularity and unbalancedness) and in satisfying model assumptions. There are other alternative analysis approaches such as a multivariate ANOVA (MANOVA), which are not as flexible and powerful as the mixed modeling approach. MANOVA can not handle irregular and unbalanced data properly, and unbalanced data are very common in practice. Another alternative is to follow a splitplot in time approach (Kuehl 2000). The splitplot approach to repeated measures analysis would assume that observations/treatments within a unit are randomized. That is, different observations over time within the same subject/unit would be assumed random, which is not a very realistic assumption (Diggle et al. 1994). Instead, we would prefer to model withinunit correlation (serial correlation in repeated measurements, and spatial correlation among trees within a plot). Therefore a model that recognizes a proper covariance structure for errors would be a better approach to analyze such data (Gregoire et al. 1995). Laird and Ware (1982) and Ware (1985) developed a randomeffects model approach for analyzing longitudinal data that would model serial correlation on the same subject. A twolevel model was developed in which grouping of observations would be made by subjects. Such models could be fitted using either maximum likelihood or residual maximum likelihood (empirical Bayes) methods. Linear mixedeffects models for longitudinal data are described by Chi and Reinsel (1989), and Verbeke and Molenberghs (2000). They assessed maximum likelihood and restricted maximum likelihood methods of estimation. They also used a firstorder autoregressive model to 37 account for withinindividual errors resulting from longitudinal measurements on the same individuals. Crowder and Hand (1990) is a standard reference for repeated measures analysis, including responses of categorical nature. They deal with standard splitplot analysis, MANOVA and antedependence analysis to account for consecutive measurements on the same unit. Diggle et al. (1994) described principles and methods for analysis of longitudinal data. They dealt with both quantitative as well as qualitative data types, and provided a significant amount of information on randomeffects models. Major theoretical and computational contributions to the nonlinear mixedmodeling approach were made by Vonesh and Carter (1992), and Davidian and Giltinan (1995). Similarly, researchers such as Lindstrom and Bates (1990), and Pinheiro and Bates (2000) developed algorithms and software that allowed practical usage of the models. Reiczigel (1999) reviewed methods for analyzing repeated measurements from designed experiments in which the main objective is to compare treatments as precisely as possible. VanLeeuwen et al. (1996) presented a mixed model that incorporated random trends through time, and also allowed correlations among observations at the same time. Pinheiro and Bates (2000) developed the nlme library in SPlus that has made a substantial impact, especially on use of nonlinear mixed models in practice. They followed the approach of Laird and Ware (1982) to develop these computational tools. SAS PROC NLMIXED has been considered limited for nonlinear mixed modeling (Tao 2002, Littell et al. 1996), although a randomeffect with limited options for covariance structure can be used. Littell (2002) compared analysis of variance (ANOVA) versus residual maximum likelihood (REML)/generalized least squares (GLS) methods for 38 analyzing unbalanced mixed model data, and found that ANOVA was popular method prior to the early 1990s because of its simplicity and lack of easily available computing tools for likelihood based methods. However, more powerful likelihood based methods such as REML and GLS have started gaining popularity due to the increasing availability of software since the early 1990s. Davidian and Giltinan (2003) provide an overview of nonlinear models for repeated measurements data. Correlated and/or Heterogeneous Errors Correlation among successive observations over time on the same unit is discussed in a framework of time series analysis by Diggle (1990). Similarly, spatial correlation among observations is dealt with by Cressie (1993), Ripley (1981), and Webster and Oliver (1990). However, these references do not work in the context of mixed modeling. Both inter and intraindividual variations in nonlinear mixed modeling for repeatedly measured observations are discussed and reviewed by Davidian and Giltinan (1995, 2003). Wolfinger (1996) reviewed and evaluated different covariance structures for modeling heterogeneity in repeated measures data. The variancecovariance structures included heterogeneous versions of the compound symmetry and firstorder autoregressive structures, the HuynhFeldt structure, the independentincrements structure, correlated random coefficients models, the firstorder antedependence model, and a simplified factoranalytic construction. Use of an appropriate variancecovariance structure would avoid data transformation allowing parameter interpretability and more accurate inference. Lin et al. (1997) discuss linear mixed models with heterogeneous 39 withincluster variances. Methods are shown to predict clusterspecific random effects variances. Application of Mixed Models in Forestry Ferguson and Leech (1978) and West et al. (1984) were among the earlier workers to recognize the problem of temporal correlation in forestry data due to multiple measurements from individual sampling units. They mentioned the difficulty in regression analysis with regular assumptions for such data, and have discussed and recommended alternative approaches for data analysis. They discussed the availability of statistical theory to solve these problems. However, these discussions were mostly from a designed experiments perspective, so their direct application was not appropriate to many forest growth modeling datasets. Ferguson and Leech (1978) developed twostage yield models. In the first stage, they used ordinary least squares, whereas a generalized least squares method was used in the second stage to improve the estimation since the errors in the second stage violated the assumptions of ordinary least squares. West et al. (1984) also found a twostage regression model to be the most suitable method to analyze repeated measurements data common in forestry problems. They demonstrated the problem with a study on Eucalyptus in which plot was considered a sampling unit. In this study, a sample of trees was selected from each plot, and measurements such as diameter at breast height were taken. With motivation from the work of Ferguson and Leech (1978), Gregoire (1987) evaluated four different covariance structures for basal area yield models. He compared 40 the following covariance structures: (a) uncorrelated and homoscedastic plot and time effects; (b) autoregressive time effects; (c) uncorrelated and heteroscedastic plot effects and autoregressive time effects; and (d) correlated and heteroscedastic plot effects and autoregressive time effects. It was found that method of ordinary least squares was nearly always best; so further work was recommended. A random stand and tree parameters approach was used to model height in slash pine (Pinus eilliottii Engelm.) by Lappi and Bailey (1988). This was presented as a good alternative to the conventional site index approach to height prediction. They fitted the following model: hki(t) = μ(t) + bk(t) + eki(t) (19) where hki(t) = dominant height for tree i in stand k at age t μ(t) = population mean height at age t over all stands bk(t) = random stand effect at age t eki(t) = random tree effect and is uncorrelated with bk(t) They used the well known Richards’ (1959) equation to model the mean height, and then estimated the parameters of the covariance structure using observed residuals as if they were the true residuals. Gregoire et al. (1995) is an excellent reference on mixed modeling issues in forestry. They deal mostly with linear mixed models, but discuss the issues of importance of nonlinear mixed models in forestry research. They have fitted standlevel mixed 41 effects models with two data sets consisting of eastern white pine and Douglasfir with a mixed model approach having plot randomeffects. Due to the availability of repeated measurements, randomeffects for stands were fitted even though the response was a stand level variable. Errors that are temporally and/or spatially correlated have also been modeled. This project will follow basic ideas from this publication. Lack of easily available and userfriendly software was mentioned by Gregoire et al. (1995) to be an important reason that there was still not much application of nonlinear mixedeffects models in forestry. Gregoire and Schabenberger (1996a) used a nonlinear mixedeffects approach for modeling individualtree cumulative bole volume of sweetgum from east Texas. They later modeled cumulative bole volume by taking spatial correlation between sections of a bole into account (Gregoire and Schabenberger 1996b). The objective was to express cumulative bole volume as a smooth curve while allowing for fluctuations within and among trees as much as possible, yet allowing the curve’s applicability to all trees of the species with similar morphological characteristics. They used a nonlinear mixedeffects model by comparing restricted maximum likelihood (REML) and generalized estimating equations (GEE), and found that GEE to be simpler (computationally less intensive) than REML. REML is a normaldistribution based estimation method, whereas GEE is a semiparametric approach for multiple measurements per subject (repeated measurements). They used SAS PROC IML and MACRO for the modeling work. Candy (1997) estimated parameters in a sigmoidal forest yield model using composite link functions with random plot effects in a generalized linear mixedeffects 42 model framework. A generalized linear mixedeffects model can be considered as a special case of nonlinear mixedeffects models (Tao 2002). Lappi (1997) analyzed two jack pine data sets, one from plantations and the other from naturally regenerated stands. Both the data sets were longitudinal in nature, and they were analyzed using the same model structure. The height/diameter curve parameters were partitioned into an agedependent trend (population mean), a random stand effect, and a random time effect. A good practical use of such a model would be calibration with which the random stand and time effects could be predicted given some additional measurements without needing to make detailed observations as would normally be done for new stands. Tasissa and Burkhart (1998) used stem analysis data from permanent plots for loblolly pine to evaluate thinning effects on form exponent, a measure of stem form. They modeled form exponent in response to thinning and by accounting for correlation among withintree observations with the firstorder autoregressive method. Apiolaza et al. (2000) used variance components estimation to analyze genetic data for Pinus radiata from a progeny test. Fang and Bailey (2001) modeled dominant height growth of slash pine from a study with several silvicultural treatments installed in Georgia and north Florida. They reparameterized the threeparameter Richards model to predict dominant height growth in presence of silvicultural treatments, such as chopping, fertilization and burning, using a nonlinear mixedeffects model approach. The study design was a splitplot, with soil type as mainplot factor and silvicultural treatments as subplot factor. They claim that this would be a better approach to predict dominant height growth rather than using the site index approach. Fang et al. (2001) used a simultaneous equation system to develop stand 43 level mixed models for growth and yield of slash pine with two components: basal area model and total volume model. Another important example of mixed modeling in forestry is Hall and Bailey (2001). They modeled forest growth variables using multilevel nonlinear mixed models with data from a loblolly pine spacing study in Georgia. They used a linearization approach of estimating equation rather than maximum likelihood or restricted maximum likelihood method. They included randomeffects for both plots and trees. They found that the multilevel nonlinear mixed model approach provided several advantages over traditional forest growth modeling methods. Guilley et al. (2004) modeled averaged ring density with individual tree random effects for sessile oak in France. It was concluded that there was no evident effect on wood density due to changing environment and forest management type when ring width and cambial age were constant. Garber and Maguire (2003), and Leites and Robinson (2004) have applied the mixed modeling method to develop taper equations. A nonlinear mixed model with autoregressive error structures was used to model stem taper of ponderosa pine (Pinus ponderosa Dougl. ex Laws.), lodgepole pine (Pinus contorta Dougl. ex Loud.), and grand fir (Abies grandis Dougl. ex D. Don) by Garber and Maguire (2003). Leites and Robinson (2004) improved the Max and Burkhart’s (1976) taper equation with crown dimensions and individual tree randomeffects for loblolly pine. Mehtätalo (2004) used a mixed model with height and diameter data of longitudinal nature for Norway spruce (Picea abies (L.) Karst.). The Korf growth curve was used as a basic growth function for the relationship. The model could be calibrated for a new stand with substantially less observations than one would require for fitting a 44 completely new model. It was concluded that the growth pattern of a stand was dependent on mean tree size in the stand but not on stand age. This may or may not be true in other species. Zhang and Borders (2004) used a mixedmodel to estimate tree component biomass for managed loblolly pines with an allometric approach. They found that the percent of stem biomass increased with age while the opposite was the case for foliage and branches. It was also found that cultural treatments affected the proportional allocation among various tree compartments. Calama and Montero (2004) used a mixed model approach to model individualtree heightdiameter relationship for stone pine (Pinus pinea L.) in Spain. Lynch et al. (2005) used a randomparameter (mixed model) approach to analyze heightDBH data for cherrybark oak (Quercus pagoda Raf.) from east Texas. They fitted a similar model as reported by Lappi (1991). That is, natural logarithm of difference between total height and breast height was modeled. They fitted a linear model with randomeffects for stands, which could be used for calibration with minimum observations from a new stand. The fitted model is as follows: ki ki k k ki ki H − bh = + D− + + D−0.9 + e 0 1 0.9 0 1 ln( ) β β α α (20) where ki H = total height of tree i in stand k bh = breast height ki D = dbh of tree i in stand k 0 β , 1 β = fixed effect parameters 45 0k α , k 1 α = random effect parameters with 0 mean for stand k ki e = random residual error for tree i in stand k They also present an example of how this model could be calibrated for new stands. Uzoh and Oliver (2006) used a composite approach (as described by Wykoff 1990) for height increment modeling for managed evenaged stands of ponderosa pine (Pinus ponderosa Dougl.). They used permanentplot measurements from the western US. Random effects for locations, plots and trees were used in the model, and an autoregressive covariance structure was used to model the repeated measurements. They used a linear form of mixed model after transforming the periodic annual height increment to a logarithmic scale. They selected the following model: E[ln(PAIH)] = b0 + b1 ln(DBH) + b2(DBH)2 + b3SIM + b4SL[cos(ASP)] + b5ELEVA+ b6SDI + b7BAL + ( ) ( ) ˆ l j l ik jl h + e + e (21) where E[ln(PAIH)] = expectation of the natural logarithm of 5year periodic annual height increment (m) DBH = initial diameter at breast height (cm) SIM = Meyer’s site index (m) SL = average slope for the stand (%) ASP = average aspect for the stand (radians) ELEVA = elevation for the stand (m) SDI = stand density index (trees/ha) 46 BAL = basal area in larger trees (m2/ha) divided by dbh of subject tree l hˆ = fixedeffect of the lth location j(l ) e = random error for plot j within location l (assumed to have mean 0 and variance 2 l σ ) ik ( jl ) e = random error for measurement k on tree i within plot j and location l (assumed to have mean 0 and varianceσ 2 ) bi = parameter estimates They found that the height increment model displayed a unimodal and positively skewed shape for tree growth process, which was as per expectation. Site index (SIM) was found to have more effect on height growth than other variables. Vázquez and Pereira (2005) used a linear mixed model to explain variation in cork weight of cork oak (Quercus suber L.) from Portugal. They decomposed the random effects in regional, plot and treelevel that would allow better estimates for fixed effects. Budhathoki et al. (2005) also used a linear mixed model approach for preliminary analysis of basal area growth of shortleaf pine using a portion of data reported by Lynch et al. (1999). The natural logarithm of annual basal area growth was modeled with a dataset consisting of plot measurements at three points in time. In the analysis, they used a compound symmetry covariance structure and plot randomeffects. Their fixedeffects parameter estimates were similar in magnitude to those reported by Lynch et al. (1999), but the standard errors were reduced as a result of using mixed model. The linear mixed model approach discussed above for Lynch et al. (2005), Uzoh and Oliver (2005), Budhathoki et al. (2005), and others was partly for computational 47 convenience that would be available in SAS PROC MIXED, although the inherent nature of such responses would be nonlinear. With the increasing access to userfriendly software such as SPlus nlme library (Pinheiro and Bates 2000), nonlinear mixedmodeling is gaining popularity. Hall and Bailey (2001) and Fang and Bailey (2001) as indicated above also used nonlinear mixed modeling. A nonlinear mixedeffects model was developed for height growth for Eucalyptus plantations in Brazil by Calegario et al. (2005). They used the nlme library available in SPlus to model the dominant height. They modeled the dominant height as a logistic function of age and with plot randomeffects. The fitted logistic function was: [ ] ij ij i i i ij a HD ε φ φ φ + + − − = 2 3 1 1 exp ( ) / (22) where HDij = dominant height (m) for ith plot on time j aij = age (years) for ith plot on time j εij = random error; εij ~ N (0, σ2) Фi = ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ i i i 3 2 1 φ φ φ = ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ 3 2 1 β β β + ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ i i i b b b 3 2 1 = β + bi (23) bi ~ N (0, ψ) 48 They used repeated measurements from 115 permanent plots measured three to 10 times between 1992 and 2001. The three parameter logistic function along with plot randomeffect was found to fit the data well. A multilevel nonlinear mixedeffects model was fitted for modeling stand volume growth on loblolly pine, and for modeling four silvicultural treatments by Zhao et al. (2005). They found that vegetation control and fertilization resulted in the largest volume growth. The nonlinear mixed model approach was reported to be useful to handle the unbalanced and incomplete repeated measurements. A similar methodology was used by Jordan et al. (2005) to analyze earlywood and latewood microfibril angle data on loblolly pine. Rose et al. (2006) used a multilevel approach to estimate individual tree survival on loblolly pine using complementary loglog link function. They estimated randomeffects at both plot and tree level, and compared effects of four cultural treatments on tree survival. Overall, if we fit a model for each plot separately, it would often result in overparameterization, whereas ignoring the grouping by plot we would lose much of the variability. A mixed model provides a balance between these two extremes (Pinheiro and Bates 2000, Calegario et al. 2005). This review shows that mixed models have been increasingly used in forest growth and yield modeling. However, no peerreviewed published work on shortleaf pine growth modeling applying mixed models has yet been done. This provides an opportunity to apply this new technique in shortleaf pine, and to determine whether there is an advantage over the modeling techniques that are currently in use. 49 Estimation/Computational Methods Corbeil and Searle (1976) describe the foundation of estimation with restricted or residual maximum likelihood (REML) procedure for mixed modeling. They also developed an algorithm to implement a mixed model. Although mixed modeling literature in statistics existed earlier than Corbeil and Searle (1976) (for example they cite Hartley and Rao (1967)), application in forestry appears to have begun only in late 1980’s and early 1990’s. Generalized estimating equations (GEE) were used by Gregoire and Schabenberger (1996b), and Hall and Bailey (2001). In SPlus nlme library, the default estimation method for nonlinear mixedeffects modeling is maximum likelihood (ML), whereas the default method for linear mixedeffects modeling is REML. The library enables to fit mixed models with correlated and/or heterogeneous errors (Pinheiro and Bates 2000). 50 CHAPTER III DATA AND DATA MANAGEMENT This chapter describes the data used for this dissertation. The details on plot establishment, sampling, tree measurements, data management and summary statistics for important variables are presented. Study Area and Establishment of Plots The Department of Forestry at Oklahoma State University cooperated with the USDA Forest Service Southern Research Station, and the Ouachita and Ozark National Forests to establish the study plots for shortleaf pine. The objective was to study growth and yield of shortleaf pine in evenaged natural stands. The study was called Study 48. The plots were established from fall 1985 to fall 1987 in eastern Oklahoma and western Arkansas. The study area is presented in Figure 3.1. To quantify the species distribution within a stand in terms of proportionate coverage of shortleaf pine and other species, basal area was assessed using a 10factor prism. Only five shortleaf trees that were classified either dominant or codominant as per the definition of Avery and Burkhart (2002) were selected for height and age 51 determination. This information was used to adopt the design criteria of agedensitysite index combinations as given in Table 3.1 for selecting plots (Lynch et al. 1999). Figure 3.1. Shortleaf pine growth study locations in eastern Oklahoma and western Arkansas (Lynch et al. 1991) The original plots were established based on the design criteria, and the following stand properties (Rose 1998): (1) naturally regenerated stand with at least 70% shortleaf pine basal area with trees having DBH 0.6 inches and larger, 52 (2) stand with dominant and codominant trees having maximum age range of 10 years or less, (3) stand having site index variation of less than 10 feet, (4) evenaged stand with no clumping and no more than two age classes per plot, and (5) reasonably free from insect, disease or fire damage and no harvesting in the last five years. Table 3.1. Study design for naturally regenerated evenaged shortleaf plots installed from 1985 to 1987 with thinning and herbicide treatment in eastern Oklahoma and western Arkansas (Lynch et al. 1999) Variable Class range Class midpoint Basal area (ft2/ac) 1645 4675 76105 106135 30 60 90 120 Site index (ft at base age 50) <56 5665 6675 >75 50 60 70 80 Stand age (yr) <31 3150 5170 7190 20 40 60 80 Circular fixedarea plots of size 0.2 acres (52.7foot radius) were established. The plots were surrounded by a 33foot isolation boundary, which was painted white. A buffer boundary of 33foot was marked that surrounded each contiguous group of plots. The buffer boundary was painted blue. The study plots were selected using aerial photographs. A typical net plot with two 5milacre subplots is given in Figure 3.2. The 5milacre subplots were used for understory measurements. Plot centers were posted 53 with an 18inch orange plastic surveyor’s stake or steel reinforcing rod. Each plot number was permanently marked on the center stake. During establishment, each plot was thinned to a predetermined basal area as per the study design. Hardwood control was achieved by using a chemical herbicide. The buffer boundary also received the same treatments for thinning and hardwood control as the study plot. A total of 192 plots were planned with three plots in each combination of design variables (basal area, site index and age), but only 191 were established. Of the established plots, eight were either damaged due to windstorms or not thinned as per the design, so only 183 plots and 7740 trees were available for analysis for the first growth period from Study 48 (Lynch et al. 1999). Figure 3.2. A representation of 0.2 acres measurement plot with two 5milacre plots (USDA Forest Service 1995) 54 Study 58 was established in 1963–1964 by Frank Freese to study thinning effects in naturally occurring shortleaf pine stands. Study 58 was modified in 1988 to conform to the same design criteria used for Study 48 (Lynch et al. 1999). Additional plots from Study 58 were also available for this dissertation. Only the relevant data for the two growth periods from 1985 to 1996 have been used, although more data exist prior to 1985 for Study 58. Although Freese’s study was established with 35 plots, only 25 plots remained after 1987. Study 58 has only 664 trees available for analysis. Therefore, measurements for 208 plots and 8404 trees from the two studies through three time points are available for the purpose of this dissertation. Measurements/Observations Initial tree measurements were taken at the time of plot establishment. The azimuth and distance from plot center were recorded for each tree in a plot. Each tree was also labeled with a tree number and diameter measurement mark. Measurements of diameter at breast height and assignment of tree crown class were made on each tree. Tree diameter was measured at breast height (4.5 feet) to nearest 0.1inch for each tree in the plot. Tree status and crown characteristics were also measured on all the trees. Crown class was recorded as (1) Dominant, (2) Codominant, (3) Intermediate, or (4) Suppressed following the definitions of Avery and Burkhart (2002). Trees with crowns extending above the general level of the crowns and being in a position to receive full light from above and partially from sides were classified as dominants. The trees classified as codominants were at the general crown level, which were in a position to receive full 55 light from above but little from the sides. Trees that were shorter than dominants and codominants receiving little direct light from above and none from the sides were classified as intermediates. Trees classified as suppressed were completely below the general level of the crown receiving no direct light. Total tree height, crown height and crown length were recorded to nearest foot for representative dominant and codominant trees. Ring count data of the representative dominant and codominant trees was recorded to determine tree age. The ring count data were collected by means of increment cores taken at 4.0 feet height on the uphill side of the tree for the sample trees. Five years were added to annual ring count to determine the total age in years. At least four trees were measured for total height and crown height from plots having the biggest trees, whereas a maximum of 39 height measurement trees were selected from plots having the smallest trees. Repeated Measurements Second measurements (first remeasurements) were made between fall 1990 and fall 1992. Third measurements (second remeasurements) were taken between fall 1995 and fall 1997. Remeasurements were made at an interval of either four or five years. For remeasurements, the plots were located with the help of aerial photos, which indicated the location of the plots. If the plot center stake was missing during remeasurements, it was located by taping from previously known tree numbers. If a tree number was not found during remeasurement, it was located with the help of azimuth and distance records of previous measurements. If previously recorded diameter appeared to be incorrect, it 56 was checked with the help of increment core taken just below breast height. For remeasurement, only new heightsample trees were used for increment cores since ring count data for previous heightsample trees were already available, although not all heightsample trees were bored for the ring count. A significant change was made in status codes for third measurements. The details on field instructions including the changes can be found in the following field guides. (1) Field Instructions FSSO410648 Growth and Yield of Thinned Natural Shortleaf Stands in the Ozark and Ouachita National Forests (August 1991) for first and second measurements (2) Field Instructions Shortleaf Growth and Yield Study FSSO410648 (July 1995) for third measurement Each tree was assigned a twodigit status code for third measurement. However, the status code for the first two measurements was only a single digit. Status codes for Study 58 were slightly different. The details of status codes for both the studies are provided in Appendix I. No damage code was recorded for first measurement in Study 48, because all healthy and sound trees were selected. On the other hand, damage code was available from all the three measurements for Study 58, because it was a thinning study previously established in 1963–64. A slightly different status code, i.e. doubledigit code was used for Study 58 (Appendix I) 57 Data Management This section presents the issues relating to quality check, corrections and intermediate calculations carried out for preliminary analysis as well as for preparing data for more formal analysis. There were some plots with the same identification number across studies 48 and 58, e.g. plots 9 and 10; so the duplicate plot numbers were resolved by adding 48000 and 58000 respectively to make the plot identification unique so that random ploteffects could be estimated properly. The records with DBH growth over the period outside the range of 0 and 2 inches were listed. Similarly the trees with total height growth outside the range of 0 and 9 feet were also listed. Original plot sheets for such cases were thoroughly checked. Necessary corrections and adjustments were made where appropriate. Data entry errors were corrected accordingly, e.g. wrongly entered status codes, DBH values etc. Moreover, the cases in which DBH data were correctly entered in computer from plot sheets, but growth values appearing unusually large were compared to the expected maximum growth of an open grown tree (Smith et al. 1992). Almost all of such cases were found reasonable based on the analysis. Thus, such cases were left unaltered as there was no basis to make adjustments otherwise. Missing and suspicious values for variables such as DBH, height and status code were checked in plot sheets, and corresponding corrections were made as appropriate. Diameter at breast height was estimated for those trees which were in the plot but were missed for measurements. The estimate was used to calculate stand basal area only, but 58 such a record would not be included in modeling. Interpolation was used if middle DBH value was missing. Similarly, constant rate of growth was assumed for two periods when either first or third DBH was missing unless the third missing observation was due to mortality. In general, it was likely that first missing measurement could be due to the field worker missing the tree, but the third missing measurement could be due to either mortality or observation error by the recorders. For example, the values of DBH for a tree in Study 58 for different years are as follows: Year 1987 1992 1997 DBH (in) 12.4 (D1) 13.0 (D2) ? (D3) When the tree was alive in 1997, then the DBH was estimated by assuming the same growth rate during the two periods. That is: Estimate of D3 = D2 + growth rate x D2 where growth rate (for period 1) = ⎟ ⎟⎠ ⎞ ⎜ ⎜⎝ ⎛ − 1 2 1 D D D So, estimate of D3 = 13.0 + ⎟⎠ ⎞ ⎜⎝ ⎛ − 12.4 13.0 12.4 x 13.0 = 13.63 (which rounds to 13.6 with 0.1 inch rounding) Site index was calculated based on all three measurements using the method of Graney and Burkhart (1973). Since their site index equation could not be solved analytically, a numerical procedure called the secant method as described in Gerald and Wheatley (1994) was used. Trees with possible height measurement errors were not 59 included in site index calculation. The ratio of quadratic mean diameter to individual tree DBH was calculated. Stand level variables such as basal area per acre and stand age were computed from individual tree measurements. Stand basal area was calculated by summing the individual tree basal areas in the plot, and by adjusting to a per acre value. Stand age was assumed to be the average age of the representative dominant and codominant trees in the plot. Study 48 data for the first two measurements (first period) were available as a SAS ver. 7 data file from the work of Lynch et al. (1999), which was converted to a SAS ver. 9 data file. Moreover, if DBH correction from increment core data during the third measurement was available, the master file was updated by running the programs again. A separate Excel file for the third measurement for Study 48 was available, and it was thoroughly checked for such errors as discussed before. Since the file was only for the third measurement, records for dead trees were not entered in the Excel file. But it was necessary to keep track of what happened to a specific tree in previous measurements. So, an appropriate death code, including appropriate code to distinguish if a tree was a height sample tree or not, was entered for such records. This would allow the users to keep track of individual trees during the fourth measurement. For Study 48, growth calculations for the first period were performed by Hitch (1994) and Lynch et al. (1999). These same computations were done for the second period. The previous SAS programs were revised for the second period, and new programs were also written to handle the added complexity due to additional growth period. For example, site index calculation is now based on all three measurements. Growth calculations were also completed for both the 60 periods for Study 58. Thus, a master file was created with both the studies and two growth periods. Conversion of azimuth and distance data to Cartesian coordinate system was achieved in three steps: (1) Transformation (translation) of azimuth angle by subtracting 900 from azimuth in quadrants II, III and IV, but adding 2700 to azimuth in quadrant I. The resulting angle is similar to that used for polar coordinates but measured in a clockwise direction (2) Subtracting the value of step (1) above from 3600 to obtain the correct angle for the polar coordinates system (3) Calculation of Cartesian (xy) coordinates from polar coordinates by using the following formulas: xcoordinate = r cosine(θ) (r= distance, θ=angle for polar coordinate) ycoordinate = r sine(θ). This was modified to our situation to calculate in SAS using appropriate variable names. SAS uses radian measure but azimuth was in degrees. The Cartesian coordinates will be used for analysis of possible spatial dependence of trees within a plot. Plot establishment, measurements and the type of data used in growth and yield modeling from the first two measurements are given in Hitch (1994), Lynch et al. (1999) and Rose (1998). Data from fourth measurements made between fall 2000 and fall 2001 will not be utilized in this project because of significant icestorm damage. 61 The data for this project is “approximated real growth series” type in the terminology of Moser and Hall (1969). The results from analysis of these data should apply to naturallyregenerated shortleaf pine stands of eastern Oklahoma and western Arkansas. The summary statistics of stand or plot variables as well as tree variables are presented in Tables 3.2 and 3.3. Table 3.2. Summary statistics for 208 plots (Studies 48 and 58 combined) for plot/stand level variables Variable Mean Standard Deviation Minimum Maximum Basal area (ft2/ac) First measurement Mid (first) Second measurement Mid (second) Third measurement 92.9 102.6 112.3 121.3 130.7 29.1 30.2 32.2 34.1 36.8 27.3 22.5 14.9 16.1 17.3 129.0 142.5 156.5 178.4 200.4 Stand age (yr) First measurement Second measurement Third measurement 41.8 46.5 51.6 19.7 19.7 19.7 18.0 23.0 28.0 93.0 98.0 103.0 Site index (ft at age 50 yr) 57.4 9.5 39.9 87.4 Quadratic mean diameter, QMD (in) 8.3 3.5 3.3 19.9 Dominant height (ft) 54 19 27 106 Overall, a negative tree growth value is not impossible in calculations due to measurement/observation error or, more importantly, due to loss of some bark thickness, especially with old trees. Therefore, analyses including negative growth as well as setting negative growth to zero will be discussed. 62 Table 3.3. Summary statistics for 208 plots (Studies 48 and 58 combined) for individual tree variables Variable Mean Standard Deviation Minimum Maximum Diameter at breast height (in) First measurement Second measurement Third measurement 7.4 8.2 9.1 3.9 3.9 4.0 1.1 1.2 1.5 24.4 25.4 26.6 Tree basal area (ft2) First measurement Second measurement Third measurement 0.3825 0.4469 0.5397 0.3835 0.4128 0.4526 0.0066 0.0079 0.0123 3.2472 3.5188 3.8591 Annual average basal area growth (ft2/tree) Overall First period Second period 0.01378 0.01303 0.01459 0.01127 0.01044 0.01205 0.01669 0.01669 0.01396 0.09964 0.07176 0.09965 Ratio of QMD to DBH (R) 1.1397 0.4423 0.3559 7.3621 Total height (ft) First measurement Second measurement Third measurement 57 61 65 22 21 20 10 10 13 112 113 119 Figures 3.3 and 3.4 provide some idea about how DBH distribution changed slightly over the periods. The mid value of DBH for two measurements was computed for use as an independent variable in modeling tree basal area growth. 63 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 0 2 4 6 8 10 12 Per c ent DBH_MID Figure 3.3. Histogram of midDBH for the first growth period 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 0 2 4 6 8 10 Per c ent DBH_MID Figure 3.4. Histogram of midDBH for the second growth period 64 CHAPTER IV METHODS FOR DATA ANALYSIS AND MODELING This chapter presents some background on statistical methods used in data analysis, and their application to the shortleaf pine growth study data set. Basic statistics (summary statistics and graphical techniques) were used for exploratory analysis. Two major tree attributes were considered for modeling: average annual basal area growth, and total height. For both of these attributes, models were first fitted with a calibration data set, and validation was carried out with a different data set, although the validation data set in this study would not be completely independent of the calibration data set. Ideally, a model would have good properties both statistically and biologically. On statistical grounds, some models might fit the data well, but they might not necessarily have good biological properties. For example, a high degree polynomial model may fit sample data very well, but such a model often does not have much biological significance. Thus an empirical modeling approach can be useful, but interpreting individual coefficients in the resulting model may be difficult. However, overall data fit and good predictability of response for a given value of explanatory variable(s) are important, and an empirical model may be able to provide these characteristics. 65 Since biological growth is considered intrinsically nonlinear (Zeide 1993), a nonlinear statistical model is naturally a good choice for forest modelers. Thus, attempts were made to fit different nonlinear models for shortleaf pine growth data. This approach is consistent with typical forest growth and yield modeling practices (e.g. Wykoff 1990, Vanclay 1994). A variety of nonlinear regression models for shortleaf pine growth variables have been reported by Murphy (1982, 1986), Lynch et al. (1991, 1999), Murphy et al. (1992), and Lynch and Murphy (1995). None of the shortleaf pine growth researchers took the random ploteffects and possible dependence of withinplot errors into account. These factors could influence the resulting parameter estimates and their precision, although tools of mixed modeling that could handle such an analysis had already appeared in forestry literature since late 1980’s (see Review of Literature). Therefore, this project carried out analyses that would make improvements on the two major components of shortleaf pine growth models developed by Lynch et al. (1999): basal area growth and total height models. Randomeffects for plots (or stands assuming a plot typically represents a stand) were incorporated in a modeling framework to obtain better parameter estimates. Possible correlated and heterogeneous withinplot errors were also evaluated. Furthermore, this modeling exercise utilized additional data resulting from the third measurement on these permanently established plots. MixedEffects Models As briefly discussed in Chapter II, when all the levels of a factor are present in the data for analysis, then the factor has a fixedeffect. That is, the whole population of the 66 factor levels is represented. The analyst is basically interested in comparing the effects of the factors on the response variable for those levels included in the study only. A model containing only fixed effects is called a fixedeffects model. On the other hand, a factor may have a large number of levels in some cases, and the direct testing or representation of all the levels are not often economical, practical, or necessary. So only a random sample of the factor levels is tested or studied. Thus inference from such a study can be applied to the population of levels from which the sample was drawn. Such effects are called randomeffects. For example, use of random plot effects in a model would allow valid generalization of the results to a population of plots from which the study plots were sampled. A model in which all the effects are random is called a randomeffects model. Variances associated with randomeffects are known as variance components (Graybill 1976, Snedecor and Cochran 1980). Data analysts are interested in the variance components rather than in the individual random coefficients themselves. Models in which some effects are fixed and the others are random are called mixed models. Each randomeffect is associated with a particular fixedeffect, but all the fixedeffects in the model may or may not have associated randomeffects. In the mixedeffects approach also, the objectives of fixed effects testing are as in the fixedeffects modeling. On the other hand, one would test if the variance components associated with the random effects equal zero rather than testing or interpreting the individual random coefficients themselves, although best linear unbiased predictors (BLUP) of the randomeffects are also estimated or in the commonly used terminology, predicted. In fact, randomeffects help in accounting for sources of variation, which enables accurate estimation and testing of fixedeffects (Tao 2002). Mixedmodels are also defined to be 67 models with errors at more than one level. So these models are also called multilevel models or hierarchical models. In mixed models, effects can be either nested or crossed. Depending on how a response is modeled as a function of explanatory variables, typically mixedeffects models are classified as: (a) Linear mixed models (b) Nonlinear mixed models Linear Mixed Model A basic linear mixedeffects model, as described by authors such as Laird and Ware (1982), Gregoire et al. (1995), and Pinheiro and Bates (2000) is given below. In matrix notation the model can be written as: yi = Xi β + Zibi + εi, i=1,2, ……M, i.e. M = number of groups (24) where, yi is the nidimensional response vector for the ith group (the number of observations could be different in different groups) β is the pdimensional parameter vector for fixedeffects bi is the qdimensional group specific vector of parameters for randomeffects (scalar if only one random effect) Xi is the ni × p design matrix for fixedeffects variables Zi is the ni × q design matrix for randomeffect variables (randomeffect of ith group) that are often chosen as a subset of fixedeffects variables εi is the nidimensional withingroup (or groupspecific) error vector 68 It is assumed that bi ~ N(0, Ψ ), and εi ~ N(0, σ2I), where Ψ is a variancecovariance matrix for random effects and I is an identity matrix with random withingroup error variance σ2. Moreover, it is also assumed that cov(bi, εi) = 0, i.e. randomeffects and withingroup errors are assumed to be independent for different groups, and to be independent of each other in the same group. The columns of Zi are often a subset of the columns of Xi. Nonlinear Mixed Model A basic nonlinear mixedeffects model can be written as follows (Pinheiro and Bates 2000). yij = f (φ ij,ν ij )+ε ij , i =1,…,M, j =1,…,ni, (25) where M = the number of groups (e.g. number of plots if data are grouped by plots) ni = the number of observations on the ith group (e.g. number of trees in a plot at one time point) f = a general, realvalued, differentiable function, which is nonlinear in at least one component of the groupspecific parameter vector φ ij = a groupspecific parameter vector ν ij = a covariate vector ε ij = a normally distributed withingroup error term independent of bi (random effects), i.e. ε ij ~ N(0, σ2) 69 Moreover, φ ij is modeled as φ ij = Aijβ + Bijbi where bi ~ N(0,Ψ ) β = pdimensional vector of fixedeffects parameters bi = qdimensional vector of randomeffects parameters associated with the ith group, and bi ~ N(0, Ψ ). Note thatΨ is a scalar, if there is only one randomeffect. Aij = design matrix for fixedeffects parameters Bij = design matrix for randomeffects parameters It is assumed that randomeffects and withingroup errors are independent. All of the above for the nonlinear mixed model can be summarized in matrixnotation as: ( ) yi = f i φ i,ν i + ε i , i =1,…,M (26) where i φ is modeled as φ i = Aiβ + Bi bi , and yi = ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ ini i y y M 1 , ( )f i φ i,ν i = ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ ( , ) ( , ) ini ini i i f f φ ν φ ν M 1 1 , i φ = ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ ini i φ φ M 1 , ν i = ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ ini i ν ν M 1 ε i = ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ ini i ε ε M 1 , Ai = ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ ini i A A M 1 , Bi = ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ ini i B B M 1 For this work, only nonlinear mixedeffects models are used, and are therefore the focus of all discussions. 70 Extended Mixed Models The mixed models considered in this section have two aspects, viz. extension to more than one level of randomeffects, and incorporation of correlated and/or heterogeneous errors. The basic models can be extended to more than one level. A typical twolevel nonlinear mixed model can be written as given below. yijk = f (φ ijk,ν ijk ) + ε ijk , i =1,…,M, j =1,…,Mi, k =1, …,nij (27) where ijk = Aijk + Bi jk bi + Bijk bij , φ β bi ~ ( ) N 0,ψ 1 , bij ~ ( ) N 0,ψ 2 Mixed models with any level of randomeffects can be further extended to incorporate correlated and/or heterogeneous errors, although the complexity increases quickly since a regular mixedeffects model in itself is computationally very intensive. It is not always necessary to assume that the errors are normally and independently distributed. Thus in the singlelevel models above, we can assume that εi ~ N(0, σ2Λi), where Λi are positivedefinite matrices. The withingroup errors are again assumed to be independent for different groups indexed by i, and independent of random effects bi . The structure of Λi can accommodate correlated and/or heterogeneous withingroup error structures. That is, the variancecovariance structure of the withingroup errors can be partitioned into two components: a correlation structure and a variance structure. The error distribution would extend to εik ~ N(0, σ2Λik) for twolevel model (Pinheiro and 71 Bates 2000). Details on correlated and/or heterogeneous errors are presented in a separate section called Error Modeling. Estimation and Computing Two major parameter estimation methods are maximum likelihood (ML) and restricted/residual maximum likelihood (REML). The theory of estimation for these methods can be found in Pinheiro and Bates (2000). The basic mixedeffects models described above can be fit by popular software packages such as SAS and SPlus. A linear mixed model or a linearized form of nonlinear model can be fit using either SAS PROC MIXED or SPlus lme command. A nonlinear mixed model can be fit by PROC NLMIXED or SPlus nlme command. Both lme and nlme commands in SPlus and PROC MIXED have options for both ML and REML methods, whereas PROC NLMIXED has only ML option available since the REML method for nonlinear mixed model is a very complicated estimation procedure. Therefore, SPlus nlme and PROC NLMIXED have ML as default estimation method, whereas SPlus lme and PROC MIXED have REML method as default method (Pinheiro and Bates 2000, Tao 2002). The extended linear mixed models can be fit by both PROC MIXED and SPlus lme. However SPlus nlme could accommodate more complicated error structures for correlated and heterogeneous errors, but PROC NLMIXED can fit only the basic nonlinear mixedeffects model assuming independently and normally distributed errors. SPlus gls and gnls commands can be used to fit equivalent basic as well as extended 72 linear and nonlinear models without randomeffects using generalized leastsquares methods. Overall, SPlus has more facilities for handling the extended models, especially for nonlinear mixed models (Pinheiro and Bates 2000, Tao 2002). Since intrinsically nonlinear mixed models are considered in this dissertation, typical syntax and command structure for SPlus and SAS procedures are given below. SPlus nlme Syntax (Pinheiro and Bates 2000, Venables and Ripley 2002): nlme(model, data, fixed, random, groups, start, method) where model is a two sided nonlinear formula separating response on the left and an expression involving parameters and explanatory variables on the right; data specifies the name of data file; fixed and random specify fixed and random components of the model; the groups option declares the grouping structure of the data if the data file is not already grouped; the start option is used to provide starting values for the estimates unless a selfstarting function is used; and method can be used to choose between ML and REML methods for parameter estimation. The output from this command can be stored in an appropriate file that could be specified as: outfile<nlme(.), where outfile is a name given to an output file. I
Click tabs to swap between content that is broken into logical sections.
Rating  
Title  Mixedeffects Modeling of Shortleaf Pine (Pinus Echinata Mill.) Growth Data 
Date  20060701 
Author  Budhathoki, Chakra Bahadur 
Department  Environmental Sciences 
Document Type  
Full Text Type  Open Access 
Abstract  The objective of this study was to develop individualtree mixedeffects models for basal area growth and the diameterheight relationship of shortleaf pine ( Pinus echinata/ Mill.). Repeated measurements for attributes including diameter at breast height and total height from over 200 permanent plots were available from eastern Oklahoma and western Arkansas. Models with plot randomeffects were fitted using the SPlus <tfont>nlme</tfont> library and SAS PROC NLMIXED utilizing a calibration dataset. Models with independently and normally distributed errors were fitted first. Then possible spatially correlated and/or heterogeneous withinplot errors were modeled for basal area growth. The most promising models were tested using an independently selected dataset from the same study. Results and conclusions/. Though increasingly popular in forestry, mixedeffects modeling technique has never previously been used in shortleaf pine growth modeling. Nonlinear mixed models with plot randomeffects were found to fit the data better than the models fitted with a complete random sample assumption (the ordinary leastsquares method) as reported in Lynch et al. (1999) for both a basal area growth model and a model for diameterheight relationship. Because data were grouped by plots, a mixedeffects model with plotlevel randomeffects was a more realistic representation of the data structure than ordinary least squares. Spatial correlation among tree measurements within a plot did not appear to be important in presence of plot randomeffects. However, variance modeling using a variance function with tree basal area as a covariate accounted for heterogeneity of withinplot errors better than the modeling approach in which constant variance was assumed. 
Note  Dissertation 
Rights  © Oklahoma Agricultural and Mechanical Board of Regents 
Transcript  MIXEDEFFECTS MODELING OF SHORTLEAF PINE (PINUS ECHINATA MILL.) GROWTH DATA By CHAKRA BAHADUR BUDHATHOKI Bachelor of Science Tribhuvan University, Nepal 1992 Diploma in Statistics The University of Reading, England 1996 Master of Science in Biometry The University of Reading, England 1997 Submitted to the Faculty of the Graduate College of the Oklahoma State University in partial fulfillment of the requirements for the Degree of DOCTOR OF PHILOSOPHY July, 2006 ii MIXEDEFFECTS MODELING OF SHORTLEAF PINE (PINUS ECHINATA MILL.) GROWTH DATA Dissertation Approved: Dr. Thomas B. Lynch ________________________________________________ Thesis Advisor Dr. David K. Lewis ________________________________________________ Dr. Robert F. Wittwer ________________________________________________ Dr. Mark E. Payton ________________________________________________ Dr. A. Gordon Emslie ________________________________________________ Dean of the Graduate College iii PREFACE The objectives of this dissertation are to review mixedeffects models in forestry literature, and to use these analysis tools for modeling shortleaf pine (Pinus echinata Mill.) growth variables. Data were available from over 200 permanent plots established in naturallyregenerated shortleaf pine stands of eastern Oklahoma and western Arkansas. It is evident from the review of literature that the mixedeffects models have been increasingly used in forestry since the 1990’s. However, these tools have never previously been used in shortleaf pine growth and yield modeling. This dissertation project was successful in expanding two of the major components of growth and yield modeling for shortleaf pine. The distanceindependent individualtree model for annual basal area growth model of Lynch et al. (1999) was improved to incorporate randomeffects for plots in a potentialmodifier framework with stand level and tree level explanatory variables. Furthermore, the individualtree model for total height of Lynch et al. (1999) was also expanded to have plotspecific random parameters for the relationship between total height and diameter at breast height in which dominant height is also a predictor. The fitted mixedeffects models for annual basal area growth and total height were found to fit the data, and to predict the responses better than the previous models reported by Lynch et al. (1999). Possible correlated and/or heterogeneous withinplot errors were also investigated. It was found that within iv plot errors did not appear to be significantly correlated in the presence of plot randomeffects; however, there was some evidence of heterogeneous errors. These model estimates can be utilized in the Shortleaf Pine Stand Simulator developed at Oklahoma State University after fitting other components such as mortality functions in mixed modeling framework. Due to voluminous data, complexity of the models attempted and some degree of correlations among model parameters, computing limitations were experienced at times, resulting in model convergence issues. v ACKNOWLEDGMENTS First and the foremost, I am grateful to my major advisor Dr. Tom Lynch for his guidance, encouragement and cooperation throughout my graduate study, both coursework and research at Oklahoma State University. Thank you Dr. Lynch! Thanks are also due to Drs. Mark Payton, David Lewis and Robert Wittwer for helping and encouraging me as advisory committee members. I also would like to thank the Department of Forestry at OSU and its faculty and staff for providing all the facilities, including a Graduate Research Assistantship and Dr. Michel Afanasiev Graduate Fellowship that made my study possible. The assistance of the USDA Forest Service Southern Research Station and the Ozark and Ouachita National Forests in data collection and financial support for the study is much appreciated. I also appreciate the friendship of all graduate students in the Department of Forestry and Environmental Science graduate program at OSU. I cannot say enough in words how much love and help I have received from my wife Reena and son Saubhagya. Thanks for understanding me as a husband and as a father. I am thankful to my parents and brother Milan for their understanding of my study pursuit for such a long period of about four years. Our stay in Stillwater would not have been so comfortable, had there not been a small but coherent community of Nepalese friends and families. Thank you all for your time and help that we received. vi TABLE OF CONTENTS Chapter Page I. INTRODUCTION......................................................................................................1 Forest Growth and Yield Modeling .........................................................................3 Objectives ................................................................................................................6 II. REVIEW OF LITERATURE………………………………………………………7 Background..............................................................................................................7 Growth Data........................................................................................................7 Modeling Biological Growth ..............................................................................8 Types of Forest Growth Models ............................................................................13 DensityFree StandLevel Models ....................................................................13 VariableDensity StandLevel Models .............................................................13 Diameter Class Models .....................................................................................17 IndividualTree Models ....................................................................................18 DistanceDependent IndividualTree Model ...............................................19 DistanceIndependent IndividualTree Model.............................................21 UnevenAged Stand Modeling .........................................................................30 Forest Growth and Yield Simulation Systems.......................................................32 Mixed Models ........................................................................................................34 Fixed vs. RandomEffects.................................................................................34 Longitudinal Data/Repeated Measures .............................................................35 Correlated and/or Heterogeneous Errors ..........................................................38 Application of Mixed Models in Forestry .............................................................39 Estimation/Computational Methods ......................................................................49 III. DATA AND DATA MANAGEMENT.................................................................50 Study Area and Establishment of Plots..................................................................50 Measurements/Observations ..................................................................................54 Repeated Measurements ........................................................................................55 Data Management ..................................................................................................57 vii IV. METHODS FOR DATA ANALYSIS AND MODELING ..................................64 MixedEffects Models ...........................................................................................65 Linear Mixed Model ..........................................................................................67 Nonlinear Mixed Model.....................................................................................68 Extended Mixed Models ....................................................................................70 Estimation and Computing ....................................................................................71 Model Comparison Criteria ...................................................................................74 Error Modeling.......................................................................................................75 Correlated Errors.................................................................................................75 Spatial Correlation Structure for WithinGroup Errors ......................................76 Heterogeneous Errors..........................................................................................78 Model Checking.....................................................................................................80 An Approach to Mixed Model Development ........................................................80 Exploratory Data Analysis.....................................................................................82 Fitting the Growth Models.....................................................................................83 Basal Area Growth Model ..................................................................................83 Total Height Model.............................................................................................86 Fitting the Extended Models..................................................................................87 Calibration and Validation.....................................................................................90 Summary Statistics for Calibration Data Set ......................................................90 Summary Statistics for Validation Data Set .......................................................92 V. RESULTS ...............................................................................................................94 Basal Area Growth Model .....................................................................................94 Calibration Results..............................................................................................94 Extended Models with Period and Plot RandomEffects ...............................98 Extended Models with Power Variance Function ..........................................99 Extended Models with Spatial Correlation Function for WithinPlot Errors ...................................................................................104 Extended Models with Variance and Spatial Correlation Functions ............105 Validation Results.............................................................................................105 Residuals vs. Predicted Values .....................................................................106 Residuals vs. Tree Basal Area ......................................................................109 Residuals by Design Criteria ........................................................................111 Total Height Model..............................................................................................119 Calibration Results............................................................................................119 Validation Results.............................................................................................120 VI. DISCUSSION AND CONCLUSIONS...............................................................125 Basal Area Growth Model ...................................................................................125 Extended Model with Period and Plot RandomEffects ..................................128 viii Extended Model Accounting for Variance Heterogeneity...............................129 Extended Model with RandomEffects for Growth Period and Plots Accounting for Variance Heterogeneity .................................................130 Residual Analysis for Basal Area Growth Models..........................................131 Total Height Model..............................................................................................136 Residual Analysis for Height Model II............................................................137 Discussion............................................................................................................139 Conclusions..........................................................................................................145 LITERATURE CITED..............................................................................................147 APPENDIX I .............................................................................................................163 APPENDIX II ............................................................................................................167 ix LIST OF TABLES Table Page 2.1 Some selected individual tree growth and yield simulation models/systems available for the United States (Adapted from Davis et al. 2001) .................................................................................................33 3.1 Study design for naturally regenerated evenaged shortleaf plots installed from 1985 to 1987 with thinning and herbicide treatment in eastern Oklahoma and western Arkansas (Lynch et al. 1999)...............................................................................................52 3.2 Summary statistics for 208 plots (Studies 48 and 58 combined) for plot/stand level variables................................................................................61 3.3 Summary statistics for 208 plots (Studies 48 and 58 combined) for individual tree variables .................................................................................62 4.1 List of fitted basal area growth models................................................................89 4.2 Summary statistics of plot level variables for calibration data set with 139 plots.......................................................................................................91 4.3 Summary statistics of tree level variables for calibration data set with 139 plots.......................................................................................................91 4.4 Summary statistics of plot level variables for validation data set with 69 plots.........................................................................................................92 4.5 Summary statistics of tree level variables for validation data set with 69 plots.........................................................................................................93 5.1 Parameter estimates and other associated statistics for BAG Model I for calibration dataset (total degrees of freedom (df) =10268, residual standard deviation (SD) = 0.006839469, residual df = 10261).............................................................................................95 5.2 Summary statistics for fitted basal area growth models from calibration dataset ................................................................................................96 x 5.3 Parameter estimates and other associated statistics for BAG Model II for calibration dataset (number of plots = 139, residual df = 10124, and total observations = 10269)..........................................97 5.4 Fixedeffect parameter estimates and related statistics for BAG Model VI for calibration dataset (number of growth period=2, plots within period = 278, residual df = 9986, and total observations = 10269)..........................................................................................98 5.5 Parameter estimates and other associated statistics for BAG Model VII for calibration dataset (number of plots = 139, residual df = 10124, and total observations = 10269) .....................................................101 5.6 Parameter estimates and other associated statistics for BAG Model VIII for calibration dataset (number of plots = 139, residual df = 10124, and total observations = 10269)........................................102 5.7 95% confidence intervals for the parameters in BAG Model VIII....................103 5.8 Parameter estimates and other associated statistics for BAG Model X for calibration dataset (number of periods = 2, plots within period = 278, residual df = 9986, and total observations = 10269) ..............................103 5.9 Summary statistics for candidate basal area growth models using validation dataset ...............................................................................................106 5.10 Summary of results for Height Model I without randomeffects for plots for calibration dataset (total observations = 5968, residual df = 5964, residual variance=22.8269) ..............................................................119 5.11 Summary of results for Height Model II with randomeffects for plots for calibration dataset (total observations = 5968, group df = 138)..........120 6.1 Summary statistics for fitted basal area growth models using complete dataset (total observations = 15,669, and number of plots = 208)........................................................................................125 6.2 Parameter estimates and other associated statistics for BAG Model I for complete dataset (number of plots = 208, total observations=15,669, and residual df = 15,661).................................................126 6.3 Parameter estimates and other associated statistics for BAG Model II for complete dataset (number of plots = 208, total observations = 15,669, and residual df = 15,455)...............................................127 xi 6.4 Parameter estimates and other associated statistics for BAG Model VI for complete dataset (number of periods = 2, plots within period = 416, residual df = 15248, and total observations = 15,669)........................................................................................128 6.5 Parameter estimates and other associated statistics for BAG Model VII for complete dataset (number of plots = 208, total observations = 15,669, and residual df = 15,455)...............................................129 6.6 Parameter estimates and other associated statistics for BAG Model X for complete dataset (number of plots = 208, total observations = 15,669, and residual df = 15,248)...............................................130 6.7 Parameter estimates from SAS PROC NLIN for complete dataset (total observations = 8964, and residual variance = 21.5688) ............................136 6.8 Parameter estimates from PROC NLMIXED for complete dataset (total observations = 8964, and number of groups = 208)..................................136 xii LIST OF FIGURES Figure Page 1.1 General distribution of shortleaf pine the US .........................................................2 3.1 Shortleaf pine growth study locations in eastern Oklahoma and western Arkansas (Lynch et al. 1991)...................................................................51 3.2 A representation of 0.2 acres measurement plot with two 5milacre plots (USDA Forest Service 1995) .......................................................................53 3.3 Histogram of midDBH for the first growth period..............................................63 3.4 Histogram of midDBH for the second growth period .........................................63 5.1 Plot of residuals vs. tree basal area for BAG Model II.......................................100 5.2 Plot of standardized residuals vs. fitted values for BAG Model II.....................107 5.3 Plot of standardized residuals vs. fitted values for BAG Model VI ...................107 5.4 Plot of standardized residuals vs. fitted values for BAG Model VII ..................108 5.5 Plot of standardized residuals vs. fitted values for BAG Model X.....................108 5.6 Plot of standardized residuals vs. tree basal area for BAG Model II..................109 5.7 Plot of standardized residuals vs. tree basal area for BAG Model VI ................109 5.8 Plot of standardized residuals vs. tree basal area for BAG Model VII...............110 5.9 Plot of standardized residuals vs. tree basal area for BAG Model X .................110 5.10 Boxplots of standardized residuals by age class for BAG Model II.................111 5.11 Boxplots of standardized residuals by age class for BAG Model VI ...............112 5.12 Boxplots of standardized residuals by age class for BAG Model VII..............112 5.13 Boxplots of standardized residuals by age class for BAG Model X.................113 xiii 5.14 Boxplots of standardized residuals by site index class for BAG Model II.......113 5.15 Boxplots of standardized residuals by site index class for BAG Model VI .....114 5.16 Boxplots of standardized residuals by site index class for BAG Model VII....114 5.17 Boxplots of standardized residuals by site index class for BAG Model X.......115 5.18 Boxplots of standardized residuals by basal area class for BAG Model II ......116 5.19 Boxplots of standardized residuals by basal area class for BAG Model VI.....116 5.20 Boxplots of standardized residuals by basal area class for BAG Model VII....117 5.21 Boxplots of standardized residuals by basal area class for BAG Model X ......117 5.22 Plot of standardized residuals vs. predicted values for total height model.......121 5.23 Plot of standardized residuals vs. diameter at breast height for total height model........................................................................................121 5.24 Plot of standardized residuals vs. dominant height for total height model.......122 5.25 Boxplots of standardized residuals for Height Model II by age classes....................................................................................................122 5.26 Boxplots of standardized residuals for Height Model II by site index classes ..........................................................................................123 5.27 Boxplots of standardized residuals for Height Model II by stand basal area classes ................................................................................124 6.1 Standardized residuals vs. fitted values for BAG Model II (complete dataset) .............................................................................................131 6.2 Standardized residuals vs. fitted values for BAG Model VI (complete dataset) .............................................................................................132 6.3 Standardized residuals vs. fitted values for BAG Model VII (complete dataset) .............................................................................................132 6.4 Standardized residuals vs. fitted values for BAG Model X (complete dataset) .............................................................................................133 6.5 Boxplots of standardized residuals for BAG Model II (complete dataset) .............................................................................................133 xiv 6.6 Boxplots of standardized residuals for BAG Model VI (complete dataset) .............................................................................................134 6.7 Boxplots of standardized residuals for BAG Model VII (complete dataset) .............................................................................................135 6.8 Boxplots of standardized residuals for BAG Model X (complete dataset) .............................................................................................135 6.9 Plot of standardized residuals vs. predicted values for total height model from complete dataset .................................................................138 6.10 Plot of standardized residuals vs. DBH for total height model from complete dataset.......................................................................................138 1 CHAPTER I INTRODUCTION Forests have been managed since time immemorial for direct benefit of human kind, although some forests are kept untouched for several indirect (or even unknown) reasons. Forest management involves one or more than one objective such as timber production, biodiversity and water supply. With increasing understanding of basic sciences and increasing demand for resources, forest management is becoming more and more important (Davis et al. 2001). Progressive forest management requires a good understanding of how individual trees and stands grow over time under different environmental conditions. Therefore, forest growth and yield modeling is an integral part of forest management, at least in developed countries. Shortleaf pine (Pinus echinata Mill.) is second to loblolly pine (Pinus taeda L.) in terms of volume of the southern pines in the United States. It has the widest range of any southern yellow pine in the US. It grows in 22 states over more than 440,000 square miles (1,139,600 km²), ranging from southeastern New York to eastern Texas. Despite its importance, there has been relatively little research and management effort compared to other southern pines (Willet 1986). The overall distribution of shortleaf pine in the US is presented in Figure 1.1. 2 Figure 1.1. General distribution of shortleaf pine in the US (Source: Western North Carolina Nature Center, http://wildwnc.org/trees/Pinus_echinata.html March 30, 2006) Shortleaf pine is adapted to a range of geography, soils, topography and habitats; however, individual trees grow best on deep welldrained soils of the Upper Coastal Plain, and the most prominent shortleaf communities are found in the Ouachita Highlands (Guldin 1986). Natural stands of shortleaf pine occur from almost sea level to 3,300 feet (1,006 m). Favorable average temperature for growth ranges from 480 to 700F, with minimums of 220F and maximums of 1020F. A rainfall range of 40 to 55 inches per year is common in areas where shortleaf stands occur naturally (Williston and Balmer 1980). It is believed that the Ouachita Mountains of Arkansas and Oklahoma originally contained the largest shortleaf pine forest in the world. According to an estimate, shortleaf and shortleafhardwood stands must have covered approximately 5,000 square 3 miles, which is approximately 50% of the Ouachita Mountains, until the middle of the twentieth century (Smith 1986). Shortleaf pine is common on nonindustrial private ownerships in Oklahoma and Arkansas, but is important in industrial ownerships also. It is native to 20 counties of Oklahoma. Shortleaf pine is also the state tree of Arkansas (ODAFS 2000). It is a common observation that loblolly pine is preferred over shortleaf pine for commercial reasons due to higher growth rates at younger ages. However, shortleaf pine continues to be an important pine species in the south, especially in naturally regenerated forests (McWilliams et al. 1986). Forest Growth and Yield Modeling Forest resource management goals can be achieved only after proper assessment of available resources. Forest resource assessment is becoming increasingly quantitative, and more reliable estimate of resources scattered in a wider area can be made. The estimation procedure can be improved both in the field during data collection and in the office during data management and analysis. This can be achieved by following rigorous scientific techniques and methods. For resource assessment, standard methods in existence can be followed, although it is a subject of continuous research. With the advent of technologies in other areas (e.g. computing) research methods in forestry are also changing over time. Researchers are using techniques like remote sensing and geographic information systems for forest resource assessments at a larger scale. 4 We can recognize the following two extremes in forest growth modeling in terms of the scale of unit of sampling and observation. 1. Models based on individual plant level information for biological processes that attempt to generalize to growth models that could be used for economically important practical applications. These are generally called physiological process models. 2. Models based on large scale geographic information systems such as models utilizing satellite imagery data that attempt to estimate or predict stand or forest level information that could be used for practical applications. A complete issue (number 3) of Volume 49 of Forest Science (2003) has been devoted to research publication for this aspect, although a wide array of literature are available from elsewhere. Forest growth models in general have often been regressiontype statistical models utilizing site index, tree and stand measurements. These traditionally popular growth models often ignore environmental variables such as climatic factors. Such typical forest growth models can possibly be considered to fall somewhere in between the two extremes mentioned above. Forest growth data are typically longitudinal in nature resulting from repeated measurements in permanent plots or approximately longitudinal in nature resulting from crosssectional data from tree measurements in stands of different ages. Traditional forest growth and yield models have been used to predict the present as well as future states of forest conditions. It has been common to use data from temporary 5 plots, however permanent plot studies are becoming more popular. Modeling techniques are also being refined to make use of new developments in the areas of statistics and computing. Forest growth and yield models can be developed either for natural stands or for plantations. The models for natural stands could be either for evenaged or unevenaged stands. Similarly, the models for plantations could be either for thinned or unthinned stands (Clutter et al. 1983). According to Davis et al. (2001), forest growth and yield models can be classified as follows. (1) Whole stand models: (a) densityfree, and (b) variabledensity (2) Diameter class models (3) Individual tree models: (a) distancedependent, and (b) distanceindependent These models are reviewed, and a short description is presented in Chapter II (Review of Literature). This dissertation presents the work of analysis of shortleaf pine growth measurements data from eastern Oklahoma and western Arkansas. There have been considerable growth studies in the past on shortleaf pine including Murphy (1982, 1986), Lynch et al. (1991, 1999), Murphy et al. (1992), and Lynch and Murphy (1995). Moreover, this project attempts to make an improvement over the previous work and to revise the parameter estimates, especially on the work reported by Lynch et al. (1999). Previous growth model parameters for shortleaf pine have generally been fitted using ordinary least squares methods. Shortleaf pine individualtree models fitted by ordinary least squares have not accounted for plotlevel grouping of individual observations. Mixedeffects models can use random plot effects to account for grouping in this data structure. Thus, it is expected that better parameter estimates can be obtained using more 6 advanced statistical tools of mixed modeling. Furthermore, this work incorporates a third measurement of permanent plots that was not used by Lynch et al. (1999). It is expected that the resulting estimates could be used for better prediction of the shortleaf pine growth for Oklahoma and Arkansas areas. Objectives This dissertation concerns development of individualtree mixedeffects models for evenaged shortleaf pine. It improves on what has been done in the past on shortleaf pine growth and yield modeling. Specifically, the following are the objectives of this project: (1) Review the past work in shortleaf pine growth and yield modeling, and mixedeffects modeling work in forestry in general (2) Improve on past work of individualtree growth model for evenaged shortleaf pine by developing a nonlinear mixedeffects growth model with plotlevel randomeffects that takes into account possible spatially correlated and heterogeneous errors using a sample (calibration) of complete data set, and with minimal explanatory variables that are easy to measure in the field (3) Make further assessments of the model and verification using an independently selected (validation) data set (4) Carry out model checking and diagnostics, and arrive with final estimates from the complete data set (5) Write and present the work with recommendations in a format that is common for the scientific audience in the field of forestry 7 CHAPTER II REVIEW OF LITERATURE Background The general objective of this chapter is to review aspects of forest growth and yield modeling relevant to development of shortleaf pine growth models. Moreover, specific objectives are to review shortleaf pine modeling in general, and mixedeffects modeling in forestry. The following areas will be reviewed, but emphasis will remain in application of mixed models in forestry. • Forest growth and yield models in general • Shortleaf pine growth and yield modeling • Mixedeffects models in general • Application of mixedeffects models in forestry Growth Data In the terminology of Moser and Hall (1969), forestry data can be classified as: (1) real growth series, (2) abstract growth series, and (3) approximated real growth series. When a complete chronological data series from establishment (e.g. planting) to harvest for a tree or stand is available, then it is called a “real growth series.” On the other hand, 8 when data are available from different temporary plots representing different age groups and sites, then the data are called an “abstract growth series.” Ideally, one would prefer data from real growth series, but due to time and expense factors, real growth series data are not common. On the other hand, abstract growth series data may not be of high quality. Therefore, it is common to have an “approximated real growth series.” An approximated real growth series is in between real and abstract growth series in which remeasurements are utilized from several age groups and sites. In this method, data are collected from a wide range (in terms of age and site) of permanent plots, but a particular plot would not have data from a complete life history of trees. Data for this study are of approximated real growth series nature, which will be discussed in Chapter III. Repeated measurements from permanent plots have different sources of variation. The error component in a linear model from such measurements can be regarded to consist of plot, time, and residual random variations (Gregoire 1987). Modeling Biological Growth A model is a description of a system. A system can be defined as any collection of interrelated objects or units in which observations are made. Therefore, a description is a signal that can be interpreted by humans. That is, a system is anything humans wish to understand and models are one tool that facilitates the understanding (Haefner 1996). Mathematical models not only should fit the reallife data well, but also should provide some practically useful interpretation. Irrespective of the nature of the model, whether empirical or theoretical, the modeler should be careful in selecting explanatory variables to provide realistic and robust predictions (Vanclay 1994). Typically, nonlinear 9 regression models of biological growth should have parameters that can be interpreted in terms of biological significance. In growth modeling, one would be mostly interested at the rates of growth over time, and at the time point when growth is stopped. It is also important to be able to understand the process under which different factors (genetic and environmental) influence the growth rate. This information might help in practice to manipulate the growth artificially to some extent by altering the growth rate. As biological growth is achieved due to cell division, it is in theory considered an exponential process (Zeide 1989), which is constrained by several factors. Biological systems inherently have two common components of gain and loss in metabolism commonly called anabolism and catabolism. Such processes have been considered in modeling biological systems, such as early work by Von Bertalanffy (Pienaar and Turnbull 1973). According to Pienaar and Turnbull (1973), an allometric model can be presented as: P = cQa (1) where P = one dimension of an organism, e.g. length of the femur Q = another dimension of the same organism, e.g. width of the skull a = allometric constant, which characterizes the particular kind of organism and environment c = parameter depending on initial conditions 10 This model assumes that the specific growth rate of P is constant proportional to the specific growth rate of Q, i.e. ⎟ ⎟⎠ ⎞ ⎜ ⎜⎝ ⎛ = Qdt a dQ Pdt dP (2) Von Bertalanffy’s model can be written as (Pienaar and Turnbull 1973): Y Y dt dY =α 2 / 3 −γ (3) where Y = size of the organism or population t = time α, γ = constants (α>0, γ>0), and 2/3 is the allometric constant. The ChapmanRichards function is a flexible model which conceptualizes the growth rate of an organism or a population as a resultant of an anabolic growth rate, which is a positive term, and a catabolic growth rate, which is a negative term (Clutter et al. 1983). This function introduces additional parameter in Bertalanffy’s function replacing the 2/3 allometric constant. A standard ChapmanRichards function is written as: Y Y dt dY =α β −γ (4) 11 where Y = size of the organism or population t = time α, β, γ = constants (α>0, 0<β<1, γ>0) This model attempts to explain how an organism or population grows over time in presence of environmental stress or pressure that tends to reduce growth. The ChapmanRichards function is popular in forest growth and yield modeling, which is empirical in nature. The model is derived as a generalization of the Bertalanffy’s growth model (Pienaar and Turnbull 1973, Yang et al. 1978). According to Yang et al. (1978), Weibull function can be modified to model different biological growth processes due to flexible nature of the function. Zeide (1989) described growth equations as a combination of power and exponential functions by taking diameter growth as an example. He compared (tested) five different forms of diameter growth models (ChapmanRichards, Gompertz, Logistic, Power decline and Weibull) in integral form, and found that power decline model, which he derived, performed better in predicting growth for various species and site qualities. He presented an elementary power function called power decline 1 as: at b Ydt dY = − (5) where Y = a measure of plant size 12 t = plant age a, b = constants (a, b > 0) The parameters a and b are interpreted as the initial relative growth rate and the rate of aging, respectively. Many biological systems are modeled using standard regressiontype techniques as described in Draper and Smith (1998), and Neter et al. (1996). Nonlinear regression techniques are more popular than standard linear regression to model reallife biological data (Bates and Watts 1988, Ratkowsky 1983). Interpretability, parsimony and validity beyond the observed range of the data are the main reasons why nonlinear models are often more appropriate than linear models for biological data (Pinheiro and Bates 2000). Haefner (1996) is an example of how complex biological systems can be modeled through computer simulation, instead of using empirical methods. Simulation is a popular method in systems modeling. However, many forest growth and yield modeling exercises are not systems modeling, so statistical models supported by measurements are still popular methods. However, process modeling is also gaining popularity in forestry (Baldwin et al. 2001, Johnsen et al. 2001), especially in organizations where strong multidisciplinary teams of forest biometricians, computer scientists and physiologists exist. There has been considerable work in forest growth and yield modeling including southern pines, but there is much less work in shortleaf pine growth studies despite its relative importance. Murphy (1986) is a good source on shortleaf pine growth studies prior to 1986. Later work will be described in the following sections. 13 Types of Forest Growth Models DensityFree StandLevel Models These models are based on full stocking assumptions. Full stocking, often called “normal”, is the density thought to maximize standing tree volume. However, the concept of normal density is subjective, and it is rarely observed in practice. “Normal” yield tables are developed from temporary plot measurements. For example as cited by Rose (1998), Sylvester (1938) developed yield tables based on data from 240 plots of loblolly pine in Louisiana and Arkansas, and compared the results with the yield tables of Miscellaneous Publication 50 (USDA Forest Service 1929). It was concluded that the yield tables of Miscellaneous Publication 50 were not very correct. But the conclusions drew comments from others such as F.X. Schumacher (1939), again as cited by Rose (1998) that the normality concept was subjective, which led to the discrepancy. The yield tables developed from average density instead of maximum density are called empirical yield tables. Another example of densityfree yield table is the work of Schumacher and Coile (1960), which is quoted by Lynch et al. (1999) for shortleaf pine. Schumacher and Coile (1960) developed “normal” yield tables based on data from 74 temporary plots in the Piedmont region of North Carolina. VariableDensity StandLevel Models In this category of growth and yield models, stand density is used as an explicit explanatory variable. These are considered an improvement over densityfree models, as growth and yield information can be obtained at varying levels of stand density. 14 According to Clutter et al. (1983), Schumacher (1939) developed the following variabledensity model: ln( ) ( ) ( ) 2 3 1 0 1 s V = β + β A− + β f S + β g D (6) where V = per unit area volume/yield, A = stand age, f(S) = function of site index, g(Ds) = function of stand density, and β0, β1, β2, β3 are model parameters This framework can be used to fit a standlevel variabledensity model. Clutter (1963) developed a compatible growth and yield model for loblolly pine. Compatible growth and yield models are developed with mathematical properties that permit yield models to be derived by integration of growth models (Davis et al. 2001). Murphy and Beltz (1981) developed the first variabledensity models for natural evenaged shortleaf pine. They used permanent plot measurements from Arkansas, Louisiana, Oklahoma and Texas. Murphy (1982) used the same set of data and basal area equation to predict sawtimber volume. Lynch et al. (1991) developed variabledensity stand volume equations for natural evenaged shortleaf pine of eastern Oklahoma and western Arkansas. They used the Schumachertype yield model with data from 191 permanent plots. The plots were established by the USDA Forest Service and Oklahoma State University in the period 15 between 1985 and 1987. The models are available for estimating merchantable cubicfoot, sawtimber cubicfoot, and boardfoot volumes per acre for natural evenaged shortleaf pine. The volume equations have the following general form: 1 2 0 V = β Bβ Hβ (7) where V = volume per unit area, B = basal area per unit area, H = average total height of dominant and codominant trees, and β0, β1, β2 are model parameters This equation predicts current volume. Future volume can be predicted by first estimating the future basal area and total height, and then by using the projected basal area and total height in the volume equation. The volume growth prediction is obtained by using the basal area growth projection equation along with the stand volume equation. The future basal area growth per unit area can be projected as a function of stand density and age (Lynch et al. 1991). The following shortleaf pine site index relationship for Ouachita Mountains has been developed by Graney and Burkhart (1973): [ ][1 exp( ( ) )] 4 0 1 2 3 H = a + a SI − − a + a SI × A a (8) 16 where H = average total height (ft) of dominant and codominant trees, SI = site index (ft at base age 50 years), A = stand age (years), and a0, a1, a2, a3, a4 are model parameters Since the parameters in this equation are intrinsically nonlinear, they must be fitted by nonlinear regression methods. A numerical method as explained in Gerald and Wheatley (1994) must be used to find a solution for site index (SI). Similarly, some other site index (height prediction) equations have been suggested for southern pines, including shortleaf, by Farrar (1973). Alternative methods of site index estimation for shortleaf as well as other southern pines are mentioned in Miscellaneous Publication 50 (USDA Forest Service 1929), as cited by Lynch et al. (1999). Zeide (2002) also developed techniques for growth modeling for evenaged stands in which stand density (number of trees per unit area and their average diameter) is explicitly included in the model. The modeling framework combines two sets of processes; one that models the growth of trees and the other that makes adjustments to the growth process as a result of competition among trees. Stand density is an important variable that is part of tree dynamics in a stand, which can be used to manage the stands with a specific objective. 17 Diameter Class Models Diameter class models provide growth and yield information by specified diameter classes, so these are more detailed than wholestand models. Stand volume can be computed by summing the volumes over diameter classes. Diameter class models often use a probability density function to allocate individual trees to diameter classes. The Weibull distribution is the probability distribution most widely used to model diameter distributions, because it is a flexible distribution that can accommodate a range of shapes and scales of distribution. Bailey and Dell (1973) first used the Weibull distribution to model tree diameter distribution. They found it to be very useful in modeling diameter distribution, although other distributions such as beta and exponential functions have also been used. The Weibull probability density can be presented as: ⎥ ⎥⎦ ⎤ ⎢ ⎢⎣ ⎡ ⎟⎠ ⎞ ⎜⎝ ⎛ − − ⎟⎠ ⎞ ⎜⎝ = ⎛ − c− c b X a b X a b f (X ) c exp 1 (a ≤ X < ∞) (9) = 0, otherwise where X = random variable such as diameter at breast height a, b, and c are parameters of the distribution Parameter a is called the location parameter, which is nonnegative (a ≥ 0) for diameter distribution, although it can take any real value theoretically. Parameters b and c are respectively called scale and shape parameters, and must always be positive (b, c > 0). 18 The Weibull is a very flexible probability distribution. Certain other distributions can be represented as special cases of the Weibull. For example, if a=0 and c=1, it is actually an exponential distribution. These results are found in standard statistical references such as Mood et al. (1974) and Wackerly et al. (2002). Use of the Weibull distribution in forestry has been well documented by Clutter et al. (1983) and Bailey and Dell (1973). To use the Weibull distribution for diameter class modeling, the model parameters need to be estimated, which is typically achieved by using maximumlikelihood method. Then the number of trees in a stand can be multiplied by appropriate diameter class probability to assign number of trees in diameter classes accordingly. Smalley and Bailey (1974) used the Weibull distribution to develop a diameter class model for shortleaf pine plantations. They used tree measurements from 104 shortleaf pine plantation plots in Tennessee, Alabama and Georgia. IndividualTree Models Models which use individual trees as the basic unit to predict stand growth and yield are called individualtree models. Such models can be either distanceindependent or distancedependent, depending on whether or not the spatial arrangement of individual trees is taken into account. Distanceindependent models predict tree growth either individually or by size classes. Growth prediction is made as a function of present size and standlevel variables such as stand age, site index, and stand basal area (Avery and Burkhart 2002). Individual tree data are aggregated after the model grows each tree under individualtree model, whereas the stand level model aggregates individual tree data into 19 stand variables before applying the growth model (Davis et al. 2001). So growth rate estimates are on an individual tree basis and stand level basis, respectively. That is, no individual tree growth estimate can be made from a stand level model, whereas stand level growth can be estimated from an individualtree level model. Therefore, diameter class and stand level information can be derived from individualtree models. DistanceDependent IndividualTree Model This model takes the spatial relationship between trees into account using a coordinate system. Initial stand conditions are either userspecified or computergenerated, and the growth of each tree is simulated as a function of its attributes, the site quality, and a measure of competition from neighbors as per the assigned tree locations (coordinate system). Yield is estimated by summing the individualtree values (either from tree volume or a taper equation), and by adjusting for some expansion factors (Avery and Burkhart 2002). This modeling approach requires extensive computer programming. PTAEDA2 is an example of distancedependent individualtree model developed for loblolly pine. This simulation system has two main subsystems: (1) one dealing with the generation of an initial precompetitive stand, and (2) another with the growth and dynamics of that stand. Various management scenarios for competition, fertilization and thinning options have been incorporated in the system (Burkhart et al. 1987, Avery and Burkhart 2002). Different diameter growth and height growth models were evaluated by Martin and Ek (1984) for red pine plantations. They observed some benefit of incorporating a competition index in the model for diameter growth model, but not in 20 height growth model. Similarly, Tome and Burkhart (1989) compared different measures of competition for Eucalyptus plantations under a spacing study in Portugal. Spurr (1962) developed a method of quantifying competition as point density by using a sampling angle gauge (prism). Another measure of point density as the potentially available area was developed by Brown (1965), as cited by Rose (1998). Crown competition factor (CCF) as defined by Krajicek et al. (1961) is a popular competition index (Avery and Burkhart 2002). Bullock and Burkhart (2005) evaluated spatial dependence in juvenile loblolly pine plantations using a simultaneous autoregressive model, and they found a significant dependence among neighboring stems. Distancedependent models are based on the concept of spatial variation. The literature of spatial statistics is huge (e.g. Cressie 1993, Ripley 1981, Webster and Oliver 1990). Spatial statistical methods are popular in geography, geology and ecology, but have started to be used in other fields. Units that are physically close tend to be similar, and correlation between the units decreases as the distance between the units increases. However, the plants or animals too close together might also compete for resources. For example, Mead (1967) has developed a mathematical model for estimating interplant competition. This kind of interplant competition might lead more dissimilar plants that are physically close. Therefore, the utilization of concept of spatial variability or competition measures depends on purpose and the scale of consideration. In summary, distancedependent models are more expensive and computationally intensive than distanceindependent models. Distanceindependent models are popular among forest professionals (Tome and Burkhart 1989, Davis et al. 2001). 21 DistanceIndependent IndividualTree Model Distanceindependent models do not require data on spatial location for each tree. That is, they ignore actual spatial distribution of trees to estimate individual tree growth, and assume that the trees are uniformly distributed over space. Typically, a distanceindependent model consists of the following components (Avery and Burkhart 2002): (1) a model for tree diameter (or basal area) growth (2) a model for tree height growth (or heightdiameter relationship to predict height as diameter is easily measured) (3) a tree mortality estimate Combining the three model components into a simulation system would provide different output scenarios with different input or management variables such as thinning/density regimes and site quality. These outputs might be useful to a forest manager in decision making. Distanceindependent models have comparatively less data demand than distancedependent models. In distanceindependent modeling, measures of competition often relate individual tree size to average tree size or to the size distribution. Two most commonly used competition measures are: (1) ratio of the quadratic mean diameter to diameter of an individual tree at breast height, and (2) cumulative basal area of trees larger than the subject tree. Basal area growth is a common response variable modeled in practice, because it is less sensitive to competition indices than diameter growth (Bella 1971). West (1980) compared different models using both basal area growth and diameter growth as response variables, and found that basal area growth was more 22 strongly related with explanatory variables than diameter growth. Depending on the type of model, different stand level as well as individualtree level variables can be used as explanatory variables. For example, Lynch et al. (1999) used two stand level variables (stand age and stand basal area), and two tree level variables (ratio of quadratic mean diameter to DBH of individual tree and tree basal area) to model annual basal area growth of natural evenaged shortleaf pine in Oklahoma and Arkansas. Distanceindependent models can be either (1) composite models of tree growth as a function of tree measurements, site properties and stand characteristics or (2) potentialmodifier growth functions. The potentialmodifier function models tree growth as a theoretically possible maximum growth, which is expressed as a function of tree characteristics. The potential is then adjusted for the modifier, which is expressed as a function of a stand and tree characteristics including competition (Rose 1998). The Forest Vegetation Simulator (FVS), previously called PROGNOSIS, is a simulation framework for composite distanceindependent modeling (Davis et al. 2001, see also www.fs.fed.us/fmsc/fvs). The composite growth model is appropriate when it is difficult to obtain dominantage and site relationships for mixedspecies stands. In such cases, a potentialmodifier growth function is difficult to apply (Wykoff 1990). FVS can be used to simulate growth for either basal area or diameter from such models, and modeling could be either composite or potentialmodifier type in nature. The following composite model was fitted by Wykoff (1990) to inventory and regeneration study data sets for 11 conifer species in the northern Rocky Mountains. b SL b EL b EL b CR b CR b BL dbh b CCF dds b b dbh b dbh b SL ASP b SL ASP b SL 11 12 2 9 10 2 7 8 2 6 3 4 5 2 0 1 2 / ln( 1) ln( ) ln( ) [cos( )] [sin( )] + + + + + + + + = + + + + + (10) 23 where ln(dds) = natural logarithm of 10year periodic change in squared diameter (inches) dbh = tree diameter outside bark at breast height (in.) BL = total basal area in trees with larger dbh than the subject tree (ft2/ac) CR = ratio of live crown length to total tree height CCF = crown competition factor SL = average slope percent for the stand ASP = average aspect for the stand (radians) EL = average elevation for the stand (100 ft) bi = estimated regression coefficients; b0 dependent on habitat type and location, b2 dependent on location, and b12 dependent on habitat type This model incorporated the site characteristics (EL, SL and ASP for each combination of habitat and forest type), competition indices (CR, CCF and BL) and tree diameter. The model is specific to a habitat and forest type (location) as quantified by a separate intercept for each combination of habitat and forest type. The model was also validated using data based on permanentplot measurements. On the other hand, a potentialmodifier function is suitable when we can easily obtain dominantage and site relationships, for example evenaged stand of a single species, in which siteindex is often used. This is one of the reasons why evenaged natural shortleaf pine has previously been modeled based on this approach. There are two components in this framework, a potential and a modifier. The potential quantifies the 24 theoretical maximum growth (diameter or basal area) of a tree which is grown without any competition for resources, e.g. an opengrown tree. The modifier component makes adjustment to the potential in light of the competition factors that exist in the field, e.g. competition for light based on crown size, spacing etc. Since a potentialmodifier model has two components, parameter estimation can be achieved in one of the two ways: (1) fit potential function and then estimate modifier by fixing the potential to a constant, or (2) fit both potential and modifier functions simultaneously. The second approach may be a better one as its estimation procedure considers both the functions together, which is what happens in tree growth. Tree growth is dynamic interplay between a potential to grow and slowing down of the growth due to environmental restrictions. Shifley and Brand (1984) provide a good example of how a potential function can be constrained to follow a biological principle. They used a modified ChapmanRichards function (Pienaar and Turnbull 1973) to restrict the tree growth to a biologically possible total size. When a standard ChapmanRichards equation is set to zero, and solved for organism/tree size (to find the maximum as the ChapmanRichards equation is the first derivative of size with respect to time), the resulting tree size would be a maximum biologically potential size (Shifley and Brand 1984). Potential/Asymptote (A) = β λ α − ⎟⎠ ⎜ ⎞ ⎝ ⎛ 1 1 (11) 25 Any measure of tree size, such as basal area, could be used. The derivative form of ChapmanRichards function presents the variable in growth form, so if tree size is represented by basal area, then the response variable would be basal area growth. The modified ChapmanRichards model obtained by Shifley and Brand (1984) was of the following form: =αY β −αY / A(1−β ) dt dY (12) One needs to insert an appropriate value for A (e.g. maximum possible tree basal area) to revise the estimates of α and β. According to Shifley and Brand (1984), such a value for A can be obtained from the National Register of Big Trees (American Forestry Association 1982 as cited by Rose 1998). Murphy and Shelton (1996) also used a ChapmanRichards function for potential of the form Potential = [1 exp( )] 1 2 t β − β B (13) where Bt = average tree basal area during the growth period βi = parameters, i = 1,2 They experienced a severe convergence problem with the nonlinear leastsquares estimation. Some other examples of potential functions are found in Amateis et al. (1989) and Smith et al. (1992). Amateis et al. (1989) proposed the idea of estimating potential 26 growth from open grown trees without any competition effects. This idea motivated Smith et al. (1992) to estimate maximum potential growth for shortleaf, longleaf and loblolly pines. A modifier function takes the growing conditions into account to adjust the potential to estimate net growth. Several examples of modifier functions are available in forestry literature. One that is most relevant to shortleaf pine growth modeling is that of Hitch (1994). He considered two modifier functions: (1) A variant of Shifley’s (1987) functions for TWIGS and STEMS for the Central States, and (2) A modified logistic function developed by Murphy and Shelton (1996). The logistic function is a useful modifier, since it is restricted between 0 and 1, and can include several independent variables. The logistic modifier has a form: Modifier = 1 exp( ) 1 3 1 4 s 5 q + β B +β B +β D (14) where B1 = basal area in trees larger than or equal to the diameter of the subject tree Bs = stand basal area Dq = quadratic mean diameter βi = parameters, i = 3,4,5 One advantage of the logistic modifier function is that we can add more explanatory variables to improve the fit index of a model, but still it is within the range of 0 and 1. So a complete growth model would take the form 27 Predicted growth = Potential growth × Modifier That is, the Murphy and Shelton (1996) model would be: [ ] 1 exp( ) 1 exp( ) 3 1 4 5 1 2 s q t B B D B β β β β β + + + − (15) The model terms are defined as before. They used this equation to fit an individualtree basal area growth model for loblolly pine A distanceindependent individualtree model for shortleaf pine for the Ouachita Mountains was first developed by Hitch (1994) using the potentialmodifier function approach. He used a data set for natural evenaged shortleaf pine in Oklahoma and Arkansas from a cooperative study between OSU Department of Forestry and USDA Forest Service. Bitoki et al. (1997) used data from a study for unevenaged continuous forest inventory (CFI) plots for shortleaf pine from the same Oklahoma and Arkansas region to develop a basal area growth model. They also used a potentialmodifier function approach. They first estimated the potential growth separately by excluding a parameter from Hitch’s (1994) equation. Then the parameters in the modifier function were estimated by keeping the potential constant, so all parameters were not estimated simultaneously. A simultaneous estimation technique for estimating the parameters in a potentialmodifier form has been used by Murphy and Shelton (1996) for loblolly pine. Hasenauer et al. (1998) used simultaneous regression methods to analyze individual tree data for Norway spruce from Australia. They found strong crossequation 28 correlations, and the simultaneous estimation method was more efficient than separate estimation by ordinary least squares. Similar estimation method in a nonlinear system was examined for white spruce data by Huang and Titus (1999). An example of simultaneous estimation for heightdiameter model fitting is Omule and MacDonald (1991). Rose (1998), and Rose and Lynch (2001) used a seemingly unrelated regression technique to fit a basal area growth model for shortleaf pine as the technique would help model correlated errors. Lynch and Murphy (1995) also used seemingly unrelated regression to fit a heightdiameter model for natural evenaged shortleaf pine. Lynch and Murphy (1995) used the following general framework for modeling height growth in natural, evenaged shortleaf pine stands: ˆ ( , ˆ , , , ,β ) i i Di i Di Li h = f A H D S D (16) where i hˆ = predicted individual tree height at time i i A = stand age at time i Di Hˆ = predicted average height of dominants and codominants at time i (could be obtained from site index curves in evenaged stands) i D = individual tree dbh at time i Li D = dbh distribution location identifier at time i such as maximum dbh or quadratic mean dbh Di S = an expression of stand density at time i β = vector of parameters 29 Lynch and Murphy (1995) also provide a comprehensive review of models relating tree height to DBH and age (or time). The Hitch’s (1994) model was later slightly modified by Lynch et al. (1999) by including additional data available from later plot measurements. They developed a complete growth and yield prediction system including models for basal area growth, heightdiameter relationship, crown characteristics and mortality functions. The estimates are now being used in the shortleaf pine stand simulator, SLPSS (Huebschmann et al. 1998). They fitted the following model: ( ) i s i i b i M b i i b b B b A b R b B b B b B B G +ε + + + + + − = − 1 exp( ) / 3 4 5 6 7 1 1 1 2 2 (17) where Gi = annual basal area growth (ft2) of tree i Bi = basal area (ft2) of tree i A = stand age (years) Ri = ratio of quadratic mean stand diameter to the dbh of tree i Bs = stand basal area (ft2/ac) BM = 7.068384 ft2 (the maximum expected basal area for a shortleaf pine in managed stands) b1, b2, ….., b7 = parameters i ε = random error 30 Lynch et al. (1999) used the following nonlinear model for individual tree height given by Lynch and Murphy (1995): i ( D ) ( i ) i H − = β H − β1 −β D −β 3 +ε 0 2 ( 4.5) 4.5 exp (18) where i H = total height (ft) of tree i i D = dbh (in) of tree i D H = dominant height as per Graney and Burkhart (1973) 0 β , 1 β , 2 β , 3 β = model parameters i ε = random error Beaumont et al. (1999) evaluated the generalized method of moments for modeling dominant height through site index in black spruce. They found that the new method of simultaneous estimation to be better than traditional twoequation approach. In the traditional method, site index is first estimated as the average height of dominant trees at base age, and then dominant height is predicted as a function of age and site index as done by Graney and Burkhart (1973). UnevenAged Stand Modeling Several basic ideas of evenaged stand modeling are relevant to unevenaged stand modeling also. Furthermore, this dissertation deals with data from evenaged shortleaf stands only. Therefore, only a brief review indicating the differences is presented. 31 The most important difference in unevenaged stands modeling is that the concept of site index and stand age cannot be easily used since there are different ageclasses in these stands. And at least in principle, modeling techniques for unevenaged stands can also be applied to evenaged stands (Avery and Burkhart 2002). Key ideas on unevenaged stands modeling are presented in a classic work by Moser and Hall (1969), and Moser (1972). They developed a modeling framework in which yield is expressed as a differential function of elapsed time from a given initial condition. A Markov Chain approach for predicting diameter distributions in unevenaged stands is provided by Bruner and Moser (1973). In such a framework, future diameter distributions, number of surviving trees, number of dead trees, and number of harvested trees are predicted using inventory data under a Markov process, which is a stochastic process for modeling an uncertain event. Lynch and Moser (1986) developed a technique of predicting stand tables for two species groups based on a differential equations approach. A matrix model approach to modeling unevenaged forest management has been reported by Buongiorno and Michie (1980). In this approach, a matrix model of forest growth similar to the Markov model of Brunner and Moser (1973) is combined with linear programming techniques to answer some economic questions. Following the idea of Buongiorno and Michie (1980), Schulte and Buongiorno (2004) developed a growth and yield model for naturallyregenerated shortleaf pine forests including hardwoods from southern US. Their work was based on repeated measurements on forestry inventory and analysis plots. According to Smith (1986), industrial forestry organizations and USDA Forest Service began a basic shift from uneven to evenaged stands in about 1970. Many 32 commercial activities involve evenaged stand management, but some argue against the ideas of evenaged management because of ecological and environmental reasons. Murphy and Farrar (1988) developed a standlevel growth and yield model for unevenaged loblollyshortleaf pine using inventory data. Guldin and Baker (1988) carried out yield comparisons for both evenaged and unevenaged loblollyshortleaf pine stands. They found that total merchantable cubicfoot yields were highest for conventionally managed evenaged plantations. AttaBoateng and Moser (2000) developed a compatible growth and yield modeling system for managed mixed forests of the tropical rain forest region. An individualtree growth and yield model for unevenaged shortleaf pine stands was developed by Huebschmann et al. (2000). Forest Growth and Yield Simulation Systems Different models/systems have been reported in the literature for simulating forest growth and yield. The USDA Forest Service has developed a comprehensive system of vegetation simulation called Forest Vegetation Simulator (FVS). The FVS system simulates vegetation change based on forest inventory data or stand examination data about the forest, stand, and trees (USDA 2002). A brief summary of some simulation systems is presented in Table 2.1. 33 Table 2.1. Some selected individual tree growth and yield simulation models/systems available for the United States (Adapted from Davis et al. 2001) Simulation System Description The Forest Vegetation Simulator (FVS) PTAEDA2 ORGANON CRYPTOS, CACTOS SLPSS Previously called PROGNOSIS, developed by USDA Forest Service for nationally supported framework for forest growth and yield modeling, more than 20 variants nationwide, many models, developed by universities and research institutions, such as TWIGS, STEMS, or GENGEM are now part of FVS (www.fs.fed.us/fmsc/fvs) Distancedependent system capable of modeling the growth of loblolly pine plantations, has also been linked with MAESTRO, a process model, developed and supported by Virginia Tech Developed for western Oregon conifer and hardwood types, developed through a public and industry cooperative and is supported by Oregon State University CRYPTOS developed for the California coastal redwood and fir forests, and CACTOS covers the California mixed conifer forests, both developed through a public and industry cooperative and are supported by University of California Berkeley Shortleaf Pine Stand Simulator, which simulates the growth of natural shortleaf pine stands in western Arkansas and eastern Oklahoma, developed from a cooperative study by Oklahoma State University (OSU), the US Forest Service Southern Station, and the Ouachita and Ozark National Forests, supported by the Department of Forestry at OSU. The following sections present a review of mixed modeling techniques that form the foundation of this dissertation work. 34 Mixed Models Fixed vs. RandomEffects When we are interested only in the levels or classes observed in the study sample, then the estimates are fixedeffects. On the other hand, if the classes or levels observed are a sample from a population, and we are not primarily interested in the particular levels observed, then this prompts the randomeffects approach. In this approach, we are interested in the variability pattern in the population itself from which the sample was observed (Snedecor and Cochran 1980, Laird and Ware 1982). For example, when a forest growth model is developed from a fixedeffects approach for the stands observed only, then the model is fixedeffects model, and the results are applicable only to the stands observed in the sample. If the forest growth model needs to be generalized to a forest from which a sample of stands was observed, plot randomeffects should be considered. Basically, we are not interested in the particular stands. Therefore, a growth model in which stands/plots have randomeffects would be desirable. However, many traditional/regular forest growth models have been based on a fixedeffects approach. Observations classified/grouped/clustered by a certain categorical variable are often analyzed by using analysis of variance (ANOVA) and analysis of covariance (ANCOVA) techniques. However, these techniques are relevant only when the number of categories is relatively small, and we are interested in comparisons of the observed categories only. This represents a case of fixedeffects. In many cases, we are interested in considering the categories as a representative random sample from a population. A fixedeffects analysis would not be appropriate in such cases, instead a randomeffects 35 approach should be used that estimates the variance components for different levels of a population (Longford 1993). In random coefficient models, the coefficients for each class (intercept and slope for each combination of different levels) quantify deviation from a defined population regression model (Tao 2002). A mixed model is an extension of a randomcoefficient regression model in which fixedeffect parameters are also included. In other words, when a model has both fixed and randomeffects, then the model is called a mixed model. A mixed model typically has more than one error level, so such a model is also called a multilevel model. If classification factors form a hierarchy (nested structure), then the model is also called a hierarchical model. Gumpertz and Pantula (1989) provide a simple way of making inference based on mean of the individual coefficients in random coefficient regression models. They interestingly contrast the longitudinal data in biological science studies with those of economic studies data. Their appreciation of agricultural and biomedical studies with a small number of repeated measures on large number of experimental units/subjects is important in the context of our project. This is different in nature compared to economic and meteorological data in which there are often multiple measurements/observations for a long period of time (time series) on relatively small number of subjects. The mathematical details for mixedmodeling will be addressed in Chapter IV. Longitudinal Data/Repeated Measures Data resulting from repeated observations/measurements on the same unit/subject over time are often called longitudinal data. Such data are common in practice, especially 36 in medicine, biology and economics (Diggle et al. 1994, Gregoire et al. 1995, Davis 2002). Mixed models are an increasingly popular method for analysis of longitudinal data because of their flexibility in handling different data structures (irregularity and unbalancedness) and in satisfying model assumptions. There are other alternative analysis approaches such as a multivariate ANOVA (MANOVA), which are not as flexible and powerful as the mixed modeling approach. MANOVA can not handle irregular and unbalanced data properly, and unbalanced data are very common in practice. Another alternative is to follow a splitplot in time approach (Kuehl 2000). The splitplot approach to repeated measures analysis would assume that observations/treatments within a unit are randomized. That is, different observations over time within the same subject/unit would be assumed random, which is not a very realistic assumption (Diggle et al. 1994). Instead, we would prefer to model withinunit correlation (serial correlation in repeated measurements, and spatial correlation among trees within a plot). Therefore a model that recognizes a proper covariance structure for errors would be a better approach to analyze such data (Gregoire et al. 1995). Laird and Ware (1982) and Ware (1985) developed a randomeffects model approach for analyzing longitudinal data that would model serial correlation on the same subject. A twolevel model was developed in which grouping of observations would be made by subjects. Such models could be fitted using either maximum likelihood or residual maximum likelihood (empirical Bayes) methods. Linear mixedeffects models for longitudinal data are described by Chi and Reinsel (1989), and Verbeke and Molenberghs (2000). They assessed maximum likelihood and restricted maximum likelihood methods of estimation. They also used a firstorder autoregressive model to 37 account for withinindividual errors resulting from longitudinal measurements on the same individuals. Crowder and Hand (1990) is a standard reference for repeated measures analysis, including responses of categorical nature. They deal with standard splitplot analysis, MANOVA and antedependence analysis to account for consecutive measurements on the same unit. Diggle et al. (1994) described principles and methods for analysis of longitudinal data. They dealt with both quantitative as well as qualitative data types, and provided a significant amount of information on randomeffects models. Major theoretical and computational contributions to the nonlinear mixedmodeling approach were made by Vonesh and Carter (1992), and Davidian and Giltinan (1995). Similarly, researchers such as Lindstrom and Bates (1990), and Pinheiro and Bates (2000) developed algorithms and software that allowed practical usage of the models. Reiczigel (1999) reviewed methods for analyzing repeated measurements from designed experiments in which the main objective is to compare treatments as precisely as possible. VanLeeuwen et al. (1996) presented a mixed model that incorporated random trends through time, and also allowed correlations among observations at the same time. Pinheiro and Bates (2000) developed the nlme library in SPlus that has made a substantial impact, especially on use of nonlinear mixed models in practice. They followed the approach of Laird and Ware (1982) to develop these computational tools. SAS PROC NLMIXED has been considered limited for nonlinear mixed modeling (Tao 2002, Littell et al. 1996), although a randomeffect with limited options for covariance structure can be used. Littell (2002) compared analysis of variance (ANOVA) versus residual maximum likelihood (REML)/generalized least squares (GLS) methods for 38 analyzing unbalanced mixed model data, and found that ANOVA was popular method prior to the early 1990s because of its simplicity and lack of easily available computing tools for likelihood based methods. However, more powerful likelihood based methods such as REML and GLS have started gaining popularity due to the increasing availability of software since the early 1990s. Davidian and Giltinan (2003) provide an overview of nonlinear models for repeated measurements data. Correlated and/or Heterogeneous Errors Correlation among successive observations over time on the same unit is discussed in a framework of time series analysis by Diggle (1990). Similarly, spatial correlation among observations is dealt with by Cressie (1993), Ripley (1981), and Webster and Oliver (1990). However, these references do not work in the context of mixed modeling. Both inter and intraindividual variations in nonlinear mixed modeling for repeatedly measured observations are discussed and reviewed by Davidian and Giltinan (1995, 2003). Wolfinger (1996) reviewed and evaluated different covariance structures for modeling heterogeneity in repeated measures data. The variancecovariance structures included heterogeneous versions of the compound symmetry and firstorder autoregressive structures, the HuynhFeldt structure, the independentincrements structure, correlated random coefficients models, the firstorder antedependence model, and a simplified factoranalytic construction. Use of an appropriate variancecovariance structure would avoid data transformation allowing parameter interpretability and more accurate inference. Lin et al. (1997) discuss linear mixed models with heterogeneous 39 withincluster variances. Methods are shown to predict clusterspecific random effects variances. Application of Mixed Models in Forestry Ferguson and Leech (1978) and West et al. (1984) were among the earlier workers to recognize the problem of temporal correlation in forestry data due to multiple measurements from individual sampling units. They mentioned the difficulty in regression analysis with regular assumptions for such data, and have discussed and recommended alternative approaches for data analysis. They discussed the availability of statistical theory to solve these problems. However, these discussions were mostly from a designed experiments perspective, so their direct application was not appropriate to many forest growth modeling datasets. Ferguson and Leech (1978) developed twostage yield models. In the first stage, they used ordinary least squares, whereas a generalized least squares method was used in the second stage to improve the estimation since the errors in the second stage violated the assumptions of ordinary least squares. West et al. (1984) also found a twostage regression model to be the most suitable method to analyze repeated measurements data common in forestry problems. They demonstrated the problem with a study on Eucalyptus in which plot was considered a sampling unit. In this study, a sample of trees was selected from each plot, and measurements such as diameter at breast height were taken. With motivation from the work of Ferguson and Leech (1978), Gregoire (1987) evaluated four different covariance structures for basal area yield models. He compared 40 the following covariance structures: (a) uncorrelated and homoscedastic plot and time effects; (b) autoregressive time effects; (c) uncorrelated and heteroscedastic plot effects and autoregressive time effects; and (d) correlated and heteroscedastic plot effects and autoregressive time effects. It was found that method of ordinary least squares was nearly always best; so further work was recommended. A random stand and tree parameters approach was used to model height in slash pine (Pinus eilliottii Engelm.) by Lappi and Bailey (1988). This was presented as a good alternative to the conventional site index approach to height prediction. They fitted the following model: hki(t) = μ(t) + bk(t) + eki(t) (19) where hki(t) = dominant height for tree i in stand k at age t μ(t) = population mean height at age t over all stands bk(t) = random stand effect at age t eki(t) = random tree effect and is uncorrelated with bk(t) They used the well known Richards’ (1959) equation to model the mean height, and then estimated the parameters of the covariance structure using observed residuals as if they were the true residuals. Gregoire et al. (1995) is an excellent reference on mixed modeling issues in forestry. They deal mostly with linear mixed models, but discuss the issues of importance of nonlinear mixed models in forestry research. They have fitted standlevel mixed 41 effects models with two data sets consisting of eastern white pine and Douglasfir with a mixed model approach having plot randomeffects. Due to the availability of repeated measurements, randomeffects for stands were fitted even though the response was a stand level variable. Errors that are temporally and/or spatially correlated have also been modeled. This project will follow basic ideas from this publication. Lack of easily available and userfriendly software was mentioned by Gregoire et al. (1995) to be an important reason that there was still not much application of nonlinear mixedeffects models in forestry. Gregoire and Schabenberger (1996a) used a nonlinear mixedeffects approach for modeling individualtree cumulative bole volume of sweetgum from east Texas. They later modeled cumulative bole volume by taking spatial correlation between sections of a bole into account (Gregoire and Schabenberger 1996b). The objective was to express cumulative bole volume as a smooth curve while allowing for fluctuations within and among trees as much as possible, yet allowing the curve’s applicability to all trees of the species with similar morphological characteristics. They used a nonlinear mixedeffects model by comparing restricted maximum likelihood (REML) and generalized estimating equations (GEE), and found that GEE to be simpler (computationally less intensive) than REML. REML is a normaldistribution based estimation method, whereas GEE is a semiparametric approach for multiple measurements per subject (repeated measurements). They used SAS PROC IML and MACRO for the modeling work. Candy (1997) estimated parameters in a sigmoidal forest yield model using composite link functions with random plot effects in a generalized linear mixedeffects 42 model framework. A generalized linear mixedeffects model can be considered as a special case of nonlinear mixedeffects models (Tao 2002). Lappi (1997) analyzed two jack pine data sets, one from plantations and the other from naturally regenerated stands. Both the data sets were longitudinal in nature, and they were analyzed using the same model structure. The height/diameter curve parameters were partitioned into an agedependent trend (population mean), a random stand effect, and a random time effect. A good practical use of such a model would be calibration with which the random stand and time effects could be predicted given some additional measurements without needing to make detailed observations as would normally be done for new stands. Tasissa and Burkhart (1998) used stem analysis data from permanent plots for loblolly pine to evaluate thinning effects on form exponent, a measure of stem form. They modeled form exponent in response to thinning and by accounting for correlation among withintree observations with the firstorder autoregressive method. Apiolaza et al. (2000) used variance components estimation to analyze genetic data for Pinus radiata from a progeny test. Fang and Bailey (2001) modeled dominant height growth of slash pine from a study with several silvicultural treatments installed in Georgia and north Florida. They reparameterized the threeparameter Richards model to predict dominant height growth in presence of silvicultural treatments, such as chopping, fertilization and burning, using a nonlinear mixedeffects model approach. The study design was a splitplot, with soil type as mainplot factor and silvicultural treatments as subplot factor. They claim that this would be a better approach to predict dominant height growth rather than using the site index approach. Fang et al. (2001) used a simultaneous equation system to develop stand 43 level mixed models for growth and yield of slash pine with two components: basal area model and total volume model. Another important example of mixed modeling in forestry is Hall and Bailey (2001). They modeled forest growth variables using multilevel nonlinear mixed models with data from a loblolly pine spacing study in Georgia. They used a linearization approach of estimating equation rather than maximum likelihood or restricted maximum likelihood method. They included randomeffects for both plots and trees. They found that the multilevel nonlinear mixed model approach provided several advantages over traditional forest growth modeling methods. Guilley et al. (2004) modeled averaged ring density with individual tree random effects for sessile oak in France. It was concluded that there was no evident effect on wood density due to changing environment and forest management type when ring width and cambial age were constant. Garber and Maguire (2003), and Leites and Robinson (2004) have applied the mixed modeling method to develop taper equations. A nonlinear mixed model with autoregressive error structures was used to model stem taper of ponderosa pine (Pinus ponderosa Dougl. ex Laws.), lodgepole pine (Pinus contorta Dougl. ex Loud.), and grand fir (Abies grandis Dougl. ex D. Don) by Garber and Maguire (2003). Leites and Robinson (2004) improved the Max and Burkhart’s (1976) taper equation with crown dimensions and individual tree randomeffects for loblolly pine. Mehtätalo (2004) used a mixed model with height and diameter data of longitudinal nature for Norway spruce (Picea abies (L.) Karst.). The Korf growth curve was used as a basic growth function for the relationship. The model could be calibrated for a new stand with substantially less observations than one would require for fitting a 44 completely new model. It was concluded that the growth pattern of a stand was dependent on mean tree size in the stand but not on stand age. This may or may not be true in other species. Zhang and Borders (2004) used a mixedmodel to estimate tree component biomass for managed loblolly pines with an allometric approach. They found that the percent of stem biomass increased with age while the opposite was the case for foliage and branches. It was also found that cultural treatments affected the proportional allocation among various tree compartments. Calama and Montero (2004) used a mixed model approach to model individualtree heightdiameter relationship for stone pine (Pinus pinea L.) in Spain. Lynch et al. (2005) used a randomparameter (mixed model) approach to analyze heightDBH data for cherrybark oak (Quercus pagoda Raf.) from east Texas. They fitted a similar model as reported by Lappi (1991). That is, natural logarithm of difference between total height and breast height was modeled. They fitted a linear model with randomeffects for stands, which could be used for calibration with minimum observations from a new stand. The fitted model is as follows: ki ki k k ki ki H − bh = + D− + + D−0.9 + e 0 1 0.9 0 1 ln( ) β β α α (20) where ki H = total height of tree i in stand k bh = breast height ki D = dbh of tree i in stand k 0 β , 1 β = fixed effect parameters 45 0k α , k 1 α = random effect parameters with 0 mean for stand k ki e = random residual error for tree i in stand k They also present an example of how this model could be calibrated for new stands. Uzoh and Oliver (2006) used a composite approach (as described by Wykoff 1990) for height increment modeling for managed evenaged stands of ponderosa pine (Pinus ponderosa Dougl.). They used permanentplot measurements from the western US. Random effects for locations, plots and trees were used in the model, and an autoregressive covariance structure was used to model the repeated measurements. They used a linear form of mixed model after transforming the periodic annual height increment to a logarithmic scale. They selected the following model: E[ln(PAIH)] = b0 + b1 ln(DBH) + b2(DBH)2 + b3SIM + b4SL[cos(ASP)] + b5ELEVA+ b6SDI + b7BAL + ( ) ( ) ˆ l j l ik jl h + e + e (21) where E[ln(PAIH)] = expectation of the natural logarithm of 5year periodic annual height increment (m) DBH = initial diameter at breast height (cm) SIM = Meyer’s site index (m) SL = average slope for the stand (%) ASP = average aspect for the stand (radians) ELEVA = elevation for the stand (m) SDI = stand density index (trees/ha) 46 BAL = basal area in larger trees (m2/ha) divided by dbh of subject tree l hˆ = fixedeffect of the lth location j(l ) e = random error for plot j within location l (assumed to have mean 0 and variance 2 l σ ) ik ( jl ) e = random error for measurement k on tree i within plot j and location l (assumed to have mean 0 and varianceσ 2 ) bi = parameter estimates They found that the height increment model displayed a unimodal and positively skewed shape for tree growth process, which was as per expectation. Site index (SIM) was found to have more effect on height growth than other variables. Vázquez and Pereira (2005) used a linear mixed model to explain variation in cork weight of cork oak (Quercus suber L.) from Portugal. They decomposed the random effects in regional, plot and treelevel that would allow better estimates for fixed effects. Budhathoki et al. (2005) also used a linear mixed model approach for preliminary analysis of basal area growth of shortleaf pine using a portion of data reported by Lynch et al. (1999). The natural logarithm of annual basal area growth was modeled with a dataset consisting of plot measurements at three points in time. In the analysis, they used a compound symmetry covariance structure and plot randomeffects. Their fixedeffects parameter estimates were similar in magnitude to those reported by Lynch et al. (1999), but the standard errors were reduced as a result of using mixed model. The linear mixed model approach discussed above for Lynch et al. (2005), Uzoh and Oliver (2005), Budhathoki et al. (2005), and others was partly for computational 47 convenience that would be available in SAS PROC MIXED, although the inherent nature of such responses would be nonlinear. With the increasing access to userfriendly software such as SPlus nlme library (Pinheiro and Bates 2000), nonlinear mixedmodeling is gaining popularity. Hall and Bailey (2001) and Fang and Bailey (2001) as indicated above also used nonlinear mixed modeling. A nonlinear mixedeffects model was developed for height growth for Eucalyptus plantations in Brazil by Calegario et al. (2005). They used the nlme library available in SPlus to model the dominant height. They modeled the dominant height as a logistic function of age and with plot randomeffects. The fitted logistic function was: [ ] ij ij i i i ij a HD ε φ φ φ + + − − = 2 3 1 1 exp ( ) / (22) where HDij = dominant height (m) for ith plot on time j aij = age (years) for ith plot on time j εij = random error; εij ~ N (0, σ2) Фi = ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ i i i 3 2 1 φ φ φ = ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ 3 2 1 β β β + ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ i i i b b b 3 2 1 = β + bi (23) bi ~ N (0, ψ) 48 They used repeated measurements from 115 permanent plots measured three to 10 times between 1992 and 2001. The three parameter logistic function along with plot randomeffect was found to fit the data well. A multilevel nonlinear mixedeffects model was fitted for modeling stand volume growth on loblolly pine, and for modeling four silvicultural treatments by Zhao et al. (2005). They found that vegetation control and fertilization resulted in the largest volume growth. The nonlinear mixed model approach was reported to be useful to handle the unbalanced and incomplete repeated measurements. A similar methodology was used by Jordan et al. (2005) to analyze earlywood and latewood microfibril angle data on loblolly pine. Rose et al. (2006) used a multilevel approach to estimate individual tree survival on loblolly pine using complementary loglog link function. They estimated randomeffects at both plot and tree level, and compared effects of four cultural treatments on tree survival. Overall, if we fit a model for each plot separately, it would often result in overparameterization, whereas ignoring the grouping by plot we would lose much of the variability. A mixed model provides a balance between these two extremes (Pinheiro and Bates 2000, Calegario et al. 2005). This review shows that mixed models have been increasingly used in forest growth and yield modeling. However, no peerreviewed published work on shortleaf pine growth modeling applying mixed models has yet been done. This provides an opportunity to apply this new technique in shortleaf pine, and to determine whether there is an advantage over the modeling techniques that are currently in use. 49 Estimation/Computational Methods Corbeil and Searle (1976) describe the foundation of estimation with restricted or residual maximum likelihood (REML) procedure for mixed modeling. They also developed an algorithm to implement a mixed model. Although mixed modeling literature in statistics existed earlier than Corbeil and Searle (1976) (for example they cite Hartley and Rao (1967)), application in forestry appears to have begun only in late 1980’s and early 1990’s. Generalized estimating equations (GEE) were used by Gregoire and Schabenberger (1996b), and Hall and Bailey (2001). In SPlus nlme library, the default estimation method for nonlinear mixedeffects modeling is maximum likelihood (ML), whereas the default method for linear mixedeffects modeling is REML. The library enables to fit mixed models with correlated and/or heterogeneous errors (Pinheiro and Bates 2000). 50 CHAPTER III DATA AND DATA MANAGEMENT This chapter describes the data used for this dissertation. The details on plot establishment, sampling, tree measurements, data management and summary statistics for important variables are presented. Study Area and Establishment of Plots The Department of Forestry at Oklahoma State University cooperated with the USDA Forest Service Southern Research Station, and the Ouachita and Ozark National Forests to establish the study plots for shortleaf pine. The objective was to study growth and yield of shortleaf pine in evenaged natural stands. The study was called Study 48. The plots were established from fall 1985 to fall 1987 in eastern Oklahoma and western Arkansas. The study area is presented in Figure 3.1. To quantify the species distribution within a stand in terms of proportionate coverage of shortleaf pine and other species, basal area was assessed using a 10factor prism. Only five shortleaf trees that were classified either dominant or codominant as per the definition of Avery and Burkhart (2002) were selected for height and age 51 determination. This information was used to adopt the design criteria of agedensitysite index combinations as given in Table 3.1 for selecting plots (Lynch et al. 1999). Figure 3.1. Shortleaf pine growth study locations in eastern Oklahoma and western Arkansas (Lynch et al. 1991) The original plots were established based on the design criteria, and the following stand properties (Rose 1998): (1) naturally regenerated stand with at least 70% shortleaf pine basal area with trees having DBH 0.6 inches and larger, 52 (2) stand with dominant and codominant trees having maximum age range of 10 years or less, (3) stand having site index variation of less than 10 feet, (4) evenaged stand with no clumping and no more than two age classes per plot, and (5) reasonably free from insect, disease or fire damage and no harvesting in the last five years. Table 3.1. Study design for naturally regenerated evenaged shortleaf plots installed from 1985 to 1987 with thinning and herbicide treatment in eastern Oklahoma and western Arkansas (Lynch et al. 1999) Variable Class range Class midpoint Basal area (ft2/ac) 1645 4675 76105 106135 30 60 90 120 Site index (ft at base age 50) <56 5665 6675 >75 50 60 70 80 Stand age (yr) <31 3150 5170 7190 20 40 60 80 Circular fixedarea plots of size 0.2 acres (52.7foot radius) were established. The plots were surrounded by a 33foot isolation boundary, which was painted white. A buffer boundary of 33foot was marked that surrounded each contiguous group of plots. The buffer boundary was painted blue. The study plots were selected using aerial photographs. A typical net plot with two 5milacre subplots is given in Figure 3.2. The 5milacre subplots were used for understory measurements. Plot centers were posted 53 with an 18inch orange plastic surveyor’s stake or steel reinforcing rod. Each plot number was permanently marked on the center stake. During establishment, each plot was thinned to a predetermined basal area as per the study design. Hardwood control was achieved by using a chemical herbicide. The buffer boundary also received the same treatments for thinning and hardwood control as the study plot. A total of 192 plots were planned with three plots in each combination of design variables (basal area, site index and age), but only 191 were established. Of the established plots, eight were either damaged due to windstorms or not thinned as per the design, so only 183 plots and 7740 trees were available for analysis for the first growth period from Study 48 (Lynch et al. 1999). Figure 3.2. A representation of 0.2 acres measurement plot with two 5milacre plots (USDA Forest Service 1995) 54 Study 58 was established in 1963–1964 by Frank Freese to study thinning effects in naturally occurring shortleaf pine stands. Study 58 was modified in 1988 to conform to the same design criteria used for Study 48 (Lynch et al. 1999). Additional plots from Study 58 were also available for this dissertation. Only the relevant data for the two growth periods from 1985 to 1996 have been used, although more data exist prior to 1985 for Study 58. Although Freese’s study was established with 35 plots, only 25 plots remained after 1987. Study 58 has only 664 trees available for analysis. Therefore, measurements for 208 plots and 8404 trees from the two studies through three time points are available for the purpose of this dissertation. Measurements/Observations Initial tree measurements were taken at the time of plot establishment. The azimuth and distance from plot center were recorded for each tree in a plot. Each tree was also labeled with a tree number and diameter measurement mark. Measurements of diameter at breast height and assignment of tree crown class were made on each tree. Tree diameter was measured at breast height (4.5 feet) to nearest 0.1inch for each tree in the plot. Tree status and crown characteristics were also measured on all the trees. Crown class was recorded as (1) Dominant, (2) Codominant, (3) Intermediate, or (4) Suppressed following the definitions of Avery and Burkhart (2002). Trees with crowns extending above the general level of the crowns and being in a position to receive full light from above and partially from sides were classified as dominants. The trees classified as codominants were at the general crown level, which were in a position to receive full 55 light from above but little from the sides. Trees that were shorter than dominants and codominants receiving little direct light from above and none from the sides were classified as intermediates. Trees classified as suppressed were completely below the general level of the crown receiving no direct light. Total tree height, crown height and crown length were recorded to nearest foot for representative dominant and codominant trees. Ring count data of the representative dominant and codominant trees was recorded to determine tree age. The ring count data were collected by means of increment cores taken at 4.0 feet height on the uphill side of the tree for the sample trees. Five years were added to annual ring count to determine the total age in years. At least four trees were measured for total height and crown height from plots having the biggest trees, whereas a maximum of 39 height measurement trees were selected from plots having the smallest trees. Repeated Measurements Second measurements (first remeasurements) were made between fall 1990 and fall 1992. Third measurements (second remeasurements) were taken between fall 1995 and fall 1997. Remeasurements were made at an interval of either four or five years. For remeasurements, the plots were located with the help of aerial photos, which indicated the location of the plots. If the plot center stake was missing during remeasurements, it was located by taping from previously known tree numbers. If a tree number was not found during remeasurement, it was located with the help of azimuth and distance records of previous measurements. If previously recorded diameter appeared to be incorrect, it 56 was checked with the help of increment core taken just below breast height. For remeasurement, only new heightsample trees were used for increment cores since ring count data for previous heightsample trees were already available, although not all heightsample trees were bored for the ring count. A significant change was made in status codes for third measurements. The details on field instructions including the changes can be found in the following field guides. (1) Field Instructions FSSO410648 Growth and Yield of Thinned Natural Shortleaf Stands in the Ozark and Ouachita National Forests (August 1991) for first and second measurements (2) Field Instructions Shortleaf Growth and Yield Study FSSO410648 (July 1995) for third measurement Each tree was assigned a twodigit status code for third measurement. However, the status code for the first two measurements was only a single digit. Status codes for Study 58 were slightly different. The details of status codes for both the studies are provided in Appendix I. No damage code was recorded for first measurement in Study 48, because all healthy and sound trees were selected. On the other hand, damage code was available from all the three measurements for Study 58, because it was a thinning study previously established in 1963–64. A slightly different status code, i.e. doubledigit code was used for Study 58 (Appendix I) 57 Data Management This section presents the issues relating to quality check, corrections and intermediate calculations carried out for preliminary analysis as well as for preparing data for more formal analysis. There were some plots with the same identification number across studies 48 and 58, e.g. plots 9 and 10; so the duplicate plot numbers were resolved by adding 48000 and 58000 respectively to make the plot identification unique so that random ploteffects could be estimated properly. The records with DBH growth over the period outside the range of 0 and 2 inches were listed. Similarly the trees with total height growth outside the range of 0 and 9 feet were also listed. Original plot sheets for such cases were thoroughly checked. Necessary corrections and adjustments were made where appropriate. Data entry errors were corrected accordingly, e.g. wrongly entered status codes, DBH values etc. Moreover, the cases in which DBH data were correctly entered in computer from plot sheets, but growth values appearing unusually large were compared to the expected maximum growth of an open grown tree (Smith et al. 1992). Almost all of such cases were found reasonable based on the analysis. Thus, such cases were left unaltered as there was no basis to make adjustments otherwise. Missing and suspicious values for variables such as DBH, height and status code were checked in plot sheets, and corresponding corrections were made as appropriate. Diameter at breast height was estimated for those trees which were in the plot but were missed for measurements. The estimate was used to calculate stand basal area only, but 58 such a record would not be included in modeling. Interpolation was used if middle DBH value was missing. Similarly, constant rate of growth was assumed for two periods when either first or third DBH was missing unless the third missing observation was due to mortality. In general, it was likely that first missing measurement could be due to the field worker missing the tree, but the third missing measurement could be due to either mortality or observation error by the recorders. For example, the values of DBH for a tree in Study 58 for different years are as follows: Year 1987 1992 1997 DBH (in) 12.4 (D1) 13.0 (D2) ? (D3) When the tree was alive in 1997, then the DBH was estimated by assuming the same growth rate during the two periods. That is: Estimate of D3 = D2 + growth rate x D2 where growth rate (for period 1) = ⎟ ⎟⎠ ⎞ ⎜ ⎜⎝ ⎛ − 1 2 1 D D D So, estimate of D3 = 13.0 + ⎟⎠ ⎞ ⎜⎝ ⎛ − 12.4 13.0 12.4 x 13.0 = 13.63 (which rounds to 13.6 with 0.1 inch rounding) Site index was calculated based on all three measurements using the method of Graney and Burkhart (1973). Since their site index equation could not be solved analytically, a numerical procedure called the secant method as described in Gerald and Wheatley (1994) was used. Trees with possible height measurement errors were not 59 included in site index calculation. The ratio of quadratic mean diameter to individual tree DBH was calculated. Stand level variables such as basal area per acre and stand age were computed from individual tree measurements. Stand basal area was calculated by summing the individual tree basal areas in the plot, and by adjusting to a per acre value. Stand age was assumed to be the average age of the representative dominant and codominant trees in the plot. Study 48 data for the first two measurements (first period) were available as a SAS ver. 7 data file from the work of Lynch et al. (1999), which was converted to a SAS ver. 9 data file. Moreover, if DBH correction from increment core data during the third measurement was available, the master file was updated by running the programs again. A separate Excel file for the third measurement for Study 48 was available, and it was thoroughly checked for such errors as discussed before. Since the file was only for the third measurement, records for dead trees were not entered in the Excel file. But it was necessary to keep track of what happened to a specific tree in previous measurements. So, an appropriate death code, including appropriate code to distinguish if a tree was a height sample tree or not, was entered for such records. This would allow the users to keep track of individual trees during the fourth measurement. For Study 48, growth calculations for the first period were performed by Hitch (1994) and Lynch et al. (1999). These same computations were done for the second period. The previous SAS programs were revised for the second period, and new programs were also written to handle the added complexity due to additional growth period. For example, site index calculation is now based on all three measurements. Growth calculations were also completed for both the 60 periods for Study 58. Thus, a master file was created with both the studies and two growth periods. Conversion of azimuth and distance data to Cartesian coordinate system was achieved in three steps: (1) Transformation (translation) of azimuth angle by subtracting 900 from azimuth in quadrants II, III and IV, but adding 2700 to azimuth in quadrant I. The resulting angle is similar to that used for polar coordinates but measured in a clockwise direction (2) Subtracting the value of step (1) above from 3600 to obtain the correct angle for the polar coordinates system (3) Calculation of Cartesian (xy) coordinates from polar coordinates by using the following formulas: xcoordinate = r cosine(θ) (r= distance, θ=angle for polar coordinate) ycoordinate = r sine(θ). This was modified to our situation to calculate in SAS using appropriate variable names. SAS uses radian measure but azimuth was in degrees. The Cartesian coordinates will be used for analysis of possible spatial dependence of trees within a plot. Plot establishment, measurements and the type of data used in growth and yield modeling from the first two measurements are given in Hitch (1994), Lynch et al. (1999) and Rose (1998). Data from fourth measurements made between fall 2000 and fall 2001 will not be utilized in this project because of significant icestorm damage. 61 The data for this project is “approximated real growth series” type in the terminology of Moser and Hall (1969). The results from analysis of these data should apply to naturallyregenerated shortleaf pine stands of eastern Oklahoma and western Arkansas. The summary statistics of stand or plot variables as well as tree variables are presented in Tables 3.2 and 3.3. Table 3.2. Summary statistics for 208 plots (Studies 48 and 58 combined) for plot/stand level variables Variable Mean Standard Deviation Minimum Maximum Basal area (ft2/ac) First measurement Mid (first) Second measurement Mid (second) Third measurement 92.9 102.6 112.3 121.3 130.7 29.1 30.2 32.2 34.1 36.8 27.3 22.5 14.9 16.1 17.3 129.0 142.5 156.5 178.4 200.4 Stand age (yr) First measurement Second measurement Third measurement 41.8 46.5 51.6 19.7 19.7 19.7 18.0 23.0 28.0 93.0 98.0 103.0 Site index (ft at age 50 yr) 57.4 9.5 39.9 87.4 Quadratic mean diameter, QMD (in) 8.3 3.5 3.3 19.9 Dominant height (ft) 54 19 27 106 Overall, a negative tree growth value is not impossible in calculations due to measurement/observation error or, more importantly, due to loss of some bark thickness, especially with old trees. Therefore, analyses including negative growth as well as setting negative growth to zero will be discussed. 62 Table 3.3. Summary statistics for 208 plots (Studies 48 and 58 combined) for individual tree variables Variable Mean Standard Deviation Minimum Maximum Diameter at breast height (in) First measurement Second measurement Third measurement 7.4 8.2 9.1 3.9 3.9 4.0 1.1 1.2 1.5 24.4 25.4 26.6 Tree basal area (ft2) First measurement Second measurement Third measurement 0.3825 0.4469 0.5397 0.3835 0.4128 0.4526 0.0066 0.0079 0.0123 3.2472 3.5188 3.8591 Annual average basal area growth (ft2/tree) Overall First period Second period 0.01378 0.01303 0.01459 0.01127 0.01044 0.01205 0.01669 0.01669 0.01396 0.09964 0.07176 0.09965 Ratio of QMD to DBH (R) 1.1397 0.4423 0.3559 7.3621 Total height (ft) First measurement Second measurement Third measurement 57 61 65 22 21 20 10 10 13 112 113 119 Figures 3.3 and 3.4 provide some idea about how DBH distribution changed slightly over the periods. The mid value of DBH for two measurements was computed for use as an independent variable in modeling tree basal area growth. 63 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 0 2 4 6 8 10 12 Per c ent DBH_MID Figure 3.3. Histogram of midDBH for the first growth period 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 0 2 4 6 8 10 Per c ent DBH_MID Figure 3.4. Histogram of midDBH for the second growth period 64 CHAPTER IV METHODS FOR DATA ANALYSIS AND MODELING This chapter presents some background on statistical methods used in data analysis, and their application to the shortleaf pine growth study data set. Basic statistics (summary statistics and graphical techniques) were used for exploratory analysis. Two major tree attributes were considered for modeling: average annual basal area growth, and total height. For both of these attributes, models were first fitted with a calibration data set, and validation was carried out with a different data set, although the validation data set in this study would not be completely independent of the calibration data set. Ideally, a model would have good properties both statistically and biologically. On statistical grounds, some models might fit the data well, but they might not necessarily have good biological properties. For example, a high degree polynomial model may fit sample data very well, but such a model often does not have much biological significance. Thus an empirical modeling approach can be useful, but interpreting individual coefficients in the resulting model may be difficult. However, overall data fit and good predictability of response for a given value of explanatory variable(s) are important, and an empirical model may be able to provide these characteristics. 65 Since biological growth is considered intrinsically nonlinear (Zeide 1993), a nonlinear statistical model is naturally a good choice for forest modelers. Thus, attempts were made to fit different nonlinear models for shortleaf pine growth data. This approach is consistent with typical forest growth and yield modeling practices (e.g. Wykoff 1990, Vanclay 1994). A variety of nonlinear regression models for shortleaf pine growth variables have been reported by Murphy (1982, 1986), Lynch et al. (1991, 1999), Murphy et al. (1992), and Lynch and Murphy (1995). None of the shortleaf pine growth researchers took the random ploteffects and possible dependence of withinplot errors into account. These factors could influence the resulting parameter estimates and their precision, although tools of mixed modeling that could handle such an analysis had already appeared in forestry literature since late 1980’s (see Review of Literature). Therefore, this project carried out analyses that would make improvements on the two major components of shortleaf pine growth models developed by Lynch et al. (1999): basal area growth and total height models. Randomeffects for plots (or stands assuming a plot typically represents a stand) were incorporated in a modeling framework to obtain better parameter estimates. Possible correlated and heterogeneous withinplot errors were also evaluated. Furthermore, this modeling exercise utilized additional data resulting from the third measurement on these permanently established plots. MixedEffects Models As briefly discussed in Chapter II, when all the levels of a factor are present in the data for analysis, then the factor has a fixedeffect. That is, the whole population of the 66 factor levels is represented. The analyst is basically interested in comparing the effects of the factors on the response variable for those levels included in the study only. A model containing only fixed effects is called a fixedeffects model. On the other hand, a factor may have a large number of levels in some cases, and the direct testing or representation of all the levels are not often economical, practical, or necessary. So only a random sample of the factor levels is tested or studied. Thus inference from such a study can be applied to the population of levels from which the sample was drawn. Such effects are called randomeffects. For example, use of random plot effects in a model would allow valid generalization of the results to a population of plots from which the study plots were sampled. A model in which all the effects are random is called a randomeffects model. Variances associated with randomeffects are known as variance components (Graybill 1976, Snedecor and Cochran 1980). Data analysts are interested in the variance components rather than in the individual random coefficients themselves. Models in which some effects are fixed and the others are random are called mixed models. Each randomeffect is associated with a particular fixedeffect, but all the fixedeffects in the model may or may not have associated randomeffects. In the mixedeffects approach also, the objectives of fixed effects testing are as in the fixedeffects modeling. On the other hand, one would test if the variance components associated with the random effects equal zero rather than testing or interpreting the individual random coefficients themselves, although best linear unbiased predictors (BLUP) of the randomeffects are also estimated or in the commonly used terminology, predicted. In fact, randomeffects help in accounting for sources of variation, which enables accurate estimation and testing of fixedeffects (Tao 2002). Mixedmodels are also defined to be 67 models with errors at more than one level. So these models are also called multilevel models or hierarchical models. In mixed models, effects can be either nested or crossed. Depending on how a response is modeled as a function of explanatory variables, typically mixedeffects models are classified as: (a) Linear mixed models (b) Nonlinear mixed models Linear Mixed Model A basic linear mixedeffects model, as described by authors such as Laird and Ware (1982), Gregoire et al. (1995), and Pinheiro and Bates (2000) is given below. In matrix notation the model can be written as: yi = Xi β + Zibi + εi, i=1,2, ……M, i.e. M = number of groups (24) where, yi is the nidimensional response vector for the ith group (the number of observations could be different in different groups) β is the pdimensional parameter vector for fixedeffects bi is the qdimensional group specific vector of parameters for randomeffects (scalar if only one random effect) Xi is the ni × p design matrix for fixedeffects variables Zi is the ni × q design matrix for randomeffect variables (randomeffect of ith group) that are often chosen as a subset of fixedeffects variables εi is the nidimensional withingroup (or groupspecific) error vector 68 It is assumed that bi ~ N(0, Ψ ), and εi ~ N(0, σ2I), where Ψ is a variancecovariance matrix for random effects and I is an identity matrix with random withingroup error variance σ2. Moreover, it is also assumed that cov(bi, εi) = 0, i.e. randomeffects and withingroup errors are assumed to be independent for different groups, and to be independent of each other in the same group. The columns of Zi are often a subset of the columns of Xi. Nonlinear Mixed Model A basic nonlinear mixedeffects model can be written as follows (Pinheiro and Bates 2000). yij = f (φ ij,ν ij )+ε ij , i =1,…,M, j =1,…,ni, (25) where M = the number of groups (e.g. number of plots if data are grouped by plots) ni = the number of observations on the ith group (e.g. number of trees in a plot at one time point) f = a general, realvalued, differentiable function, which is nonlinear in at least one component of the groupspecific parameter vector φ ij = a groupspecific parameter vector ν ij = a covariate vector ε ij = a normally distributed withingroup error term independent of bi (random effects), i.e. ε ij ~ N(0, σ2) 69 Moreover, φ ij is modeled as φ ij = Aijβ + Bijbi where bi ~ N(0,Ψ ) β = pdimensional vector of fixedeffects parameters bi = qdimensional vector of randomeffects parameters associated with the ith group, and bi ~ N(0, Ψ ). Note thatΨ is a scalar, if there is only one randomeffect. Aij = design matrix for fixedeffects parameters Bij = design matrix for randomeffects parameters It is assumed that randomeffects and withingroup errors are independent. All of the above for the nonlinear mixed model can be summarized in matrixnotation as: ( ) yi = f i φ i,ν i + ε i , i =1,…,M (26) where i φ is modeled as φ i = Aiβ + Bi bi , and yi = ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ ini i y y M 1 , ( )f i φ i,ν i = ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ ( , ) ( , ) ini ini i i f f φ ν φ ν M 1 1 , i φ = ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ ini i φ φ M 1 , ν i = ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ ini i ν ν M 1 ε i = ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ ini i ε ε M 1 , Ai = ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ ini i A A M 1 , Bi = ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ ini i B B M 1 For this work, only nonlinear mixedeffects models are used, and are therefore the focus of all discussions. 70 Extended Mixed Models The mixed models considered in this section have two aspects, viz. extension to more than one level of randomeffects, and incorporation of correlated and/or heterogeneous errors. The basic models can be extended to more than one level. A typical twolevel nonlinear mixed model can be written as given below. yijk = f (φ ijk,ν ijk ) + ε ijk , i =1,…,M, j =1,…,Mi, k =1, …,nij (27) where ijk = Aijk + Bi jk bi + Bijk bij , φ β bi ~ ( ) N 0,ψ 1 , bij ~ ( ) N 0,ψ 2 Mixed models with any level of randomeffects can be further extended to incorporate correlated and/or heterogeneous errors, although the complexity increases quickly since a regular mixedeffects model in itself is computationally very intensive. It is not always necessary to assume that the errors are normally and independently distributed. Thus in the singlelevel models above, we can assume that εi ~ N(0, σ2Λi), where Λi are positivedefinite matrices. The withingroup errors are again assumed to be independent for different groups indexed by i, and independent of random effects bi . The structure of Λi can accommodate correlated and/or heterogeneous withingroup error structures. That is, the variancecovariance structure of the withingroup errors can be partitioned into two components: a correlation structure and a variance structure. The error distribution would extend to εik ~ N(0, σ2Λik) for twolevel model (Pinheiro and 71 Bates 2000). Details on correlated and/or heterogeneous errors are presented in a separate section called Error Modeling. Estimation and Computing Two major parameter estimation methods are maximum likelihood (ML) and restricted/residual maximum likelihood (REML). The theory of estimation for these methods can be found in Pinheiro and Bates (2000). The basic mixedeffects models described above can be fit by popular software packages such as SAS and SPlus. A linear mixed model or a linearized form of nonlinear model can be fit using either SAS PROC MIXED or SPlus lme command. A nonlinear mixed model can be fit by PROC NLMIXED or SPlus nlme command. Both lme and nlme commands in SPlus and PROC MIXED have options for both ML and REML methods, whereas PROC NLMIXED has only ML option available since the REML method for nonlinear mixed model is a very complicated estimation procedure. Therefore, SPlus nlme and PROC NLMIXED have ML as default estimation method, whereas SPlus lme and PROC MIXED have REML method as default method (Pinheiro and Bates 2000, Tao 2002). The extended linear mixed models can be fit by both PROC MIXED and SPlus lme. However SPlus nlme could accommodate more complicated error structures for correlated and heterogeneous errors, but PROC NLMIXED can fit only the basic nonlinear mixedeffects model assuming independently and normally distributed errors. SPlus gls and gnls commands can be used to fit equivalent basic as well as extended 72 linear and nonlinear models without randomeffects using generalized leastsquares methods. Overall, SPlus has more facilities for handling the extended models, especially for nonlinear mixed models (Pinheiro and Bates 2000, Tao 2002). Since intrinsically nonlinear mixed models are considered in this dissertation, typical syntax and command structure for SPlus and SAS procedures are given below. SPlus nlme Syntax (Pinheiro and Bates 2000, Venables and Ripley 2002): nlme(model, data, fixed, random, groups, start, method) where model is a two sided nonlinear formula separating response on the left and an expression involving parameters and explanatory variables on the right; data specifies the name of data file; fixed and random specify fixed and random components of the model; the groups option declares the grouping structure of the data if the data file is not already grouped; the start option is used to provide starting values for the estimates unless a selfstarting function is used; and method can be used to choose between ML and REML methods for parameter estimation. The output from this command can be stored in an appropriate file that could be specified as: outfile<nlme(.), where outfile is a name given to an output file. I 



A 

B 

C 

D 

E 

F 

I 

J 

K 

L 

O 

P 

R 

S 

T 

U 

V 

W 


