

small (250x250 max)
medium (500x500 max)
Large
Extra Large
large ( > 500x500)
Full Resolution


NORTHERN BOBWHITE ABUNDANCE IN RELATION TO CLIMATE, WEATHER, AND LAND USE IN ARID AND SEMIARID AREAS: A NEURAL NETWORK APPROACH By Jeffrey J. Lusk Bachelor of Science University of Illinois Chicago, Illinois 1993 Master of Science Southern Illinois University Carbondale, Illinois 1998 Submitted to the Faculty of the Graduate College of the Oklahoma State University in partial fulfillment of the requirements for the Degree of DOCTOR OF PHILOSOPHY July, 2004 ii NORTHERN BOBWHITE ABUNDANCE IN RELATION TO CLIMATE, WEATHER, AND LAND USE IN ARID AND SEMIARID AREAS: A NEURAL NETWORK APPROACH Thesis Approved: Fred S. Guthery ____________________________________________________ Thesis Advisor Stanley F. Fox ____________________________________________________ Ronald E. Masters ____________________________________________________ Samuel D. Fuhlendorf ____________________________________________________ Al Carlozzi ____________________________________________________ Dean of the Graduate College iii PREFACE Some of the chapters in this dissertation have been published in peerreviewed journals. Although I shared authorship of these chapters in their published form with colleagues and collaborators, I am responsible for the content (analysis, modeling, and writing). Because each chapter was meant to be a standalone manuscript, some duplication of information is necessary. Therefore, I have elected to leave each chapter in its published form. Footnotes at the beginning of each chapter indicate the manuscript s status and, if applicable, the full citation for published chapters. Authors wishing to cite information in the published chapters should cite the published versions, since these journals own the copyrights. I attempted to limit the amount of repetition in chapters that have not been previously published. As a result, the introductions and discussions in these chapters, particularly Chapter 6, are shorter than their counterparts in published chapters. I would like to thank my advisor, Dr. Fred S. Guthery, for his guidance and encouragement during my studies at Oklahoma State University (OSU). He has been both a mentor and a colleague, and it has been an honor to have worked with him. He encouraged me to challenge existing knowledge and pervading paradigms, and provided a role model to emulate. I thank Dr. Samuel D. Fuhlendorf for serving on my committee, for his constructive and detailed comments on numerous manuscripts, and for his perspectives on landscape ecology and rangelands. I would also like to thank my other committee members, Drs. Ronald Masters and Stanley Fox for their assistance and advice. Several people have made my time in Stillwater more enjoyable. Most notably, I would like to thank Kim Suedkamp Wells, Heather Hansen (née Wilson), Charles Coley, Jill Brison, and Jon Forsman for their friendship, support, and encouragement. C. Coley also provided moral support and editing assistance during the iii writing of this dissertation. Finally, I would like to thank my parents for their constant love and support, even though they still are not certain what it is that I do. Financial support for this project was provided by the Bollenbach Endowment and the Game Bird Research Fund through Dr. Fred S. Guthery. I was also supported by a Presidential Fellowship for Water, Energy and the Environment from the OSU Environmental Institute and a Doris and Eugene Miller Distinguished Graduate Fellowship from the OSU Foundation. Further support was provided by the Department of Forestry, Department of Zoology, Oklahoma Department of Wildlife Conservation, Texas Parks and Wildlife Department and the Oklahoma Agricultural Experiment Station. v TABLE OF CONTENTS Page PREFACE.......................................................................................................................................... iii TABLE OF CONTENTS................................................................................................................ v LIST OF TABLES............................................................................................................................ vii LIST OF FIGURES.......................................................................................................................... viii Chapter 1 GENERAL INTRODUCTION AND LITERATURE REVIEW............................................ 1 2 NEURAL NETWORK MODELING: AN APPROACH TO DISCRIMINATION AND PREDICTION...................................................................................................... 15 Abstract......................................................................................................................... 15 Introduction.................................................................................................................. 15 Model Description..................................................................................................... 22 Neural Network Architecture..................................................................... 22 The Training Process...................................................................................... 25 Data Considerations........................................................................................ 27 Usage Considerations.................................................................................... 30 Neural Model Interpretation........................................................................ 34 Accuracy Assessment.................................................................................... 36 Examples....................................................................................................................... 37 Gambel s Quail and Winter Precipitation.............................................. 37 Nestsite Characteristics of Northern Bobwhites............................. 38 Caveats........................................................................................................................... 44 Management Considerations.............................................................................. 45 3 A NEURAL NETWORK MODEL FOR PREDICTING NORTHERN BOBWHITE ABUNDANCE IN THE ROLLING RED PLAINS OF OKLAHOMA............. 47 Introduction.................................................................................................................. 47 Methods......................................................................................................................... 49 Artificial Neural Networks............................................................................. 49 Database Construction.................................................................................. 51 ANN Construction, Training, and Validation......................................... 52 Regression Analysis......................................................................................... 54 Model Comparison........................................................................................... 54 Simulation Analyses......................................................................................... 55 Results............................................................................................................................ 56 Discussion..................................................................................................................... 70 Conclusions.................................................................................................................. 74 vi Chapter Page 4 NORTHERN BOBWHITE (COLINUS VIRGINIANUS) ABUNDANCE IN RELATION TO YEARLY WEATHER AND LONGTERM CLIMATE PATTERNS.................................................................................................................... 76 Abstract........................................................................................................................ 76 Introduction.................................................................................................................. 77 Methods........................................................................................................................ 79 Northern Bobwhites........................................................................................ 79 Abundance Indices........................................................................................... 79 Climate and Weather Variables................................................................ 80 Landuse Variables........................................................................................... 81 Neural Networks............................................................................................... 82 Results........................................................................................................................... 84 Neural Models.................................................................................................... 84 Simulation Analyses......................................................................................... 86 Discussion.................................................................................................................... 89 Conclusions.................................................................................................................. 95 5 RELATIVE ABUNDANCE OF BOBWHITES IN RELATION TO WEATHER AND LAND USE..................................................................................................................... 97 Abstract...................................................................................................................... 97 Introduction............................................................................................................... 98 Methods...................................................................................................................... 101 Neural Network Architecture................................................................... 101 Database Construction................................................................................ 102 Model Interpretation..................................................................................... 103 Results......................................................................................................................... 106 Discussion.................................................................................................................. 117 Management Implications.................................................................................. 121 6 EFFECTS OF CLIMATE DEVIATIONS ON NORTHERN BOBWHITE ABUNDANCE IN TEXAS....................................................................................... 123 Introduction............................................................................................................... 123 Methods...................................................................................................................... 124 Results......................................................................................................................... 126 Discussion.................................................................................................................. 131 7 THE EFFECTS OF GLOBAL CLIMATE CHANGE ON NORTHERN BOBWHITE ABUNDANCE............................................................................................................ 135 Abstract...................................................................................................................... 135 Introduction............................................................................................................... 136 Methods...................................................................................................................... 139 Results and Discussion........................................................................................ 141 8 CONCLUSIONS............................................................................................................................. 181 9 LITERATURE CITED..................................................................................................................... 187 vii LIST OF TABLES Table Page 2.1 Definitions of terms used in neural modeling, listed alphabetically.......................... 17 3.1 Parsimony analysis of the artificial neural network model and the regression model using the adjusted sumofsquares (Hilborn and Mangel 1997)................ 57 3.2 Contribution of each independent variable to the artificial neural network and regression models predictions of bobwhite abundance in the Rolling Red Plains of Oklahoma.......................................................................................................................... 58 4.1 Independent variable contributions to neural network predictions of normalized bobwhite counts (19911997) in Oklahoma based on weather and climate data. Percent contribution reflects the importance of a particular variable in determining a neural network s predictions relative to other variables.................................................................................................................................. 85 5.1 State and ecosystemlevel means for independent variables used to develop a predictive model for northern bobwhite abundance in Texas, 1978 1997.. 105 5.2 Relevance (importance) of input variables in a 4neuron neural model developed to predict the abundance of northern bobwhites in Texas based on data collected during 1978 1997. Relevance is calculated as the sum of the squared weight of the variable of interest divided by the sum of squared weights for all inputs. The higher the relevance score, the more the variable contributes to the model s predictions and, therefore, gives the relative importance of each variable....................................................................................................... 107 viii LIST OF FIGURES Figure Page 1.1 Hypothetical relationship between abundance and temperature showing how the range over which a variable is measured in the field can determine the response type. Even if sampling crosses the depicted zones, the overall correlation might still be negative, positive, or nonexistent......................................... 12 2.1 A diagrammatic representation of a generic multilayer perceptron, neural network model. This MLP is a 321 network (3 input nodes, 2 neurons, and 1 output node) consisting of 3 layers: an input layer (A), a neuron layer (B), and an output layer (C). Nodes in 1 layer are connected to nodes in the preceding layer via synaptic weights (D). Each neuron also has an associated bias weight (E)........................................................................................................... 23 2.2 Hypothetical error surfaces resulting from particular combinations of synaptic weights. In (a), the error surface is relatively flat, and a MLP with initial synaptic weights randomly assigned any value in this range will eventually find the combination of synaptic weights that gives the global minimum prediction error. In (b), the error surface is hilly. A MLP may not be able to find the combination of connection weights resulting in a global minimum, but instead may become stuck in a local minimum.................................. 31 2.3 Simulation results from the Swank and Gallizioli (1954) MLP model showing the predicted change in fall age ratio over the observed range of variation in total winter rainfall (cm). Data points represent observed fall age ratios. Inset: a diagrammatic representation of the 111 MLP used to model the data presented in Swank and Gallizioli (1954). The MLP contained 1 input node in the input layer (total winter rainfall), 1 neuron in the neuron layer, and 1 output node in the output layer (fall age ratio)..................................................... 39 2.4 Simulation results from the trained neural network model for differentiating random and nest locations based on vegetation characteristics on the Mesa Vista Ranch in Roberts County, Texas, 2001 2002. Results are presented only for variables with >10% contribution to the model s output: A) canopy height (cm), B) percent shrub cover, and C) bareground exposure (%). Dashed horizontal lines represent an arbitrary 0.5 cutoff threshold between suitable and unsuitable................................................................................................................. 42 3.1 Predicted bobwhite counts from the artificial neural network model plotted against the actual values in the (a) training data set and (b) the validation data set, for the Rolling Red Plains of western Oklahoma. The trend line represents the linear model regression of predicted bobwhite count on the actual bobwhite count................................................................................................................... 59 ix Figure Page 3.2 Predicted bobwhite counts from the full model regression plotted against the actual values in (a) training data set and (b) the validation data set, for the Rolling Red Plains of western Oklahoma. The trend line represents the linear model regression of predicted bobwhite count on actual bobwhite count...................................................................................................................................................... 61 3.3 Neural network simulation analyses (solid line) and regression predictions (dashed line) of the response of bobwhite counts in the Rolling Red Plains of western Oklahoma to mean monthly temperature in (a) June, (b) July, and (c) August. Temperature is reported in degrees Celsius, and the same scale was used for each plot.................................................................................................................. 64 3.4 Neural network simulation results (solid line) and regression predictions (dashed line) of the response of bobwhite counts to seasonal precipitation in the Rolling Red Plains of western Oklahoma. Winter months (a) included December, January, and February; spring months (b) included March, April, and May; and summer months (c) included June, July, and August. Precipitation is reported in centimeters, but each plot has its own scale........... 66 3.5 Neural network simulation results (solid line) and regression predictions (dashed line) of the response of bobwhite counts in the Rolling Red Plains of western Oklahoma to (a) the proportion of county area in agricultural production, (b) cattle density on nonagricultural lands, and (c) the previous year s bobwhite count. Cattle density is reported as total number of head per km2 of nonagricultural land................................................................................................ 68 4.1 Results of simulation analyses of the independent variables effects on normalized bobwhite counts in Oklahoma using the weather neural network. Variables of interest are the observed weather conditions and landscape variables for a particular year: June (a), July (b), and August (c) temperature; winter (d), spring (e), and summer (f) precipitation; and the proportion of county area in cultivation (g), density of cattle on noncultivated land (h), and the previous year s normalized bobwhite count (i).......................................................... 87 4.2 Results of simulation analyses of the independent variables effects on normalized bobwhite counts in Oklahoma using the climate neural network. The variables in this network were the deviations of annual weather conditions from longterm mean conditions and landscape variables: deviation from longterm mean June (a), July (b) and August (c) temperature; deviation from longterm mean winter (d), spring (e), and summer (f) precipitation; and the proportion of county area in cultivation (g), density of cattle on noncultivated land (h), and the previous year s normalized bobwhite count (i)................................................................................................................................................. 90 5.1 Predicted versus observed northern bobwhite counts recorded by Texas Parks and Wildlife Department biologists during annual August surveys (1978 1997) for training data (A) and validation data (B) using a 4neuron neural network. The trend line indicates the linear relationship between predicted and observed counts................................................................................................ 108 x Figure Page 5.2 Predicted northern bobwhite counts from simulation analyses of the effects of June (A), July (B), and August (C) mean maximum temperature (°C) generated from the trained neural model using a data set in which the independent variable of interest varies between its minimum and maximum, and all other independent variables are held constant at their statewide mean (Table 5.1). Dashed vertical lines indicate the mean value of the independent variable. The same scale was used for each plot s Yaxis to provide information on sensitivity............................................................................................. 110 5.3 Predicted northern bobwhite counts from simulation analyses of the effects of winter (A), spring (B), summer (C), and fall (D) rainfall (mm) generated from the trained neural model using a data set in which the independent variable of interest varies between its minimum and maximum, and all other variables are held constant at their statewide mean (Table 1). Dashed vertical lines indicate the mean value of the independent variable. The same scale was used for each plot s Yaxis to provide information on sensitivity......... 112 5.4 Predicted northern bobwhite counts from simulation analyses of the effect of the proportion of county area in cultivation (A), head of livestock per hectare of noncultivated land (B), and previous year s bobwhite count (C). Predictions were generated from the trained neural model using a data set in which the independent variable of interest varies between its minimum and maximum, and all other independent variables are held constant at their statewide mean (Table 5.1). Dashed vertical lines indicate the mean value of the independent variable of interest. The same scale was used for each plot s Yaxis to provide information on sensitivity............................................................. 115 6.1 Predicted versus observed bobwhite abundance for counts recorded by Texas Parks and Wildlife Department during annual August surveys (1978 1997) for both training and testing/verification datasets using a 5neuron neural network. .............................................................................................................................. 127 6.2 Predicted bobwhite abundance as a function of (a) the previous year s bobwhite count, (b) deviations from longterm mean June temperature, and (c) livestock density on noncultivated lands generated by the 5neuron neural network. The variable of interested was varied incrementally from the maximum observed value to the minimum observed value while the remaining variables were held constant at their means. The scale of the y axis in each graph is identical to provide information on the sensitivity of the model..................................................................................................................................................... 129 7.1 Predicted changes in northern bobwhite abundance in Texas based on climate change scenarios developed from the Goddard Institute of Space Science general circulation model (GISS GCM). Predictions were based on a 0.5×0.5° latitude/longitude grid and interpolated over the entire state using universal kriging................................................................................................................................ 142 7.2 Predicted changes in standard normal deviate of bobwhite counts in Oklahoma based on climate change scenarios developed from the Goddard Institute of Space Science general circulation model. The predictions were based on a 0.5×0.5° latitude/longitude grid and interpolated across the state using universal kriging....................................................................................................... 144 xi Figure Page 7.3 Change in June temperature for Oklahoma as predicted by the Goddard Institute for Space Science general circulation model. The values represent the difference between model predictions at 2×CO2 and 1×CO2 concentrations.................................................................................................................................. 147 7.4 Change in July temperature for Oklahoma as predicted by the Goddard Institute for Space Science general circulation model. The values represent the difference between model predictions at 2×CO2 and 1×CO2 concentrations.................................................................................................................................. 149 7.5 Change in August temperature for Oklahoma as predicted by the Goddard Institute for Space Science general circulation model. The values represent the difference between model predictions at 2×CO2 and 1×CO2 concentrations.................................................................................................................................. 151 7.6 Change in winter rainfall for Oklahoma as predicted by the Goddard Institute for Space Science general circulation model. The values represent the ratio of model predictions at 2×CO2 and 1×CO2 concentrations......................................... 153 7.7 Change in spring rainfall for Oklahoma as predicted by the Goddard Institute for Space Science general circulation model. The values represent the ratio of model predictions at 2×CO2 and 1×CO2 concentrations......................................... 155 7.8 Change in summer rainfall for Oklahoma as predicted by the Goddard Institute for Space Science general circulation model. The values represent the ratio of model predictions at 2×CO2 and 1×CO2 concentrations...................... 157 7.9 Change in fall rainfall for Oklahoma as predicted by the Goddard Institute for Space Science general circulation model. The values represent the ratio of model predictions at 2×CO2 and 1×CO2 concentrations.............................................. 159 7.10 Change in June temperature as predicted by the Goddard Institute for Space Science general circulation model for Texas. The values represent the difference between model predictions at 2×CO2 and 1×CO2 concentrations.................................................................................................................................. 162 7.11 Change in July temperature as predicted by the Goddard Institute for Space Science general circulation model for Texas. The values represent the difference between model predictions at 2×CO2 and 1×CO2 concentrations..... 164 7.12 Change in August temperature as predicted by the Goddard Institute for Space Science general circulation model for Texas. The values represent the difference between model predictions at 2×CO2 and 1×CO2 concentrations.................................................................................................................................. 166 7.13 Change in winter rainfall as predicted by the Goddard Institute for Space Science general circulation model for Texas. The values represent the ratio of model predictions at 2×CO2 and 1×CO2 concentrations......................................... 168 xii Figure Page 7.14 Change in spring rainfall as predicted by the Goddard Institute for Space Science general circulation model for Texas. The values represent the ratio of model predictions at 2×CO2 and 1×CO2 concentrations......................................... 170 7.15 Change in summer rainfall as predicted by the Goddard Institute for Space Science general circulation model for Texas. The values represent the ratio of model predictions at 2×CO2 and 1×CO2 concentrations......................................... 172 7.16 Change in fall rainfall as predicted by the Goddard Institute for Space Science general circulation model for Texas. The values represent the ratio of model predictions at 2×CO2 and 1×CO2 concentrations......................................... 174 1 CHAPTER 1 GENERAL INTRODUCTION AND LITERATURE REVIEW1 The northern bobwhite (Colinus virginianus; hereafter, bobwhite) is an important game species over much of its range. Although declines have been noted since at least the 1880s (Errington and Hamerstrom 1936), bobwhite abundance typically follows a boomorbust pattern with considerable variation in numbers between and among years (Stoddard 1931, Stanford 1972, Roseberry and Klimstra 1984:130). Possible factors influencing longterm trends in bobwhite abundance include climate change, habitat loss, and landuse changes (Edwards 1972, Klimstra 1982, Brady et al. 1993, Schemnitz 1993, Rotenberry 1998). Further, harvest may be an additive, rather than compensatory, source of mortality in years of low production (Pollock et al. 1989, Johnson and Braun 1999, Guthery et al. 2000). Before harvest and habitat management can be effective at maintaining stable, huntable populations, an understanding of the factors influencing bobwhite abundance that are not amenable to management, such as weather and climate, is required. It is further required that the interactions between climate, weather, and land use be elucidated, because it is against the backdrop of these effects that habitat and harvest management must operate. Another issue of some importance is the effects of global change on wildlife, especially in the arid and semiarid regions of the United States (Guthery et al. 2000). As such, global change is an issue of concern to both conservation and wildlife management. With the knowledge garnered from investigations of the responses of bobwhite abundance to current climate, weather, and landuse patterns, managers may be better able to plan for the effects of 1 This chapter was written to place the remaining chapters into a common context. It is not intended for publication. 2 future climate, as predicted by various globalchange models. Such planning will be a necessary part of any longterm management program (Irwin 1998), and could involve reservesite choice or habitat manipulations designed to ameliorate the effects of climate. In the United States, bobwhites range over much of the eastern and central parts of the country (Kaufman 1996). According to data from the North American Breeding Bird Survey (NABBS), bobwhite populations in the US show a longterm rate of decline of 2.40% per year (Church et al. 1993, Sauer et al. 1997). This rate of decline increased between 1982 and 1991 to 3.50% per year (Church et al. 1993). In Oklahoma, the longterm rate of decline has not been as severe, averaging only 0.20% per year (Sauer et al. 1997). However, short term trends indicate a significant decline. The 10year population trend for the period 1986 1996 indicates a 3.88% per year decline, and the 3year trend (19931996) indicates populations are declining at a rate of 7.26% per year (Sauer et al. 1997). In Texas, the long term rate of decline is 2.00% per year, with shortterm declines of 6.43% per year (10year trend) and 20.09% per year (3year trend) (Sauer et al. 1997). Although the abovecited declines may be cause for concern among wildlife managers, these changes in average abundance through time provide a reference frame from which to determine population status. As mentioned previously, bobwhite populations tend toward boomorbust dynamics across their range (Stoddard 1931, Stanford 1972, Roseberry and Klimstra 1984:130). In the US, the mean number of bobwhites counted per NABBS route over the years 1966 1996 was 20.95. In Oklahoma and Texas, the mean was 47.12 and 33.21, respectively (Sauer et al. 1997). Considering shorter intervals, the 10year mean in Oklahoma is 44.59 bobwhites per NABBS route, and in Texas 26.37 bobwhites per NABBS route. The 3year means for 1993 1995 are 37.83 and 21.55 bobwhites per NABBS route in Oklahoma and Texas, respectively (Sauer et al. 1997). Therefore, trends in bobwhite populations may not be as severe as suggested by the percent declines. The importance of various weather factors in determining avian abundance varies both with the species being considered and with latitude. Temperature is a controlling factor in 3 northern latitudes, especially over the winter period. In southern latitudes, rainfall and moisture tend to be more important than temperature (Newton 1998:288), but summer temperature can also have important effects on the reproductive biology of a species (Leopold 1933, Robinson and Baker 1955, Speake and Haugen 1960, Guthery et al. 2001), thereby influencing abundance measured in the autumn. Among gallinaceous birds, young are often susceptible to both rainfall and temperature (Sumner 1935, Newton 1998:288). Weather effects may manifest both through direct and indirect means. Direct effects such as hyper and hypothermia are obvious, but weather s indirect effects may be more difficult to detect. Weather may act indirectly on abundance through both food availability and habitat suitability (Swank and Gallizioli 1954, Sowls 1960, Newton 1998), and may be moderated or accentuated by both the length and intensity of the weather event (Leopold 1931, Elkins 1995). For example, insect prey is essential for successful broodrearing among quail (Hurst 1972), and the availability of such prey is determined, in part, by rain and temperature (Elkins 1995). Periods of drought and high temperature will reduce the amount of insect prey available and, therefore, reduce production (Newton 1998:289). Further, these impacts on production might increase in magnitude with the length of the drought. In addition, the effects of weather on a species are not constant, but vary with the average physical condition of the local population. If a drought is of sufficient duration, the population may be food stressed and less able to withstand the vagaries of weather than a population that has not experienced a food shortage, but exposed to the same weather conditions (Newton 1998:289). Rainfall and temperature both influence quail dynamics (Edwards 1972, Stanford 1972, Campbell et al. 1973, Roseberry and Klimtstra 1984, Giuliano and Lutz 1993), but the effects vary with region. Investigations of weather effects also differ in how they define weather variables, such as summer rain, and in the estimates of population parameters used. Consequently, reported results are not directly comparable and often lead to confusion about the exact effects of weather on quail production and population status. 4 In arid regions, rainfall is the most influential weather component for avian survival and production (Newton 1998), is an important determinant of abundance, and can affect various demographic components of bobwhites. In drier environments in south Texas, the bobwhite s breeding season ends 2 months earlier than in more mesic environments (Guthery et al. 1988). Summer rainfall (April August) was highly, positively correlated with hunter success for scaled quail (Callipepla squamata) in eastern New Mexico (Campbell 1968). Rainfall may be more critical during certain periods of the life cycles of quail species than during other periods. Heffelfinger et al. (1999) found that midwinter (December January) rainfall affected calling behavior of Gambel s quail (Callipepla gambelii) more than rainfall during early (October November) or late (February March) winter. In arid and semiarid regions of Oklahoma and Texas, spring and summer rainfall might be particularly important (Stanford 1972). However, Campbell et al. (1973) did not find a significant correlation between May June or April July rainfall and scaled quail production in New Mexico. A lack of linear correlation between environmental and response variables may not necessarily indicate a lack of relationship between the variables (Laasko et al. 2001). Summer rainfall (July August) had the greatest influence on scaled quail production (Campbell et al. 1973), with most of the response due to August rainfall alone (Campbell 1968). Percent juveniles in the fall bobwhite harvest was positively related to the average total rainfall between May and August in Alabama (Speake and Haugen 1960). Bobwhite production in Louisiana responded positively to increasing summer precipitation, with highest production occurring when precipitation exceeded 762 mm (Reid and Goodrum 1960). June rainfall in Texas was only weakly related to bobwhite abundance (Giuliano and Lutz 1993). Recent work by Bridges et al. (2001) in Texas showed that, although 12month rainfall totals were positively correlated with bobwhite abundance in the South Texas Plains, the 12 month Modified Palmer Drought Severity Index (PMDI; an index of rainfall that accounts for soil type and moisture, temperature, and evaporation) was more strongly correlated with bobwhite abundance. They also reported that monthly PMDIs were positively correlated with bobwhite 5 abundance in the Cross Timbers and Prairies (November February, rs ≥ 0.57), Edwards Plateau (September November, rs ≥ 0.59), Rolling Plains (September February, April, June; rs ≥ 0.56), and South Texas Plains (October July, rs ≥ 0.56), whereas raw rainfall amount was positively correlated with bobwhite abundance only in the South Texas Plains. Although snowfall sufficient to kill bobwhites occurs in parts of their range, snowfall is probably not a major concern in arid and semiarid regions. In these regions, however, winter rainfall can still influence quail production. The effects of winter rain, again, vary by species and region. Percent juveniles in fall populations of scaled quail showed a nonsignificant, negative relationship with winter (October March) rainfall both in pre and postharvest samples (Campbell et al. 1973). However, in an earlier study of scaled quail in the same area, winter rainfall (October March) showed nonsignificant, positive correlation with hunter success, which is assumed to be an index of abundance (Campbell 1968). Giuliano and Lutz (1993) found that scaled quail abundance in Texas was positively correlated to winter rainfall. Bobwhite harvest in Illinois was positively related to winter rainfall (Edwards 1972), whereas, in Texas, abundance showed a nonsignificant, negative correlation with winter rainfall (Giuliano and Lutz 1993). California quail (Callipepla californica) age ratios were positively correlated with winter (January March) rainfall in California (Francis 1970). Temperature may be a less important factor in quail production than rainfall (Edwards 1972), or may only be important below some critical threshold of precipitation (Robinson and Baker 1955, Heffelfinger et al. 1999). However, this might not hold for arid and semiarid regions where operative temperatures may exceed the thermotolerance limits of many species (Forrester et al. 1998, Heffelfinger et al. 1999, Guthery et al. 2001). In such areas, high temperatures reduce the amount of space time available for use by a species (Guthery 1997, Forrester et al. 1998, Heffelfinger et al. 1999). Klimstra and Roseberry (1975) reported that July August (summer) temperatures affected the end of the bobwhite nesting season. Therefore, the effects of temperature will be of critical importance to bobwhite 6 production in the more southern areas of its range, if temperatures increase due to global change. Forrester et al. (1998) found that bobwhites avoided patches in which the operative temperature (a metric that takes account of the ambient air temperature plus the heating effects of sunlight and the cooling effects of airflow) exceeded 39 °C and, as a result, 50% of the available habitat space time was unusable to bobwhites during all seasons. The age ratio of bobwhite populations in Louisiana in winter responded positively to mean maximum monthly temperature in all months, but responded negatively with the highest maximum monthly temperature (Reid and Goodrum 1960). Therefore, high seasonal temperatures can affect production. For example, the length of the laying season in Illinois was reduced by 12 days for every 1 °C increase in the July August temperature (Klimstra and Roseberry 1975). In Alabama, the percent juveniles in the fall harvest was negatively correlated with the total deviation from mean monthly temperatures from May through August (Speake and Haugen 1960). Reid and Goodrum (1960) reported that bobwhite production was suppressed in hot years compared with cooler years. Hot, dry conditions reduced the percentage of female bobwhites in laying condition in south Texas (Guthery et al. 1988). Male bobwhites reduced calling behavior by 86.4% in a hot year compared with a cooler year (Guthery et al. 2001). It seems likely that bobwhites adjust their reproductive activities based on ambient weather conditions in a particular year, thereby favoring longterm survival and maximizing lifetime reproductive output. However, other studies in higher latitude areas lacked a strong effect of temperature on production and recruitment. For example, Edwards (1972) did not find consistent effects of mean monthly temperature on bobwhite harvest in Illinois. Further, Roseberry and Klimstra (1984) found no relationship between bobwhite recruitment and mean average daily temperature or mean maximum daily temperature. Although temperature reduced the length of the bobwhite breedingseason, it did not decrease the proportion of those young produced in a given year from entering the breeding population. That is, juvenile survival was not reduced. 7 The effects of temperature and rainfall can interact in influencing bobwhite abundance. Rainfall masked the effects of temperature on bobwhite production in Kansas (Robinson and Baker 1955). When precipitation was below some threshold amount, temperatures above 23.3 °C reduced bobwhite production, but there was little effect when rainfall exceeded this threshold (Robinson and Baker 1955). Combinations of low rainfall (drought) and high temperatures reduced bobwhite recruitment (Stanford 1972, Hurst et al. 1996). Guthery et al. (2002) report that temperature and rainfall influence age ratios of bobwhites in south Texas in complex, nonlinear ways, and suggest that low temperatures can mitigate the negative effects of drought and that high temperatures can eliminate the positive effects of rainfall. Habitat provides all life requisites for an individual organism (Hall et al. 1997), and is, therefore, an important factor in understanding a species abundance and distribution. Human use of the landscape can have considerable effects on its suitability as habitat for wildlife. Whereas the amount of land area converted for human use influences population dynamics, the spatial pattern of this fragmentation is also of concern (Hanski 1999). Further, different land uses will affect wildlife populations to different extents. That is, not all landuse practices are incompatible with wildlife. Human land use practices fall into 2 broad categories: 1) urban development resulting in land being converted to residential, commercial, or industrial use, and 2) agricultural development resulting in land being converted to the production of food for humans or domesticated animals. Although cropland is a dominant agricultural land use in the northern and eastern portions of the bobwhite s range, in the west, grazing may be more pervasive. Around 70% of western land area is grazed (Fleischner 1994). In Texas, approximately 53,140,000 ha, or 76.8% of the land area, is in agriculture, with 65.5% of that area rangeland and 28.7% cropland (USDA NASS, Census of Agriculture 1997). In Oklahoma, approximately 13,443,000 ha, or 74.2% of the land area, is agricultural land, of which 46.5% is rangeland and 44.7% is cropland (USDA NASS, Census of Agriculture 1997). Therefore, grazing and cultivation are important land uses that affect the amount of usable habitat 8 space time (Guthery 1997) available for bobwhites. As the predominant land use in these states, livestock grazing and cultivation undoubtedly influence the abundance, distribution, and population dynamics of a variety of wildlife species (Barnes et al. 1991). The conversion of habitat from native vegetation to row crops often converts what was once a heterogeneous landscape into a monoculture. Early agricultural practices, typified by many, small familyowned farms, resulted in a pattern of land use referred to as patchwork agriculture and was believed to enhance wildlife abundance through the creation of edge between cultivated fields and windbreaks and fencerows (Leopold 1933). Modern agricultural practices, however, are managed using clean farming practices, which favor large fields with few fencerows or windbreaks. Cultivated crops may serve as a food source for some wildlife species. Roseberry and Klimstra (1984) report that unharvested grain served as the only food source for bobwhite coveys during a prolonged snow cover in southern Illinois. The benefit to bobwhites from these unharvested grains depends on the juxtaposition of standing crops to suitable bobwhite winter habitat. In southern Illinois, much of the agricultural landscape is still in a patchwork arrangement (J. Lusk, personal observation) and, therefore, such juxtapositions occur frequently. However, the value of food plots and cultivated cropland for bobwhites in other areas where such juxtapositions are rare is probably nil, mostly because bobwhite populations cannot survive in such landscapes. Livestock grazing does not usually result in the total transformation of the vegetation community, but, depending on the intensity and periodicity, can alter the structural complexity and species composition of the habitat and thereby affect its suitability (Fleischner 1994). Whether these habitat changes will increase or decrease suitability depends on the magnitude of the changes (Severson and Urness 1994). Further, changes that favor a particular species may disfavor another species (Barnes et al. 1991, Severson and Urness 1994). Structural changes include changes in vegetation stratification leading to a reduction in structural complexity (Fleischner 1994). Grazing can also reduce the amount of litter and increase the 9 amount of bare ground, which in some cases can alter plant phenology (Kaufman et al. 1983). Changes in litter and ground cover can increase soil compaction and thereby reduce water infiltration (Orr 1960, Orodho et al. 1990), which can have nontrivial effects on plant communities, especially in arid and semiarid regions (Fleischner 1994). Grazing was the primary influence on grassland species composition in the Edwards Plateau ecoregion in Texas (Fuhlendorf and Smeins 1997, Fuhlendorf et al. in press). However, interannual precipitation was correlated with plant basal area (Fuhlendorf et al. 2001). Precipitation and grazing also interacted in determining species composition, where moderately and ungrazed areas were more resilient to the effects of severe drought than heavily grazed areas (Fuhlendorf and Smeins 1997). These grazing effects on the vegetation community will indirectly affect bobwhite abundance. Bobwhites have adapted to a variety of habitats from the eastern coast of the United States west to the Rocky Mountains. Within these longitudes, bobwhites have adapted to conditions from temperate latitudes in Wisconsin to subtropical, semiarid, and arid latitudes throughout the southern US and south to Costa Rica. Within the array of habitats the bobwhite occupies, there are many configurations of habitat types that are equally optimal (Guthery 1999). Many authors have qualitatively described bobwhite habitat in various regions. For example, Edminster (1954) reported bobwhite habitat included grassland, cropland, brushy cover, and woodland habitat types. In south Texas, optimal habitat configuration typically consisted of 53% woody canopy coverage, 38% herbaceous canopy coverage, and 44% bare ground (Kopp et al. 1998). In southern Illinois, bobwhites were associated with patchy landscapes with moderate levels of grassland and row crops, and high levels of woody edge (Roseberry and Sudkamp 1998). Although there is a great deal of ecological slack in the optimal composition of bobwhite habitat (Guthery 1999), the structural changes brought about by grazing could have the greatest impact on bobwhite abundance. Grazing may increase the amount of bare ground in an area (Fleischner 1994) and decrease amounts of certain grass species 10 (Severson and Urness 1994). These changes have been associated with increases in bobwhite use (Schulz and Guthery 1988). Peak bobwhite abundance occurred in pastures using a rapidrotation grazing system compared to abundances under continuous grazing (Hamerquist and Crawford 1981, Schulz and Guthery 1988). Given that the optimal seral stage for bobwhites varies with the overall productivity of the habitat (Spears et al. 1993), the effects of grazing on bobwhite abundance may also vary among areas and habitat types. The research reported herein was intended to address several issues of importance to bobwhite management in the arid and semiarid regions of their range, and attempted to address some of the current ambiguity apparent in previous investigations of bobwhite weather relationships. I employed an artificial neural network technique to model bobwhite abundance in relation to climate, weather, and land use. I then used these models to predict the changes in bobwhite abundance that could be expected under equilibrium climate expected under 2x the current CO2 concentrations in the atmosphere (IPCC 1998). The research reported herein is important for several reasons. First, little research into the population dynamics of grassland birds has been undertaken to date, despite the fact that declines among these species have been of greater magnitude and of a more persistent trend than for the morestudied, neotropicalmigrant forest species (Herkert and Knopf 1998, Rotenberry 1998). Conservation efforts for many grassland speciesofconcern are hampered by a lack of data on aspects of their ecology (Herkert and Knopf 1998). Further, because indirect methods are commonly used to obtain demographic data, estimates of demographic parameters based on these data might be biased or imprecise (Pollock et al. 1989, Shupe et al. 1990, Clobert and Lebreton 1991, Roseberry and Klimstra 1992). The nature of the relationship between bobwhite production and climate, weather, and land use is unclear at this time. This lack of clarity results from a multitude of studies with largely contradictory results. These contradictions might result from differences in variable definition and selection, or from the use of linear analysis techniques. Linear analyses, such as correlation and regression, are not conducive for determining functional relationships among variables when the functional 11 relationship is nonlinear. For example, correlation coefficients may indicate a positive or negative response to variation in another variable, but the lack of a strong correlation may not be indicative of a lack of relationship between the variables (Laasko et al. 2001). Furthermore, nonlinear biological responses to environmental variation can sometimes result in either spurious positive or negative correlations depending on the functional response of the biological system and the pattern of environmental variation (Laasko et al. 2001). For instance, if bobwhite abundance varies in a symmetric, unimodal fashion with temperature, then, depending on the observed range of temperatures with respect to the abundance response function, there may be positive, negative, or no relation apparent from the correlation coefficients, even when temperature is a strong forcing variable for bobwhite abundance (Fig. 1.1). Therefore, a nonlinear analysis approach is necessary to clarify these relationships and to confirm or reject results obtained using traditional linear approaches. Second, the neural models resulting from my analyses were used to predict bobwhite abundance in the fall, prior to the hunting season. As such, the Oklahoma Department of Wildlife Conservation and the Texas Parks and Wildlife Department can use them to forecast fall harvests in advance of their fall roadside counts, thereby giving them more time to act on this information. This information may also be used by managers and conservation biologists to develop proactive management plans in the light of global climate change. Because the bobwhite is an important game species, its management and conservation are of immediate concern to state wildlife managers. Declining bobwhite populations could lead to decreased revenue from the sale of hunting licenses and decreased funding from contributions to the Federal Aid in Wildlife Restoration program, and, therefore, these state agencies must begin planning to minimize the impact climate change might have on bobwhite populations within their jurisdictions. Third, research is only a part of the management process. To be useful for management, research must be conveyed to managers in a manner in which they can apply it to the decisionmaking process (Hejl and Granillo 1998, Young and Varland 1998). My 12 Fig. 1.1. Hypothetical relationship between abundance and temperature showing how the range over which a variable is measured in the field can determine the response type. Even if sampling crosses the depicted zones, the overall correlation might still be negative, positive, or nonexistent. 13 Abundance ! Zone of Positive Correlation Zone of Zero Correlation Zone of Negative Correlation Temperature ! 14 research will provide managers with both a method for forecasting fall bobwhite harvests and for understanding bobwhite responses to weather conditions. The former provision will assist in setting bag limits, season lengths, and in redirecting hunters from low abundance areas. In addition, the results can be used to develop longterm management plans. Finally, the results of this research can be used to better understand the impacts of climate change on species abundance and distribution in the central United States. Evidence for the effects of climate change on species ecology continues to mount. Changes in plant phenology will have concomitant effects among vertebrate species that rely on them for food or shelter. Many species have evolved lifehistory characteristics synchronized with seasonal changes in resource availability, but that are only weakly coupled to actual changes in the resource (Myers and Lester 1992, Root 1993). That is, species might synchronize their life history with resource availability via proximate cues (e.g., photoperiod). Changes in climate might alter or negate the relationship between the cue and the underlying resource (e.g., plant seed abundance), resulting in a decoupling of life history from resource base, and reduction in production and abundance. Community structure will also likely be affected by climate change, because each species in the community will respond to changes differently. However, such changes in community structure will result in changes in community dynamics, which will also affect the individual species. Although the models presented herein cannot address all of the complexities of the impacts of climate change on bobwhite populations, they can show how abundance and distribution will change in response to climate change alone. From this base, management actions can be focused on areas in which bobwhite abundance is predicted to be greatest or the least. Also, further research can begin to investigate the interactions between climate, landuse, and community reorganization. 15 CHAPTER 2 NEURAL NETWORK MODELING: AN APPROACH TO DISCRIMINATION AND PREDICTION1 Abstract Neural network modeling offers wildlife biologists a powerful technique for finding patterns in large, multivariate datasets. Because neural network modeling is appearing more frequently in the ecological literature, we provide a descriptive overview of this approach to data analysis in wildlife research, and discuss its merits and drawbacks. Neural networks offer a powerful alternative to traditional prediction and discrimination models, especially where little or no a priori information about the relationships among variables exists. Neural networks are nonparametric, can model linear and nonlinear relationships, are unaffected by multicollinearity, and can be applied to prediction and discrimination problems; the same model can simultaneously predict multiple dependent variables or discrimination classes. However, because of the structure of neural networks, biological interpretation of model output is not straightforward and requires additional simulations. Further, neural models can become overfit and lose the ability to generalize to new data. Focusing on 1 type of neural network, the backpropagation, multilayer perceptron, we provide a prediction and a discrimination example of the technique using published data. Introduction An artificial neural network (ANN) is one of a suite of machine learning techniques currently being applied in ecology (Fielding 1999b). Other machine learning techniques include 1 Manuscript prepared for submission to Wildlife Society Bulletin. Second author: Dr. Fred S. Guthery. 16 genetic algorithms (Mitchell 1998, Jeffers 1999) and cellular automata (Dunkerley 1999). Although other types of ANNs exist (Boddy and Morris 1999), the type we describe is a feed forward, backpropagation multilayer perceptron (Smith 1996; hereafter MLP). We chose the MLP because it is the simplest and most widely used technique in the ecological literature. This type of neural network was originally developed as a model of cognition and learning in the human brain (Rumelhart et al. 1986, Smith 1996, Boddy and Morris 1999, StevensWood 1999). As such, the associated terminology borrows heavily from neurobiology (Table 2.1). The use of neural network models in ecology is increasing and current applications include statistical modeling. The technique is nonparametric and, therefore, makes no distributional assumptions about the data. Applications thus far have dealt with comparing the performance of MLPs with that of traditional statistical methods. These comparisons have typically shown that MLP models outperform more traditional analyses such as linear regression based on accuracy of predictions (Recknagel et al. 1997, Maier et al. 1998). For example, Olson and Cochran (1998) applied a MLP to model aboveground biomass in the tallgrass prairie. Compared to a regression model, their MLP model more accurately predicted standing biomass and predicted changes in biomass with greater accuracy (Olson and Cochran 1998). An MLP predicted the species diversity of arthropod assemblages in wet soil habitats more accurately than a multiple linear regression analysis (LekAng et al. 1999). Özesmi and Özesmi (1999) compared the performance of a MLP with that of logistic regression in the classification of locations in a GIS database. These locations represented either nest or nonnest sites for redwinged blackbirds (Agelaius phoencies) and marsh wrens (Cistothorus palustris). They reported that in all but 1 case the MLP outperformed logistic regression (Özesmi and Özesmi 1999). Manel et al. (1999) compared MLPs with logistic regression and multiple discriminant analysis for predicting birdspecies occurrences, and 17 Table 2.1. Definitions of terms used in neural modeling, listed alphabetically. Term Definition Backpropagation An algorithm that sends errors detected in the output sequentially back thought the model to adjust synaptic and bias weights (parameters) Bias weight Weights attached to each neuron in the neuron and output layers; analogous to an intercept in a regression equation Hidden layer(s) One or more layers of neurons in a multilayer perceptron; also called a neuron layer and the layer of processing elements Input layer Layer containing the input nodes (independent variables) in a multilayer perceptron Input node Data used as predictors; synonymous with independent variables in traditional statistical models Learning The iterative change in synaptic weights resulting in a reduction of the mean square prediction error; the process of finding relationships among variables and producing an appropriate response for a give set of input data; also called training Learning rate A value determining the magnitude of changes made to the synaptic weights during the training process Learning rule A rule governing how a synaptic weight can be adjusted to minimize the mean square prediction 18 Table 2.1. Continued. Term Definition Learning rule, Con t error; examples include steepest descent and conjugate gradient Momentum A value determining the number of past iterations to consider when adjusting synaptic weights; reduces instabilities and oscillations in the prediction error Multilayer perceptron A type of neural network model which uses a backpropagation technique to simulate cognition and learning in the brain; used in statistical modeling to find nonlinear and linear patterns in large, multivariate datasets without assumptions inherent in parametric techniques Neural network A machine learning technique used to simulate the function of the brain Neuron A component of the neuron layer of a multilayer perceptron; transforms the weighted sum of the input variables using a transfer function such as the sigmoid transfer function Neuron layer One or more layers of neurons in a multilayer perceptron; also called the hidden layer and the layer of processing elements Output layer Layer containing the output node(s) in a multilayer perceptron 19 Table 2.1. Continued. Term Definition Output node Data being predicted by a multilayer perceptron; synonymous with the dependent variable in traditional statistical models Overfitting A problem in modeling in general and neural modeling in particular in which a model too closely approximates the data used for model development, and which, therefore, generalizes poorly to new data Processing elements One or more layers of neurons in a multilayer perceptron; also called the hidden layer or neuron layer Relevance An index of the contribution of each input variable to the predictions; a measure of the importance of an input node based on the synaptic weights Logistic transfer function A transformation applied to the weighted sum of input variables in order to approximate the underlying function or relationships among input and output variables Stimuli Another way of referring to the input data in a neural network model which maintains the neurological analogy 20 Table 2.1. Continued. Term Definition Synaptic weights Weights applied to the input variables and neurons in order to produce accurate predictions of the output variable and which are adjusted during the learning process; contain information about the relationships among input and output data; analogous to regression coefficients Training See learning. Training data Data used during the training process to determine patterns among input and output variables and to adjust synaptic weights to minimize the mean square prediction error; a portion of the total dataset from which the MLP learns Validation data Data used during or after the training process to evaluate the MLP s performance to prevent overfitting and determine how well the MLP predicts from novel data; data not used to adjust synaptic weights during training 21 found that the MLP correctly classified more cases than the other 2 methods. However, they concluded that, based on Receiver Operating Characteristic plots (Fielding 1999a), the logistic model was the better model, but that it was sensitive to the prevalence of positive cases (occupied sites) in the data (Manel et al. 1999). Using an adjusted sumofsquares technique, which penalizes models for their complexity (Hilborn and Mangel 1997), we found that a multiple linear regression model outperformed a neural model in predicting bobwhite (Colinus virginianus) abundance based on weather and landuse characteristics (Lusk et al. 2002). However, the neural model provided a better understanding of how bobwhite populations respond to climate. In addition to the above comparisons between traditional statistical techniques, other researchers have applied MLP models to a variety of research questions. Multilayer perceptron models successfully predicted call counts and age ratios for Gambel s quail (Callipepla gambelii) from precipitation and temperature data (Heffelfinger et al. 1999); occurrences of 3 smallbodied fish in freshwater streams in >80% of the cases (Mastrorillo et al. 1997); and abundances of trout (Salmo trutta) based on habitat characteristics (Baran et al. 1996, Lek et al. 1996a). A MLP model allowed wildlife managers in southern France to predict the impact of wild boar (Sus scrofa) damage to agricultural crops allowing more efficient use of limited funds (Spitz and Lek 1999). In our research, we have applied MLP models to predict northern bobwhite abundance in western Oklahoma (Lusk et al. 2002) and to determine the relative importance of longterm climate and shortterm weather patterns in determining their abundance (Lusk et al. 2001). Multilayer perceptrons can provide accurate predictions for management planning and decision making (Lein 1997), and a deeper insight into the ecological and biological processes at work (Colasanti 1991, Edwards and Morse 1995, Lek et al. 1996b). The main advantage of the MLP is that it can find patterns in large, multivariate datasets without the assumptions inherent in regression and other techniques. This is true because a MLP represents a function as a sum of terms, and any continuous function, under mild constraints, 22 can be represented as a sum of terms. Wildlife researchers may be familiar with other sum ofterms models, such as the kernel estimator used in homerange estimation (Worton 1989) and the Fourier series used in line transect analyses (Buckland et al. 1993). Our objective is to introduce MLP modeling to wildlife managers and scientists. We 1) briefly explain the theory behind neural modeling, 2) describe the structure and terminology of the neural modeling method, with specific regard to the MLP, 3) provide examples of the application of neural models to the problems of prediction and discrimination, and 4) discuss the strengths and weaknesses of the approach. Model Description Neural Model Architecture The MLP may be arranged in a series (≥ 3) of layers (Fig. 2.1). The first layer is called the input layer, which contains 1 input node for each independent variable. Input nodes are homologous to the independent variables in multiple regression. The input nodes can be considered stimuli in the neurological sense. The second layer is referred to as the hidden layer, the neuron layer, or the layer of processing elements. The neuron layer contains ≥ 1 set of neurons, the number of which determines the complexity of patterns that can be detected (Smith 1996:25). The neuron layer processes the data to predict the dependent variable(s) in the third layer, called the output layer. The output node(s), or dependent variable(s), represent the desired response. Elements in each layer may be connected to every element in the preceding layer via synaptic weights. The synaptic weights store the information learned (see below) by the network during the training process, and are analogous to regression coefficients (Heffelfinger et al. 1999), but their interpretation is not as straightforward. Typically, each node in 1 layer is connected to every node in the preceding layer (Fig. 2.1), and, as such, the neural network is termed fully connected (Smith 1996, Boddy and Morris 1999). 23 Fig. 2.1. A diagrammatic representation of a generic multilayer perceptron, neural network model. This MLP is a 321 network (3 input nodes, 2 neurons, and 1 output node) consisting of 3 layers: an input layer (A), a neuron layer (B), and an output layer (C). Nodes in 1 layer are connected to nodes in the preceding layer via synaptic weights (D). Each neuron also has an associated bias weight (E). 24 A B C D E E 25 The Training Process The development of a MLP model can be thought of as a process in which a network attempts to learn an appropriate response (e.g., a population abundance or a classification of used or unused) to a given set of stimuli. Training (or learning) is simply the rote method (see below) of adjusting parameters (biases and synaptic weights) such that prediction or discrimination becomes more accurate as parameters are iteratively adjusted. Biologists are familiar with leastsquares regression using linear models, which attempt to maximize prediction accuracy by minimizing the sumofsquared errors. The MLP operates under the same error minimization goal. However, because of nonlinearity and other model complexities, there is no analytical solution for minimization; the model must minimize error by using a learning rule that changes synaptic weights iteratively, so that the mean squared error may be reduced each iteration. During this process, which is called training (or learning), the synaptic weights begin to represent the relationships among input and output variables. In this way, the model is said to learn. Initially, a MLP has little or no ability to predict or discriminate because synaptic weights are set at small, random values (Smith 1996:22). Each neuron processes the incoming stimuli by first multiplying each input by the appropriate synaptic weight (Hagan et al. 1996:27 28). These products are then summed together and a bias weight is added (Hagan et al. 1996, Smith 1996). The bias weight is analogous to the intercept in regression analysis. This result, u, is then transformed using a transfer function. The most widely used transfer function is the logistic transfer function ( ) u u 1 e 1 g + = . The use of a logistic transfer function allows nonlinear relationships between the independent and dependent variables to be detected and learned. The processed stimuli, g(u), are then sent to an output node. At the output node, another transformation is applied to the processed stimuli, the result of which is a scaled prediction of the dependent variable(s) (Smith 26 1996). This second transformation can be the same as that applied at the neurons, but more often a linear transformation is applied (Hagan et al. 1996). The model predictions can be considered a response to the incoming stimuli. Next, the predictions generated by the model are compared with the actual values of the dependent variable(s). The prediction error is calculated and backpropagated through the network to adjust the synaptic weights. Backpropagation means that the biases and synaptic weights are first adjusted for the synapses between the neurons and the output nodes, and then adjusted for the synapses between the neurons and the input nodes; i.e., information on error is sent backwards through the model. The error is apportioned among the various synaptic weights using the chain rule of calculus (Haykin 1999:162). The adjustment of synaptic weights is governed by 3 factors. The first is the learning rule, which determines how the MLP will adjust the synaptic weights. There are several types of learning rules, the most popular of which are steepest descent and the conjugate gradient learning rules. The steepestdescent rule alters the synaptic weights after each pass through the entire dataset so that the error decreases the fastest (Smith 1996:78). A variation to the steepestdescent rule involves adjusting synaptic weights after each data point is processed, rather than after all data points have been processed. The conjugate gradient rule involves the secondorder derivative (i.e., the derivative of a derivative) of the error, which measures the rate at which that slope is changing, or, in other words, the rate at which the change in error is decelerating (Smith 1996:184). The other techniques all involve the firstorder derivative of the error, which gives the slope of the error surface (see below) for a given set of synaptic weights. The conjugate gradient technique, therefore, allows more accurate and sensitive adjustment of the synaptic weights, but is more computationally intense. Related to the learning rules is the learning rate. The learning rate determines the absolute magnitude of the changes in the synaptic weights based on the direction and magnitude of the prediction error (Smith 1996:77). So whereas the learning rules determine how the synaptic weights are changed, the learning rate determines how much the synaptic 27 weights are changed given a specific learning rule. The selection of an appropriate learning rate is important in neural model construction. If the learning rate is too small, then it will take longer for the network to learn the patterns in the data (i.e., converge to a minimal error), because only small adjustments are made to the synaptic weights. If the learning rate is too large, then the error will tend to oscillate and the network will be unstable (i.e., the predictive accuracy of the model will change from good to poor repeatedly), because the large changes to the synaptic weights will often increase the error rather than reduce it (Hagan et al. 1996:95, Smith 1996:8182). We recommend using a steepestdescent learning rule with an adaptive learning rate that will allow the learning rate to be adjusted as needed during the training process (Hagan et al. 1996:1212 1214, Smith 1996:8890). For example, if during training, the error begins to oscillate, the algorithm will reduce the learning rate until the oscillations are dampened and the error decreases. The final factor governing synaptic weight changes is called momentum and determines the degree of influence past changes in the synaptic weights have over current changes (Smith 1996: 8588). Momentum is a kind of filter, which reduces the amount of oscillations in the prediction error (Hagan et al. 1996:1210). The momentum can have a value between 0 and 1. The larger the momentum, the stronger the effect of past error changes in determining current weight changes. Therefore, the change in the error rate after the most recent iteration will tend to continue in the direction of previous changes, even if the error begins to increase in an opposite direction. This allows weight changes to track the average error rate (Hagan et al. 1996:1210). Because oscillations in the error rate reduce the efficiency of the training process, a high momentum, usually 0.9, is most often used (Smith 1996: 86). Data Considerations General Considerations. Although the specific formatting of a dataset will depend on the specific neural network application being used, there are some common data 28 requirements. First, all data in the neural model must be numeric (i.e., consist of numbers rather than letters). Categorical and other nonnumeric data, therefore, must be coded (using dummy coding, for example) for use in a neural network. Multilayer perceptron models can predict multiple dependent variables simultaneously (Smith 1996: 165). For example, Özesmi and Özesmi (1999) used a MLP with 3 output nodes to simultaneously predict the probability that a given location was suitable as a redwinged blackbird nest site, suitable as a marsh wren nest site, and not suitable as a nest site based on habitat variables. Dependent variables can be continuous values (e.g., abundance indices) or class factors (e.g., present vs. absent; poor, fair, or good) to be predicted by the model. However, the manner in which the data are coded differs slightly from typical coding schemes. For example, presence and absence data are commonly coded as either 0 (absent) or 1 (present). This coding scheme is appropriate if these data are to be used as independent variables in a MLP model. However, if the purpose is to discriminate presence from absence based on some habitat features, the data should be recoded as some value <1 and >0, such as 0.1 (absent) and 0.9 (present). This coding scheme is necessary because the logistic transfer function approaches but does not reach 0 or 1 (Smith 1996:166), and therefore, a MLP can never predict presence or absence with complete accuracy if 1 or 0 are used for coding the dependent variable(s). A benefit of the MLP approach to discrimination is that, unlike logistic regression, MLPs can discriminate >2 classes simultaneously. For example, an MLP can discriminate poor, good, fair, and excellent habitats based on sets of habitat features. Sample size is also an important consideration for the application of neural network models. The larger the sample size, the more information there is in the data about the relationship between the independent and dependent variable(s) for the network to learn. Therefore, it is desirable to have as large a database as possible. This is especially true if the relationships are complex or if the data are noisy (Smith 1996:115, Boddy and Morris 1999:57). For neural networks, the sample size required for a given level of accuracy is a function only of the noise in the data (Smith 1996:135). 29 Because neural network models become increasingly complex as the number of neurons and predictors increases (see below), the choice of variables used to predict the dependent variable should be selected with care based on extensive literature review and current knowledge about the factors affecting the system. Further, although multicollinearity is not a problem for neural models (they simply learn the redundancies in the predictors), including several correlated variables will unnecessarily increase model complexity. Training and Validation Data. The development of a neural network model requires 2 datasets, 1 set for training the network and 1 set for validation. Training data are used during the learning phase to develop the network s synaptic and bias weights. The validation data are not used in model development (i.e., the prediction errors associated with validation data are not used to adjust synaptic weights), but are used to gauge the network s ability to respond appropriately to novel data. Although model validation is an important part of the modeling exercise, including statistical modeling, few authors attempt to validate their models. Ideally, the data used in model validation should be independent of those used in model development (Conroy 1993, Conroy et al. 1995, Haefner 1996:157). However, in practice, data are a precious commodity and obtaining an independent dataset may be logistically or fiscally impossible. Furthermore, the intended purpose for the model must be considered when selecting a model validation approach (Rykiel 1996). Because independent data are often lacking, data obtained during a research project must be partitioned into training and validation sets (Fielding 1999a:219). The first decision to be made in the partitioning of the dataset is what percentages of the total dataset should be allocated to training and validation. With more training data, a neural network has more information about the relationships among variables on which to base its predictions; therefore, as many data as possible should be allocated to the training dataset (Fielding 1999a:219). We generally use 80% of our data for training and 20% for validation. 30 After choosing the number of data points to apportion to each dataset, cases must be selected. Data may be randomly assigned to the validation dataset. However, because there are no assumptions of normality for data used for neural network training, a random sample may result in unrepresentative training and validation datasets, which has been linked to the poor generalization ability of MLPs in some applications, especially discrimination (Ripley 1994). We, therefore, recommend that the selection of training and test cases be performed using a systematic approach. For example, Lusk et al. (2002) ordered their data based on the dependent variable and systematically selected every fifth case for the validation dataset. This ensured that the training and validation data were representative of the whole dataset, and, by assumption, of the range of possible datasets. Usage Considerations The Error Surface. Consider a simple neural network model consisting of 2 input nodes, 1 neuron, and a single output node. The prediction error for such a model can be represented graphically as a 3dimensional surface, where the error rate is presented as a function of the synaptic weights of each input node (Fig. 2.2). This surface represents the theoretical range of possible prediction errors for a given range of synaptic weights. Such surfaces can either be relatively flat (Fig. 2.2a) or can contain many hills and valleys (Fig. 2.2b). Because the initial synaptic weights are assigned randomly, where the network starts learning on the error surface varies. If the error surface has a relatively flat slope, the network will continue learning until the lowest point on the error surface (the global minimum) is reached. If, however, the error surface is irregular, the network will continue learning until it reaches a minimum error rate (i.e., changing synaptic weights in any direction will lead to an increase in error), but there is no guarantee that this minimum is the global minimum (Fig. 2.2b). The network may be stuck in a local minimum if other synaptic weight combinations can provide a lower prediction error. However, this problem can be ameliorated by selecting the 31 Fig. 2.2. Hypothetical error surfaces resulting from particular combinations of synaptic weights. In (a), the error surface is relatively flat, and a MLP with initial synaptic weights randomly assigned any value in this range will eventually find the combination of synaptic weights that gives the global minimum prediction error. In (b), the error surface is hilly. A MLP may not be able to find the combination of connection weights resulting in a global minimum, but instead may become stuck in a local minimum. 32 Global Minimum Local Minimum Global Minimum 33 appropriate number of neurons in the neuron layer (Smith 1996:62). As the number of neurons in the network increases, the error surface smoothes out and becomes more flat. Selecting the appropriate number of neurons can be accomplished by training several neural models on the same data, with the same learning rate and momentum, but with varying numbers of neurons. The network with the appropriate number of neurons will be the network with the smallest prediction error for both the training and the validation datasets and for which the addition of more neurons does not greatly increase the network s performance. Complexity and Parsimony. Any modeling attempt must balance the costs of added complexity in terms of loss of generalization ability and the benefit of added complexity in terms of reduced variance. This is often called the biasvariance dilemma (Geman et al. 1992). The solution is based on the principle of Occam s razor (principle of parsimony) which suggests that the appropriate model is the one that is just complex enough to adequately represent the relationships in the data but no more complex (Burnham and Anderson 1998:23). However, there is no inherent reason that a simple model should be better than a more complex model, especially if the system is known to be complex (Maurer 1999), and the choice of a model will depend on the objectives of the researcher (e.g., prediction or understanding processes). That is, if a model is used solely to predict in the realm of management, then the most accurate model may be optimal, whether or not it represents the best compromise between bias and variance. With regards to neural networks, we need to ask if the increase in complexity that accompanies neural networks provides sufficient increases in understanding or predictive power to warrant their use instead of a simple, linear model. As some authors have noted, directly comparing the predictive accuracy of both types of models is biased because the number of parameters in each model is not considered (LekAng et al. 1999). Although Haykin (1999:219222) offered several methods to limit the complexity of neural networks during training, we employ a simpler, post hoc method for ranking models. This technique adjusts the 34 sumofsquared errors based on the number of parameters in the model (Hilborn and Mangel 1997:114117): (n 2m ) SS SS j a − = , where SSa is the adjusted sumofsquares, SSj is the sumofsquares for model j, n is the sample size, and m is the number of parameters in the model. The best model is the one with the smallest adjusted sumofsquares. For a multiple linear regression, the number of parameters equals the number of regression coefficients in the model plus the intercept. Given a regression equation with 5 independent variables and 1 dependent variable, there are 6 parameters in the model. For fully connected MLPs, the number of parameters equals the number of synaptic weights and biases according to m = N(I + 1) + O (N + 1), where N = the number of neurons, I = the number of input nodes, and O = the number of output nodes. For example, a fully connected MLP with 5 input nodes, 3 neurons, and 1 output node would have m = 22 parameters. It is apparent that neural networks quickly grow in parameterization with the addition of predictors and neurons. Neural Model Interpretation Once a neural network has been trained, it can be used to generate predictions, including discrimination scores, based on new data. In addition to generating predictions, neural models can be used to increase understanding about the patterns and relationships in the data, and to generate hypotheses for further testing. There are several methods for obtaining such information from neural models. First, you can calculate the relevance (importance) of each input variable (Özesmi and Özesmi 1999): ( ) Σ Σ[ ] Σ = = = ⎟ ⎟⎠ ⎞ ⎜ ⎜⎝ ⎛ = 1 1 2 1 2 w w n j i j i i R , 35 where, for a MLP with n input nodes and j neurons, Ri is the relevance of the ith input variable and wi is the synaptic weight(s) associated with the ith input variable. Therefore, the relevance is the sum of squared synaptic weights for the ith input node divided by the sum of squared synaptic weights of all input nodes, and is a measure of the relative contribution of each input variable to the determination of network predictions. Variables with larger relevance values have stronger relationships with the dependent variables than those with smaller relevance values, i.e., they contain more information about the variation in the dependent variable than less relevant variables. This is true because input variables with larger synaptic weights exert more control over the network s response to a given stimulus. The second method for obtaining biologically significant information from a neural network model is using neural interpretation diagrams (NID) (Özesmi and Özesmi 1999). These diagrams appear similar to Fig. 2.1, but the lines representing the synaptic weights are of varying widths and colors. The width of the synapses is determined by the relative values of the synaptic weights and the color of the lines by the sign (+ or ) of each synaptic weight. Therefore, the NID indicates which variables are exerting more influence over network predictions, as well as whether they are having a positive or negative influence. However, as the number of input nodes and neurons increases, the interpretation of the diagrams becomes less straightforward. Simulation with a trained MLP model offers another alternative for interpreting the output of a neural network (Lek et al. 1996a). This method offers a view of how each input variable influences the value of the dependent variable. Some neural modeling software packages contain modules for automatically running a simulation analysis (e.g., Neural Connections, SPSS, Inc.). For other neural packages, a little more work is involved. First, a series of datasets must be constructed in which the independent variable of interest is allowed to vary between its minimum and maximum value, or over ±1 SD of the mean, while all other independent variables are held constant at their mean, or some other biologically meaningful value. These datasets are then presented to the trained model and a set of predictions is 36 produced. By plotting these predicted values against the range of values for the input variable of interest, we obtain a picture of how the dependent variable responds to variation in the independent variable being considered, all else being equal. If the interactive effects of 2 variables are of interest, a dataset in which values for these variables are allowed to vary together while the remaining variables are held constant can be constructed and presented to the trained network. Predictions can then be plotted in 3D, producing a response surface. Accuracy Assessment Because there are no significance tests associated with MLPs, there are no P values by which to judge a model s performance and extract biologically significant information. Depending on whether you are using the neural network to predict or to discriminate, there are several options for assessing the performance of the network. The most commonly used method for predictive models is to calculate the squared correlation (r2) between predicted and observed values. Simulation analyses offer a way of visualizing the effect of a single variable on the dependent variable. However, simulations actually represent the effect of the variable of interest when all other variables are at their mean. It is theoretically unlikely that such average conditions will be experienced in nature, rendering the usefulness of simulations in making management decisions uncertain. The data used to train the model can be used to determine how well the simulations represent reality, however. We can filter the observed data for cases in which all observations of independent variables are within ± 1 SE of the mean. These cases can then be plotted with the simulation data to give a measure of the accuracy of the simulation predictions. With small datasets with a large number of independent variables, it might be necessary to increase the range of SE used so that there are sufficient cases available to plot. There are several methods of determining the accuracy of discrimination models, many of which are summarized by Fielding and Bell (1997), all of which are applicable to neural 37 network output (Fielding and Bell 1997, Fielding 1999b). The simplest method for assessing the accuracy of a classification model is to calculate the percent correctly classified. However, if misclassification errors are more important to the application, then an alternative method, called receiver operator characteristic (ROC) plots, are a better alternative, because they use all available information about the performance of the neural model (Fielding 1999b), and do not rely on a specific cutoff threshold (e.g., 0.5; Fielding and Bell 1997). The area under the ROC curve (AUC) is a measure of the performance of the network and varies between 1 and 0.5. As values approach 1, the model s performance increases. That is, if you drew a random case from both classes (i.e., 0, 1), the AUC would give the probability that the discrimination score for the case from class 1 would be greater than the score for the case from class 0 and, therefore, allow you to accurately discriminate the pair independent of a threshold cutoff. Both ROC plots and the AUC can be produced with standard, desktop statistical software (e.g., SIGNAL module in SYSTAT; SPSS Inc. 1999). Examples Here we provide 2 simple examples of the application of MLP modeling. The first example uses data on the relationship between Gambel s quail production and December April precipitation (Swank and Gallizioli 1954). The second example shows how the same modeling technique can be used for discrimination, using data on habitat use by masked bobwhites (C. v. ridgwayii) (Guthery et al. 2001). These examples are intended to illustrate the application of the MLP technique to the analysis of ecological data as well as to show the benefits of their application. Gambel s Quail and Winter Precipitation We used data from Swank and Gallizioli (1954) on a study conducted between 1941 and 1953 in Arizona. These data consisted of total winter (December April) precipitation (cm) and the age ratio (juveniles/adult) in the subsequent fall harvest. Therefore, we had 1 38 input (total winter rainfall) and 1 output (fall age ratios) node in the network. Because we had only 1 predictor variable (rainfall), we trained a network that consisted of a single neuron (Fig. 2.3 inset). Therefore, the network consisted of 4 parameters (1 synaptic weight between the input node and the neuron, 2 bias weights for the neuron and output node, and 1 synaptic weight between the neuron and the output node). The network was trained for 400 iterations with an adaptive learning rate and a momentum of 0.6. Because of the small sample (n = 13), we did not partition the data into training and validation sets; doing so would have reduced the performance of the network (Fielding 1999a:219). The network accounted for 81% of the variation in the age ratios. Although the original analysis by Swank and Gallizioli (1954) did not include an estimation of trend, the authors concluded that precipitation during winter was the factor limiting abundance during their study. Our simulation analysis (Fig. 2.3) indicated that there was a relationship between fall age ratios and the previous winter s total precipitation. However, this relationship appears to be a curvilinear, logisticlike relationship (Fig. 2.3). Production (as represented by fall age ratios) was low over a wide range of total winter rainfall, but increases sharply when winter rainfall exceeds 12 cm. However, there appears to be an upper threshold of approximately 20 cm, after which there is no further increase in production with increasing precipitation. This pattern makes sense, since there is likely an upper limit to the production in any year based on time and physiological constraints (Guthery and Kuvlesky 1998). Although the relationship could have been modeled using a variety of logistic growth functions, the strength of the MLP technique is that we did not have to specify the form of the function a priori. Had the relationship been merely asymptotic rather than logistic, the MLP would have performed equally well. Nestsite Characteristics of Northern Bobwhites The same technique used above for prediction can, with minor modifications, be used in a discrimination analysis. We used data collected on the Mesa Vista Ranch in Roberts County, Texas, USA, during 2001 and 2002. Data were collected at northern bobwhite nest 39 Fig. 2.3. Simulation results from the Swank and Gallizioli (1954) MLP model showing the predicted change in fall age ratio over the observed range of variation in total winter rainfall (cm). Data points represent observed fall age ratios. Inset: a diagrammatic representation of the 111 MLP used to model the data presented in Swank and Gallizioli (1954). The MLP contained 1 input node in the input layer (total winter rainfall), 1 neuron in the neuron layer, and 1 output node in the output layer (fall age ratio). 40 0 0.5 1 1.5 2 2.5 3 3.5 4 0 10 20 30 40 Total winter (DecApril) rainfall (cm) Predicted age ratio Total winter rainfall Neuron Fall age ratio 41 sites and random locations and included vegetation canopy height (cm), percent cover by dominant tallgrass, percent cover by shrubs, bare ground exposure (%), and mean screening cover over 3 cover classes. The MLPs developed for this analysis contained 5 inputs, 2 neurons, and 1 output resulting in 15 parameters in the model. The output node represented nest sites and random locations and was coded 0.9 for nest sites and 0.1 for random locations. The network was trained with an adaptive learning rate for 500 iterations using a momentum of 0.8. The data were partitioned into training (88 cases) and validation (22 cases) sets before analysis. We measured accuracy using the area under the curve of the receiver operating characteristic (ROC) plot (Fielding and Bell 1997, Fielding 1999b). This method provides a thresholdindependent method for measuring accuracy. However, for our graphical presentation of the results, we used an arbitrary threshold of 0.5 for discriminating nest sites from random locations. We report results here only for the 3 most important variables in the model (relevance > 10%). The MLP accounted for 40.1% of the variation in the training data and 43.6% of the variation in the validation data. The area under the ROC curve was 0.842 for the training data and 0.768 for the testing data. That is, there was an 84.2% probability of correctly classifying a randomly selected pair of nest and random points based solely on the relative difference in their classification scores. The simulation analyses showed the change in suitability of a given location for use as a nest site as vegetation canopy height, percent cover by shrubs, and bare ground exposure (relevance = 32.9%, 31.2%, and 26.9%, respectively) each varied while all other variables were held at the mean (Fig. 2.4). One of the important pieces of information revealed by the simulations is the transition points between suitable and unsuitable. At the Mesa Vista Ranch, locations with canopies >40 cm were suitable for nesting (Fig. 2.4a). Locations with shrub cover >20% were also suitable as nest sites (Fig. 2.4b). However, bare ground cover in excess of 30% rendered a particular location unsuitable for nesting (Fig. 2.4c). 42 Fig. 2.4. Simulation results from the trained neural network model for differentiating random and nest locations based on vegetation characteristics on the Mesa Vista Ranch in Roberts County, Texas, 2001 2002. Results are presented only for variables with >10% contribution to the model s output: A) canopy height (cm), B) percent shrub cover, and C) bareground exposure (%). Dashed horizontal lines represent an arbitrary 0.5 cutoff threshold between suitable and unsuitable. 43 0 0.2 0.4 0.6 0.8 1 0 50 100 150 200 Canopy height (cm) 0 0.2 0.4 0.6 0.8 1 0 20 40 60 80 100 Shrub cover (%) Neural classification score 0 0.2 0.4 0.6 0.8 1 0 20 40 60 80 100 Bareground exposure (%) A B C 44 Caveats Although we have attempted to discuss limitations and peculiarities of the MLP technique in the text, there are a few more considerations when using MLPs for predictive or discriminant analysis. First, although MLPs models can be used for statistical modeling, they lack a statistical background for ascribing confidence limits to their predictions. An approximation can be achieved via bootstrapping (M. T. Hagan, Oklahoma State University, Department of Electrical and Computer Engineering, personal communication), although this can be computationally intensive depending on the complexity of the neural model. Further, a trained neural network does not have an associated P value, although some of the associated measures of accuracy (e.g., r 2) can have P values associated with them. However, as many authors have pointed out, the rampant use of Pvalues in the scientific literature is often uninformative (Cohen 1994, Anderson et al. 2000). The ability of a MLP to find patterns in noisy data is both a strength and a weakness of the technique. Because of the power with which they can find patterns, MLPs are sensitive to outliers in the training data. A MLP will learn the appropriate responses necessary to predict an outlier. However, this may weaken the model s ability to generalize when presented with new data. The MLP s response will be distorted by the outlier, resulting in inaccurate predictions. This is similar to the effect that outliers can have on the slope or intercept of a regression line. Therefore, screening outliers from training and validation data will increase the accuracy of the models predictions when presented with new data. A related problem is that of overfitting (also called overtraining; Smith 1996:113114). Overfitting occurs when model predictions match the observed data too closely, resulting in a reduction in the model s ability to generalize. Although other techniques, such as multiple regression, are also susceptible to overfitting, it is not as great a concern because these techniques are generally restricted to linear relationships (Smith 1996:114). The MLP technique is especially susceptible to overfitting because a MLP can approximate any function (Hagan et al. 1996), and can, therefore, map a dataset exactly. 45 There are 3 techniques to prevent overfitting. The easiest method is to gauge the MLP s accuracy in predicting the validation dataset. Since the validation data have not been used in model training, the MLP s ability to accurately predict validation data can indicate when the model has overfit the training data (an overfit MLP would show excellent performance on training data, but weak performance on validation data). Limiting the number of training iterations can also reduce the danger of overfitting, but there are no quantitative guidelines for this approach. Finally, MLPs lose power as the number of neurons, and hence the number of parameters, is reduced. So elimination of neurons in the presence of overfitting may result in an MLP that generalizes better. Finally, ANN models are phenomenological models and provide no information on the underlying mechanisms. However, traditional regression and discrimination models usually suffer the same limitation. Researchers must develop hypotheses for experimentation and testing to confirm relationships discovered in any model. Further, although trained MLP models can produce accurate predictions, the model parameters (i.e., synaptic and bias weights) are not as readily interpretable as coefficients from a multiple regression equation. This has been referred to as a lack of transparency and, as such, MLPs are considered black box models (Boddy and Morris 1999). We have described 3 methods for obtaining further biologically significant information from neural networks that can ameliorate this limitation. Furthermore, this lack of transparency is not as much an issue in management, where making accurate decisions and predictions may be paramount. Management Considerations We have described an alternative method of data analysis to traditional statistical techniques. Multilayer perceptrons are nonparametric, can approximate linear and nonlinear functions, are not constrained by multicollinearity, and can be used for both prediction and discrimination. In addition, MLPs can predict and discriminate simultaneously. Although an extremely powerful tool, the lack of transparency and parsimony has discouraged some 46 researchers from applying the ANN technique to their data. We believe that this hesitancy is misplaced and hope that we have demonstrated not only the mechanics of the method, but also its usefulness. Neural network modeling offers not only a method for elucidating complex relationships from multivariate datasets, but also can serve as a basis for making more accurate and efficient management and conservation decisions. 47 CHAPTER 3 A NEURAL NETWORK MODEL FOR PREDICTING NORTHERN BOBWHITE ABUNDANCE IN THE ROLLING RED PLAINS OF OKLAHOMA1 Introduction More accurate predictions of species abundance are necessary for management and conservation to be effectively implemented (Leopold 1933, Peters 1992, Schneider et al. 1992). Such predictions are increasingly important as human impacts on the environment increase. Artificial neural network (ANN) models are extremely powerful and allow the investigation of linear and nonlinear responses. As such, ANN models offer ecologists a powerful new tool for understanding the ecologies of declining species, which can lead to more effective management (Colasanti 1991, Edwards and Morse 1995, Lek et al. 1996, Lek and Guégan 1999). Current applications of ANN models include statistical modeling (Smith 1996). In this capacity, ANN models have considerable advantages over traditional statistical models, such as regression. Artificial neural networks are extremely powerful due to their capacity to learn from the data used during training. Another advantage of ANN models over traditional models is that ANNs are inherently nonlinear (Haykin 1999:2). Because most ecological phenomena are nonlinear (Maurer 1999:110), this property of ANN models makes them more useful than standard statistical models that are often limited to linear relationships (Lek et al. 1996b). Even minor nonlinearities in the response of one variable to another can reduce the 1 Lusk, J. J., F. S. Guthery, and S. J. DeMaso. 2002. A neural network model for predicting northern bobwhite abundance in the Rolling Red Plains of Oklahoma. Pages 345 355, in J. M. Scott, P. J. Heglund, M. L. Morrison, J. B. Haufler, M. G. Raphael, W. A. Wall, and F. B. Samson, Editors, Predicting species occurrences: issues of accuracy and scale. Island Press. Covello, California, USA. 48 predictive power of traditional statistical techniques (Paruelo and Tomasel 1997). Neural networks also do not require any a priori knowledge of the nature of the relationship between predictor and response variables, which makes available nonlinear methods cumbersome (Smith 1996:1920). ANNs find the form of the response in the data presented to them and, as such, are not constrained to simple curves, as are curvilinear regression techniques (Pedhazur 1982:406, Smith 1996:20). Finally, ANN models are nonparametric (Smith 1996:20). Use of nonnormal data for neural model development will not bias the results (Baran et al. 1996). Much is known about bobwhite ecology, so it offers an effective means of evaluating the ANN technique and its applicability to management and conservation. Furthermore, an understanding of bobwhite climate relationships is an important component of management and conservation of bobwhites. Bobwhite abundance has declined over much of their range during the past several decades (Koerth and Guthery 1988, Brennan 1991, Church et al. 1993, Sauer et al. 1997). Bobwhite declines may be accelerated by climate change in some regions of their range (Guthery et al. 2000). Although we cannot manage the weather, we can factor in its effects when making management plans. By working in cooperation with state management agencies, the results of our research can be directly and immediately applied in the field, completing the research management cycle (Hejl and Granillo 1998, Kochert and Collopy 1998, Young and Varland 1998). We developed an artificial neural network model to investigate the influence of weather patterns on the abundance of northern bobwhites (Colinus virginianus; bobwhites hereafter) in a semiarid region of western Oklahoma, United States. An understanding of the effects of weather on species abundances is warranted in the light of global climate change (Root 1993, Schneider 1993). We also sought to evaluate the ANN modeling technique. Specifically, we 1) compared ANN model output with that of a traditional multiple regression model, 2) determined which model was better using a sums of squares criterion (Hilborn and Mangel 1997), and 3) conducted simulation modeling using the ANN and regression models. 49 Methods We modeled bobwhite abundance in the Rolling Red Plains ecoregion of Oklahoma. This ecoregion is in western Oklahoma, excluding the panhandle (Peoples 1991), and occupies 5.7 million ha. Mean annual precipitation is 58 cm (Oklahoma Climatological Survey, unpublished data). Biologists from the Oklahoma Department of Wildlife Conservation counted bobwhites in each county in Oklahoma. Survey routes were established in typical quail habitat (Peoples 1991). Each 32km route was surveyed twice annually beginning in 1991: once in August and once in October. Surveys were conducted either at sunrise or 1 hr before sunset. Total number of bobwhites observed per 32km route was used as an index of bobwhite abundance. Although roadside counts such as these are prone to biases, these surveys are positively related to the fall harvest in Oklahoma (r > 0.70, S. DeMaso, unpublished data). Artificial Neural Networks Artificial neural networks are mathematical algorithms developed to imitate the function of brain cells for the study of human cognition (Hagan et al. 1996:18, Smith 1996:1, Haykin 1999:69). However, early techniques were handicapped by their inability to handle non linear relationships (Hagan et al. 1996:14, Smith 1996:8). In the 1980s, neural network modeling experienced a renaissance of sorts with the development of a backpropagation algorithm (see below) that is capable of handling nonlinear relationships (Smith 1996:20). Because of their foundations in cognitive science, many of the terms used to describe aspects of ANNs are derived from neurobiology. What follows is a short explanation of the terminology of neural network modeling and a brief description of how a typical neural model works. A neural network typically consists of 3 layers: the input nodes, the neurons (also called hidden nodes or processing elements), and the output nodes. However, ANNs with more than one neuron layer are possible. Typically, each node in each layer is connected to each node in the previous layer by synapses (connection weights), and, as such, is termed fully connected 50 (Smith 1996:21). The synapses store the information learned by the model (Haykin 1999:2), and are analogous to regression coefficients (Heffelfinger et al. 1999). Each input node represents an independent variable. Values of input nodes are scaled so that they range between zero and one (Smith 1996:67). Each neuron processes the input nodes by computing a logistic function from the sum of the inputs: ( ) u u 1 e 1 g + = , where u is the weighted sum of the inputs (wjxj) plus a bias weight (wb): Σ= = + J j b j j u w w x 1 (Smith 1996:40). The logistic function above is the most widely used, but is not the only function available (Smith 1996:35). The values calculated by the neurons, g(u), are transferred to the output nodes. The output nodes perform a similar calculation and their output is detransformed to obtain a prediction of the independent variable (Smith 1996:22). In backpropagation ANNs, the error between the predicted output and the actual output is calculated and propagated back through the model where it is used to adjust the values of the synaptic weights according to one of a variety of learning rules (Hagan et al. 1996:1140; Smith 1996:67). The adjustment of the synapses is termed learning (Smith 1996:59). This process continues iteratively, with synapses adjusted after each forward pass, and is termed training. With each iteration, the ANN learns more about the relationship between inputs and outputs and, therefore, the prediction error decreases. Training is stopped before the model maps the relationship between inputs and outputs exactly. When this occurs, the network is said to be overtrained and the model s predictive abilities are diminished when presented with novel data (Hagan et al. 1996:1122, Smith 1996:113). The use of ANNs in the ecological sciences requires predictability, and there is a tradeoff between model generality and accuracy of prediction. Because ANN models begin training with randomly selected connection weights, the minimum error achieved by a network may not be the global minimum, but only a local 51 minimum (Smith 1996:62). Therefore, there may exist an error minimum lower than the one achieved by the network. However, Smith (1996:62) reported that the probability of such local minima existing decreases as more neurons are added to the model. Determining the optimum number of neurons should, therefore, maximize the chances of finding the global minimum in the error surface. Database Construction Roadside quail counts were initiated in Oklahoma in 1991, and therefore, our database comprised the 1991 1996 bobwhite surveys. We averaged each year s August and October count for our models. The database also included weather and landuse data as independent variables. Weather data were obtained on CDROM from EarthInfo, Inc. (Boulder, Colorado). We extracted mean monthly temperature data for June, July, and August. Seasonal precipitation data were calculated from total monthly precipitation. We divided the year as follows: winter = December, January, and February; spring = March, April, and May; and summer = June, July, and August. Therefore, seasonal precipitation equaled total monthly precipitation averaged for each 3mo period. We grouped climate data into these periods because they represent ecologically important phases of the bobwhite s life cycle (breeding, recruitment, and winter survival). We did not include any time lag for the effects of rainfall on quail abundance because other networks we developed indicated this lag effect was not important to model predictions (J. Lusk, unpublished data). We used weather stations closest to each survey route for obtaining weather data. As measures of landuse and human impacts, we used cattle density on nonagricultural lands (total head/km2) and the proportion of county area in agricultural crop and hay production (hereafter, agricultural production). We selected these variables because they are likely to have the greatest effect on bobwhite abundance (Murray 1958, Roseberry and Sudkamp 1998). Bobwhite abundance in Florida varied directly with cultivated acreage and inversely with acreage grazed (Murray 1958). These landuse 52 variables were determined at the county level and were extracted from the Oklahoma Department of Agriculture s annual crop statistics for each survey year in the database. The final variable included in the data set was the number of bobwhites counted during the previous year s survey. The number of bobwhites present in 1 yr is dependent on the number of bobwhites present the previous year. Furthermore, survival and reproduction may be density dependent (Roseberry and Klimstra 1984). ANN Construction, Training, and Validation Network Architecture. We used a threelayered, backpropagation neural network. The network consisted of a layer of input nodes representing the independent variables, a layer of neurons, and an output node representing the dependent variable. Our model was fully connected (Smith 1996:21). We used a commercial neuralmodeling software package (QNet for Windows, v97.02, Vesta Services, Winnetka, Illinois) for ANN development. Including too many neurons in the neuron layer may result in reduced prediction ability and including too few will limit the complexity the network can accurately learn (Smith 1996:120123). Therefore, we determined the optimal number of neurons experimentally by training models in which the same data set and model parameters were used, but the number of neurons was varied. We developed models that contained 2 through 9 neurons. We limited the maximum number of neurons to the number of input variables in the model. We selected the model with best performance gauged as the correlation between the predicted counts obtained from the model and the actual counts in the validation data set. Training Parameters. We used an adaptive learning rule during model training (Smith 1996). In addition, 3 parameters were adjusted to optimize model performance. These parameters were the number of iterations, the learning rate, and the momentum. The values we selected for the learning rate and momentum were within the range of those found to be most effective in a wide variety of neural network applications (Smith 1996:7790). The number of iterations controls how long the model has to learn the pattern and relationships 53 among the variables in the model. The larger the number of iterations, the more attempts the network has to minimize prediction errors. We trained our model for 10,000 iterations. We believed that 10,000 iterations would allow the network to find the error minimum and allow us to stop training if the network began to overfit the data. The learning rate controls the magnitude of the corrections of the synaptic weights per iteration based on the direction and magnitude of the change in the prediction error during past iterations (Smith 1996:77). Selection of too small a learning rate will increase the number of iterations necessary to reach an error minimum. However, selection of too large a learning rate may make the network unstable, resulting in oscillations in the prediction error (Hagan et al. 1996:95). We used a learning rate of 0.05. The final network parameter was momentum. Momentum determines how many past iterations are used in determining synapticweight adjustments in the current iteration (Smith 1996:8588). Momentum keeps the error corrections moving in the same direction along the error surface (Smith 1996:85). If a large momentum value is used, it will take longer for weight corrections to respond to changes in the prediction error. In other words, synaptic weight adjustments are based on the longterm trend in prediction error, and momentum determines the number of iterations used in determining the longterm trend. We used a momentum of 0.90. This momentum is appropriate for most types of models (Smith 1996:86). Validation. To assess the predictive ability, accuracy, and reliability of our ANN model, we presented the trained model with data not used in network training. We created a validation data set by extracting 20% of the data from the original data set. Data were rank ordered by the number of quail counted, and every 5th record was assigned to the validation data set. There were 98 records in the original database, resulting in 20 records in the validation data set. The systematic removal of the validation data allowed us to gauge the performance of the network over the entire range of the original bobwhite count data. Because the validation data were derived from the original data set and were, therefore, obtained under 54 the same conditions as those used for network training, the network can be considered only validated for this particular ecoregion in Oklahoma (Conroy 1993, Conroy et al. 1995). In addition to our validation data set, we tested our model with data collected in the same ecoregion but not part of the training or validation data sets. Because this model will eventually be used by managers to predict bobwhite abundance, this test will determine the utility of the model. We presented the trained model with the 1997 data and recorded the accuracy of the predictions. Regression Analysis We performed a multiple regression analysis to compare ANN performance with that of this traditional statistical model. We used the same data set used for training and validating the ANN model for the regression analysis. The fullmodel, multiple linear regression included all the independent variables and the dependent variable used in the ANN model. We used the statistical software package Statistix (Analytical Software 1996). We used the Student s ttest for determining which variables were contributing (P < 0.05) to the model predictions (Analytical Software 1996). The correlation between each model s predicted and actual bobwhite count was used as an indicator of the relative performance of each model. Model Comparison We used the percent contribution of each variable to the ANN model s predictions to identify important variables (Özesmi and Özesmi 1999). The percent contribution is calculated by dividing the sumofsquared synaptic weights for the variable of interest by the total sumof squared synaptic weights for all variables. For the regression model, we determined each variable s contribution to the total, unadjusted R2 using a forward stepwise regression (Wilkinson 1998). We calculated the increase in R2 after each variable was entered into the model to apportion the amount of variance accounted for to each variable. We then divided each individual R2 by the total unadjusted R2 for the model. This gave the percentage 55 contribution of each variable in the regression model to the model s response. This percentage is, therefore, homologous to the percent contribution of the ANN model. Although these percentage contributions are not directly comparable, they allowed us to determine what variables were driving each model. To determine if the differences in performance were due to the increased power of the ANN modeling technique, or to the increased parameterization of the ANN model, we used a sumofsquares criterion for model comparison (Hilborn and Mangel 1997:114117). This technique adjusts the sum of squared
Click tabs to swap between content that is broken into logical sections.
Rating  
Title  Northern Bobwhite Abundance in Relation to Climate, Weather, and Land Use in Arid and Semiarid Areas: a Neural Network Approach (Texas) 
Date  20040701 
Author  Lusk, Jeffrey J 
Keywords  Northern bobwhite, Climate, Weather, Land use, Arid, Semiarid, Neural network, Texas, Colinus virginianus 
Department  Zoology 
Document Type  
Full Text Type  Open Access 
Abstract  The purpose of this study was to garner a better understanding of northern bobwhite population dynamics in relation to weather and climate, as well as land use. I developed models to predict bobwhite abundance in Oklahoma and Texas using a neural network modeling approach. This method has several advantages over traditional statistical models. For example, neural networks are nonparametric and nonlinear. In addition, no a priori specification of the functional form of the relationships among predictors (independent) and response (dependent) variables is necessary. I developed models for weather variables (actual observed temperature and precipitation within a given year) and climate variables (deviations from longterm mean temperature and precipitation), and included variables representing the proportion of county area under cultivation and the density of livestock on noncultivated lands (landuse variables) in both climate and weather models. I used the adjusted sumofsquares criterion to determine the relative ability of each model to describe the relationships in the data to determine whether bobwhites responded more to climate or weather factors. The models were then employed to predict possible changes in bobwhite abundance given predicted changes in climate resulting from 2 � atmospheric CO2 concentrations. Neural models accounted for between 23 and 78% of the variation in the training datasets. Traditional linear regression had a lower adjusted sumofsquares, but neural models better represented the nonlinear relationships among predictors and the response variable. Summer temperatures were consistently important contributors to model output. In Oklahoma, landuse variables were less important to model predictions than they were in Texas. Bobwhites in Texas and Oklahoma were more sensitive to deviations from longterm mean weather than they were to actual yearly weather patterns, based on the adjusted sumofsquares. This result suggests that bobwhite populations might adapt to local conditions. In Oklahoma, cooler than average summer temperatures resulted in higher bobwhite abundances. Conversely, higher than normal summer temperatures were likely to result in lower bobwhite abundances. Likewise, positive deviations in summer temperature in Texas resulted in reduced abundance whereas negative deviations resulted in higher abundances. Given equilibrium climate conditions predicted as a result of 2 � atmospheric CO2 concentrations, bobwhite abundance in Texas and Oklahoma will likely decline. 
Note  Dissertation 
Rights  © Oklahoma Agricultural and Mechanical Board of Regents 
Transcript  NORTHERN BOBWHITE ABUNDANCE IN RELATION TO CLIMATE, WEATHER, AND LAND USE IN ARID AND SEMIARID AREAS: A NEURAL NETWORK APPROACH By Jeffrey J. Lusk Bachelor of Science University of Illinois Chicago, Illinois 1993 Master of Science Southern Illinois University Carbondale, Illinois 1998 Submitted to the Faculty of the Graduate College of the Oklahoma State University in partial fulfillment of the requirements for the Degree of DOCTOR OF PHILOSOPHY July, 2004 ii NORTHERN BOBWHITE ABUNDANCE IN RELATION TO CLIMATE, WEATHER, AND LAND USE IN ARID AND SEMIARID AREAS: A NEURAL NETWORK APPROACH Thesis Approved: Fred S. Guthery ____________________________________________________ Thesis Advisor Stanley F. Fox ____________________________________________________ Ronald E. Masters ____________________________________________________ Samuel D. Fuhlendorf ____________________________________________________ Al Carlozzi ____________________________________________________ Dean of the Graduate College iii PREFACE Some of the chapters in this dissertation have been published in peerreviewed journals. Although I shared authorship of these chapters in their published form with colleagues and collaborators, I am responsible for the content (analysis, modeling, and writing). Because each chapter was meant to be a standalone manuscript, some duplication of information is necessary. Therefore, I have elected to leave each chapter in its published form. Footnotes at the beginning of each chapter indicate the manuscript s status and, if applicable, the full citation for published chapters. Authors wishing to cite information in the published chapters should cite the published versions, since these journals own the copyrights. I attempted to limit the amount of repetition in chapters that have not been previously published. As a result, the introductions and discussions in these chapters, particularly Chapter 6, are shorter than their counterparts in published chapters. I would like to thank my advisor, Dr. Fred S. Guthery, for his guidance and encouragement during my studies at Oklahoma State University (OSU). He has been both a mentor and a colleague, and it has been an honor to have worked with him. He encouraged me to challenge existing knowledge and pervading paradigms, and provided a role model to emulate. I thank Dr. Samuel D. Fuhlendorf for serving on my committee, for his constructive and detailed comments on numerous manuscripts, and for his perspectives on landscape ecology and rangelands. I would also like to thank my other committee members, Drs. Ronald Masters and Stanley Fox for their assistance and advice. Several people have made my time in Stillwater more enjoyable. Most notably, I would like to thank Kim Suedkamp Wells, Heather Hansen (née Wilson), Charles Coley, Jill Brison, and Jon Forsman for their friendship, support, and encouragement. C. Coley also provided moral support and editing assistance during the iii writing of this dissertation. Finally, I would like to thank my parents for their constant love and support, even though they still are not certain what it is that I do. Financial support for this project was provided by the Bollenbach Endowment and the Game Bird Research Fund through Dr. Fred S. Guthery. I was also supported by a Presidential Fellowship for Water, Energy and the Environment from the OSU Environmental Institute and a Doris and Eugene Miller Distinguished Graduate Fellowship from the OSU Foundation. Further support was provided by the Department of Forestry, Department of Zoology, Oklahoma Department of Wildlife Conservation, Texas Parks and Wildlife Department and the Oklahoma Agricultural Experiment Station. v TABLE OF CONTENTS Page PREFACE.......................................................................................................................................... iii TABLE OF CONTENTS................................................................................................................ v LIST OF TABLES............................................................................................................................ vii LIST OF FIGURES.......................................................................................................................... viii Chapter 1 GENERAL INTRODUCTION AND LITERATURE REVIEW............................................ 1 2 NEURAL NETWORK MODELING: AN APPROACH TO DISCRIMINATION AND PREDICTION...................................................................................................... 15 Abstract......................................................................................................................... 15 Introduction.................................................................................................................. 15 Model Description..................................................................................................... 22 Neural Network Architecture..................................................................... 22 The Training Process...................................................................................... 25 Data Considerations........................................................................................ 27 Usage Considerations.................................................................................... 30 Neural Model Interpretation........................................................................ 34 Accuracy Assessment.................................................................................... 36 Examples....................................................................................................................... 37 Gambel s Quail and Winter Precipitation.............................................. 37 Nestsite Characteristics of Northern Bobwhites............................. 38 Caveats........................................................................................................................... 44 Management Considerations.............................................................................. 45 3 A NEURAL NETWORK MODEL FOR PREDICTING NORTHERN BOBWHITE ABUNDANCE IN THE ROLLING RED PLAINS OF OKLAHOMA............. 47 Introduction.................................................................................................................. 47 Methods......................................................................................................................... 49 Artificial Neural Networks............................................................................. 49 Database Construction.................................................................................. 51 ANN Construction, Training, and Validation......................................... 52 Regression Analysis......................................................................................... 54 Model Comparison........................................................................................... 54 Simulation Analyses......................................................................................... 55 Results............................................................................................................................ 56 Discussion..................................................................................................................... 70 Conclusions.................................................................................................................. 74 vi Chapter Page 4 NORTHERN BOBWHITE (COLINUS VIRGINIANUS) ABUNDANCE IN RELATION TO YEARLY WEATHER AND LONGTERM CLIMATE PATTERNS.................................................................................................................... 76 Abstract........................................................................................................................ 76 Introduction.................................................................................................................. 77 Methods........................................................................................................................ 79 Northern Bobwhites........................................................................................ 79 Abundance Indices........................................................................................... 79 Climate and Weather Variables................................................................ 80 Landuse Variables........................................................................................... 81 Neural Networks............................................................................................... 82 Results........................................................................................................................... 84 Neural Models.................................................................................................... 84 Simulation Analyses......................................................................................... 86 Discussion.................................................................................................................... 89 Conclusions.................................................................................................................. 95 5 RELATIVE ABUNDANCE OF BOBWHITES IN RELATION TO WEATHER AND LAND USE..................................................................................................................... 97 Abstract...................................................................................................................... 97 Introduction............................................................................................................... 98 Methods...................................................................................................................... 101 Neural Network Architecture................................................................... 101 Database Construction................................................................................ 102 Model Interpretation..................................................................................... 103 Results......................................................................................................................... 106 Discussion.................................................................................................................. 117 Management Implications.................................................................................. 121 6 EFFECTS OF CLIMATE DEVIATIONS ON NORTHERN BOBWHITE ABUNDANCE IN TEXAS....................................................................................... 123 Introduction............................................................................................................... 123 Methods...................................................................................................................... 124 Results......................................................................................................................... 126 Discussion.................................................................................................................. 131 7 THE EFFECTS OF GLOBAL CLIMATE CHANGE ON NORTHERN BOBWHITE ABUNDANCE............................................................................................................ 135 Abstract...................................................................................................................... 135 Introduction............................................................................................................... 136 Methods...................................................................................................................... 139 Results and Discussion........................................................................................ 141 8 CONCLUSIONS............................................................................................................................. 181 9 LITERATURE CITED..................................................................................................................... 187 vii LIST OF TABLES Table Page 2.1 Definitions of terms used in neural modeling, listed alphabetically.......................... 17 3.1 Parsimony analysis of the artificial neural network model and the regression model using the adjusted sumofsquares (Hilborn and Mangel 1997)................ 57 3.2 Contribution of each independent variable to the artificial neural network and regression models predictions of bobwhite abundance in the Rolling Red Plains of Oklahoma.......................................................................................................................... 58 4.1 Independent variable contributions to neural network predictions of normalized bobwhite counts (19911997) in Oklahoma based on weather and climate data. Percent contribution reflects the importance of a particular variable in determining a neural network s predictions relative to other variables.................................................................................................................................. 85 5.1 State and ecosystemlevel means for independent variables used to develop a predictive model for northern bobwhite abundance in Texas, 1978 1997.. 105 5.2 Relevance (importance) of input variables in a 4neuron neural model developed to predict the abundance of northern bobwhites in Texas based on data collected during 1978 1997. Relevance is calculated as the sum of the squared weight of the variable of interest divided by the sum of squared weights for all inputs. The higher the relevance score, the more the variable contributes to the model s predictions and, therefore, gives the relative importance of each variable....................................................................................................... 107 viii LIST OF FIGURES Figure Page 1.1 Hypothetical relationship between abundance and temperature showing how the range over which a variable is measured in the field can determine the response type. Even if sampling crosses the depicted zones, the overall correlation might still be negative, positive, or nonexistent......................................... 12 2.1 A diagrammatic representation of a generic multilayer perceptron, neural network model. This MLP is a 321 network (3 input nodes, 2 neurons, and 1 output node) consisting of 3 layers: an input layer (A), a neuron layer (B), and an output layer (C). Nodes in 1 layer are connected to nodes in the preceding layer via synaptic weights (D). Each neuron also has an associated bias weight (E)........................................................................................................... 23 2.2 Hypothetical error surfaces resulting from particular combinations of synaptic weights. In (a), the error surface is relatively flat, and a MLP with initial synaptic weights randomly assigned any value in this range will eventually find the combination of synaptic weights that gives the global minimum prediction error. In (b), the error surface is hilly. A MLP may not be able to find the combination of connection weights resulting in a global minimum, but instead may become stuck in a local minimum.................................. 31 2.3 Simulation results from the Swank and Gallizioli (1954) MLP model showing the predicted change in fall age ratio over the observed range of variation in total winter rainfall (cm). Data points represent observed fall age ratios. Inset: a diagrammatic representation of the 111 MLP used to model the data presented in Swank and Gallizioli (1954). The MLP contained 1 input node in the input layer (total winter rainfall), 1 neuron in the neuron layer, and 1 output node in the output layer (fall age ratio)..................................................... 39 2.4 Simulation results from the trained neural network model for differentiating random and nest locations based on vegetation characteristics on the Mesa Vista Ranch in Roberts County, Texas, 2001 2002. Results are presented only for variables with >10% contribution to the model s output: A) canopy height (cm), B) percent shrub cover, and C) bareground exposure (%). Dashed horizontal lines represent an arbitrary 0.5 cutoff threshold between suitable and unsuitable................................................................................................................. 42 3.1 Predicted bobwhite counts from the artificial neural network model plotted against the actual values in the (a) training data set and (b) the validation data set, for the Rolling Red Plains of western Oklahoma. The trend line represents the linear model regression of predicted bobwhite count on the actual bobwhite count................................................................................................................... 59 ix Figure Page 3.2 Predicted bobwhite counts from the full model regression plotted against the actual values in (a) training data set and (b) the validation data set, for the Rolling Red Plains of western Oklahoma. The trend line represents the linear model regression of predicted bobwhite count on actual bobwhite count...................................................................................................................................................... 61 3.3 Neural network simulation analyses (solid line) and regression predictions (dashed line) of the response of bobwhite counts in the Rolling Red Plains of western Oklahoma to mean monthly temperature in (a) June, (b) July, and (c) August. Temperature is reported in degrees Celsius, and the same scale was used for each plot.................................................................................................................. 64 3.4 Neural network simulation results (solid line) and regression predictions (dashed line) of the response of bobwhite counts to seasonal precipitation in the Rolling Red Plains of western Oklahoma. Winter months (a) included December, January, and February; spring months (b) included March, April, and May; and summer months (c) included June, July, and August. Precipitation is reported in centimeters, but each plot has its own scale........... 66 3.5 Neural network simulation results (solid line) and regression predictions (dashed line) of the response of bobwhite counts in the Rolling Red Plains of western Oklahoma to (a) the proportion of county area in agricultural production, (b) cattle density on nonagricultural lands, and (c) the previous year s bobwhite count. Cattle density is reported as total number of head per km2 of nonagricultural land................................................................................................ 68 4.1 Results of simulation analyses of the independent variables effects on normalized bobwhite counts in Oklahoma using the weather neural network. Variables of interest are the observed weather conditions and landscape variables for a particular year: June (a), July (b), and August (c) temperature; winter (d), spring (e), and summer (f) precipitation; and the proportion of county area in cultivation (g), density of cattle on noncultivated land (h), and the previous year s normalized bobwhite count (i).......................................................... 87 4.2 Results of simulation analyses of the independent variables effects on normalized bobwhite counts in Oklahoma using the climate neural network. The variables in this network were the deviations of annual weather conditions from longterm mean conditions and landscape variables: deviation from longterm mean June (a), July (b) and August (c) temperature; deviation from longterm mean winter (d), spring (e), and summer (f) precipitation; and the proportion of county area in cultivation (g), density of cattle on noncultivated land (h), and the previous year s normalized bobwhite count (i)................................................................................................................................................. 90 5.1 Predicted versus observed northern bobwhite counts recorded by Texas Parks and Wildlife Department biologists during annual August surveys (1978 1997) for training data (A) and validation data (B) using a 4neuron neural network. The trend line indicates the linear relationship between predicted and observed counts................................................................................................ 108 x Figure Page 5.2 Predicted northern bobwhite counts from simulation analyses of the effects of June (A), July (B), and August (C) mean maximum temperature (°C) generated from the trained neural model using a data set in which the independent variable of interest varies between its minimum and maximum, and all other independent variables are held constant at their statewide mean (Table 5.1). Dashed vertical lines indicate the mean value of the independent variable. The same scale was used for each plot s Yaxis to provide information on sensitivity............................................................................................. 110 5.3 Predicted northern bobwhite counts from simulation analyses of the effects of winter (A), spring (B), summer (C), and fall (D) rainfall (mm) generated from the trained neural model using a data set in which the independent variable of interest varies between its minimum and maximum, and all other variables are held constant at their statewide mean (Table 1). Dashed vertical lines indicate the mean value of the independent variable. The same scale was used for each plot s Yaxis to provide information on sensitivity......... 112 5.4 Predicted northern bobwhite counts from simulation analyses of the effect of the proportion of county area in cultivation (A), head of livestock per hectare of noncultivated land (B), and previous year s bobwhite count (C). Predictions were generated from the trained neural model using a data set in which the independent variable of interest varies between its minimum and maximum, and all other independent variables are held constant at their statewide mean (Table 5.1). Dashed vertical lines indicate the mean value of the independent variable of interest. The same scale was used for each plot s Yaxis to provide information on sensitivity............................................................. 115 6.1 Predicted versus observed bobwhite abundance for counts recorded by Texas Parks and Wildlife Department during annual August surveys (1978 1997) for both training and testing/verification datasets using a 5neuron neural network. .............................................................................................................................. 127 6.2 Predicted bobwhite abundance as a function of (a) the previous year s bobwhite count, (b) deviations from longterm mean June temperature, and (c) livestock density on noncultivated lands generated by the 5neuron neural network. The variable of interested was varied incrementally from the maximum observed value to the minimum observed value while the remaining variables were held constant at their means. The scale of the y axis in each graph is identical to provide information on the sensitivity of the model..................................................................................................................................................... 129 7.1 Predicted changes in northern bobwhite abundance in Texas based on climate change scenarios developed from the Goddard Institute of Space Science general circulation model (GISS GCM). Predictions were based on a 0.5×0.5° latitude/longitude grid and interpolated over the entire state using universal kriging................................................................................................................................ 142 7.2 Predicted changes in standard normal deviate of bobwhite counts in Oklahoma based on climate change scenarios developed from the Goddard Institute of Space Science general circulation model. The predictions were based on a 0.5×0.5° latitude/longitude grid and interpolated across the state using universal kriging....................................................................................................... 144 xi Figure Page 7.3 Change in June temperature for Oklahoma as predicted by the Goddard Institute for Space Science general circulation model. The values represent the difference between model predictions at 2×CO2 and 1×CO2 concentrations.................................................................................................................................. 147 7.4 Change in July temperature for Oklahoma as predicted by the Goddard Institute for Space Science general circulation model. The values represent the difference between model predictions at 2×CO2 and 1×CO2 concentrations.................................................................................................................................. 149 7.5 Change in August temperature for Oklahoma as predicted by the Goddard Institute for Space Science general circulation model. The values represent the difference between model predictions at 2×CO2 and 1×CO2 concentrations.................................................................................................................................. 151 7.6 Change in winter rainfall for Oklahoma as predicted by the Goddard Institute for Space Science general circulation model. The values represent the ratio of model predictions at 2×CO2 and 1×CO2 concentrations......................................... 153 7.7 Change in spring rainfall for Oklahoma as predicted by the Goddard Institute for Space Science general circulation model. The values represent the ratio of model predictions at 2×CO2 and 1×CO2 concentrations......................................... 155 7.8 Change in summer rainfall for Oklahoma as predicted by the Goddard Institute for Space Science general circulation model. The values represent the ratio of model predictions at 2×CO2 and 1×CO2 concentrations...................... 157 7.9 Change in fall rainfall for Oklahoma as predicted by the Goddard Institute for Space Science general circulation model. The values represent the ratio of model predictions at 2×CO2 and 1×CO2 concentrations.............................................. 159 7.10 Change in June temperature as predicted by the Goddard Institute for Space Science general circulation model for Texas. The values represent the difference between model predictions at 2×CO2 and 1×CO2 concentrations.................................................................................................................................. 162 7.11 Change in July temperature as predicted by the Goddard Institute for Space Science general circulation model for Texas. The values represent the difference between model predictions at 2×CO2 and 1×CO2 concentrations..... 164 7.12 Change in August temperature as predicted by the Goddard Institute for Space Science general circulation model for Texas. The values represent the difference between model predictions at 2×CO2 and 1×CO2 concentrations.................................................................................................................................. 166 7.13 Change in winter rainfall as predicted by the Goddard Institute for Space Science general circulation model for Texas. The values represent the ratio of model predictions at 2×CO2 and 1×CO2 concentrations......................................... 168 xii Figure Page 7.14 Change in spring rainfall as predicted by the Goddard Institute for Space Science general circulation model for Texas. The values represent the ratio of model predictions at 2×CO2 and 1×CO2 concentrations......................................... 170 7.15 Change in summer rainfall as predicted by the Goddard Institute for Space Science general circulation model for Texas. The values represent the ratio of model predictions at 2×CO2 and 1×CO2 concentrations......................................... 172 7.16 Change in fall rainfall as predicted by the Goddard Institute for Space Science general circulation model for Texas. The values represent the ratio of model predictions at 2×CO2 and 1×CO2 concentrations......................................... 174 1 CHAPTER 1 GENERAL INTRODUCTION AND LITERATURE REVIEW1 The northern bobwhite (Colinus virginianus; hereafter, bobwhite) is an important game species over much of its range. Although declines have been noted since at least the 1880s (Errington and Hamerstrom 1936), bobwhite abundance typically follows a boomorbust pattern with considerable variation in numbers between and among years (Stoddard 1931, Stanford 1972, Roseberry and Klimstra 1984:130). Possible factors influencing longterm trends in bobwhite abundance include climate change, habitat loss, and landuse changes (Edwards 1972, Klimstra 1982, Brady et al. 1993, Schemnitz 1993, Rotenberry 1998). Further, harvest may be an additive, rather than compensatory, source of mortality in years of low production (Pollock et al. 1989, Johnson and Braun 1999, Guthery et al. 2000). Before harvest and habitat management can be effective at maintaining stable, huntable populations, an understanding of the factors influencing bobwhite abundance that are not amenable to management, such as weather and climate, is required. It is further required that the interactions between climate, weather, and land use be elucidated, because it is against the backdrop of these effects that habitat and harvest management must operate. Another issue of some importance is the effects of global change on wildlife, especially in the arid and semiarid regions of the United States (Guthery et al. 2000). As such, global change is an issue of concern to both conservation and wildlife management. With the knowledge garnered from investigations of the responses of bobwhite abundance to current climate, weather, and landuse patterns, managers may be better able to plan for the effects of 1 This chapter was written to place the remaining chapters into a common context. It is not intended for publication. 2 future climate, as predicted by various globalchange models. Such planning will be a necessary part of any longterm management program (Irwin 1998), and could involve reservesite choice or habitat manipulations designed to ameliorate the effects of climate. In the United States, bobwhites range over much of the eastern and central parts of the country (Kaufman 1996). According to data from the North American Breeding Bird Survey (NABBS), bobwhite populations in the US show a longterm rate of decline of 2.40% per year (Church et al. 1993, Sauer et al. 1997). This rate of decline increased between 1982 and 1991 to 3.50% per year (Church et al. 1993). In Oklahoma, the longterm rate of decline has not been as severe, averaging only 0.20% per year (Sauer et al. 1997). However, short term trends indicate a significant decline. The 10year population trend for the period 1986 1996 indicates a 3.88% per year decline, and the 3year trend (19931996) indicates populations are declining at a rate of 7.26% per year (Sauer et al. 1997). In Texas, the long term rate of decline is 2.00% per year, with shortterm declines of 6.43% per year (10year trend) and 20.09% per year (3year trend) (Sauer et al. 1997). Although the abovecited declines may be cause for concern among wildlife managers, these changes in average abundance through time provide a reference frame from which to determine population status. As mentioned previously, bobwhite populations tend toward boomorbust dynamics across their range (Stoddard 1931, Stanford 1972, Roseberry and Klimstra 1984:130). In the US, the mean number of bobwhites counted per NABBS route over the years 1966 1996 was 20.95. In Oklahoma and Texas, the mean was 47.12 and 33.21, respectively (Sauer et al. 1997). Considering shorter intervals, the 10year mean in Oklahoma is 44.59 bobwhites per NABBS route, and in Texas 26.37 bobwhites per NABBS route. The 3year means for 1993 1995 are 37.83 and 21.55 bobwhites per NABBS route in Oklahoma and Texas, respectively (Sauer et al. 1997). Therefore, trends in bobwhite populations may not be as severe as suggested by the percent declines. The importance of various weather factors in determining avian abundance varies both with the species being considered and with latitude. Temperature is a controlling factor in 3 northern latitudes, especially over the winter period. In southern latitudes, rainfall and moisture tend to be more important than temperature (Newton 1998:288), but summer temperature can also have important effects on the reproductive biology of a species (Leopold 1933, Robinson and Baker 1955, Speake and Haugen 1960, Guthery et al. 2001), thereby influencing abundance measured in the autumn. Among gallinaceous birds, young are often susceptible to both rainfall and temperature (Sumner 1935, Newton 1998:288). Weather effects may manifest both through direct and indirect means. Direct effects such as hyper and hypothermia are obvious, but weather s indirect effects may be more difficult to detect. Weather may act indirectly on abundance through both food availability and habitat suitability (Swank and Gallizioli 1954, Sowls 1960, Newton 1998), and may be moderated or accentuated by both the length and intensity of the weather event (Leopold 1931, Elkins 1995). For example, insect prey is essential for successful broodrearing among quail (Hurst 1972), and the availability of such prey is determined, in part, by rain and temperature (Elkins 1995). Periods of drought and high temperature will reduce the amount of insect prey available and, therefore, reduce production (Newton 1998:289). Further, these impacts on production might increase in magnitude with the length of the drought. In addition, the effects of weather on a species are not constant, but vary with the average physical condition of the local population. If a drought is of sufficient duration, the population may be food stressed and less able to withstand the vagaries of weather than a population that has not experienced a food shortage, but exposed to the same weather conditions (Newton 1998:289). Rainfall and temperature both influence quail dynamics (Edwards 1972, Stanford 1972, Campbell et al. 1973, Roseberry and Klimtstra 1984, Giuliano and Lutz 1993), but the effects vary with region. Investigations of weather effects also differ in how they define weather variables, such as summer rain, and in the estimates of population parameters used. Consequently, reported results are not directly comparable and often lead to confusion about the exact effects of weather on quail production and population status. 4 In arid regions, rainfall is the most influential weather component for avian survival and production (Newton 1998), is an important determinant of abundance, and can affect various demographic components of bobwhites. In drier environments in south Texas, the bobwhite s breeding season ends 2 months earlier than in more mesic environments (Guthery et al. 1988). Summer rainfall (April August) was highly, positively correlated with hunter success for scaled quail (Callipepla squamata) in eastern New Mexico (Campbell 1968). Rainfall may be more critical during certain periods of the life cycles of quail species than during other periods. Heffelfinger et al. (1999) found that midwinter (December January) rainfall affected calling behavior of Gambel s quail (Callipepla gambelii) more than rainfall during early (October November) or late (February March) winter. In arid and semiarid regions of Oklahoma and Texas, spring and summer rainfall might be particularly important (Stanford 1972). However, Campbell et al. (1973) did not find a significant correlation between May June or April July rainfall and scaled quail production in New Mexico. A lack of linear correlation between environmental and response variables may not necessarily indicate a lack of relationship between the variables (Laasko et al. 2001). Summer rainfall (July August) had the greatest influence on scaled quail production (Campbell et al. 1973), with most of the response due to August rainfall alone (Campbell 1968). Percent juveniles in the fall bobwhite harvest was positively related to the average total rainfall between May and August in Alabama (Speake and Haugen 1960). Bobwhite production in Louisiana responded positively to increasing summer precipitation, with highest production occurring when precipitation exceeded 762 mm (Reid and Goodrum 1960). June rainfall in Texas was only weakly related to bobwhite abundance (Giuliano and Lutz 1993). Recent work by Bridges et al. (2001) in Texas showed that, although 12month rainfall totals were positively correlated with bobwhite abundance in the South Texas Plains, the 12 month Modified Palmer Drought Severity Index (PMDI; an index of rainfall that accounts for soil type and moisture, temperature, and evaporation) was more strongly correlated with bobwhite abundance. They also reported that monthly PMDIs were positively correlated with bobwhite 5 abundance in the Cross Timbers and Prairies (November February, rs ≥ 0.57), Edwards Plateau (September November, rs ≥ 0.59), Rolling Plains (September February, April, June; rs ≥ 0.56), and South Texas Plains (October July, rs ≥ 0.56), whereas raw rainfall amount was positively correlated with bobwhite abundance only in the South Texas Plains. Although snowfall sufficient to kill bobwhites occurs in parts of their range, snowfall is probably not a major concern in arid and semiarid regions. In these regions, however, winter rainfall can still influence quail production. The effects of winter rain, again, vary by species and region. Percent juveniles in fall populations of scaled quail showed a nonsignificant, negative relationship with winter (October March) rainfall both in pre and postharvest samples (Campbell et al. 1973). However, in an earlier study of scaled quail in the same area, winter rainfall (October March) showed nonsignificant, positive correlation with hunter success, which is assumed to be an index of abundance (Campbell 1968). Giuliano and Lutz (1993) found that scaled quail abundance in Texas was positively correlated to winter rainfall. Bobwhite harvest in Illinois was positively related to winter rainfall (Edwards 1972), whereas, in Texas, abundance showed a nonsignificant, negative correlation with winter rainfall (Giuliano and Lutz 1993). California quail (Callipepla californica) age ratios were positively correlated with winter (January March) rainfall in California (Francis 1970). Temperature may be a less important factor in quail production than rainfall (Edwards 1972), or may only be important below some critical threshold of precipitation (Robinson and Baker 1955, Heffelfinger et al. 1999). However, this might not hold for arid and semiarid regions where operative temperatures may exceed the thermotolerance limits of many species (Forrester et al. 1998, Heffelfinger et al. 1999, Guthery et al. 2001). In such areas, high temperatures reduce the amount of space time available for use by a species (Guthery 1997, Forrester et al. 1998, Heffelfinger et al. 1999). Klimstra and Roseberry (1975) reported that July August (summer) temperatures affected the end of the bobwhite nesting season. Therefore, the effects of temperature will be of critical importance to bobwhite 6 production in the more southern areas of its range, if temperatures increase due to global change. Forrester et al. (1998) found that bobwhites avoided patches in which the operative temperature (a metric that takes account of the ambient air temperature plus the heating effects of sunlight and the cooling effects of airflow) exceeded 39 °C and, as a result, 50% of the available habitat space time was unusable to bobwhites during all seasons. The age ratio of bobwhite populations in Louisiana in winter responded positively to mean maximum monthly temperature in all months, but responded negatively with the highest maximum monthly temperature (Reid and Goodrum 1960). Therefore, high seasonal temperatures can affect production. For example, the length of the laying season in Illinois was reduced by 12 days for every 1 °C increase in the July August temperature (Klimstra and Roseberry 1975). In Alabama, the percent juveniles in the fall harvest was negatively correlated with the total deviation from mean monthly temperatures from May through August (Speake and Haugen 1960). Reid and Goodrum (1960) reported that bobwhite production was suppressed in hot years compared with cooler years. Hot, dry conditions reduced the percentage of female bobwhites in laying condition in south Texas (Guthery et al. 1988). Male bobwhites reduced calling behavior by 86.4% in a hot year compared with a cooler year (Guthery et al. 2001). It seems likely that bobwhites adjust their reproductive activities based on ambient weather conditions in a particular year, thereby favoring longterm survival and maximizing lifetime reproductive output. However, other studies in higher latitude areas lacked a strong effect of temperature on production and recruitment. For example, Edwards (1972) did not find consistent effects of mean monthly temperature on bobwhite harvest in Illinois. Further, Roseberry and Klimstra (1984) found no relationship between bobwhite recruitment and mean average daily temperature or mean maximum daily temperature. Although temperature reduced the length of the bobwhite breedingseason, it did not decrease the proportion of those young produced in a given year from entering the breeding population. That is, juvenile survival was not reduced. 7 The effects of temperature and rainfall can interact in influencing bobwhite abundance. Rainfall masked the effects of temperature on bobwhite production in Kansas (Robinson and Baker 1955). When precipitation was below some threshold amount, temperatures above 23.3 °C reduced bobwhite production, but there was little effect when rainfall exceeded this threshold (Robinson and Baker 1955). Combinations of low rainfall (drought) and high temperatures reduced bobwhite recruitment (Stanford 1972, Hurst et al. 1996). Guthery et al. (2002) report that temperature and rainfall influence age ratios of bobwhites in south Texas in complex, nonlinear ways, and suggest that low temperatures can mitigate the negative effects of drought and that high temperatures can eliminate the positive effects of rainfall. Habitat provides all life requisites for an individual organism (Hall et al. 1997), and is, therefore, an important factor in understanding a species abundance and distribution. Human use of the landscape can have considerable effects on its suitability as habitat for wildlife. Whereas the amount of land area converted for human use influences population dynamics, the spatial pattern of this fragmentation is also of concern (Hanski 1999). Further, different land uses will affect wildlife populations to different extents. That is, not all landuse practices are incompatible with wildlife. Human land use practices fall into 2 broad categories: 1) urban development resulting in land being converted to residential, commercial, or industrial use, and 2) agricultural development resulting in land being converted to the production of food for humans or domesticated animals. Although cropland is a dominant agricultural land use in the northern and eastern portions of the bobwhite s range, in the west, grazing may be more pervasive. Around 70% of western land area is grazed (Fleischner 1994). In Texas, approximately 53,140,000 ha, or 76.8% of the land area, is in agriculture, with 65.5% of that area rangeland and 28.7% cropland (USDA NASS, Census of Agriculture 1997). In Oklahoma, approximately 13,443,000 ha, or 74.2% of the land area, is agricultural land, of which 46.5% is rangeland and 44.7% is cropland (USDA NASS, Census of Agriculture 1997). Therefore, grazing and cultivation are important land uses that affect the amount of usable habitat 8 space time (Guthery 1997) available for bobwhites. As the predominant land use in these states, livestock grazing and cultivation undoubtedly influence the abundance, distribution, and population dynamics of a variety of wildlife species (Barnes et al. 1991). The conversion of habitat from native vegetation to row crops often converts what was once a heterogeneous landscape into a monoculture. Early agricultural practices, typified by many, small familyowned farms, resulted in a pattern of land use referred to as patchwork agriculture and was believed to enhance wildlife abundance through the creation of edge between cultivated fields and windbreaks and fencerows (Leopold 1933). Modern agricultural practices, however, are managed using clean farming practices, which favor large fields with few fencerows or windbreaks. Cultivated crops may serve as a food source for some wildlife species. Roseberry and Klimstra (1984) report that unharvested grain served as the only food source for bobwhite coveys during a prolonged snow cover in southern Illinois. The benefit to bobwhites from these unharvested grains depends on the juxtaposition of standing crops to suitable bobwhite winter habitat. In southern Illinois, much of the agricultural landscape is still in a patchwork arrangement (J. Lusk, personal observation) and, therefore, such juxtapositions occur frequently. However, the value of food plots and cultivated cropland for bobwhites in other areas where such juxtapositions are rare is probably nil, mostly because bobwhite populations cannot survive in such landscapes. Livestock grazing does not usually result in the total transformation of the vegetation community, but, depending on the intensity and periodicity, can alter the structural complexity and species composition of the habitat and thereby affect its suitability (Fleischner 1994). Whether these habitat changes will increase or decrease suitability depends on the magnitude of the changes (Severson and Urness 1994). Further, changes that favor a particular species may disfavor another species (Barnes et al. 1991, Severson and Urness 1994). Structural changes include changes in vegetation stratification leading to a reduction in structural complexity (Fleischner 1994). Grazing can also reduce the amount of litter and increase the 9 amount of bare ground, which in some cases can alter plant phenology (Kaufman et al. 1983). Changes in litter and ground cover can increase soil compaction and thereby reduce water infiltration (Orr 1960, Orodho et al. 1990), which can have nontrivial effects on plant communities, especially in arid and semiarid regions (Fleischner 1994). Grazing was the primary influence on grassland species composition in the Edwards Plateau ecoregion in Texas (Fuhlendorf and Smeins 1997, Fuhlendorf et al. in press). However, interannual precipitation was correlated with plant basal area (Fuhlendorf et al. 2001). Precipitation and grazing also interacted in determining species composition, where moderately and ungrazed areas were more resilient to the effects of severe drought than heavily grazed areas (Fuhlendorf and Smeins 1997). These grazing effects on the vegetation community will indirectly affect bobwhite abundance. Bobwhites have adapted to a variety of habitats from the eastern coast of the United States west to the Rocky Mountains. Within these longitudes, bobwhites have adapted to conditions from temperate latitudes in Wisconsin to subtropical, semiarid, and arid latitudes throughout the southern US and south to Costa Rica. Within the array of habitats the bobwhite occupies, there are many configurations of habitat types that are equally optimal (Guthery 1999). Many authors have qualitatively described bobwhite habitat in various regions. For example, Edminster (1954) reported bobwhite habitat included grassland, cropland, brushy cover, and woodland habitat types. In south Texas, optimal habitat configuration typically consisted of 53% woody canopy coverage, 38% herbaceous canopy coverage, and 44% bare ground (Kopp et al. 1998). In southern Illinois, bobwhites were associated with patchy landscapes with moderate levels of grassland and row crops, and high levels of woody edge (Roseberry and Sudkamp 1998). Although there is a great deal of ecological slack in the optimal composition of bobwhite habitat (Guthery 1999), the structural changes brought about by grazing could have the greatest impact on bobwhite abundance. Grazing may increase the amount of bare ground in an area (Fleischner 1994) and decrease amounts of certain grass species 10 (Severson and Urness 1994). These changes have been associated with increases in bobwhite use (Schulz and Guthery 1988). Peak bobwhite abundance occurred in pastures using a rapidrotation grazing system compared to abundances under continuous grazing (Hamerquist and Crawford 1981, Schulz and Guthery 1988). Given that the optimal seral stage for bobwhites varies with the overall productivity of the habitat (Spears et al. 1993), the effects of grazing on bobwhite abundance may also vary among areas and habitat types. The research reported herein was intended to address several issues of importance to bobwhite management in the arid and semiarid regions of their range, and attempted to address some of the current ambiguity apparent in previous investigations of bobwhite weather relationships. I employed an artificial neural network technique to model bobwhite abundance in relation to climate, weather, and land use. I then used these models to predict the changes in bobwhite abundance that could be expected under equilibrium climate expected under 2x the current CO2 concentrations in the atmosphere (IPCC 1998). The research reported herein is important for several reasons. First, little research into the population dynamics of grassland birds has been undertaken to date, despite the fact that declines among these species have been of greater magnitude and of a more persistent trend than for the morestudied, neotropicalmigrant forest species (Herkert and Knopf 1998, Rotenberry 1998). Conservation efforts for many grassland speciesofconcern are hampered by a lack of data on aspects of their ecology (Herkert and Knopf 1998). Further, because indirect methods are commonly used to obtain demographic data, estimates of demographic parameters based on these data might be biased or imprecise (Pollock et al. 1989, Shupe et al. 1990, Clobert and Lebreton 1991, Roseberry and Klimstra 1992). The nature of the relationship between bobwhite production and climate, weather, and land use is unclear at this time. This lack of clarity results from a multitude of studies with largely contradictory results. These contradictions might result from differences in variable definition and selection, or from the use of linear analysis techniques. Linear analyses, such as correlation and regression, are not conducive for determining functional relationships among variables when the functional 11 relationship is nonlinear. For example, correlation coefficients may indicate a positive or negative response to variation in another variable, but the lack of a strong correlation may not be indicative of a lack of relationship between the variables (Laasko et al. 2001). Furthermore, nonlinear biological responses to environmental variation can sometimes result in either spurious positive or negative correlations depending on the functional response of the biological system and the pattern of environmental variation (Laasko et al. 2001). For instance, if bobwhite abundance varies in a symmetric, unimodal fashion with temperature, then, depending on the observed range of temperatures with respect to the abundance response function, there may be positive, negative, or no relation apparent from the correlation coefficients, even when temperature is a strong forcing variable for bobwhite abundance (Fig. 1.1). Therefore, a nonlinear analysis approach is necessary to clarify these relationships and to confirm or reject results obtained using traditional linear approaches. Second, the neural models resulting from my analyses were used to predict bobwhite abundance in the fall, prior to the hunting season. As such, the Oklahoma Department of Wildlife Conservation and the Texas Parks and Wildlife Department can use them to forecast fall harvests in advance of their fall roadside counts, thereby giving them more time to act on this information. This information may also be used by managers and conservation biologists to develop proactive management plans in the light of global climate change. Because the bobwhite is an important game species, its management and conservation are of immediate concern to state wildlife managers. Declining bobwhite populations could lead to decreased revenue from the sale of hunting licenses and decreased funding from contributions to the Federal Aid in Wildlife Restoration program, and, therefore, these state agencies must begin planning to minimize the impact climate change might have on bobwhite populations within their jurisdictions. Third, research is only a part of the management process. To be useful for management, research must be conveyed to managers in a manner in which they can apply it to the decisionmaking process (Hejl and Granillo 1998, Young and Varland 1998). My 12 Fig. 1.1. Hypothetical relationship between abundance and temperature showing how the range over which a variable is measured in the field can determine the response type. Even if sampling crosses the depicted zones, the overall correlation might still be negative, positive, or nonexistent. 13 Abundance ! Zone of Positive Correlation Zone of Zero Correlation Zone of Negative Correlation Temperature ! 14 research will provide managers with both a method for forecasting fall bobwhite harvests and for understanding bobwhite responses to weather conditions. The former provision will assist in setting bag limits, season lengths, and in redirecting hunters from low abundance areas. In addition, the results can be used to develop longterm management plans. Finally, the results of this research can be used to better understand the impacts of climate change on species abundance and distribution in the central United States. Evidence for the effects of climate change on species ecology continues to mount. Changes in plant phenology will have concomitant effects among vertebrate species that rely on them for food or shelter. Many species have evolved lifehistory characteristics synchronized with seasonal changes in resource availability, but that are only weakly coupled to actual changes in the resource (Myers and Lester 1992, Root 1993). That is, species might synchronize their life history with resource availability via proximate cues (e.g., photoperiod). Changes in climate might alter or negate the relationship between the cue and the underlying resource (e.g., plant seed abundance), resulting in a decoupling of life history from resource base, and reduction in production and abundance. Community structure will also likely be affected by climate change, because each species in the community will respond to changes differently. However, such changes in community structure will result in changes in community dynamics, which will also affect the individual species. Although the models presented herein cannot address all of the complexities of the impacts of climate change on bobwhite populations, they can show how abundance and distribution will change in response to climate change alone. From this base, management actions can be focused on areas in which bobwhite abundance is predicted to be greatest or the least. Also, further research can begin to investigate the interactions between climate, landuse, and community reorganization. 15 CHAPTER 2 NEURAL NETWORK MODELING: AN APPROACH TO DISCRIMINATION AND PREDICTION1 Abstract Neural network modeling offers wildlife biologists a powerful technique for finding patterns in large, multivariate datasets. Because neural network modeling is appearing more frequently in the ecological literature, we provide a descriptive overview of this approach to data analysis in wildlife research, and discuss its merits and drawbacks. Neural networks offer a powerful alternative to traditional prediction and discrimination models, especially where little or no a priori information about the relationships among variables exists. Neural networks are nonparametric, can model linear and nonlinear relationships, are unaffected by multicollinearity, and can be applied to prediction and discrimination problems; the same model can simultaneously predict multiple dependent variables or discrimination classes. However, because of the structure of neural networks, biological interpretation of model output is not straightforward and requires additional simulations. Further, neural models can become overfit and lose the ability to generalize to new data. Focusing on 1 type of neural network, the backpropagation, multilayer perceptron, we provide a prediction and a discrimination example of the technique using published data. Introduction An artificial neural network (ANN) is one of a suite of machine learning techniques currently being applied in ecology (Fielding 1999b). Other machine learning techniques include 1 Manuscript prepared for submission to Wildlife Society Bulletin. Second author: Dr. Fred S. Guthery. 16 genetic algorithms (Mitchell 1998, Jeffers 1999) and cellular automata (Dunkerley 1999). Although other types of ANNs exist (Boddy and Morris 1999), the type we describe is a feed forward, backpropagation multilayer perceptron (Smith 1996; hereafter MLP). We chose the MLP because it is the simplest and most widely used technique in the ecological literature. This type of neural network was originally developed as a model of cognition and learning in the human brain (Rumelhart et al. 1986, Smith 1996, Boddy and Morris 1999, StevensWood 1999). As such, the associated terminology borrows heavily from neurobiology (Table 2.1). The use of neural network models in ecology is increasing and current applications include statistical modeling. The technique is nonparametric and, therefore, makes no distributional assumptions about the data. Applications thus far have dealt with comparing the performance of MLPs with that of traditional statistical methods. These comparisons have typically shown that MLP models outperform more traditional analyses such as linear regression based on accuracy of predictions (Recknagel et al. 1997, Maier et al. 1998). For example, Olson and Cochran (1998) applied a MLP to model aboveground biomass in the tallgrass prairie. Compared to a regression model, their MLP model more accurately predicted standing biomass and predicted changes in biomass with greater accuracy (Olson and Cochran 1998). An MLP predicted the species diversity of arthropod assemblages in wet soil habitats more accurately than a multiple linear regression analysis (LekAng et al. 1999). Özesmi and Özesmi (1999) compared the performance of a MLP with that of logistic regression in the classification of locations in a GIS database. These locations represented either nest or nonnest sites for redwinged blackbirds (Agelaius phoencies) and marsh wrens (Cistothorus palustris). They reported that in all but 1 case the MLP outperformed logistic regression (Özesmi and Özesmi 1999). Manel et al. (1999) compared MLPs with logistic regression and multiple discriminant analysis for predicting birdspecies occurrences, and 17 Table 2.1. Definitions of terms used in neural modeling, listed alphabetically. Term Definition Backpropagation An algorithm that sends errors detected in the output sequentially back thought the model to adjust synaptic and bias weights (parameters) Bias weight Weights attached to each neuron in the neuron and output layers; analogous to an intercept in a regression equation Hidden layer(s) One or more layers of neurons in a multilayer perceptron; also called a neuron layer and the layer of processing elements Input layer Layer containing the input nodes (independent variables) in a multilayer perceptron Input node Data used as predictors; synonymous with independent variables in traditional statistical models Learning The iterative change in synaptic weights resulting in a reduction of the mean square prediction error; the process of finding relationships among variables and producing an appropriate response for a give set of input data; also called training Learning rate A value determining the magnitude of changes made to the synaptic weights during the training process Learning rule A rule governing how a synaptic weight can be adjusted to minimize the mean square prediction 18 Table 2.1. Continued. Term Definition Learning rule, Con t error; examples include steepest descent and conjugate gradient Momentum A value determining the number of past iterations to consider when adjusting synaptic weights; reduces instabilities and oscillations in the prediction error Multilayer perceptron A type of neural network model which uses a backpropagation technique to simulate cognition and learning in the brain; used in statistical modeling to find nonlinear and linear patterns in large, multivariate datasets without assumptions inherent in parametric techniques Neural network A machine learning technique used to simulate the function of the brain Neuron A component of the neuron layer of a multilayer perceptron; transforms the weighted sum of the input variables using a transfer function such as the sigmoid transfer function Neuron layer One or more layers of neurons in a multilayer perceptron; also called the hidden layer and the layer of processing elements Output layer Layer containing the output node(s) in a multilayer perceptron 19 Table 2.1. Continued. Term Definition Output node Data being predicted by a multilayer perceptron; synonymous with the dependent variable in traditional statistical models Overfitting A problem in modeling in general and neural modeling in particular in which a model too closely approximates the data used for model development, and which, therefore, generalizes poorly to new data Processing elements One or more layers of neurons in a multilayer perceptron; also called the hidden layer or neuron layer Relevance An index of the contribution of each input variable to the predictions; a measure of the importance of an input node based on the synaptic weights Logistic transfer function A transformation applied to the weighted sum of input variables in order to approximate the underlying function or relationships among input and output variables Stimuli Another way of referring to the input data in a neural network model which maintains the neurological analogy 20 Table 2.1. Continued. Term Definition Synaptic weights Weights applied to the input variables and neurons in order to produce accurate predictions of the output variable and which are adjusted during the learning process; contain information about the relationships among input and output data; analogous to regression coefficients Training See learning. Training data Data used during the training process to determine patterns among input and output variables and to adjust synaptic weights to minimize the mean square prediction error; a portion of the total dataset from which the MLP learns Validation data Data used during or after the training process to evaluate the MLP s performance to prevent overfitting and determine how well the MLP predicts from novel data; data not used to adjust synaptic weights during training 21 found that the MLP correctly classified more cases than the other 2 methods. However, they concluded that, based on Receiver Operating Characteristic plots (Fielding 1999a), the logistic model was the better model, but that it was sensitive to the prevalence of positive cases (occupied sites) in the data (Manel et al. 1999). Using an adjusted sumofsquares technique, which penalizes models for their complexity (Hilborn and Mangel 1997), we found that a multiple linear regression model outperformed a neural model in predicting bobwhite (Colinus virginianus) abundance based on weather and landuse characteristics (Lusk et al. 2002). However, the neural model provided a better understanding of how bobwhite populations respond to climate. In addition to the above comparisons between traditional statistical techniques, other researchers have applied MLP models to a variety of research questions. Multilayer perceptron models successfully predicted call counts and age ratios for Gambel s quail (Callipepla gambelii) from precipitation and temperature data (Heffelfinger et al. 1999); occurrences of 3 smallbodied fish in freshwater streams in >80% of the cases (Mastrorillo et al. 1997); and abundances of trout (Salmo trutta) based on habitat characteristics (Baran et al. 1996, Lek et al. 1996a). A MLP model allowed wildlife managers in southern France to predict the impact of wild boar (Sus scrofa) damage to agricultural crops allowing more efficient use of limited funds (Spitz and Lek 1999). In our research, we have applied MLP models to predict northern bobwhite abundance in western Oklahoma (Lusk et al. 2002) and to determine the relative importance of longterm climate and shortterm weather patterns in determining their abundance (Lusk et al. 2001). Multilayer perceptrons can provide accurate predictions for management planning and decision making (Lein 1997), and a deeper insight into the ecological and biological processes at work (Colasanti 1991, Edwards and Morse 1995, Lek et al. 1996b). The main advantage of the MLP is that it can find patterns in large, multivariate datasets without the assumptions inherent in regression and other techniques. This is true because a MLP represents a function as a sum of terms, and any continuous function, under mild constraints, 22 can be represented as a sum of terms. Wildlife researchers may be familiar with other sum ofterms models, such as the kernel estimator used in homerange estimation (Worton 1989) and the Fourier series used in line transect analyses (Buckland et al. 1993). Our objective is to introduce MLP modeling to wildlife managers and scientists. We 1) briefly explain the theory behind neural modeling, 2) describe the structure and terminology of the neural modeling method, with specific regard to the MLP, 3) provide examples of the application of neural models to the problems of prediction and discrimination, and 4) discuss the strengths and weaknesses of the approach. Model Description Neural Model Architecture The MLP may be arranged in a series (≥ 3) of layers (Fig. 2.1). The first layer is called the input layer, which contains 1 input node for each independent variable. Input nodes are homologous to the independent variables in multiple regression. The input nodes can be considered stimuli in the neurological sense. The second layer is referred to as the hidden layer, the neuron layer, or the layer of processing elements. The neuron layer contains ≥ 1 set of neurons, the number of which determines the complexity of patterns that can be detected (Smith 1996:25). The neuron layer processes the data to predict the dependent variable(s) in the third layer, called the output layer. The output node(s), or dependent variable(s), represent the desired response. Elements in each layer may be connected to every element in the preceding layer via synaptic weights. The synaptic weights store the information learned (see below) by the network during the training process, and are analogous to regression coefficients (Heffelfinger et al. 1999), but their interpretation is not as straightforward. Typically, each node in 1 layer is connected to every node in the preceding layer (Fig. 2.1), and, as such, the neural network is termed fully connected (Smith 1996, Boddy and Morris 1999). 23 Fig. 2.1. A diagrammatic representation of a generic multilayer perceptron, neural network model. This MLP is a 321 network (3 input nodes, 2 neurons, and 1 output node) consisting of 3 layers: an input layer (A), a neuron layer (B), and an output layer (C). Nodes in 1 layer are connected to nodes in the preceding layer via synaptic weights (D). Each neuron also has an associated bias weight (E). 24 A B C D E E 25 The Training Process The development of a MLP model can be thought of as a process in which a network attempts to learn an appropriate response (e.g., a population abundance or a classification of used or unused) to a given set of stimuli. Training (or learning) is simply the rote method (see below) of adjusting parameters (biases and synaptic weights) such that prediction or discrimination becomes more accurate as parameters are iteratively adjusted. Biologists are familiar with leastsquares regression using linear models, which attempt to maximize prediction accuracy by minimizing the sumofsquared errors. The MLP operates under the same error minimization goal. However, because of nonlinearity and other model complexities, there is no analytical solution for minimization; the model must minimize error by using a learning rule that changes synaptic weights iteratively, so that the mean squared error may be reduced each iteration. During this process, which is called training (or learning), the synaptic weights begin to represent the relationships among input and output variables. In this way, the model is said to learn. Initially, a MLP has little or no ability to predict or discriminate because synaptic weights are set at small, random values (Smith 1996:22). Each neuron processes the incoming stimuli by first multiplying each input by the appropriate synaptic weight (Hagan et al. 1996:27 28). These products are then summed together and a bias weight is added (Hagan et al. 1996, Smith 1996). The bias weight is analogous to the intercept in regression analysis. This result, u, is then transformed using a transfer function. The most widely used transfer function is the logistic transfer function ( ) u u 1 e 1 g + = . The use of a logistic transfer function allows nonlinear relationships between the independent and dependent variables to be detected and learned. The processed stimuli, g(u), are then sent to an output node. At the output node, another transformation is applied to the processed stimuli, the result of which is a scaled prediction of the dependent variable(s) (Smith 26 1996). This second transformation can be the same as that applied at the neurons, but more often a linear transformation is applied (Hagan et al. 1996). The model predictions can be considered a response to the incoming stimuli. Next, the predictions generated by the model are compared with the actual values of the dependent variable(s). The prediction error is calculated and backpropagated through the network to adjust the synaptic weights. Backpropagation means that the biases and synaptic weights are first adjusted for the synapses between the neurons and the output nodes, and then adjusted for the synapses between the neurons and the input nodes; i.e., information on error is sent backwards through the model. The error is apportioned among the various synaptic weights using the chain rule of calculus (Haykin 1999:162). The adjustment of synaptic weights is governed by 3 factors. The first is the learning rule, which determines how the MLP will adjust the synaptic weights. There are several types of learning rules, the most popular of which are steepest descent and the conjugate gradient learning rules. The steepestdescent rule alters the synaptic weights after each pass through the entire dataset so that the error decreases the fastest (Smith 1996:78). A variation to the steepestdescent rule involves adjusting synaptic weights after each data point is processed, rather than after all data points have been processed. The conjugate gradient rule involves the secondorder derivative (i.e., the derivative of a derivative) of the error, which measures the rate at which that slope is changing, or, in other words, the rate at which the change in error is decelerating (Smith 1996:184). The other techniques all involve the firstorder derivative of the error, which gives the slope of the error surface (see below) for a given set of synaptic weights. The conjugate gradient technique, therefore, allows more accurate and sensitive adjustment of the synaptic weights, but is more computationally intense. Related to the learning rules is the learning rate. The learning rate determines the absolute magnitude of the changes in the synaptic weights based on the direction and magnitude of the prediction error (Smith 1996:77). So whereas the learning rules determine how the synaptic weights are changed, the learning rate determines how much the synaptic 27 weights are changed given a specific learning rule. The selection of an appropriate learning rate is important in neural model construction. If the learning rate is too small, then it will take longer for the network to learn the patterns in the data (i.e., converge to a minimal error), because only small adjustments are made to the synaptic weights. If the learning rate is too large, then the error will tend to oscillate and the network will be unstable (i.e., the predictive accuracy of the model will change from good to poor repeatedly), because the large changes to the synaptic weights will often increase the error rather than reduce it (Hagan et al. 1996:95, Smith 1996:8182). We recommend using a steepestdescent learning rule with an adaptive learning rate that will allow the learning rate to be adjusted as needed during the training process (Hagan et al. 1996:1212 1214, Smith 1996:8890). For example, if during training, the error begins to oscillate, the algorithm will reduce the learning rate until the oscillations are dampened and the error decreases. The final factor governing synaptic weight changes is called momentum and determines the degree of influence past changes in the synaptic weights have over current changes (Smith 1996: 8588). Momentum is a kind of filter, which reduces the amount of oscillations in the prediction error (Hagan et al. 1996:1210). The momentum can have a value between 0 and 1. The larger the momentum, the stronger the effect of past error changes in determining current weight changes. Therefore, the change in the error rate after the most recent iteration will tend to continue in the direction of previous changes, even if the error begins to increase in an opposite direction. This allows weight changes to track the average error rate (Hagan et al. 1996:1210). Because oscillations in the error rate reduce the efficiency of the training process, a high momentum, usually 0.9, is most often used (Smith 1996: 86). Data Considerations General Considerations. Although the specific formatting of a dataset will depend on the specific neural network application being used, there are some common data 28 requirements. First, all data in the neural model must be numeric (i.e., consist of numbers rather than letters). Categorical and other nonnumeric data, therefore, must be coded (using dummy coding, for example) for use in a neural network. Multilayer perceptron models can predict multiple dependent variables simultaneously (Smith 1996: 165). For example, Özesmi and Özesmi (1999) used a MLP with 3 output nodes to simultaneously predict the probability that a given location was suitable as a redwinged blackbird nest site, suitable as a marsh wren nest site, and not suitable as a nest site based on habitat variables. Dependent variables can be continuous values (e.g., abundance indices) or class factors (e.g., present vs. absent; poor, fair, or good) to be predicted by the model. However, the manner in which the data are coded differs slightly from typical coding schemes. For example, presence and absence data are commonly coded as either 0 (absent) or 1 (present). This coding scheme is appropriate if these data are to be used as independent variables in a MLP model. However, if the purpose is to discriminate presence from absence based on some habitat features, the data should be recoded as some value <1 and >0, such as 0.1 (absent) and 0.9 (present). This coding scheme is necessary because the logistic transfer function approaches but does not reach 0 or 1 (Smith 1996:166), and therefore, a MLP can never predict presence or absence with complete accuracy if 1 or 0 are used for coding the dependent variable(s). A benefit of the MLP approach to discrimination is that, unlike logistic regression, MLPs can discriminate >2 classes simultaneously. For example, an MLP can discriminate poor, good, fair, and excellent habitats based on sets of habitat features. Sample size is also an important consideration for the application of neural network models. The larger the sample size, the more information there is in the data about the relationship between the independent and dependent variable(s) for the network to learn. Therefore, it is desirable to have as large a database as possible. This is especially true if the relationships are complex or if the data are noisy (Smith 1996:115, Boddy and Morris 1999:57). For neural networks, the sample size required for a given level of accuracy is a function only of the noise in the data (Smith 1996:135). 29 Because neural network models become increasingly complex as the number of neurons and predictors increases (see below), the choice of variables used to predict the dependent variable should be selected with care based on extensive literature review and current knowledge about the factors affecting the system. Further, although multicollinearity is not a problem for neural models (they simply learn the redundancies in the predictors), including several correlated variables will unnecessarily increase model complexity. Training and Validation Data. The development of a neural network model requires 2 datasets, 1 set for training the network and 1 set for validation. Training data are used during the learning phase to develop the network s synaptic and bias weights. The validation data are not used in model development (i.e., the prediction errors associated with validation data are not used to adjust synaptic weights), but are used to gauge the network s ability to respond appropriately to novel data. Although model validation is an important part of the modeling exercise, including statistical modeling, few authors attempt to validate their models. Ideally, the data used in model validation should be independent of those used in model development (Conroy 1993, Conroy et al. 1995, Haefner 1996:157). However, in practice, data are a precious commodity and obtaining an independent dataset may be logistically or fiscally impossible. Furthermore, the intended purpose for the model must be considered when selecting a model validation approach (Rykiel 1996). Because independent data are often lacking, data obtained during a research project must be partitioned into training and validation sets (Fielding 1999a:219). The first decision to be made in the partitioning of the dataset is what percentages of the total dataset should be allocated to training and validation. With more training data, a neural network has more information about the relationships among variables on which to base its predictions; therefore, as many data as possible should be allocated to the training dataset (Fielding 1999a:219). We generally use 80% of our data for training and 20% for validation. 30 After choosing the number of data points to apportion to each dataset, cases must be selected. Data may be randomly assigned to the validation dataset. However, because there are no assumptions of normality for data used for neural network training, a random sample may result in unrepresentative training and validation datasets, which has been linked to the poor generalization ability of MLPs in some applications, especially discrimination (Ripley 1994). We, therefore, recommend that the selection of training and test cases be performed using a systematic approach. For example, Lusk et al. (2002) ordered their data based on the dependent variable and systematically selected every fifth case for the validation dataset. This ensured that the training and validation data were representative of the whole dataset, and, by assumption, of the range of possible datasets. Usage Considerations The Error Surface. Consider a simple neural network model consisting of 2 input nodes, 1 neuron, and a single output node. The prediction error for such a model can be represented graphically as a 3dimensional surface, where the error rate is presented as a function of the synaptic weights of each input node (Fig. 2.2). This surface represents the theoretical range of possible prediction errors for a given range of synaptic weights. Such surfaces can either be relatively flat (Fig. 2.2a) or can contain many hills and valleys (Fig. 2.2b). Because the initial synaptic weights are assigned randomly, where the network starts learning on the error surface varies. If the error surface has a relatively flat slope, the network will continue learning until the lowest point on the error surface (the global minimum) is reached. If, however, the error surface is irregular, the network will continue learning until it reaches a minimum error rate (i.e., changing synaptic weights in any direction will lead to an increase in error), but there is no guarantee that this minimum is the global minimum (Fig. 2.2b). The network may be stuck in a local minimum if other synaptic weight combinations can provide a lower prediction error. However, this problem can be ameliorated by selecting the 31 Fig. 2.2. Hypothetical error surfaces resulting from particular combinations of synaptic weights. In (a), the error surface is relatively flat, and a MLP with initial synaptic weights randomly assigned any value in this range will eventually find the combination of synaptic weights that gives the global minimum prediction error. In (b), the error surface is hilly. A MLP may not be able to find the combination of connection weights resulting in a global minimum, but instead may become stuck in a local minimum. 32 Global Minimum Local Minimum Global Minimum 33 appropriate number of neurons in the neuron layer (Smith 1996:62). As the number of neurons in the network increases, the error surface smoothes out and becomes more flat. Selecting the appropriate number of neurons can be accomplished by training several neural models on the same data, with the same learning rate and momentum, but with varying numbers of neurons. The network with the appropriate number of neurons will be the network with the smallest prediction error for both the training and the validation datasets and for which the addition of more neurons does not greatly increase the network s performance. Complexity and Parsimony. Any modeling attempt must balance the costs of added complexity in terms of loss of generalization ability and the benefit of added complexity in terms of reduced variance. This is often called the biasvariance dilemma (Geman et al. 1992). The solution is based on the principle of Occam s razor (principle of parsimony) which suggests that the appropriate model is the one that is just complex enough to adequately represent the relationships in the data but no more complex (Burnham and Anderson 1998:23). However, there is no inherent reason that a simple model should be better than a more complex model, especially if the system is known to be complex (Maurer 1999), and the choice of a model will depend on the objectives of the researcher (e.g., prediction or understanding processes). That is, if a model is used solely to predict in the realm of management, then the most accurate model may be optimal, whether or not it represents the best compromise between bias and variance. With regards to neural networks, we need to ask if the increase in complexity that accompanies neural networks provides sufficient increases in understanding or predictive power to warrant their use instead of a simple, linear model. As some authors have noted, directly comparing the predictive accuracy of both types of models is biased because the number of parameters in each model is not considered (LekAng et al. 1999). Although Haykin (1999:219222) offered several methods to limit the complexity of neural networks during training, we employ a simpler, post hoc method for ranking models. This technique adjusts the 34 sumofsquared errors based on the number of parameters in the model (Hilborn and Mangel 1997:114117): (n 2m ) SS SS j a − = , where SSa is the adjusted sumofsquares, SSj is the sumofsquares for model j, n is the sample size, and m is the number of parameters in the model. The best model is the one with the smallest adjusted sumofsquares. For a multiple linear regression, the number of parameters equals the number of regression coefficients in the model plus the intercept. Given a regression equation with 5 independent variables and 1 dependent variable, there are 6 parameters in the model. For fully connected MLPs, the number of parameters equals the number of synaptic weights and biases according to m = N(I + 1) + O (N + 1), where N = the number of neurons, I = the number of input nodes, and O = the number of output nodes. For example, a fully connected MLP with 5 input nodes, 3 neurons, and 1 output node would have m = 22 parameters. It is apparent that neural networks quickly grow in parameterization with the addition of predictors and neurons. Neural Model Interpretation Once a neural network has been trained, it can be used to generate predictions, including discrimination scores, based on new data. In addition to generating predictions, neural models can be used to increase understanding about the patterns and relationships in the data, and to generate hypotheses for further testing. There are several methods for obtaining such information from neural models. First, you can calculate the relevance (importance) of each input variable (Özesmi and Özesmi 1999): ( ) Σ Σ[ ] Σ = = = ⎟ ⎟⎠ ⎞ ⎜ ⎜⎝ ⎛ = 1 1 2 1 2 w w n j i j i i R , 35 where, for a MLP with n input nodes and j neurons, Ri is the relevance of the ith input variable and wi is the synaptic weight(s) associated with the ith input variable. Therefore, the relevance is the sum of squared synaptic weights for the ith input node divided by the sum of squared synaptic weights of all input nodes, and is a measure of the relative contribution of each input variable to the determination of network predictions. Variables with larger relevance values have stronger relationships with the dependent variables than those with smaller relevance values, i.e., they contain more information about the variation in the dependent variable than less relevant variables. This is true because input variables with larger synaptic weights exert more control over the network s response to a given stimulus. The second method for obtaining biologically significant information from a neural network model is using neural interpretation diagrams (NID) (Özesmi and Özesmi 1999). These diagrams appear similar to Fig. 2.1, but the lines representing the synaptic weights are of varying widths and colors. The width of the synapses is determined by the relative values of the synaptic weights and the color of the lines by the sign (+ or ) of each synaptic weight. Therefore, the NID indicates which variables are exerting more influence over network predictions, as well as whether they are having a positive or negative influence. However, as the number of input nodes and neurons increases, the interpretation of the diagrams becomes less straightforward. Simulation with a trained MLP model offers another alternative for interpreting the output of a neural network (Lek et al. 1996a). This method offers a view of how each input variable influences the value of the dependent variable. Some neural modeling software packages contain modules for automatically running a simulation analysis (e.g., Neural Connections, SPSS, Inc.). For other neural packages, a little more work is involved. First, a series of datasets must be constructed in which the independent variable of interest is allowed to vary between its minimum and maximum value, or over ±1 SD of the mean, while all other independent variables are held constant at their mean, or some other biologically meaningful value. These datasets are then presented to the trained model and a set of predictions is 36 produced. By plotting these predicted values against the range of values for the input variable of interest, we obtain a picture of how the dependent variable responds to variation in the independent variable being considered, all else being equal. If the interactive effects of 2 variables are of interest, a dataset in which values for these variables are allowed to vary together while the remaining variables are held constant can be constructed and presented to the trained network. Predictions can then be plotted in 3D, producing a response surface. Accuracy Assessment Because there are no significance tests associated with MLPs, there are no P values by which to judge a model s performance and extract biologically significant information. Depending on whether you are using the neural network to predict or to discriminate, there are several options for assessing the performance of the network. The most commonly used method for predictive models is to calculate the squared correlation (r2) between predicted and observed values. Simulation analyses offer a way of visualizing the effect of a single variable on the dependent variable. However, simulations actually represent the effect of the variable of interest when all other variables are at their mean. It is theoretically unlikely that such average conditions will be experienced in nature, rendering the usefulness of simulations in making management decisions uncertain. The data used to train the model can be used to determine how well the simulations represent reality, however. We can filter the observed data for cases in which all observations of independent variables are within ± 1 SE of the mean. These cases can then be plotted with the simulation data to give a measure of the accuracy of the simulation predictions. With small datasets with a large number of independent variables, it might be necessary to increase the range of SE used so that there are sufficient cases available to plot. There are several methods of determining the accuracy of discrimination models, many of which are summarized by Fielding and Bell (1997), all of which are applicable to neural 37 network output (Fielding and Bell 1997, Fielding 1999b). The simplest method for assessing the accuracy of a classification model is to calculate the percent correctly classified. However, if misclassification errors are more important to the application, then an alternative method, called receiver operator characteristic (ROC) plots, are a better alternative, because they use all available information about the performance of the neural model (Fielding 1999b), and do not rely on a specific cutoff threshold (e.g., 0.5; Fielding and Bell 1997). The area under the ROC curve (AUC) is a measure of the performance of the network and varies between 1 and 0.5. As values approach 1, the model s performance increases. That is, if you drew a random case from both classes (i.e., 0, 1), the AUC would give the probability that the discrimination score for the case from class 1 would be greater than the score for the case from class 0 and, therefore, allow you to accurately discriminate the pair independent of a threshold cutoff. Both ROC plots and the AUC can be produced with standard, desktop statistical software (e.g., SIGNAL module in SYSTAT; SPSS Inc. 1999). Examples Here we provide 2 simple examples of the application of MLP modeling. The first example uses data on the relationship between Gambel s quail production and December April precipitation (Swank and Gallizioli 1954). The second example shows how the same modeling technique can be used for discrimination, using data on habitat use by masked bobwhites (C. v. ridgwayii) (Guthery et al. 2001). These examples are intended to illustrate the application of the MLP technique to the analysis of ecological data as well as to show the benefits of their application. Gambel s Quail and Winter Precipitation We used data from Swank and Gallizioli (1954) on a study conducted between 1941 and 1953 in Arizona. These data consisted of total winter (December April) precipitation (cm) and the age ratio (juveniles/adult) in the subsequent fall harvest. Therefore, we had 1 38 input (total winter rainfall) and 1 output (fall age ratios) node in the network. Because we had only 1 predictor variable (rainfall), we trained a network that consisted of a single neuron (Fig. 2.3 inset). Therefore, the network consisted of 4 parameters (1 synaptic weight between the input node and the neuron, 2 bias weights for the neuron and output node, and 1 synaptic weight between the neuron and the output node). The network was trained for 400 iterations with an adaptive learning rate and a momentum of 0.6. Because of the small sample (n = 13), we did not partition the data into training and validation sets; doing so would have reduced the performance of the network (Fielding 1999a:219). The network accounted for 81% of the variation in the age ratios. Although the original analysis by Swank and Gallizioli (1954) did not include an estimation of trend, the authors concluded that precipitation during winter was the factor limiting abundance during their study. Our simulation analysis (Fig. 2.3) indicated that there was a relationship between fall age ratios and the previous winter s total precipitation. However, this relationship appears to be a curvilinear, logisticlike relationship (Fig. 2.3). Production (as represented by fall age ratios) was low over a wide range of total winter rainfall, but increases sharply when winter rainfall exceeds 12 cm. However, there appears to be an upper threshold of approximately 20 cm, after which there is no further increase in production with increasing precipitation. This pattern makes sense, since there is likely an upper limit to the production in any year based on time and physiological constraints (Guthery and Kuvlesky 1998). Although the relationship could have been modeled using a variety of logistic growth functions, the strength of the MLP technique is that we did not have to specify the form of the function a priori. Had the relationship been merely asymptotic rather than logistic, the MLP would have performed equally well. Nestsite Characteristics of Northern Bobwhites The same technique used above for prediction can, with minor modifications, be used in a discrimination analysis. We used data collected on the Mesa Vista Ranch in Roberts County, Texas, USA, during 2001 and 2002. Data were collected at northern bobwhite nest 39 Fig. 2.3. Simulation results from the Swank and Gallizioli (1954) MLP model showing the predicted change in fall age ratio over the observed range of variation in total winter rainfall (cm). Data points represent observed fall age ratios. Inset: a diagrammatic representation of the 111 MLP used to model the data presented in Swank and Gallizioli (1954). The MLP contained 1 input node in the input layer (total winter rainfall), 1 neuron in the neuron layer, and 1 output node in the output layer (fall age ratio). 40 0 0.5 1 1.5 2 2.5 3 3.5 4 0 10 20 30 40 Total winter (DecApril) rainfall (cm) Predicted age ratio Total winter rainfall Neuron Fall age ratio 41 sites and random locations and included vegetation canopy height (cm), percent cover by dominant tallgrass, percent cover by shrubs, bare ground exposure (%), and mean screening cover over 3 cover classes. The MLPs developed for this analysis contained 5 inputs, 2 neurons, and 1 output resulting in 15 parameters in the model. The output node represented nest sites and random locations and was coded 0.9 for nest sites and 0.1 for random locations. The network was trained with an adaptive learning rate for 500 iterations using a momentum of 0.8. The data were partitioned into training (88 cases) and validation (22 cases) sets before analysis. We measured accuracy using the area under the curve of the receiver operating characteristic (ROC) plot (Fielding and Bell 1997, Fielding 1999b). This method provides a thresholdindependent method for measuring accuracy. However, for our graphical presentation of the results, we used an arbitrary threshold of 0.5 for discriminating nest sites from random locations. We report results here only for the 3 most important variables in the model (relevance > 10%). The MLP accounted for 40.1% of the variation in the training data and 43.6% of the variation in the validation data. The area under the ROC curve was 0.842 for the training data and 0.768 for the testing data. That is, there was an 84.2% probability of correctly classifying a randomly selected pair of nest and random points based solely on the relative difference in their classification scores. The simulation analyses showed the change in suitability of a given location for use as a nest site as vegetation canopy height, percent cover by shrubs, and bare ground exposure (relevance = 32.9%, 31.2%, and 26.9%, respectively) each varied while all other variables were held at the mean (Fig. 2.4). One of the important pieces of information revealed by the simulations is the transition points between suitable and unsuitable. At the Mesa Vista Ranch, locations with canopies >40 cm were suitable for nesting (Fig. 2.4a). Locations with shrub cover >20% were also suitable as nest sites (Fig. 2.4b). However, bare ground cover in excess of 30% rendered a particular location unsuitable for nesting (Fig. 2.4c). 42 Fig. 2.4. Simulation results from the trained neural network model for differentiating random and nest locations based on vegetation characteristics on the Mesa Vista Ranch in Roberts County, Texas, 2001 2002. Results are presented only for variables with >10% contribution to the model s output: A) canopy height (cm), B) percent shrub cover, and C) bareground exposure (%). Dashed horizontal lines represent an arbitrary 0.5 cutoff threshold between suitable and unsuitable. 43 0 0.2 0.4 0.6 0.8 1 0 50 100 150 200 Canopy height (cm) 0 0.2 0.4 0.6 0.8 1 0 20 40 60 80 100 Shrub cover (%) Neural classification score 0 0.2 0.4 0.6 0.8 1 0 20 40 60 80 100 Bareground exposure (%) A B C 44 Caveats Although we have attempted to discuss limitations and peculiarities of the MLP technique in the text, there are a few more considerations when using MLPs for predictive or discriminant analysis. First, although MLPs models can be used for statistical modeling, they lack a statistical background for ascribing confidence limits to their predictions. An approximation can be achieved via bootstrapping (M. T. Hagan, Oklahoma State University, Department of Electrical and Computer Engineering, personal communication), although this can be computationally intensive depending on the complexity of the neural model. Further, a trained neural network does not have an associated P value, although some of the associated measures of accuracy (e.g., r 2) can have P values associated with them. However, as many authors have pointed out, the rampant use of Pvalues in the scientific literature is often uninformative (Cohen 1994, Anderson et al. 2000). The ability of a MLP to find patterns in noisy data is both a strength and a weakness of the technique. Because of the power with which they can find patterns, MLPs are sensitive to outliers in the training data. A MLP will learn the appropriate responses necessary to predict an outlier. However, this may weaken the model s ability to generalize when presented with new data. The MLP s response will be distorted by the outlier, resulting in inaccurate predictions. This is similar to the effect that outliers can have on the slope or intercept of a regression line. Therefore, screening outliers from training and validation data will increase the accuracy of the models predictions when presented with new data. A related problem is that of overfitting (also called overtraining; Smith 1996:113114). Overfitting occurs when model predictions match the observed data too closely, resulting in a reduction in the model s ability to generalize. Although other techniques, such as multiple regression, are also susceptible to overfitting, it is not as great a concern because these techniques are generally restricted to linear relationships (Smith 1996:114). The MLP technique is especially susceptible to overfitting because a MLP can approximate any function (Hagan et al. 1996), and can, therefore, map a dataset exactly. 45 There are 3 techniques to prevent overfitting. The easiest method is to gauge the MLP s accuracy in predicting the validation dataset. Since the validation data have not been used in model training, the MLP s ability to accurately predict validation data can indicate when the model has overfit the training data (an overfit MLP would show excellent performance on training data, but weak performance on validation data). Limiting the number of training iterations can also reduce the danger of overfitting, but there are no quantitative guidelines for this approach. Finally, MLPs lose power as the number of neurons, and hence the number of parameters, is reduced. So elimination of neurons in the presence of overfitting may result in an MLP that generalizes better. Finally, ANN models are phenomenological models and provide no information on the underlying mechanisms. However, traditional regression and discrimination models usually suffer the same limitation. Researchers must develop hypotheses for experimentation and testing to confirm relationships discovered in any model. Further, although trained MLP models can produce accurate predictions, the model parameters (i.e., synaptic and bias weights) are not as readily interpretable as coefficients from a multiple regression equation. This has been referred to as a lack of transparency and, as such, MLPs are considered black box models (Boddy and Morris 1999). We have described 3 methods for obtaining further biologically significant information from neural networks that can ameliorate this limitation. Furthermore, this lack of transparency is not as much an issue in management, where making accurate decisions and predictions may be paramount. Management Considerations We have described an alternative method of data analysis to traditional statistical techniques. Multilayer perceptrons are nonparametric, can approximate linear and nonlinear functions, are not constrained by multicollinearity, and can be used for both prediction and discrimination. In addition, MLPs can predict and discriminate simultaneously. Although an extremely powerful tool, the lack of transparency and parsimony has discouraged some 46 researchers from applying the ANN technique to their data. We believe that this hesitancy is misplaced and hope that we have demonstrated not only the mechanics of the method, but also its usefulness. Neural network modeling offers not only a method for elucidating complex relationships from multivariate datasets, but also can serve as a basis for making more accurate and efficient management and conservation decisions. 47 CHAPTER 3 A NEURAL NETWORK MODEL FOR PREDICTING NORTHERN BOBWHITE ABUNDANCE IN THE ROLLING RED PLAINS OF OKLAHOMA1 Introduction More accurate predictions of species abundance are necessary for management and conservation to be effectively implemented (Leopold 1933, Peters 1992, Schneider et al. 1992). Such predictions are increasingly important as human impacts on the environment increase. Artificial neural network (ANN) models are extremely powerful and allow the investigation of linear and nonlinear responses. As such, ANN models offer ecologists a powerful new tool for understanding the ecologies of declining species, which can lead to more effective management (Colasanti 1991, Edwards and Morse 1995, Lek et al. 1996, Lek and Guégan 1999). Current applications of ANN models include statistical modeling (Smith 1996). In this capacity, ANN models have considerable advantages over traditional statistical models, such as regression. Artificial neural networks are extremely powerful due to their capacity to learn from the data used during training. Another advantage of ANN models over traditional models is that ANNs are inherently nonlinear (Haykin 1999:2). Because most ecological phenomena are nonlinear (Maurer 1999:110), this property of ANN models makes them more useful than standard statistical models that are often limited to linear relationships (Lek et al. 1996b). Even minor nonlinearities in the response of one variable to another can reduce the 1 Lusk, J. J., F. S. Guthery, and S. J. DeMaso. 2002. A neural network model for predicting northern bobwhite abundance in the Rolling Red Plains of Oklahoma. Pages 345 355, in J. M. Scott, P. J. Heglund, M. L. Morrison, J. B. Haufler, M. G. Raphael, W. A. Wall, and F. B. Samson, Editors, Predicting species occurrences: issues of accuracy and scale. Island Press. Covello, California, USA. 48 predictive power of traditional statistical techniques (Paruelo and Tomasel 1997). Neural networks also do not require any a priori knowledge of the nature of the relationship between predictor and response variables, which makes available nonlinear methods cumbersome (Smith 1996:1920). ANNs find the form of the response in the data presented to them and, as such, are not constrained to simple curves, as are curvilinear regression techniques (Pedhazur 1982:406, Smith 1996:20). Finally, ANN models are nonparametric (Smith 1996:20). Use of nonnormal data for neural model development will not bias the results (Baran et al. 1996). Much is known about bobwhite ecology, so it offers an effective means of evaluating the ANN technique and its applicability to management and conservation. Furthermore, an understanding of bobwhite climate relationships is an important component of management and conservation of bobwhites. Bobwhite abundance has declined over much of their range during the past several decades (Koerth and Guthery 1988, Brennan 1991, Church et al. 1993, Sauer et al. 1997). Bobwhite declines may be accelerated by climate change in some regions of their range (Guthery et al. 2000). Although we cannot manage the weather, we can factor in its effects when making management plans. By working in cooperation with state management agencies, the results of our research can be directly and immediately applied in the field, completing the research management cycle (Hejl and Granillo 1998, Kochert and Collopy 1998, Young and Varland 1998). We developed an artificial neural network model to investigate the influence of weather patterns on the abundance of northern bobwhites (Colinus virginianus; bobwhites hereafter) in a semiarid region of western Oklahoma, United States. An understanding of the effects of weather on species abundances is warranted in the light of global climate change (Root 1993, Schneider 1993). We also sought to evaluate the ANN modeling technique. Specifically, we 1) compared ANN model output with that of a traditional multiple regression model, 2) determined which model was better using a sums of squares criterion (Hilborn and Mangel 1997), and 3) conducted simulation modeling using the ANN and regression models. 49 Methods We modeled bobwhite abundance in the Rolling Red Plains ecoregion of Oklahoma. This ecoregion is in western Oklahoma, excluding the panhandle (Peoples 1991), and occupies 5.7 million ha. Mean annual precipitation is 58 cm (Oklahoma Climatological Survey, unpublished data). Biologists from the Oklahoma Department of Wildlife Conservation counted bobwhites in each county in Oklahoma. Survey routes were established in typical quail habitat (Peoples 1991). Each 32km route was surveyed twice annually beginning in 1991: once in August and once in October. Surveys were conducted either at sunrise or 1 hr before sunset. Total number of bobwhites observed per 32km route was used as an index of bobwhite abundance. Although roadside counts such as these are prone to biases, these surveys are positively related to the fall harvest in Oklahoma (r > 0.70, S. DeMaso, unpublished data). Artificial Neural Networks Artificial neural networks are mathematical algorithms developed to imitate the function of brain cells for the study of human cognition (Hagan et al. 1996:18, Smith 1996:1, Haykin 1999:69). However, early techniques were handicapped by their inability to handle non linear relationships (Hagan et al. 1996:14, Smith 1996:8). In the 1980s, neural network modeling experienced a renaissance of sorts with the development of a backpropagation algorithm (see below) that is capable of handling nonlinear relationships (Smith 1996:20). Because of their foundations in cognitive science, many of the terms used to describe aspects of ANNs are derived from neurobiology. What follows is a short explanation of the terminology of neural network modeling and a brief description of how a typical neural model works. A neural network typically consists of 3 layers: the input nodes, the neurons (also called hidden nodes or processing elements), and the output nodes. However, ANNs with more than one neuron layer are possible. Typically, each node in each layer is connected to each node in the previous layer by synapses (connection weights), and, as such, is termed fully connected 50 (Smith 1996:21). The synapses store the information learned by the model (Haykin 1999:2), and are analogous to regression coefficients (Heffelfinger et al. 1999). Each input node represents an independent variable. Values of input nodes are scaled so that they range between zero and one (Smith 1996:67). Each neuron processes the input nodes by computing a logistic function from the sum of the inputs: ( ) u u 1 e 1 g + = , where u is the weighted sum of the inputs (wjxj) plus a bias weight (wb): Σ= = + J j b j j u w w x 1 (Smith 1996:40). The logistic function above is the most widely used, but is not the only function available (Smith 1996:35). The values calculated by the neurons, g(u), are transferred to the output nodes. The output nodes perform a similar calculation and their output is detransformed to obtain a prediction of the independent variable (Smith 1996:22). In backpropagation ANNs, the error between the predicted output and the actual output is calculated and propagated back through the model where it is used to adjust the values of the synaptic weights according to one of a variety of learning rules (Hagan et al. 1996:1140; Smith 1996:67). The adjustment of the synapses is termed learning (Smith 1996:59). This process continues iteratively, with synapses adjusted after each forward pass, and is termed training. With each iteration, the ANN learns more about the relationship between inputs and outputs and, therefore, the prediction error decreases. Training is stopped before the model maps the relationship between inputs and outputs exactly. When this occurs, the network is said to be overtrained and the model s predictive abilities are diminished when presented with novel data (Hagan et al. 1996:1122, Smith 1996:113). The use of ANNs in the ecological sciences requires predictability, and there is a tradeoff between model generality and accuracy of prediction. Because ANN models begin training with randomly selected connection weights, the minimum error achieved by a network may not be the global minimum, but only a local 51 minimum (Smith 1996:62). Therefore, there may exist an error minimum lower than the one achieved by the network. However, Smith (1996:62) reported that the probability of such local minima existing decreases as more neurons are added to the model. Determining the optimum number of neurons should, therefore, maximize the chances of finding the global minimum in the error surface. Database Construction Roadside quail counts were initiated in Oklahoma in 1991, and therefore, our database comprised the 1991 1996 bobwhite surveys. We averaged each year s August and October count for our models. The database also included weather and landuse data as independent variables. Weather data were obtained on CDROM from EarthInfo, Inc. (Boulder, Colorado). We extracted mean monthly temperature data for June, July, and August. Seasonal precipitation data were calculated from total monthly precipitation. We divided the year as follows: winter = December, January, and February; spring = March, April, and May; and summer = June, July, and August. Therefore, seasonal precipitation equaled total monthly precipitation averaged for each 3mo period. We grouped climate data into these periods because they represent ecologically important phases of the bobwhite s life cycle (breeding, recruitment, and winter survival). We did not include any time lag for the effects of rainfall on quail abundance because other networks we developed indicated this lag effect was not important to model predictions (J. Lusk, unpublished data). We used weather stations closest to each survey route for obtaining weather data. As measures of landuse and human impacts, we used cattle density on nonagricultural lands (total head/km2) and the proportion of county area in agricultural crop and hay production (hereafter, agricultural production). We selected these variables because they are likely to have the greatest effect on bobwhite abundance (Murray 1958, Roseberry and Sudkamp 1998). Bobwhite abundance in Florida varied directly with cultivated acreage and inversely with acreage grazed (Murray 1958). These landuse 52 variables were determined at the county level and were extracted from the Oklahoma Department of Agriculture s annual crop statistics for each survey year in the database. The final variable included in the data set was the number of bobwhites counted during the previous year s survey. The number of bobwhites present in 1 yr is dependent on the number of bobwhites present the previous year. Furthermore, survival and reproduction may be density dependent (Roseberry and Klimstra 1984). ANN Construction, Training, and Validation Network Architecture. We used a threelayered, backpropagation neural network. The network consisted of a layer of input nodes representing the independent variables, a layer of neurons, and an output node representing the dependent variable. Our model was fully connected (Smith 1996:21). We used a commercial neuralmodeling software package (QNet for Windows, v97.02, Vesta Services, Winnetka, Illinois) for ANN development. Including too many neurons in the neuron layer may result in reduced prediction ability and including too few will limit the complexity the network can accurately learn (Smith 1996:120123). Therefore, we determined the optimal number of neurons experimentally by training models in which the same data set and model parameters were used, but the number of neurons was varied. We developed models that contained 2 through 9 neurons. We limited the maximum number of neurons to the number of input variables in the model. We selected the model with best performance gauged as the correlation between the predicted counts obtained from the model and the actual counts in the validation data set. Training Parameters. We used an adaptive learning rule during model training (Smith 1996). In addition, 3 parameters were adjusted to optimize model performance. These parameters were the number of iterations, the learning rate, and the momentum. The values we selected for the learning rate and momentum were within the range of those found to be most effective in a wide variety of neural network applications (Smith 1996:7790). The number of iterations controls how long the model has to learn the pattern and relationships 53 among the variables in the model. The larger the number of iterations, the more attempts the network has to minimize prediction errors. We trained our model for 10,000 iterations. We believed that 10,000 iterations would allow the network to find the error minimum and allow us to stop training if the network began to overfit the data. The learning rate controls the magnitude of the corrections of the synaptic weights per iteration based on the direction and magnitude of the change in the prediction error during past iterations (Smith 1996:77). Selection of too small a learning rate will increase the number of iterations necessary to reach an error minimum. However, selection of too large a learning rate may make the network unstable, resulting in oscillations in the prediction error (Hagan et al. 1996:95). We used a learning rate of 0.05. The final network parameter was momentum. Momentum determines how many past iterations are used in determining synapticweight adjustments in the current iteration (Smith 1996:8588). Momentum keeps the error corrections moving in the same direction along the error surface (Smith 1996:85). If a large momentum value is used, it will take longer for weight corrections to respond to changes in the prediction error. In other words, synaptic weight adjustments are based on the longterm trend in prediction error, and momentum determines the number of iterations used in determining the longterm trend. We used a momentum of 0.90. This momentum is appropriate for most types of models (Smith 1996:86). Validation. To assess the predictive ability, accuracy, and reliability of our ANN model, we presented the trained model with data not used in network training. We created a validation data set by extracting 20% of the data from the original data set. Data were rank ordered by the number of quail counted, and every 5th record was assigned to the validation data set. There were 98 records in the original database, resulting in 20 records in the validation data set. The systematic removal of the validation data allowed us to gauge the performance of the network over the entire range of the original bobwhite count data. Because the validation data were derived from the original data set and were, therefore, obtained under 54 the same conditions as those used for network training, the network can be considered only validated for this particular ecoregion in Oklahoma (Conroy 1993, Conroy et al. 1995). In addition to our validation data set, we tested our model with data collected in the same ecoregion but not part of the training or validation data sets. Because this model will eventually be used by managers to predict bobwhite abundance, this test will determine the utility of the model. We presented the trained model with the 1997 data and recorded the accuracy of the predictions. Regression Analysis We performed a multiple regression analysis to compare ANN performance with that of this traditional statistical model. We used the same data set used for training and validating the ANN model for the regression analysis. The fullmodel, multiple linear regression included all the independent variables and the dependent variable used in the ANN model. We used the statistical software package Statistix (Analytical Software 1996). We used the Student s ttest for determining which variables were contributing (P < 0.05) to the model predictions (Analytical Software 1996). The correlation between each model s predicted and actual bobwhite count was used as an indicator of the relative performance of each model. Model Comparison We used the percent contribution of each variable to the ANN model s predictions to identify important variables (Özesmi and Özesmi 1999). The percent contribution is calculated by dividing the sumofsquared synaptic weights for the variable of interest by the total sumof squared synaptic weights for all variables. For the regression model, we determined each variable s contribution to the total, unadjusted R2 using a forward stepwise regression (Wilkinson 1998). We calculated the increase in R2 after each variable was entered into the model to apportion the amount of variance accounted for to each variable. We then divided each individual R2 by the total unadjusted R2 for the model. This gave the percentage 55 contribution of each variable in the regression model to the model s response. This percentage is, therefore, homologous to the percent contribution of the ANN model. Although these percentage contributions are not directly comparable, they allowed us to determine what variables were driving each model. To determine if the differences in performance were due to the increased power of the ANN modeling technique, or to the increased parameterization of the ANN model, we used a sumofsquares criterion for model comparison (Hilborn and Mangel 1997:114117). This technique adjusts the sum of squared 



A 

B 

C 

D 

E 

F 

I 

J 

K 

L 

O 

P 

R 

S 

T 

U 

V 

W 


