

small (250x250 max)
medium (500x500 max)
Large
Extra Large
large ( > 500x500)
Full Resolution


AB INITIO MOLECULAR DYNAMICS (AIMD) A NEW APPROACH FOR DEVELOPMENT OF ACCURATE POTENTIALS By Milind M Malshe Bachelor of Engineering University of Mumbai Mumbai, India 1998 Submitted to the Faculty of the Graduate College of the Oklahoma State University in partial fulfillment of the requirements for the Degree of MASTER of SCIENCE December, 2004 ii AB INITIO MOLECULAR DYNAMICS (AIMD) A NEW APPROACH FOR DEVELOPMENT OF ACCURATE POTENTIALS Thesis Approved: ________________________________________________ Thesis Advisor ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Dean of the Graduate College Dr. R. Komanduri Dr. S. Roy Dr. H. Lu Dr. M. Hagan Dr. L. M. Raff Dr. A. Gordon Emslie iii 1 2 3 SUMMARY In this study a new approach is presented for the development of accurate potentialenergy hypersurfaces based on ab initio calculations that can be utilized to conduct molecular dynamics and Monte Carlo simulations to study chemical and mechanical properties at the atomistic level. The method integrates ab initio electronic structure calculations with the interpolation capability of multilayer neural networks. A sampling technique based on novelty detection is also developed to ensure that the neural network fitting for the potential energy spans the entire configuration space involved during the simulation. The procedure can be initiated using an empirical potential or direct dynamics simulation. The procedure is applied for developing the potential energy hypersurface for 5atom clusters within a silicon workpiece. Ab initio calculations were performed using Gaussian 98 electronic structure program. Results for 5atom silicon clusters representing the bulk and the surface structure are presented. iv 4 ACKNOWLEDGEMENTS I would like to express my gratitude to my parents and brother for their support, encouragement, love and confidence in me. I would like to thank my advisor, Dr. R. Komanduri, for his inspiration, technical guidance, and financial support. I wish to express my sincere appreciation to Dr. L. M. Raff for introducing me to the concept of Molecular Dynamics and Monte Carlo simulations and for providing guidance and encouragement throughout the study. I would like to thank Dr. M. Hagan for introducing me to the field of Neural Networks and for his invaluable guidance and encouragement. I would like to thank Dr. P. M. Agrawal for his guidance in understanding various aspects of simulation techniques. This project is a result of coordinated effort linking simulation methods and neural networks. I would also like to thank Dr. S. Roy and Dr. H. Lu for their suggestions in this study. I’m grateful to Dr. R. Komanduri and my committee members for helping me to explore and develop new methods for improving simulation techniques. This project is funded by grant DMI0200327 from the National Science Foundation. I thank Dr. W. DeVries, Dr. G. Hazelrigg, Dr. J. Chen, and Dr. D. Durham of the Division of Design, Manufacturing, and Industrial Innovation, Dr. B. M. Kramer, Engineering Centers Division, and Dr. J. Larsen Basse, Tribology and Surface Engineering program, for their interest and support of this work. This project was also sponsored by a DEPSCoR grant on the Multiscale Modeling and Simulation of Material v Processing (F496200310281). I thank Dr. Craig S. Hartley of the AFOSR for his interest in and support of this work. vi TABLE OF CONTENTS 1 Introduction................................................................................................................ 1 2 Literature Review....................................................................................................... 4 2.1 Literature review of design of interatomic potentials ......................................... 4 2.2 Literature review of quantum mechanical calculations of silicon clusters ....... 13 3 Molecular Dynamics Simulation ............................................................................. 17 3.1 Brief history of molecular dynamics simulation............................................... 17 3.2 Applicability of MD simulations ...................................................................... 18 3.3 Formulation of the differential equations of motion......................................... 20 3.4 Time integration algorithms.............................................................................. 22 3.4.1 Verlet algorithm................................................................................ 22 3.4.2 Leapfrog Verlet algorithm............................................................... 23 3.4.3 Velocity Verlet algorithm ................................................................. 24 3.4.4 Beeman algorithm............................................................................. 24 3.5 Methods to improve computational speed ........................................................ 25 3.5.1 Neighbor list and bookkeeping techniques...................................... 25 3.5.2 Cell structures and link lists.............................................................. 25 3.6 Periodic boundary conditions (PBC) ................................................................ 26 4 Interatomic Potentials .............................................................................................. 30 4.1 Design of interatomic potentials...................................................................... 31 vii 4.2 Classification of potential energy functions ..................................................... 32 4.2.1 Pair potentials.................................................................................... 34 4.2.2 Manybody potentials ....................................................................... 35 5 Ab Initio Calculations .............................................................................................. 41 5.1 BornOppenheimer approximation ................................................................... 42 5.2 Basis sets........................................................................................................... 42 5.2.1 Slatertype orbitals (STO)................................................................. 43 5.2.2 Gaussiantype orbitals (GTO)........................................................... 43 5.2.3 Types of basis sets ............................................................................ 44 5.3 HarteeFock method.......................................................................................... 51 5.4 Electron correlation methods ............................................................................ 55 5.4.1 Requirements of electron correlation theories .................................. 55 5.4.2 Configuration interaction (CI) .......................................................... 56 5.4.3 Limited configuration interaction ..................................................... 58 5.4.4 MøllerPlesset Perturbation theory (MP).......................................... 58 5.5 Density functional theory (DFT) ...................................................................... 60 6 Neural Networks ...................................................................................................... 66 6.1 Biological neurons ............................................................................................ 66 6.2 History of multilayer neural networks .............................................................. 70 6.3 Neural networks versus conventional computers ............................................. 71 6.4 Model of a neuron............................................................................................. 72 6.5 Multilayer network............................................................................................ 73 6.6 Transfer functions ............................................................................................. 74 viii 6.7 Neural network training .................................................................................... 75 6.7.1 Supervised training and parameter optimization .............................. 76 6.8 LMS (Least Mean Squared Error) algorithm.................................................... 77 6.9 Backpropagation algorithm............................................................................... 79 6.10 Numerical optimization techniques .......................................................... 82 6.10.1 Conjugate gradient methods ............................................................. 82 6.10.2 Newton and quasiNewton methods ................................................. 83 6.10.3 LevenbergMarquardt algorithm....................................................... 84 6.11 Bias/variance dilemma.............................................................................. 84 6.11.1 Neural network growing and neural network pruning...................... 85 6.12 Regularization and early stopping ............................................................ 86 6.13 Early stopping ........................................................................................... 87 6.14 Normalizing the data................................................................................. 88 6.15 Neural network derivatives ....................................................................... 88 6.15.1 First derivative with respect to weights ............................................ 89 6.15.2 Derivatives of the transfer function .................................................. 90 6.15.3 First derivative with respect to inputs............................................... 91 6.16 Novelty detection...................................................................................... 92 6.16.1 Minimum distance computation ....................................................... 94 7 Ab Initio Molecular Dynamics (AIMD) .................................................................. 97 7.1 Approach........................................................................................................... 97 7.2 Choice of the coordinate system....................................................................... 98 7.2.1 Internal coordinates........................................................................... 99 ix 7.3 Input vector to the neural network.................................................................. 101 7.4 Sorting and ordering the input vector for neural network training ................. 103 7.4.1 Procedure for sorting and ordering the input vector for AIMD...... 105 7.5 Calculation of forces during the MD simulation ............................................ 107 7.6 Sampling technique......................................................................................... 110 7.7 Novelty detection............................................................................................ 112 8 Results and Disscussion......................................................................................... 115 8.1 Ab initio calculations....................................................................................... 116 8.2 Neural network training .................................................................................. 118 8.3 Iterating for the neural network convergence ................................................. 122 8.4 Neural network convergence using different empirical potentials ................. 129 8.5 A method to obtain a conservative force field................................................ 132 8.6 Neural network training for surface and bulk configurations......................... 137 8.7 Neural network training for ab initio energies in carbon clusters of Buckyball (C60) and carbon nanotube ..................................................................................... 140 9 Conclusions and Future Work ............................................................................... 142 10 References.............................................................................................................. 146 11 Appendix................................................................................................................ 165 x 5 LIST OF FIGURES Fig 2.1 The geometries for the neutral and ionic clusters of Si2 to Si6. ............................ 14 Fig 2.2 Scaled cohesive energy vs cluster size for neutral silicon clusters calculated at the MP4/631G* level................................................................................................. 15 Fig 2.3 Incremental binding energy vs cluster size for neutral silicon clusters calculated at the HF/631G* and MP4/631G* levels. .............................................................. 16 Fig 3.1 Cell index method................................................................................................. 26 Fig 3.2 Representation of periodic boundary condition (PBC) and the simulation cell ... 27 Fig 5.1 Description of split valence basis set.................................................................... 47 Fig 5.2 Variation of energy with respect to radius............................................................ 49 Fig 6.1 Schematic of biological neuron ............................................................................ 66 Fig 6.2 Synaptic gap ......................................................................................................... 68 Fig 6.3 Model of a neuron................................................................................................. 69 Fig 6.4 Single input neuron............................................................................................... 73 Fig 6.5 Multilayer network ............................................................................................... 73 Fig 6.6 Sigmoid transfer functions.................................................................................... 74 Fig 6.7 Evaluation of error during training....................................................................... 88 Fig 6.8 Plot of neural network output for f(x) = x4 ........................................................... 92 Fig 7.1 Internal coordinates a) bond distance, b) bond angle, and c) dihedral angle. .... 100 Fig 7.2 Sign convention for dihedral angle..................................................................... 101 Fig 7.3 Configuration of 5 silicon atoms ........................................................................ 101 Fig 7.4 Contour plot of a function f(x,y) = x2+y2 ........................................................... 104 xi Fig 7.5 Effect of ordering of atoms in a configuration ................................................... 105 Fig 8.1 Silicon cluster ..................................................................................................... 115 Fig 8.2 Comparison of neural network output with ab initio energy for the training set120 Fig 8.3 Comparison of neural network output with ab initio energy for the validation set ................................................................................................................................ 121 Fig 8.4 Comparison of neural network output with ab initio energy for the testing set. 121 Fig 8.5 Distribution of minimum distances N( m) for the initial set of Si5 configurations ................................................................................................................................ 122 Fig 8.6 Distribution of minimum distances N( m) of the configurations in first iteration from initial set configurations............................................................................. 123 Fig 8.7 Comparison of distribution of minimum distances N( m) of the configurations in first iteration from initial configurations with the distribution of configurations from Tersoff potential. ........................................................................................ 124 Fig 8.8 Distribution of recomputed minimum distances of concatenated set of configurations in the initial and first iteration..................................................... 125 Fig 8.9 Parzen’s distribution of minimum distances for the initial set of configurations from the Tersoff potential ................................................................................... 126 Fig 8.10 Parzen’s distribution of minimum distances for the configurations stored while using neural network trained for ab initio energy during MD simulation.......... 127 Fig 8.11 Distribution of minimum distances of the configurations during second iteration from configurations during the first iteration...................................................... 128 Fig 8.12 Parzen’s distribution of the minimum distances for the concatenated set of configurations in the initial and first iteration..................................................... 129 xii Fig 8.13 Distribution of minimum distances of all configurations stored using modified Tersoff potential.................................................................................................. 130 Fig 8.14 Comparison of Tersoff and modified Tersoff potential energies (eV)............. 131 Fig 8.15 Comparison of neural network output energy (eV) for .................................... 132 Fig 8.17 Variation of the total energy............................................................................. 135 Fig 8.18 Variation of the potential energy...................................................................... 135 Fig 8.19 Phonon frequencies for silicon ......................................................................... 136 Fig 8.20 Comparison of neural network output with ab initio energy for the training set. ................................................................................................................................ 138 Fig 8.21 Comparison of neural network output with ab initio energy for the validation set ................................................................................................................................ 139 Fig 8.22 Comparison of neural network output with ab initio energy for the testing set139 Fig 8.23 Comparison of neural network output with ab initio energy for the training set ................................................................................................................................ 141 1 6 7 8 CHAPTER 1 1 Introduction Computer simulation techniques play an important role in the study of various chemical, biological and mechanical processes at the atomistic level. By the comparing results of the simulations with experimental observations, theoretical model used in the simulation can be corrected and validated. Once validated, the simulations can be used to study the system under different conditions for which experimental results are not easily available. By analyzing the results from the simulations, experiments can then be performed under specific conditions to achieve desired results. Thus computer simulations can complement both theoretical and experimental approaches. In conjunction with the developments in computer technology, the use of simulations has greatly extended the range of problems that can be studied. In chemistry, simulations are performed to study reaction dynamics. In engineering, simulations are used to study the behavior of a material under varying conditions in various processes such as machining, indentation, and melting. Such simulations can provide a useful insight into important mechanisms such as phase transformation and dislocation dynamics, which otherwise is not easily understood using other techniques. Simulation techniques can be broadly classified into two groups as, stochastic and deterministic. Stochastic simulations are based on statistical mechanics which relates 2 macroscopic properties such as pressure and temperature to the distribution and motion of atoms and molecules, whereas deterministic approach is based on classical mechanics. Examples of deterministic and stochastic approaches are molecular dynamics and Monte Carlo simulations respectively. Molecular dynamics simulations generate information at the microscopic level, such as atomic positions and velocities, by integrating the equations of motions and following the time evolution of a set of interacting atoms. Statistical mechanics deals with macroscopic systems from a molecular point of view, where macroscopic phenomenons are studied from the properties of individual molecules making up the system. The most important part of MD/MC simulation is the development of the potential energy surfaces, which describe the interactions of atoms within a system. It should provide a realistic description of the interatomic interactions to match the experimental observations. The usual approach for developing a potential is to determine a functional form motivated by physical intuition and then to adjust the parameters either to ab initio data and/or some physical properties to come up with an empirical potential. Although, such empirical potentials provide a simple and physically interpretable description for the interatomic interactions their applicability is limited to the type of data to which it was fitted. Once fitted, there is no easy way to improve upon it, without refitting the data. A solution to this problem is to model the system using ab initio electronic structure calculations by solving Schrödinger’s equation, to compute the potential energy and forces from the first principles. Such a technique can provide very accurate description of the interatomic interactions, but are computationally very extensive and hence limited only to small systems involving only a few atoms. 3 In this study a new approach is presented for the development of accurate potentialenergy hypersurfaces based on ab initio calculations that can be utilized to conduct molecular dynamics and Monte Carlo simulations to study chemical, mechanical, and electrical properties at the atomistic level. The method integrates ab initio electronic structure calculations with the interpolation capability of multilayer neural networks. A sampling technique based on novelty detection is also developed to ensure that the neural network fitting for the potential energy spans the entire configuration space involved during the simulation. Chapter 2 gives a literature review of various empirical potentials and different design approaches employed for modeling different situations. Chapter 3 explains the molecular dynamics technique, and various integration algorithms and implementation of periodic boundary conditions. Chapter 4 gives the classification and design of empirical potentials and details about some of the important empirical potentials developed for covalent materials, particularly silicon. Chapter 5 provides a detailed description of ab initio techniques. Different types of basis sets and techniques, such as HartreeFock method, electron correlation methods, such as configuration interaction, perturbation theory and density functional theory are discussed. Chapter 6 discusses multilayer neural networks, training algorithms and techniques to achieve an accurate fit. A sampling technique called novelty detection is also discussed. Chapter 7 presents the new technique for ab initio Molecular Dynamics, and discusses various steps involved. Chapter 8 shows the results for the silicon system. Chapter 9 summarizes the conclusions of this study. 4 9 10 11 CHAPTER 2 2 Literature Review The most important part of the MD/MC simulation is the development of the potential energy surfaces, which can model the system under the consideration sufficiently close to the actual one. In recent years many empirical potentials for silicon such as StillingerWeber potential, Tersoff potential, BoldingAnderson potential and Brenner potential have been developed and applied to number of different systems. 2.1 Literature review of design of interatomic potentials Existing models for interatomic differ in the degree of sophistication, functional form, fitting strategy and the range of applicability where each one is designed for a specific problem. Not only surfaces and small clusters are difficult to model (Balamane et al, 1992; Kaxiras, 1996) even bulk material, including crystalline and amorphous phases, solid defects and liquid phase have not provided a transferable description by a single potential. The usual approach for developing an empirical potential is to arrive at a parameterized functional form motivated by physical intuition and then to find a set of parameters by fitting to either ab initio data or experimental observation, or both for various atomic structures. A covalent material presents a difficult challenge because complex quantum mechanical effects, such as chemical bond formation, hybrdization, metallization, charge transfer and bond bending must be described by an effective 5 interaction between atoms. In spite of a wide range of functional forms and fitting strategies most of these models have been successful in the regime for which they were parameterised, but have shown a lack of transferability. Balamane and Halicioglu ( 1992) performed a comparative study of six classical manybody potentials namely, Pearson, Takai, Halicioglu and Tiller potential; Biswas and Hamann potential; Stillinger and Weber potential; Dodson potential; Tersoff potential (T2); modified Tersoff potential (T3). They concluded that each has strengths and limitations, none appearing clearly superior to the others, and none being fully transferable. The StillingerWeber potential (1985) was parameterized for experimental properties of solid and liquid silicon such as lattice energy and melting point. The model has a form of a third order cluster potential (Carlson, 1990) in which the potential energy is expressed as a linear combination of two and threebody terms, as: < < < = + i j k i j k i j i j E v i j v i j k , , 3 , 2 ( , ) ( , , ) , ( 2.1 ) where v2(i,j) represents twobody interactions and v3(i,j,k) represents threebody interactions. The twobody interactions are expressed as : ( )( ) , ( ) ( / ), 1 2 2 2 = = p q r a ij ij and f A B r r e v r f r ( 2.2) where has energy units and has length units. The threebody interaction is expressed as a separable product of radial functions g(r) and angular function h( ). Thus: 2 3 3 1 ) cos( ) ( ) ( ) , , ( = ijk + ijk ij ik v i j k g r g r , ( 2.3 ) 6 where ijk is the bond angle formed by the bonds ij and ik, and g(r) is a decaying function. The angular function has a minimum at = 109.5°, i.e. the tetrahedral angle to represent the angular preference for sp3 bonding. This potential gives a fairly realistic description of crystalline silicon, point defects, liquid and amorphous phases but provides inadequate description of undercoordinated or overcoordinated structures. Mistriotis et al. (1989) proposed an improvement over StillingerWeber potential which included fourbody interactions as well. It was able to give good agreement with the melting point of the crystal and the geometries and the energies of the ground and low metastable states of silicon clusters. Tersoff (1988; 1989) proposed an approach by coupling twobody and multibody correlations into the model. The potential is based on bond order which takes into account local environment. The bond strength depends on the geometry the more neighbors an atom has, the weaker the bond to each neighbor would be. The energy is the sum of repulsive and attractive interactions which depends on the local bonding environment. Thus: [ ] ( , , ). ( ) ( , ) ( ) ( , ) , 2 1 , , , 3 , = = + k i k j j i i j k ij i j i j C ij R ij ij A and v i j k E f r f i j b f i j ( 2.4 ) The fR and fA terms are repulsive and attractive pair potentials, respectively and fC is a cutoff function. The main feature of this form is the term bij. The idea is that the strength of each bond depends upon the local environment and is lowered when the number of neighbors is relatively high. The term ij defines the effective coordination number of 7 atom taking into account the relative distance of two neighbors (rijrik) and the bond angle ijk. The BoldingAnderson potential (1990) is a general from of Tersoff potential, in which the interaction between pair of atoms dependent on the environment. They suggested that clusters of four or more atoms have local minima on the potential energy surface corresponding to multiple configurations, which involve atoms with high coordination number and strained bonds angles. The increase in potential because of strained bond angles is offset by the large number of weak bonds formed. Their potential incorporated angle dependence that is different for clusters and crystals by introducing interference functions. Bazant and Kaxiras (1997) developed environment dependent interatomic potential (EDIP) for bulk silicon. The interaction model included twobody, threebody terms which depend on the local atomic environment through an effective coordination number given by: V2(r,Z) = R(r) + p(Z) A(r), where R(r) represents the short range repulsion of atoms, A(r) represents the attractive force of bond formation, and p(Z) is the bond order, which determines the strength of the attraction as a function of the atomic environment, measured by the coordination Z. The explanation of the bond order is that, as an atom becomes overcoordinated beyond the ideal coordination beyond its valence, the bonds become more metallic, characterized by delocalized electrons (Carlsson, 1985; 1990; Abell, 1985; Pettifor, 1990). They provided a test of empirical theories of covalent bonding in solids using a procedure to invert ab initio cohesive energy curves. They showed that the inversion of ab initio cohesive 8 energy curves verified trends in chemical bonding across various bulk bonding arrangements consistent with theoretical predictions (Bazant and Kaxiras, 1996). Their results indicated that a coordination dependent pair interaction can provide a fair description of high symmetry crystal structures without requiring additional manybody interactions, and angular forces are needed only to stabilize the structure under symmetry breaking distortion, primarily for small coordinations. In spite of various approaches, no potential has demonstrated a transferable description of molecule in all its forms leading to the conclusion that it may be too ambitious to attempt a simultaneous fit of all of the important atomic structures (bulk, crystalline, surface, amorphous, liquid phase and clusters), since qualitatively different aspects of bonding are at work in different types of structures. Rahman and Raff (2001) presented analytical fitting to the ab initio data for dissociation dynamics of vinyl bromide. Their approach was to develop analytic functions for each of the energetically open reaction channels using functional forms suggested by physical and chemical considerations. The parameters of these functions are expressed as appropriate functions of system’s geometry. These channel potentials are connected smoothly with parameterized switching functions that play a central role in fitting the potential barriers, the reaction coordinate curvatures, and the coordinate coupling terms obtained by the ab initio calculations. Ischtwan and Collins (1994) have developed a moving interpolation technique in which the potential energy in the neighborhood of any point is approximated by Taylor’s series using inverse bond length coordinates as the expansion variables. The data from ab initio calculations is used to obtain a set of initial Taylor series expansions. The 9 procedure expresses the potential at any configuration as a weighed average of the Taylor series about all points in the data set. Subsequently, the results are iteratively improved by computing trajectories on the Taylor series fitted surface and the internal coordinates are stored. These new points are added to the data set according to a weight factor that is determined by the relative density of points in the data set in the region of the new point. The criterion used to assess convergence of the iteratively improved potential to the final surface was computed using dynamic variable, such as reaction probability. Jordan et al. (1995) used the moving interpolation trajectory sampling method to investigate the reaction dynamics. Maisuradze et al. (2003) introduced an interpolating moving least squares (IMLS) method for the interpolation between the computed ab initio points. When the method is unrestricted, the least squares coefficients are obtained from the solution of a large matrix equation that must be solved repeatedly during a trajectory study. Another approach involves bypassing the physical motivation behind the functional form of the potential in favor of elaborate fitting as described by Ercolessi and Adams, (1994). The potential involves combinations of cubic splines with effectively hundreds of adjustable parameters and the approach was force matching method, which involves fitting the potential to ab initio atomic forces of many atomic configurations including surfaces, clusters, liquids, and crystals. Potential form is defined by single variable functions atomic coordinates, such as glue potential which is defined as: = + i j ij i j ij V (r ) U( ) r 2 1 , , ( 2.5 ) where ( ) ij r is a pair potential, ( ) ij r is a function of atomic density and U(n) is the glue function. They attempted to match the forces from first principle calculations for a large 10 set of configurations with those predicted by a classical potential, by minimizing the objective function, namely, ( ) F ( ) C ( ) Z = Z + Z , ( 2.6 ) where ZF contains ab initio data and ZC consists of physical quantities from experimental results. If the emphasis is given to ZF, then the minimizing potential appears as the best approximation of the first principles system, with ZC acting as a guide towards the correct region in the space of parameters. On the other hand, if the emphasis is on ZC, then the method looks like conventional fit, but the ZF term relieves the burden of guessing the functional form. Another approach utilizes genetic algorithms and genetic programming. The idea is to let the computer program determine the functional form and come up with the best possible solution with minimal human input. The concept is based on genetics and evolution theory, in which different genetic representations of a set of variables are combined to produce a better solution. The technique is stochastic in nature rather than conventional gradient based approaches and hence the dimensionality of the problem does not pose a serious limitation. It works by randomly generating and exchanging functional elements (variables and arithmetic operators, such as addition and multiplication), and selecting the fittest individuals (which represent a set of parameters) assessed by comparing with target values, a population of potential evolves until a superior form emerges. Makarov and Metiu (1998) proposed a procedure by which genetic programming could be used to find the best functional form and the best set of parameters to fit the energy and derivatives from ab initio calculations. They suggested directed search which guides the computer to use a specified functional form without 11 limiting too much on the flexibility of the search. Directed search was used to fill in portions of human engineered functional template, without which the computer could not find a simple interpretable form even for a pair potential. As an alternative to these methods Blank et al. (1993; 1995) explored an interpolation method based on feed forward neural networks. Functional approximation using neural networks does not require any assumption regarding the functional form of the potential. Multilayer feed forward neural networks could fit in principle to any real valued continuous function of any dimension to arbitrary accuracy provided it has sufficient number of neurons (Cybenko, 1989). Blank et al. used neural network to fit data sets produced by low dimensional models of CO molecule chemisorbed on a Ni(111) surface. The models were constrained versions of many dimensional empirical potential that had been used in the simulation of surface dynamics where it had been shown to give a reliable qualitative picture of the molecule surface interaction. Hobday et al. (1999) developed a neural network model for more complex CN system for which the potential energy function was not parameterized. They used the neural network to fit the manybody term in the Tersoff functional form rather than fitting the entire potential. In contrast to the conventional network architecture, where a single input vector is presented to network, a set of N vectors where N is the number of neighbors of the two atoms i and j, not including i and j. The value of N depends on the ij bond. For example if the ij bond is a part of monocyclic chain, then N=2. If the ij bond is tetrahedrally coordinated, then N=6. To present the local environment to the network, the bond order term is expressed as a function of ijk contributions, where k is the first neighbor of i or j. The input network consists of N vectors, where N is the number of 12 different ijk contributions. Each vector of the input data consisted of pairwise information, such as direction cosines, bond lengths, cutoff functions, and first neighbor information, such as its bond lengths and torsional angles and second neighbor information, such as the number of types of atoms used to describe the ijk atoms triplet. The size of the input vector was constant, but the number of input vectors N is a variable which could change during the simulation. Car and Parrinello (1985) developed a unified approach for molecular dynamics and density functional theory. In this method, the ions are still moved classically but under the action of forces obtained by solving the electronic structure problem, thus eliminating the need for empirical potentials at the expense of much larger computational requirements. This technique is known as CarParrinello molecular dynamics method. The electron density in terms of occupied single particle orthonormal orbitals is written as: = i i n r r 2 ( ) ( ) . ( 2.7 ) A point on the BornOppenheimer potential energy surface is given by the minimum with respect to i of the energy functional. In the conventional formulation, minimization of the energy functional with respect to orbitals i, subject to the orthonormality constraint leads to the self consistent KohnSham equations. Its solution involves repeated matrix diagonalization. Car and Parrinello (1985) adopted a different approach by regarding the minimization of the KohnSham (1965) functional as a complex optimization problem, which could be solved by applying optimization method called “simulated annealing” introduced by Kirkpatrick et al. (1983). In this approach, an objective function is minimized relative to the parameters with Boltzman type probability distribution via a 13 Monte Carlo procedure. In CarParrinello method the objective function is the total energy functional and the variational parameters are the coefficients of the expansion of KohnSham orbitals. They found the simulated annealing strategy based on the MD, rather than on Metropolis Monte Carlo of Kirkpatrick et al. (1983) could be applied efficiently to minimize the KohnSham functional. This technique called “dynamical simulated annealing” was found to be useful to study finite temperature properties. 2.2 Literature review of quantum mechanical calculations of silicon clusters Raghavachari (1985; 1986; 1988) investigated the geometries and energies of silicon clusters using ab initio calculations. He considered several geometrical arrangements and electronic states. All geometries considered were optimized with several basis sets (minimal basis set STO3G, 631G) at HartreeFock level of theory and incase of open shell species (triplets, quintets, etc.) unrestricted HartreeFock was used. Then single point calculations were performed at these geometries using complete fourth order perturbation (MP4) theory with the polarized 631G* basis set. For the clusters of five silicon atoms, the geometries considered were trigonal bipyramid, square pyramid, planar pentagon, linear structure, and the tetrahedral structure (which is the equilibrium structure for the bulk). He found the trigonal bipyramid with a singlet state to be the most stable structure of all the structures considered. In the case of square pyramid atom forms four equivalent bonds, all on one side of a plane which is not a stable structure and the molecule prefers to rearrange to more stable triangular bipyramid structure in spite of lower number of bonds present. They found the tetrahedral structure of five silicon atoms to have high energy. Though the central atom in such a cluster has 14 four bonds oriented in tetrahedral directions, the remaining four silicon atoms have only one bond each and are too distant from each other to interact favorably. Fig 2.1 The geometries for the neutral and ionic clusters of Si2 to Si6, (Raghavachari, 1985). (Numbers in parentheses indicate bond lengths and bond angles for the ions). Fig 2.1shows the ground state equilibrium structures for the neutral and ionic structures of Si2 to Si6. Consideration of the structures Si3 to Si7 revealed that each cluster Sin can be built from a smaller cluster Sin1 by addition of a silicon atom at an appropriate edge or face capped bonding site. Edge capped structures were found to be favored in the case of smaller clusters while the face capped clusters became comparable in energy for the intermediate clusters. For example the rhombus structure of Si4 could be considered as an edge capped triangular form, which is stable compared to the face capped tetrahedron. Many atoms in larger clusters Si7 to Si10 were found to have a coordination number of 6 (greater than 4) for the bulk. It indicated that unlike a closepacked metallic system there may be saturation of coordination at 6 and further bonding caused overcrowding and destabilization. He concluded that the principal differences between these clusters and the bulk were due to large fraction of surface atoms in the clusters. 15 These surface atoms provide a large driving force (analogous to the surface tension) to form more compact structures provided the resulting strain energy is not too high. In this sense there was a correspondence between the local coordination found in clusters and high pressure form of silicon ( tin structure) which has hexacoordinate silicon in octahedral environment. Fig 2.2 Scaled cohesive energy vs. cluster size for neutral silicon clusters calculated at the MP4/6 31G* level, (Raghavachari, 1985). Fig 2.2shows the scaled cohesive energy as a function of the size of the cluster. The cohesive energy generally increases with some indication of special stability at cluster sizes 4, 7 and 10. Si7 and Si10 approach the bulk cohesive energy more closely. Smaller clusters have low cohesive energy because of the difference in the number of bonds between the clusters and the bulk and not because of the nature of bonding. The local relative stabilities of different size clusters could be compared by the incremental binding energy (which is defined as the energy required in the reaction Sin Sin1+Si). Fig 2.3 shows an incremental binding energy as a function of the size of the cluster at HF/631G* and MP4/631G* levels. 16 Fig 2.3 Incremental binding energy vs. cluster size for neutral silicon clusters calculated at the HF/6 31G* and MP4/631G* levels (Raghavachari and Rohlfing, 1988). The peaks at cluster sizes of 4, 6, 7 and 10 indicate that these structures are locally stable, consistent with the prominent presence of 6, 7 and 10 in the mass spectral distributions of these clusters. 17 12 13 14 CHAPTER 3 3 Molecular Dynamics Simulation Molecular Dynamics is a computer simulation technique where the time evolution of a set of interacting atoms is followed by integrating the equations of motions. While the continuum mechanics approach provides a better understanding of the processes at the macro and micro regime, it cannot be implemented to simulate the processes at the atomistic regime. Unlike in finite element analysis or other continuum approaches, where the nodes and distances are selected arbitrarily, in molecular dynamics it is based on more fundamental unit for a specific material such as: lattice spacing and interatomic distances. Thus the processes can be studied at the fundamental level. However, since a large number of atoms constitute any common material, one has to consider the interactions of several thousands of atoms even for nanometric study. Such simulations require large amount of computing resources but in return can provide an insight into processes happening at the smallest level, namely, atomistic level. 3.1 Brief history of molecular dynamics simulation The molecular dynamics method was first introduced by Alder and Wainwright in the late 1950's (Alder and Wainwright, 1957, 1959) to study the interactions of hard spheres. The purpose of the paper was to investigate the phase diagram of a hard sphere system and in particular the solid and liquid regions. In a hard sphere system, particles 18 interact via instantaneous collisions, and travel as free particles between collisions. Gibson et al. (1960) studied the dynamics of radiation damage. It was the first example of molecular dynamics calculation with a continuous potential based on a finite time integration method. Rahman (1964) carried out perhaps the first simulation using a realistic potential for liquid argon. He studied properties of liquid argon using Lennard Jones potential. Rahman and Stillinger (1974) carried out simulation of liquid water. Verlet calculated the phase diagram of argon using the LennardJones potential, and computed correlation functions to test theories of the liquid state. The bookkeeping device, which became known as Verlet neighbor list was also introduced. The Verlet time integration algorithm was used. Phase transitions in the same system were investigated by Hansen and Verlet (1969). The number of simulation techniques has expanded since then; there exist now many specialized techniques for particular problems, including mixed quantum mechanical  classical simulations. 3.2 Applicability of MD simulations Molecular dynamics follows the laws of classical mechanics. At the atomistic level, the system follows laws of quantum mechanics rather than classical mechanics. A classical description of the system involved can be used, when quantum effects, such as tunneling, and electronic excitations, play no essential role in the dynamics. In addition to this, the kinetic energy of a particle must be large enough, to ensure that the de Broglie wavelength is much smaller than the lattice constant of the solid. A simple test of the validity of the classical approximation is based on the de Broglie thermal wavelength defined as: 19 M k T h 2 B 2 = , (3.1) where M is the atomic mass and T is the temperature. The classical approximation is justified if << a , where a is the mean nearest neighbor distance. The classical approximation is poor for lighter elements, such as hydrogen, helium. Moreover, quantum effects become important in any system when T is sufficiently low. The drop in the specific heat of crystals below the Debye temperature or the anomalous behavior of the thermal expansion coefficient, are well known examples of measurable quantum effects in solids. Hence, molecular dynamics results should be interpreted with caution in these regions. An important issue of MD simulations is the temporal scale that can be explored. The maximum time step with which Newton’s equations can be integrated numerically is inherently restricted to small values (about 1 fs), in order to reproduce atomic vibrations accurately. As a result molecular dynamics simulations require long computational times because the most interesting motions occurring during the simulation processes are very slow compared with the fast oscillations of bond lengths and bond angles that limit the integration time step. This limits the time scales that can be handled by MD in reasonable amount of actual clock time. MD simulations are usually limited to molecular processes happening in relatively short times, the so called “rare events” phenomenon in which infrequent processes are completed quickly, such as dissociation of a chemical bond which is a rapid process that is completed on the picosecond time scale. However the application of external load is longer by orders of magnitude compared with the dissociation time, making it difficult to observe such events at experimental speeds. In order to overcome the time scale limitation of MD, two approaches are usually followed. 20 In the first approach coarsegraining, the dynamic variables are divided into slow and fast degrees of freedom and averaging is performed on the fast degrees of freedom. In the second approach fast (and rare) trajectories between desired states are computed explicitly, and their probabilities (relative to nonreactive trajectories) are calculated. 3.3 Formulation of the differential equations of motion The forces between the atoms are computed explicitly and the motion of the molecule is followed over a period of time using a suitable numerical integration method. Classical mechanics describes how physical objects move and how their positions change with time. Consider an isolated system consisting of N bodies with coordinates (xi, yi, zi) where i=1, 2, 3… N (Goldstein, 1965). By isolated system it means that all other bodies are sufficiently remote to have a negligible influence on the system. Each of the N bodies is assumed to be small enough to be treated as a point particle. The position of the ith body with respect to a given inertial frame is denoted as ri (t). Its velocity and acceleration are given by vi (t) =( r) t i & and ai (t) = r (t) i && . Each atom has a mass mi. From Newton’s second law: i i i F m ar r = where i F r is the total force acting on an atom. This force is composed of a sum of forces due to each of the other interacting atoms in the system. If the force on ith body due to jth body is denoted as Fij, then = = N i j j i ij F F 1 r r , where Fii is zero, since there no force on ith atom due to itself. dt d p dt d mv d t d r F m i i i i r r r r = = = ( ) 2 2 , (3.2) where i p r is the momentum of ith body 21 The force on the atom i is the gradient of the potential energy function with respect to the coordinates of atom i. ( , ,... ) i i 1 2 N F V r r rr r r r = , (3.3) where V is the potential energy function. i rr is the position vector of atom i, xi, yi, and zi, are the Cartesian coordinates of atom i. r x i y j z k i i i i r = ˆ+ ˆ + ˆ , and k z j y i x i i i i ˆ ˆ ˆ + + = . (3.4) Now, force in the X direction on atom i is given by: x V dt d mv dt d p F x x x = = = ( ) . (3.5) The velocity of atom i in X direction is: m p d t d x v x x = = . (3.6) Similar equations can be derived in Y and Z directions. It should be noted that to update the position of an atom, one has to solve 6 coupled first order differential equations. If the number of atoms in the system is N, then the number of differential equations to be solved is 6N. If the potential function is a simple pairwise potential, such as Lennard Jones or Morse potential, then the total number of terms in such a simple potential is N(N1)/2. Thus, as an example, a system of 2000 atoms requires integration of 12,000 coupled first order differential equations of motion and about 2 million pairwise terms need to be calculated each time a derivative is evaluated. Such evaluations must be done for every integration step for every trajectory calculation. The computational time therefore, increases rapidly as the number of atoms in the system increase. 22 3.4 Time integration algorithms The MD trajectories are defined by both position and velocity vectors and they describe the time evolution of the system. Accordingly the positions and velocities are propagated with a finite time interval by solving a set of coupled first order differential equations. Time integration algorithms are based on finite difference methods, where time is discretized on a finite grid. These are based on Taylor series expansion truncated after some finite number of terms. Truncation error varies as ( t)n , where t is the time step and n is the order of the method (which depends on the number of time derivatives considered in the expansion). Hence, for higher order methods, the truncation error drops off rapidly even with a small reduction in the time step. The idea of the numerical integration of Newton’s equations of motion is to find an expression that defines positions at time t + t in terms of already known positions and their time derivatives, such as velocities and accelerations at time t. Thus: ( ) ( ) ( ) ... ( ) ... 2 1 ( ) ( ) ( ) ( ) ... 2 1 ( ) ( ) ( ) 2 2 + = + + + = + + + + = + + + a t t a t b t t v t t v t a t t b t t r t t r t v t t a t t (3.7) 3.4.1 Verlet algorithm This is perhaps the most widely used method of integrating the equations of motion, initially developed by Verlet (1967) and attributed to Stömer (Gear, 1971). The Verlet algorithm uses acceleration at time t and positions at time t and t t to calculate the new positions at time t+ t . The algorithm is time reversible, and given the conservative force field, it guarantees the conservation of the linear momentum. 23 ( ) ... 2 1 ( ) ( ) ( ) ( ) ... 2 1 ( ) ( ) ( ) 2 2 = + + + = + + + r t t r t v t t a t t r t t r t v t t a t t (3.8 A) (3.8 B) Adding Eqn. (3.8 A) and (3.8 B), and neglecting higher order terms, r(t + t)= 2r(t) r(t tt)+t a( ) 2 and the velocity is computed as, t r t t r t t v t = + 2 ( ) ( ) ( ) The basic Verlet algorithm does not uses velocities explicitly which may introduce numerical imprecision, since the local error in updating the positions is proportional to t 4, whereas the error in updating the velocities is proportional to t 2. 3.4.2 Leap frog Verlet algorithm This is a modification of the Verlet algorithm in which velocities are computed at half time step as well. The algorithm is given as, v t t v t t a t t r t t r t t v t t + = + + = + + ) ( ) 2 1 ) ( 2 1 ( ) 2 1 ( ) ( ) ( (3.9) The advantage of this method is that the velocities are explicitly calculated, but the disadvantage is that they are not calculated at the same time as the positions. The velocities at time t can be approximated as, = + + ) 2 1 ) ( 2 1 ( 2 1 v(t) v tt tt v (3.10) 24 3.4.3 Velocity Verlet algorithm The velocity Verlet algorithm provides a better description of the velocities, without the need to compute the velocities at half the time step. v t t v t [a t a t t ] t r t t r t v t t a t t + = + + + + = + + ( ) ( ) 2 1 ( ) ( ) ( ) 2 1 ( ) ( ) ( ) 2 (3.11) 3.4.4 Beeman algorithm Beeman algorithm (1976) provides more accurate expression for velocities v t t v t a t t t a t t a t t t r t t r t v t t a t t a t t t + = + + + + = + + ( ) 6 1 ( ) 6 5 ( ) 3 1 ( ) ( ) ( ) 6 1 ( ) 3 2 ( ) ( ) ( ) 2 2 (3.12) It should be noted that these algorithms assume that the force remains constant while updating the positions from t to t+ t . To circumvent this assumption, higher order terms involving higher time derivatives should be included in the algorithm. It may be noted that the above mentioned methods are self starting methods i.e. it is only necessary to know the positions and velocities at time t=t0. An estimate of accuracy during the integration can be obtained by monitoring the total energy of the system and the angular momentum which should be conserved. Step size reduction and back integration are other resources to check the accuracy of the integration results. Predictorcorrector methods provide an automatic error estimate at each integration step, allowing the program to employ a variable step size to achieve a specified accuracy. However, these methods are not self starting. Therefore it is necessary to use velocity Verlet or similar single step methods to start the integration. 25 3.5 Methods to improve computational speed 3.5.1 Neighbor list and bookkeeping techniques In the MD/MC simulations distances of atom i to all other atoms in the system are computed. For the calculation of potential and forces only the atoms that are separated by distances smaller than the potential cutoff are considered. The time to calculate all pair bond distances is proportional to N2, where N is the number of atoms in the system. Verlet (1967) suggested a technique for improving the speed by maintaining a list of neighbors of a particular atom, which is updated at intervals. Between the updates of the neighbor list, only the bond distances of atoms appearing in the list are computed. This procedure reduces the computational time to some extent. 3.5.2 Cell structures and link lists As the size of the system increases the conventional neighbor list becomes too large to store easily and testing of every pair in the system is inefficient. An alternative method of keeping track of neighbors for large systems is the cell index method. (Quentrec and Brot, 1975; Hockney and Eastwood, 1981). The cubic simulation box is divided into a regular lattice of M x M x M cells. A 2 dimensional representation is shown in Fig 3.1.These cells are chosen such that the side of the cell l=L/M is greater than the cutoff distance of the potential. 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Fig 3.1 Cell index method The first part of the method involves sorting all atoms into appropriate cells. For this 2 dimensional example, neighbors of any atom in the cell 13 can be found in cells 13, 7, 8, 9, 12, 14, 17, 18, 19. Now searching for the neighbors is fast and efficient. For a 2 dimensional system there are approximately Nc = N/M2 atoms in each cell and for a 3 dimensional system there would be Nc= N/M3 atoms in each cell. Using the cell structure method only 9NNc pairs have to computed for a 2dimensional system and in 3 dimensional system 27NNc pairs as opposed to ½ N(N1) in the conventional way. The cell structure can be set up and implemented in a linked list. The linked list array stores the number of the next atom in the cell, i.e. the list array element for an atom is the next of next atom in the cell and so on. 3.6 Periodic boundary conditions (PBC) No matter how large the simulated system is its number of atoms would be negligible compared to the number of atoms that make up the macroscopic specimen (of the order of 1023). As a result, in the simulation model the ratio of the number of surface atoms to the total number of atoms would be much larger than in reality, causing surface effects to be much more important than what they would be. A solution to this problem is the use of periodic boundary conditions. Periodic boundary is a mathematical formulation 27 to make a simulation that contains only a few hundred atoms to behave as though it is infinite in size. In doing so it also removes the surface effects since atoms near the boundary have less neighbor than the atoms inside. While using PBCs, the atoms are considered to be enclosed in a box, and this box is replicated by translation in all three Cartesian directions to form an infinite lattice. In effect every atom in the simulation box has an exact duplicate in each of the surrounding cells with the same velocities. Fig 3.2 Representation of periodic boundary condition (PBC) and the simulation cell Fig 3.2 shows the representation of the periodic boundary along with the main simulation cell. rcut is the cutoff used in the potential. If an atom is located at a position r in the box then this particle represents an infinite set of particle located at: r+ la+ mb + nc, (3.13) where l, m and n are integer numbers, and a, b and c indicates the vectors corresponding to the edges of the box. All these image particles move together, but only one of them is represented in the computer program. Each particle should be thought of as interacting not only with other particles in the box but also with their images in the surrounding boxes. Whenever an atom leaves the simulation cell, it is replaced by another with the 28 same velocity entering from the opposite cell face, so the number of atoms in the simulation cell is conserved. Therefore no atom feels any surface effects. An artifact of this technique is that if the simulation cell is too small and the cutoff radius is greater than half the length of the simulation cell then an atom i can interact with atom j as well as its image j1 in the neighboring cell. As a result the interaction between atoms i and j is effectively counted twice as: ij and ij1. The minimum image criterion states that if the box size is greater than 2 rc, where rc is the cutoff radius of the potential then among all pairs formed by an atom i in the box and the set of all periodic images of atom j atmost one will interact. To demonstrate this, if suppose an atom i interacts with two images j1 and j2 of an atom j. These two images must be separated by a distance of at least 2 rc. In order to interact with both j1 and j2, i should be within a distance rc from each of them, which is impossible since they are separated by more than 2 rc. Using the minimum image criterion, only the closest image of an atom j if within the cutoff radius rc will interact with an atom i. Hence, it is very important that the box size of the simulation cell is at least greater than twice the cutoff radius specified for the potential. Long range interactions such as Columbic interactions larger than the simulation cell are treated using Ewald summation which was developed for studying ionic crystals (Ewald 1921; Madelung 1918). Using the minimum image criterion, the periodic boundaries are modeled by modifying the distance calculation. The difference between the x coordinate of an atom i and j is denoted as dx, then if dx is greater than half the box size in x direction (L/2) then atom j does not interact with atom i. Instead, its image j1 in a surrounding cell is considered to be bonded with atom i. 29 If, dx > L/2 then dx = dx – L, If, dx < L/2 then dx = dx + L. (3.14) Alternatively, = L dx dx dx L anint , (3.15) where L is the box size in that particular Cartesian direction and anint is a function that rounds off the ratio L dx to the nearest integer value. Similar procedure is applied in the Y and Zdirections, and the distance is computed as: dx2 + dy2 + dz 2 . (3.16) 30 15 CHAPTER 4 4 Interatomic Potentials The interatomic potential used in a simulation to model the lattice structure plays an important role in determining the accuracy of the simulation results. The accuracy of the trajectories obtained from MD simulation is affected by the choice of the potential energy function. The total energy of the system is the sum of kinetic energy (KE) and the potential energy (PE). The KE is simple to compute but the PE computation is complex since it depends on the positions of all interacting atoms. The complexity of the potential also determines the computational time required for the simulation. The force acting on an atom is proportional to the first derivative of the potential energy function. The largest part of a molecular dynamics simulation is the evaluation of the forces which are required to calculate new positions of atoms. The potential energy can be obtained using two approaches. One approach is to perform ab initio calculations by solving the Schrodinger’s equation for the entire system at each integration step. Although, theoritically this is the best approach (without any arbitrary assumptions and ad hoc parameterization), it is not feasible to perform such calculations for the entire system at every time step. For larger systems comparitively simple and computationally efficient empirical potential functions are used which take into account factors such as, bond stretching, bending and torsions, covalent bonding and nonbonded (Van der Waals and Coulomb) interactions. 31 4.1 Design of interatomic potentials The traditional approach is to completely get rid of electronic degrees of freedom and move the nuclei using some appropriate analytical potential function. The functional form of this potential is chosen so as to mimic the behavior of the true potential in realistic ways for specific materials under specific conditions. Constructing the potential involves two steps: 1. Selecting an analytical form for the potential. This could be sums of pairwise terms where the energy of the pair depends on the relative distance, or a manybody form appropriate for specific type of bonding involving bond distances, angles, and coordination. 2. Once the form is decided then the next step is to determine specific parameterization of the functions. A set of parameters is found which fits the data from ab initio calculations or experimental properties, such as, density, cohesive energy and phonon frequencies, or a data set containing some combination of both theoretical and experimental observations. The most important factor is the range of applicability of the potential energy function. Since ad hoc functional form is used, it has no theoretical foundation and once the empirical potential is developed there is no straightforward means to improve it. Normally, the parameters are adjusted to the equilibrium data which means that the potential can be used to model the bulk properties of the crystal lattice close to equilibrium conditions. So it is unlikely that the energies and forces for amorphous structures will be accurately represented by the potential. For example, a potential function designed for bulk may not be appropriate for studying the bond dissociation or 32 diatomic molecule of the same element, since the environment would be very different from the one for which it was designed. However, it may be possible to model the bulk and surface where the environment differs due to the reduced coordination of the atoms at the surface. These problems are intractable and one should be critical while using analytical potentials for simulating high temperature and pressure conditions. 4.2 Classification of potential energy functions The empirical potentials are further classified into twobody, threebody and manybody potential forms. Most of the empirical potential forms comprise an attractive and a repulsive term. The term empirical may be misleading as it may not be strictly empirical as the term suggests (Komanduri and Raff, 1999). These potentials can provide a more realistic model than the potentials derived from purely theoretical considerations (Torrens, 1972) since parameterization of these potentials can take into account some physical properties as well. Empirical potential forms are based on mathematical expressions for pairwise interaction between two atoms or ions, which may or may not be justified from theory. These forms can be expressed in terms of attractive and repulsive terms. By convention repulsive forces are considered positive and attractive forces as negative. As the distance between two atoms decreases, both attractive and repulsive forces increase. The repulsive forces increase more rapidly than the attractive forces. The curvature of the potential energy function is mainly determined by the repulsive force component, which in turn also governs the elastic behavior of the solid. When attractive and repulsive forces exactly balance each other, this corresponds to the equilibrium position with the potential energy being minimum, which is also the bond energy. The cohesive properties of solids, 33 such as melting and vaporization behavior are determined by the magnitude of the maximum binding energy, which is governed by attractive force component. Higher the bonding energy, the higher is the melting temperature and the elastic modulus and lower is the coefficient of thermal expansion. Several empirical potential functions have been developed suitable for different materials. Some examples of simple pair potentials are Lennard Jones (also known as 612) (1925), Morse potential (1929) and BornMeyer potential (1933). Also for complex systems such as one involving covalent bonding common potential functions are StillingerWeber (1985), Tersoff (1988, 1989), BoldingAnderson potential (1990), Brenner potential (1990), and embedded atom potential and modified emebedded atom potential (MEAM) (Baskes, 1989). Cutoff radius In general, each atom can interact with all other atoms in the simulation. One method of increasing the computational speed is to limit the range of the potential. Normally the potential functions are truncated at a certain value of bond distance called the “cutoff radius”. By neglecting the interaction beyond the cutoff radius, the computational time is significantly reduced with very little loss in accuracy. Truncation of the potential energy function also results in the truncation of the forces. The cutoff distance is generally taken as the distance where the potential energy is ~35% of the equilibrium potentials energy. Consequently, a finite cutoff results in small fluctuations in total energy instead of strictly constant energy. 34 4.2.1 Pair potentials Pair potentials are applicable for many metals for atomistic studies. Pair potentials can be classified into two basic categories (Vitek, 1996), one in which the potential determnies the total energy of the system and the second type determines the change in energy when a configuration varies under constant density conditions. Lennard Jones potential (Lennard Jones, 1925) is given as: ! !" # = 12 6 ( ) 4 r r V r LJ $ $ , (4.1) where and$ are parameters that determine the well depth and the position of the potential minimum, respectively. One of the most commonly used forms of the pair potential is the Morse potential (Morse, 1929). It is used to model the interaction between atoms of two different materials. It is given as: V(r)=D(1 e ( r re ) )2 , or ( ) c r r r r V(r)=D e 2 ( e ) 2e ( e ) ...r % r . (4.2) where r is the bond distance between the two atoms, and rc is the cutoff radius beyond which the potential is truncated to zero. Potential values computed using the two forms given in Eqn. (4.2) differes by a factor of D, which is a constant, and as a result the forces computed using the two forms are the same and either of the forms could be used in simulation. The potential parameters D, , re are adjusted to the measured sublimation enthalpy, Debye temperature, and the nearest neighbor spacing for the material. Garifalco and Weizer (1959) calculated Morse potential parameters using experimental values for energy of vaporization, lattice constants and compressibility. Morse potential can model FCC metals well but not BCC metals and covalent materials. A general feature of a 35 physically reasonable pair potential is that they favor formation of closed packed structures. So they are unsuitable for describing covalent systems which assume more open structure. 4.2.2 Manybody potentials Pair potentials, such as LennardJones potential are suitable to model interactions of inert gases, such as Helium, Argon, and Xenon. A covalent material represents a difficult challenge because of complex quantum mechanical effects, such as chemical bond formation, hybridization, metallization, charge transfer, and bond bending, which must be described by an effective interaction between atoms. For instance, silicon undergoes a series of structural phase transitions from tetrahedral to a denser phase called as tin to simple cubic to FCC under pressure. This indicates that the energy difference between these structures is not too large which suggests that the cohesive energy is nearly independent of the coordination. The pairwise potential model would favor the more packed structure. Consequently, no reasonable pair potential can stabilize the diamond tetrahedral structure without considering the angular terms. Various methods have been introduced to circumvent this problem. Several authors have introduced potentials with a strong angular dependence (Stillinger and Weber, 1985; Biswas and Hamman, 1985; Baskes, 1987; Baskes et al, 1989). Pettifor (1989) developed similar approach from tight binding. Models based on the bond charge (Brenner and Garrison, 1985), and a function of local coordination (Tersoff, 1986) have also been developed. Most of these models have been successful in the regime for which they were parameterised but have shown a lack of transferability. A survey of six such potentials (Balamane and Halicioglu, 1992) concluded that each has strengths and limitations, none appearing clearly superior to 36 others, and none being fully transferable. Recently the environment dependent interatomic potential (EDIP) (Justo et al, 1997, 1998; Bazant et al, 1997) have been developed as an improvement over previous model. The other group of potentials (Kane, 1985; Keating, 1966) are constructed to accurately describe small distortions from the ground state in complex systems, such as diamond structure of semiconductors. The Keating model uses Taylor series expansion of the energy about its minimum. It can give accurate descriptions of small displacements but become progressively less accurate for large displacements. Such potentials are useful for describing phonons and elastic deformations. In general, any manybody potential energy function describing interactions among N identical particles can generally be resolved into onebody, twobody, threebody etc. contributions as follows: ( ) ( , ) ( , , ) ... (1,...., ) , , 3 , 1 2 E V i V i j V i j k V N N i j k i j k i j i i j = + + + + < < < . (4.3) The single particle potential V1 describes external forces. The first term which describes the interactions of atoms is the second term, which has a form of pair potential. In order that this description to be useful in theoretical modeling it is necessary that the component functions Vn quickly converges to zero with increasing n. The first step towards a manybody form is to include threebody terms, by favoring the bond angles corresponding to those of diamond structure. Stillinger and Weber (1985) proposed such a potential which has the form: 2 3 1 ( ) ( ) ( ) cos( ) 2 1 = + ijk + ijk ij ik ij ij V r g r g r , (4.4) where ijk is the bond angle formed by the bonds ij and ik and g(r) is a decaying function with a cutoff between the first and second neighbor. The bond angles in diamond 37 tetrahedral structure are ~ 109.5° and the cosine of this angle is ~ 1/3. This makes the term cos( ijk)+1/3 to go to zero for ~ 109.5°, which makes tetrahedral structure more stable than compact structure. This potential gives a fairly realistic description of crystalline silicon. However its builtin tetrahedral bias create problems in other situations for example, it cannot predict the right energies of the nontetrahedral structures found under pressure, the coordination of the liquid is too low and the surface structures are not correct. Consequently this potential is not transferable to situations other than it was designed for. Work by Carlsson (1990) showed that to build realistic potential, analytic forms which take into account local environments with bond strengths depending on it should be considered. The work of Biswas and Hamman (1987) suggests that a threebody potential is not adequate for accurately describing the cohesive energy of silicon over a wide range of bonding geometry and coordination. However a general form for a fourbody or fivebody potential becomes intractable as it would contain too many parameters. Instead of a general Nbody form, Tersoff (Tersoff, 1988; 1989) proposed an approach by coupling two body and multi body correlations into the model. The potential is based on bond order which takes into account local environment. The bond strength depends on geometry, the more neighbors an atom has, the weaker the bond to each neighbor would be. If the energy per bond decreases rapidly with increasing coordination then the diatomic molecules would be the most stable arrangements of atoms. On the other hand, if the bond order depends weakly on coordination then this should favor closed packed structures to maximize the number of bonds. Silicon undergoes structural transitions with a varying coordination under pressure with a small difference in cohesive energy (Yin 38 and Cohen, 1980, 1982, 1984; Chang and Cohen, 1984). This suggests that the decrease in bond strength with increase in coordination number almost cancels the increase in number of bonds over a range of coordination (Tersoff, 1988). Tersoff potential (Tersoff, 1989) is given as: ( ) [ ( ) ( )]. , 2 1 ij C ij R ij ij A ij i j ij i i V f r f r b f r E E V = + = = (4.5 a) where E is the total potential energy of the system. fR and fA are repulsive and attractive pair potentials, respectively, and fC is a cutoff function given by: & & ' & & ( ) > < < !" # + < = = = r S R r S S R r R r R f r f r B e f r A e ij ij ij ij C ij r A ij ij r R ij ij ij ij ij ij 0, , ( ) ( ) cos 2 1 2 1 1, ( ) ( ) , ( ) , ( ) ( ) μ , (4.5 b) where rij is the bond distance between atoms i and j. S is the cutoff radius. ij, μij, and R are potential parameters. The main feature of this form is the term bij. The idea is that the strength of each bond depends upon the local environment and is lowered when the number of neighbors is relatively high. This dependence is expressed by the parameter bij which can diminish the attractive force relative to the repulsive force. [ ], ( cos( )) ( ) 1 ( ) ( ), (1 ) , 2 2 2 2 2 , 2 1 i i ijk i i i ijk ijk k i j ij C ik ik n n ij n ij ij i d h c d c g f r g b i i  / . + = + = = + (4.5 c) The term ij defines the effective coordination number of atom taking into account the relative distance between two neighbors (rijrik) and the bond angle ijk. The function g( ) 39 has a minimum at h=cos( ), the parameter d determines how sharp the dependence on the angle is and c expresses the strength of the angular effect. The parameters R and S are not optimized but chosen so as to include first neighbors only for several selected high symmetry structures, such as: graphite, diamond, simple cubic, and face centered cubic. The parameter xij strengthens or weakens heteropolar bonds in multicomponent systems. xii =1, and xij = xji. Also ii = 1. The potential was first calibrated for silicon (Tersoff, 1988)a and then for carbon (Tersoff, 1988)b. The parameters were chosen to fit theoretical and experimental data, such as cohesive energy, lattice constants, and bilk modulus obtained from realistic and hypothetical configurations. Table 4.1 lists the parameters for carbon, silicon and germanium. Table 4.1 (Parameters of Tersoff potential, 1989) C Si Ge A (eV) 1.3936 x 103 1.8308 x 103 1.769 x 103 B (eV) 3.467 x 102 4.7118 x 102 4.1923 x 102 (°A1) 3.4879 2.4799 2.4451 μ (°A1) 2.2119 1.7322 1.7047 1.5724 x 107 1.1 x 106 9.0166 x 107 n 7.2751 x 101 7.8734 x 101 7.5627 x 101 c 3.8049 x 104 1.0039 x 105 1.0643 x 105 d 4.384 1.6217 x 101 1.5652 x 101 h 5.7058 x 101 5.9825 x 101 4.3884 x 101 R (°A) 1.8 2.7 2.8 S (°A) 2.1 3.0 3.1 Brenner (1990) developed an empirical manybody potential for hydrocarbons which can model intramolecular chemical bonding in a variety of small hydrocarbon molecules as well as graphite and diamond lattice. The potential function was based on Tersoff covalent bonding formulation, with additional terms to correct an inherent overbinding of radicals. Nonlocal effects were incorporated via an analytic function that defines conjugation based on the coordination of carbon atoms. The potential was fitted 40 to the binding energy of the C2 diatomic molecule, and binding energies and lattice constants of graphite, diamond, simple cubic and face centered cubic structures. Dyson and Smith (1996) extended the Brenner potential for CSiH system to model the chemical vapor deposition (CVD) diamond growth on the silicon substrate. Parameters for the SiSi and SiC interactions were fitted to diatomic bond energy and bulk properties such as bulk modulus, experimental cohesive energy, experimental lattice constant instead of bond length, phonon mode frequencies. 41 16 17 CHAPTER 5 5 Ab Initio Calculations 'Ab initio' is a term used to describe a solution of the nonrelativistic, time independent Schrödinger equation: = E . Where is the Hamiltonian operator for the system, which is a function of kinetic and potential energies of the particles of the system, is the wavefunction, and E is the energy. The Hamiltonian for an N electron atom is = = = = + = + N i N j i ij N i i e N i i e n n r e r Z m m H 1 1 2 1 2 1 2 2 2 2 2 2 h h . (5.1) The first term on the right hand side of the Eqn.(5.1) represents the kinetic energy of the nucleus. It contains the coordinates of the nucleus and derivatives with respect to the coordinates of the nucleus, but it does not contain the coordinates of any of the N electrons or derivatives with respect to these coordinates. Therefore, this is called as zeroelectron term. The second and third terms in Eqn.(5.1) contains the coordinates of one electron and hence are called as one electron terms, where as the last term, electronelectron repulsion, depends on coordinates of both electrons and hence is called as two electron terms. 42 5.1 BornOppenheimer approximation It is computationally impossible to solve Eqn.(5.1) for anything other than the hydrogen atom. For this reason, for a many bodied system, approximations are introduced to simplify the calculations. One of the approximations is the Born Oppenheimer approximation. It can be noted that the mass term appears in the denominator of the nuclear kinetic energy term is much larger than the mass of the electron. This large mass means that the nuclei are moving very slowly relative to the electrons. The nuclei can be considered to be moving in the potential generated by the moving electrons. This approximation allows the electronic Hamiltonian to be considered for a fixed set of nuclear coordinates. 5.2 Basis sets The next approximation involves expressing the molecular orbitals as linear combinations of a predefined set of oneelectron functions known as basis functions. A basis set is the mathematical description of the orbitals within a system, which in turn combine to approximate the total electronic wavefunction. These basis functions are usually centered on the atomic nuclei and so bear some resemblance to atomic orbitals. An individual molecular orbital is defined as: = = N i i c μ 1 μ / μ . (5.2) The coefficients i cμ are known as the molecular orbital expansion coefficients. The basis functions 1… N are also chosen to be normalized. 43 5.2.1 Slatertype orbitals (STO) STOs are constructed from a radial part describing the radial extend of the orbital and an angular part describing the shape of the orbital. lm C r n 1 e( r ) Y μ = . .(5.3) The radial part r n 1 e( r ) depends on the distance r from the origin of the basis function (usually the location of the nucleus), the orbital exponent, and the principal quantum number, n. The spherical part, Ylm known as spherical harmonics, depends on the angular quantum number, l and the magnetic quantum number, m. The normalization constant, C is chosen such that the integral over the square of the basis function yields unity. 5.2.2 Gaussiantype orbitals (GTO) Gaussiantype orbitals are constructed from a radical and a spherical part, but the radical part has a different dependence on r. The GTO squares r so that the product of gaussian primitives is another gaussian. By doing this, the equation is much easier at the cost of accuracy. To compensate for this, more gaussian equations are combined to get more accurate results. Gaussian and other ab initio electronic structure programs use gaussian type atomic functions as basis functions. Gaussian functions have a general form as: g =c xa yb z c e r 2 , (5.4) where is the constant determining the size (radical extent) of the function. In a Gaussian function e r 2 is multiplied by powers of x, y, z, and normalization constant so that: 0 = all space g 2 1 . (5.5) 44 The sum of these exponents L= a+ b+ c is used to define the angular momentum of the basis function: s type (L=0), p type (L=1) d type (L=2), f type (L=3), g type (L=4)… Following are some representative Gaussian functions for s, py and dxy respectively: ( ) ( ) . 2048 , , 128 , , 2 ( , ) 2 2 2 4 1 3 7 4 1 3 5 4 3 r xy r y r s g r xy e g r y e g r e = = = (5.6) Linear combinations of primitive Gaussians are used to form the actual basis functions called contracted Gaussians which have the form: = p p p / μ dμ g , (5.7) where gp are primitive Gaussians, p dμ are fixed constants within a given basis set. This results in the following expansion for molecular orbitals: = = μ μ μ μ μ / μ p i i i p p c c d g . (5.8) 5.2.3 Types of basis sets Larger basis sets more accurately approximate the orbitals by imposing fewer restrictions on the locations of electrons in space. In the true quantum mechanical picture, electrons can have a finite probability of existing anywhere in space; this limit corresponds to the infinite basis set expansion. Standard basis sets for electronic calculations use linear combinations of Gaussian functions to form orbitals. Basis sets may be classified by the number and type of basis functions that they contain. Basis sets 45 assign a group of basis functions to each atom within a molecule to approximate its orbitals. These basis functions are composed of a linear combination of Gaussian functions referred to as contracted functions, and the component Gaussian functions are referred to as primitives. A basis function consisting of a single Gaussian function is termed as uncontracted. Minimal basis set Single Gaussian functions as described by Eqn.(5.4) are not well suited to describe the spatial extent and nodal characteristics of atomic orbitals. Hence, basis functions are described as sum (contraction) of several Gaussians as used in Eqn.(5.7). Minimal basis set contains minimum number of basis functions needed for each atom. Minimal basis set use fixed size atomic type orbitals. The STO3G basis set (Hehre et al., 1969; Collins et al., 1976) is a minimal basis set (although it is not the smallest possible basis set). It uses three Gaussian primitives per basis function, which accounts for 3G in its name. STO stands for Slatertype orbitals. The STO3G basis set approximates Slatertype orbitals with Gaussian functions. Double zeta, triple zeta and quadruple zeta basis set The minimal basis set approximates all orbitals to be of the same shape. The double zeta basis sets such as the DunningHuzinaga basis set denoted as D95 (Dunning and Huzinaga, 1976) from all molecular orbitals from linear combinations of two sizes of functions for each atomic orbital. Dunning’s basis set has been derived from an already existing large atomic basis set of nine uncontracted Gaussian primitives of the stype and five uncontracted Gaussian primitives of the ptype. Six of the nine stype functions have then been grouped into a single contraction, while the other three stype functions have 46 been left alone. Similarly, four of the five ptype functions have been contracted into a single function, while one function was left uncontracted. This yields a basis set of four stype and two ptype basis functions. In contrast to the split valence basis sets discussed before, the D95 basis set is a full double zeta basis set in that it allocates two basis functions for each atomic orbital of the core as well as the valence region occupied in the electronic ground state. The double zeta basis set is important because it allows to treat each orbital separately during HartreeFock calculations. This gives more accurate representation of each orbital. Each atomic orbital is expressed as a sum of two Slater type orbitals. The two equations are the same except for the value of zeta, which accounts for how diffuse (large) the orbital is. The two STOs are added in some proportion as in Eqn.( 5.9 ). ( 5.9 ) The constant d determines how much each STO will account towards the final orbital. Thus, the size of the atomic orbital can range anywhere between the value of either of the two STOs. The triple and quadruple zeta basis sets use three and four Slater equations respectively. Split valence basis set The description of the valence electrons can be improved over that in the minimal STO 3G basis set if more than one basis function is used per valence electron. Basis sets of this type are called split valence basis set. They have two sizes of basis function for each valence orbital. Often, it takes too much effort to calculate a doublezeta for every Slater orbital 1 Slater orbital 2 Constant ( , ) ( , ) 2 2 r 1 d 2 r 2 STO s STO s = s + 47 321G orbital. Since the innershell electrons are not as vital to the calculation, they are described with a single Slater Orbital and the only valence orbitals are treated using a doublezeta. Examples of split valence basis sets are 321G (Binkley et al., 1980; Gordon et al., 1982; Pietro et al., 1982; Dobbs and Hehre, 1986; 1987) and 631G (Ditchfield et al., 1971; Hehre et al., 1972; Hariharan and Pople, 1973; 1974; Gordon, 1980; Binning and Curtiss, 1990). The notation 3 21G means summing 3 Gaussians for the inner shell orbital, each valence electron is described by two basis functions, two Gaussians for the first STO of the valence orbital and 1 Gaussian for the second STO. Fig 5.1 Description of split valence basis set The main difference between 321G and 631G is that a much larger number of primitives are used in the latter in the core as well as the inner most valence shell. The use of a contraction of six Gaussian primitives for each core orbital improves the description of the core region significantly. The valence region is described by two basis The number of Gaussian functions summed to describe the inner shell orbital The number of Gaussian functions summed in the second STO The number of Gaussian functions that comprise the first STO of the double zeta 48 functions per atomic orbital. The inner shell is composed of a contraction of three Gaussians and the outer shell consists of one single Gaussian primitive. 631G† also known as 631G(d ) and 631G†† also known as 631G(d ,p ) is defined as part of the complete basis set methods (Petersson and AlLaham, 1991 ; Petersson et al. 1988). Similarly, triple split valence basis sets such as 6311G also known as MC311G (McLean and Chandler, 1980; Krishnan et al., 1980; Wachters, 1970; Hay, 1977; Raghavachari and Trucks, 1989; Binning and Curtiss, 1990; Curtiss et al., 1995; McGrath and Radom, 1991) use three sizes of contracted functions for each orbital type. Polarized basis set A better approximation is to account for the fact that sometimes orbitals share qualities of 's' and 'p' orbitals or 'p' and 'd', etc. and not necessarily have characteristics of only one or the other. As atoms are brought close together, their charge distribution causes a polarization effect (the positive charge is drawn to one side while the negative charge is drawn to the other) which distorts the shape of the atomic orbitals. Split valence basis sets allow orbitals to change size, but not to change shape. Polarized basis sets remove this limitation by adding orbitals with angular momentum beyond what is required for the ground state to the description of each atom. The first step consists of the addition of a set of dtype functions to the basis sets of those atoms, which have occupied s and pshells in their electronic ground states. For hydrogen, this corresponds to the addition of a set of ptype functions. Two different notations exist to specify the addition of polarization functions. The first notation adds one asterisk to the basis set to specify addition of polarization functions to nonhydrogen atoms, while two asterisks symbolize 49 the addition of polarization functions to all atoms (including hydrogen). The 631G** basis set (Frisch et al., 1984) is thus constructed from the split valence 631G basis set through addition of one set of dfunctions to all nonhydrogen atoms and one set of pfunctions to all hydrogen atoms. In the second notation the polarization functions are specified through their angular quantum number explicitly. The 631G** basis set would then be termed 631G (d,p). This latter notation is much more flexible as multiple sets of polarization functions can be specified much more easily. The notation 631G (d) means the polarization functions for the 631G basis appear in the basis set listing as a single set of uncontracted dtype Gaussians. There are six Cartesian dtype Gaussians (x2, y2, z2, xy, yz, zx) e( r2 ), which are equivalent to five pure dtype functions (xy, yz, zx, x2y2, 3z2r2) e( r2 ) plus one additional stype function. Diffuse functions In chemistry, one is mainly concerned with the valence electrons which interact with other molecules. However, many of the basis sets concentrate on the main energy located in the inner shell electrons. This is the main area under the wave function curve. In Fig 5.2, this area is that to the left of the dotted line. Normally the tail (the area to the right of the dotted line), is not really a factor in calculations. Fig 5.2 Variation of energy with respect to radius 50 However, when an atom is in an anion or in an excited state, the loosely bond electrons, which are responsible for the energy in the tail of the wave function become much more important. The theoretical description of negatively charged species is particularly challenging for ab initio molecular orbital theory. This is due to the fact that the excess negative charge spreads outward to a much larger degree than is typically the case for uncharged or positively charged molecules. To compensate for this, diffuse functions are used. Diffuse functions are large size versions of s and ptype functions (as opposed to standard valence size functions). They use small orbital exponents. They allow orbitals to occupy a larger region of space and are important for systems where electrons are relatively far from the nucleus, such as molecules with lone pairs, anions and other systems with significant negative charge, systems in their excited states, and systems with low ionization potentials. Diffuse basis functions are typically added as an additional set of uncontracted Gaussian functions of the same angular momentum as the valence electrons. To reflect the addition of diffuse basis functions on all nonhydrogen atoms, a + sign is added to the standard basis set notation. If diffuse stype functions are also added to the basis set of hydrogen atoms, a second + sign is appended. The 631+G(d) basis set (Clark et al., 1983) is the 631G(d) basis set with diffuse functions added to heavy atoms. The basis et 631++G(d) adds diffuse functions to the hydrogen as well. Diffuse functions on hydrogen seldom make a significant difference in accuracy. High angular momentum basis set Such basis sets add multiple polarization functions per atom to the triple zeta basis set. For example, the 631G(2d) basis set adds two d functions per heavy atom instead of just one. Such basis sets are useful for describing the interactions between 51 electrons in electron correlation methods; they are not generally needed for HartreeFock calculations, to be discussed in the next section. Basis sets for post third row atoms basis sets for atoms beyond the third row of the periodic table are handled somewhat differently. For very large nuclei, electrons near the nucleus are treated in an approximate way, using effective core potentials (ECP). This treatment includes relativistic effects. The LANL2DZ basis set is an example of such a basis set. 5.3 HarteeFock method Molecular orbital theory decomposes the wave function into a combination of molecular orbitals 1, 2… i are chosen to be normalized orthogonal set of molecular orbitals. Thus 000 * dx dy dz =1 i i , (5.10) 000 dx dy dz = i j i j * 0; . (5.11) The simplest way of expressing the wave function of a many electron system is to take the product of the spinorbitals of the individual electrons. For the case of two electrons ( ) ( ) ( ) 1 2 1 1 2 2 x x =/ x / x . (5.12) This is known as Hartree product. However, such a function is not antisymmetric, since swapping the orbitals of two electrons does not result in a sign change. i.e. ( ) ( ) 1 2 2 1 x x x x . (5.13) Therefore the Hartree product does not satisfy the Pauli principle. This problem can be overcome by taking a linear combination of both Hartree products: 52 { ( ) ( ) ( ) ( )} 2 1 ( ) 1 2 1 1 2 2 1 2 2 1 x x = / x / x / x / x , ( 5.14) where the coefficient is the normalization factor. This wave function is antisymmetric. Moreover, it also goes to zero if any two wave functions or two electrons are the same. The expression can be generalized by writing it as a determinant called Slater determinant: ( ) ( ) ... ( ) . . ... . . . ... . . . ... . ( ) ( ) ... ( ) ( ) ( ) ... ( ) ! 1 ( , ,..., ) 1 2 1 2 2 2 2 1 1 2 2 1 1 2 N N N N N N N x x x x x x x x x N x x x / / / / / / / / / = ( 5.15) A single Slater determinant is used as an approximation to the electronic wave function in HartreeFock theory. In more accurate theories, such as configuration interaction and multiconfiguration self consistent field (MCSCF) calculation (Hegarty and Robb, 1979; Eade and Robb, 1981; Schlegel and Robb, 1982; Bernardi, et al., 1984; Frisch et al., 1992), a linear combination of Slater determinants is used. Each row is formed by representing all possible assignments of electron i to all orbital spin combinations. Swapping two electrons corresponds to interchanging two rows of the determinant, which will have the effect of changing its sign. The original Hartree method expresses the total wave function of the system as a product of oneelectron orbital. In the HartreeFock method (Hehre et al., 1986), the wave function is an antisymmetrized determinantal product of oneelectron orbital (the "Slater" determinant). Schrödinger’s equation is transformed into a set of HartreeFock equations. Now the problem is to solve for a set of molecular orbital expansion 53 coefficients i cμ in Eqn.(5.8). HartreeFock theory takes advantage of the variational principle, which states that for the ground state of any antisymmetric normalized function of the electronic coordinates denoted as , the expectation value corresponding to will always be greater than the energy for the exact wave function: E( ) > E( ). (5.16) In other words, the energy of the exact wave function serves as a lower bound to the energies calculated by any other normalized antisymmetric function. Thus the problem becomes one of finding the set of coefficients that minimize the energy of the resultant wave function. The variables in the HartreeFock equations depend on themselves. So, they must be solved in an iterative manner. Convergence may be improved by changing the form of the initial guess. Since the equations are solved selfconsistently, HartreeFock is an example of a selfconsistent field (SCF) method. HartreeFock calculation involves the following steps: 1. Begin with a set of approximate orbital for all the electrons in the system. 2. One electron is selected, and the potential in which it moves is calculated by freezing the distribution of all the other electrons and treating their averaged distribution as the Centrosymmetric source of potential 3. The Schrödinger equation is solved for this potential, which gives a new orbital for it. 4. The procedure is repeated for all the other electrons in the system, using the electrons in the frozen orbitals as the source of the potential. 5. At the end of one cycle, there are new orbitals from the original set. 54 6. The process is repeated until there is little or no change in the orbitals. Unrestricted HartreeFock method is capable of treating unpaired electrons for open shell systems. In this case the and electrons are in different orbitals, resulting in two sets of molecular orbital expansion coefficients. = = μ μ . μ . μ μ μ 3 / 3 / . , i i i i c c The two sets of coefficients result in two sets of Fock matrices producing two sets of orbitals. These separate orbitals proper dissociation to separate atoms and correct delocalized orbitals for resonant systems. However, the eigenfunctions are not pure spin states, but contain some amount of spin contamination from higher states, for example, doublets are contaminated to some degree by functions corresponding to quartets and higher states.. It might give an impression that molecular orbitals are real. Except for the hydrogen atom that is not the case. Molecular orbitals are the product of HartreeFock theory, which is an approximation to the Schrödinger equation. The approximation is that each electron feels only the average Coulomb repulsion of all the other electrons. This approximation makes HartreeFock theory much simpler than the real problem, which is an Nbody problem. Unfortunately, in many cases this approximation is rather serious and can give bad answers. It can be corrected by explicitly accounting for the electron correlation by manybody perturbation theory (MBPT), configuration interaction (CI), or density functional theory (DFT). 55 5.4 Electron correlation methods HartreeFock theory provides an inadequate treatment of the correlation between the motions of electrons within a molecular system, especially that arising between electrons of opposite spin. When HartreeFock theory fulfills the requirement that  2 be invariant with respect to the exchange of any two electrons by antisymmetrizing the wave function, it automatically includes the major correlation effects arising from pairs of electrons with the same spin. This correlation is termed as exchange correlation. The motion of electrons of opposite spin remains uncorrelated under HartreeFock theory. Any method which goes beyond SCF in attempting to treat this phenomenon is known as electron correlation method or post SCF method. 5.4.1 Requirements of electron correlation theories Any electron correlation theory should provide a unique total energy for each electronic state at a given geometry and should also provide continuous potential energy surfaces as the geometry changes. The resulting energy should be variational, i.e. it should be an upper bound to the exact energy. The most important criterion for an accurate electron correlation theory is the property of size consistency. This term refers to the linear scaling of energy with the number of electrons in the system. A size consistent method leads to additive energies for infinitely separated systems. Size consistent method is necessary to reach quantitative accuracy. Another important criterion is the correctness for two electron systems. For any molecular system composed of reasonably well defined electron pair bonds, such methods provide excellent starting points for inclusion of additional corrections such as those from three electron correlations. Such three electron correlations although computationally expensive, are crucial to reach quantitative 56 accuracy (Raghavachari and Anderson, 1996). In general, correlation techniques are computationally more expensive than HF methods. Many correlation techniques involve an iterative solution of a set of coupled equations. 5.4.2 Configuration interaction (CI) Among the many schemes introduced to overcome the deficiencies of Hartree Fock method, the most simple and general technique to address the correlation problem is configuration interaction (Shavitt, 1977; Schaefer, 1977; Boys, 1950). Configuration interaction is a straight forward application of the linear variational technique to the calculation of electronic wavefunctions. A linear combination of configurations (Slater determinants) is used to provide a better variational solution to the exact manyelectron wavefunction. The most important type of correlation effect which contributes to chemical bonding is usually termed as leftright correlation. For H2, this refers to the tendency that when one electron is near the first hydrogen, the other electron tends to be near the second hydrogen. This is absent in the HF method where the spatial positions of the two electrons occupying the lowest bonding molecular orbital are uncorrelated. The problem gets worse as the two atoms move apart and dissociate. Qualitatively, this can be corrected by including a second configuration where both electrons occupy the antibonding orbital. While this is unfavorable energetically, a mixture of the HF configuration with this second configuration provides a better description of the system. This is referred to as “configuration interaction”. The second configuration has only a small weight at the equilibrium distance in H2 but its weight increases as the bond 57 distance increases until the configurations have equal weights at dissociation. Such a leftright correlation is included in valencebondtype wave functions. The exact wave function can not be expressed as a single determinant. CI method constructs other determinants by replacing one or more occupied orbitals within the HartreeFock determinant with a virtual orbital. In a single substitution, a virtual orbital a replaces an occupied orbital i within the determinant. This is equivalent to exciting an electron to a higher energy orbital. Similarly in a double substitution, two occupied orbitals are replaced by virtual orbitals and triple substitution would exchange three orbitals and so on. CI wave function mixes the HartreeFock wave function with single, double, triple, quadruple … excited configurations, and the coefficients which determine the amount of mixing are determined variationally. The full CI method forms the wave function as a linear combination of HartreeFock determinant and all possible substituted determinants: > = + 0 0 0 s s s b b , (5.17) where the 0indexed term is the HartreeFock level and s runs over all possible substitutions. The b’s are the set of coefficients to be solved for by minimizing the energy of the resultant wave function. If all possible excited configurations are included, the method gives the exact solution within the space spanned by a given basis set. Full CI is the most complete nonrelativistic treatment of the molecular system possible, with the limitation imposed by the chosen basis set. 58 5.4.3 Limited configuration interaction The full CI method is size consistent and variational. However, it is very computationally expensive and impractical for large systems, since the number of configurations in full CI expansion grows exponentially with the size of the system. Practical CI methods augment the HartreeFock by adding only a limited set of substitutions. CIS method adds single excitation to HartreeFock determinant, while CISD adds singles and doubles. The CISD method (Pople et al., 1977; Krishnan et al., 1980; Raghavachari and Pople, 1981) is an iterative technique where the computational dependence of each iteration scales as the sixth power of the size of the system. A disadvantage of the limited CI methods is that they are not size consistent, i.e. the energy does not scale linearly with the size of the system, and CISD energy is not additive for infinitely separated systems. For example, the CISD energy for two infinitely separated He atoms is different from twice the energy of a single He atom. Quadratic configuration interaction (QCI) technique (Pople et al., 1987) introduces size consistency in CISD theory. The CISD method consists of a set of linear equations in the configuration expansion coefficients (for single and double excitations) which are solved iteratively. In the QCISD method (Gauss and Cremer, 1988; Trucks and Frisch, 1998; Salter et al., 1989), these equations are modified by the introduction of additional terms, quadratic in the expansion coefficients, which make the method size consistent. 5.4.4 MøllerPlesset Perturbation theory (MP) MP theory (Møller and Plesset, 1934) treats the electron correlation as a perturbation on the HartreeFock problem. It adds higher excitations to the HartreeFock 59 theory as a noniterative correction. The wave function and energy are expanded in a power series of the perturbation. The Hamiltonian is expressed in two parts: H = H0 + H , (5.18) where H0 is called the zeroth order Hamiltonian, and H is the perturbation. H0 is chosen such that the Schrödinger equation H0 0 = E0 0 can be solved exactly for the zeroth order eigenfunctions 0 and the corresponding zeroth order energy eigenvalues E0. It is assumed that perturbation is sufficiently small that its presence does not appreciably alter the eigenfunctions of the system. HatreeFock energy is correct to the first order. Hence the perturbation energies start contributing from second order. While performing MP calculations, first HF calculation is performed and then second, third, fourth, and fifth order perturbation corrections are computed. The final ground state energy is given by Eground state = EHF + E(2) + E(3) + E(4) + E(5) + … , (5.19) where EHF is the HartreeFock energy and E(i) are the successive MøllerPlesset perturbation corrections. In perturbation theory, the different correlation contributions emerge through their interaction with the starting HF wave function 0. Since the Hamiltonian contains only one and twoelectron terms, only single and double excitations can contribute via direct coupling to 0 in the lowest orders. However, the selfconsistent optimization of the HF wave function prevents direct mixing between single excitations and 0. Thus, the second and thirdorder energies have contributions only from double excitations. In higher orders, there is indirect coupling via double excitations, and thus the fourth and fifthorder energies have contributions from single, double, triple, and quadruple excitations. 60 A common notation denoted as MPn (Raghavachari and Pople, 1978; Raghavachari et al., 1980; Frisch et al., 1980) is used. Thus, MP2 (HeadGordon et al, 1988; Frisch et al, 1990; HeadGordon, 1994; Trucks et al, 1998; Saebo and Almlof, 1989), MP3 (Pople et al, 1977; 1976), MP4 (Raghavachari and Pople, 1978) denote the total energies correct to second, third, fourth … order respectively. Perturbation theory truncated at any order is size consistent. MP2, MP3, MP4, and MP5 methods scale as the fifth, sixth, seventh and eighth power of the size of the system. If triples are excluded from the MP4 method (Raghavachari and Pople, 1978; Raghavachari et al., 1980; Frisch et al., 1980; Bartlett and Shavitt, 1977; Bartlett and Purvis, 1978; Bartlett et al., 1983; Wilson and Saunders, 1979) the resulting MP4 (SDQ) technique (Raghavachari and Pople, 1978) can be evaluated with sixth order computational dependence. Fifthorder theory (MP5) (Kucharski and Bartlett, 1986; Kucharski et al., 1989; Bartlett et al., 1990; Raghavachari et al., 1990) has been implemented though feasible only for small systems. Analytical derivatives of the potentials energy can be computed for MP2 (Frisch et al, 1990; Pople et al, 1979; Handy and Schaefer, 1984), MP3, and MP4 (SDQ) methods (Trucks et al, 1988). Analytical frequencies can be computed for MP2 method (Head Gordon, 1994; Trucks et al, 1998). MP2 is the most applicable method for electron correlation effects for large systems. It can be implemented without requiring the storage of twoelectron integrals and many other intermediate quantities which makes it possible to study large systems such as C60. 5.5 Density functional theory (DFT) Shortly after the formulation of quantum mechanics in the mid 1920's, Thomas (1926) and Fermi (1928) introduced the idea of expressing the total energy of a system as 61 a functional of the total electron density. Because of the crude treatment of the kinetic energy term, i. e. the absence of molecular orbitals, the accuracy of these early attempts was far from satisfactory. It was not until the 1960's that an exact theoretical framework called density functional theory (DFT) was formulated by Hohenberg and Kohn (1964) and Kohn and Sham (1965) that provided the foundation for accurate calculations. Earlier, motivated by the search for practical electronic structure calculations. In contrast to the HartreeFock picture, which deals with a description of individual electrons interacting with the nuclei and all other electrons in the system, in density functional theory, the total energy is decomposed into three contributions, a kinetic energy, a Coulomb energy due to classical electrostatic interactions among all charged particles in the system, and a term called the exchangecorrelation energy that captures all manybody interactions. In density functional theory (Hohenberg and Kohn, 1964; Kohn and Sham, 1965), correlation energy along with exchange energy is treated as a functional of the three dimensional electron density. DFT partitions the electronic energy as: E = ET +EV + EJ + EXC, (5.20) where ET is the kinetic energy term, EV includes terms describing the potential energy of the nuclearelectron attraction and of the repulsion between pairs of nuclei, EJ is the electronelectron repulsion term and EXC is the exchange correlation term. All terms, except the nuclearnuclear repulsion in Eqn.(5.20) are functions of the electron density. The term EXC accounts for the exchange energy arising from the antisymmetry of the quantum mechanical wavefuncion, and dynamic correlation in the motions of the individual electrons. Hohenbrg and Kohn (1964) demonstrated that EXC is 62 determined entirely by the electron density, i.e. it is a function of electron density. The term EXC is usually approximated as an integral involving only the spin densities and possibly their gradients as given by E f r r r r d r B XC ( ) ( ) (v), (v), (v), (v) 3 v = 0 . , (5.21) and refer to and spin density, and total electron density is ( + ). EXC is separated into exchange and correlation parts corresponding to same spin and mixed spin interactions as given by EXC ( ) = EX( ) + EC( ), (5.22) the terms EX( ) and EC( ) are again functionals of electron density, termed as exchange functional and correlation functional. Local functionals depend on the electron density, while gradient corrected functionals depend on both electron density and its gradient . In local density approximation (LDA) approximation, the exchangecorrelation energy is taken from the known results of the manyelectron interactions in an electron system of constant density (homogeneous electron gas). The LDA assumes that at each point in a molecule or solid there exists a well defined electron density; an electron at such a point experiences the same manybody response by the surrounding electrons as if the density of these surrounding electrons had the same value throughout the entire space as at the point of the reference electron. The exchangecorrelation energy of the total molecule or solid is then the integral over the contributions from each volume element. The contributions are different from each volume element depending on the local electron density. The local exchange functional is defined as 63 0 E X = d r LDA 3 3 v 3 4 1 4 3 2 3 . (5.23) In DFT, the total electron density is decomposed into oneelectron densities, which are constructed from oneelectron wave functions. These oneelectron wave functions are similar to those of HartreeFock theory. For molecular systems, DFT leads to a molecular orbital (MO) picture in analogy to the HartreeFock approach. DFT has been successfully extended to openshell systems and magnetic solids (von Barth and Hedin, 1972; Gunnarsson et al., 1972). In these cases, the local exchangecorrelation energy depends not only on the local electron density, but also on the local spin density (which is the difference between the electron density of spinup electrons and that of spindown electrons). The resulting generalization of LDA is called local spin density approximation (LSDA). Within the local density approximation, binding energies are typically overestimated. Any real system is spatially inhomogeneous, i.e. it has spatially varying density. The accuracy could be improved if the rate of this variation is included in the functional. Gradient expansion approximation (GEA) tries to systematically calculate the gradient corrections to the LDA. In generalized gradient approximations (GGA), instead of power series gradient expansions, generalized functions are used. Inclusion of nonlocal gradient corrections improves the values of binding energies. Becke (1988) formulated the gradient corrected exchange functional based on the LDA exchange functional given as 0 + = d r x x E EX LDA X Becke 3v 1 3 2 4 (1 64 sinh ( )) 4 , (5.24) 64 where x = 4 / 3 ,4 is a parameter chosen to fit known exchange energies of the inert atoms. Becke functional is defined as a correction to the local LDA exchange functional. Self consistent Kohn Sham DFT calculations are performed in an iterative manner analogous to the SCF computation. Becke formulated which include Hartree Fock and DFT exchange along with DFT correlation as; XC DFT DFT X HF HF XC hybrid E =c E + c E . (5.25) For example Becke style three parameter functional denoted as B3LYP (Becke, 1993) is defined as: ( ) ( ) 3 0 3 3 C VWN C C LYP C VWN X X Becke X LDA X HF X LDA XC B LYP E = E + c E E + c E + E +c E E . (5.26) The parameter c0 allows any combination of Hartree Fock and LDA local exchange to be used. The local correlation functional VWN3 is corrected by the LYP correlation correction. Table 5.1 lists various model chemistries definable via ab initio methods and standard basis sets. Each cell in the chart defines a model chemistry. Table 5.1 Effect of various model chemistries and basis sets on accuracy of the solution to Schrödinger’s equation (Foresman and Frisch). 65 The columns correspond to different theoretical methods and the rows to different basis sets. The level of correlation increases from left to right across any row, with HartreeFock method at the extreme left (including no correlation), and the full configuration interaction method at the right (which fully accounts for electron correlation). The rows of the chart correspond to increasingly larger basis sets. The bottom row of the chart represents a completely flexible basis set. The cell in the lower right corner of the chart represents the exact solution of the Schrödinger equation. 18 66 19 CHAPTER 6 6 Neural Networks As the term neural network (NN) implies, this approach is aimed towards modeling real networks of neurons in the brain. It was inspired by a number of features of the brain that would be desirable in artificial systems, such as its robustness and fault tolerance, its flexibility and the ability to deal with fuzzy and noisy information and its highly parallel structure. 6.1 Biological neurons The brain is composed of about 1011 highly connected elements called neurons, each of which is connected to about 104 other neurons. Fig 6.1 shows the schematic of biological neuron. Fig 6.1 Schematic of biological neuron 67 A neuron's dendritic tree is connected to thousands of neighboring neurons. Each neuron receives electrochemical inputs from other neurons at the dendrites. When one of those neurons fire, a positive or negative charge is received by one of the dendrites. The strengths of all the received charges are added together through the processes of spatial and temporal summation. Spatial summation occurs when several weak signals are converted into a single large one while temporal summation converts a rapid series of weak pulses from one source into one large signal. The aggregate input is then passed to the soma (cell body). The soma and the enclosed nucleus do not play a significant role in the processing of incoming and outgoing data. Their primary function is to perform the continuous maintenance required to keep the neuron functional. If the sum of these electrical inputs is sufficiently powerful to activate the neuron, it transmits an electrochemical signal along the axon, and passes this signal to the other neurons whose dendrites are attached at any of the axon terminals. These attached neurons may then fire. The part of the soma that does concern itself with the signal is the axon hillock. If the aggregate input is greater than the axon hillock's threshold value, then the neuron fires, and an output signal is transmitted down the axon. The strength of the output is constant, regardless of whether the input was just above the threshold, or a hundred times as great. The output strength is unaffected by the many divisions in the axon; it reaches each terminal button with the same intensity it had at the axon hillock. This uniformity is critical in an analogue device such as a brain where small errors can be critical, and where error correction is more difficult than in a digital system. It is important to note that a neuron fires only if the total signal received at the cell body exceeds a certain level. The neuron either fires or it does not, there are not different grades of firing. 68 Each terminal button is connected to other neurons across a small gap called a synapse, as shown in Fig 6.2 . The physical and neurochemical characteristics of each synapse determine the strength and polarity of the new input signal. This is where the brain is most flexible. Fig 6.2 Synaptic gap Changing the constitution of various neurotransmitter chemicals can increase or decrease the amount of stimulation that the firing axon imparts on the neighboring dendrite. Altering the neurotransmitters can also change whether the stimulation is excitatory or inhibitory. Many drugs, such as alcohol have dramatic effects on the production or destruction of these critical chemicals. The infamous nerve gas, sarin, can kill because it neutralizes a chemical (acetylcholinesterase) that is normally responsible for the destruction of a neurotransmitter (acetylcholine). This means that once a neuron fires, it keeps on triggering all the neurons in the vicinity and one no longer has control over muscles. So, from a very large number of extremely simple processing units (each performing a weighted sum of its inputs, and then firing a binary signal, if the total input 69 exceeds a certain level) the brain manages to perform extremely complex tasks. This is the model on which artificial neural networks (NN) are based. Thus far, artificial neural networks have not even come close to modeling the complexity of the brain. It is worth noting that even though biological neurons are slow compared to electrical circuits, the brain is able to perform many tasks much faster than any conventional computer because of the massively parallel structure of biological neural networks. While in actual neurons the dendrite receives electrical signals from the axons of other neurons, in an artificial neuron (perceptron) these electrical signals are represented as numerical values. At the synapses between the dendrite and axons, electrical signals are modulated in various amounts. This is also modeled in an artificial neuron by multiplying each input value by a value called the weight. An actual neuron fires an output signal only when the total strength of the input signals exceeds a certain threshold. We model this phenomenon in a ANN by calculating the weighted sum of the inputs to represent the total strength of the input signals, and applying a step function on the sum to determine its output. Fig 6.3 shows a simple model of a neuron. Fig 6.3 Model of a neuron 70 6.2 History of multilayer neural networks The first step toward artificial neural networks came in 1943 when Warren McCulloch and Walter Pitts (1943) wrote a paper “A logical calculus of the ideas immanent in nervous activity” describing how neurons might work that united the studies of neurophysiology and mathematical logic. They modeled a simple neural network with electrical circuits and showed the network in principle, co
Click tabs to swap between content that is broken into logical sections.
Rating  
Title  Ab Initio Molecular Dynamics (Aimd) a New Approach for Development of Accurate Potentials 
Date  20041201 
Author  Malshe, Milind Madhukar 
Department  Mechanical Engineering 
Document Type  
Full Text Type  Open Access 
Abstract  In this study a new approach is presented for the development of accurate potentialenergy hypersurfaces based on ab initio calculations that can be utilized to conduct molecular dynamics and Monte Carlo simulations to study chemical and mechanical properties at the atomistic level. The method integrates ab initio electronic structure calculations with the interpolation capability of multilayer neural networks. A sampling technique based on novelty detection is also developed to ensure that the neural network fitting for the potential energy spans the entire configuration space involved during the simulation. The procedure can be initiated using an empirical potential or direct dynamics simulation. The procedure is applied for developing the potential energy hypersurface for fiveatom clusters within a silicon workpiece. Ab initio calculations were performed using Gaussian 98 electronic structure program. Results for fiveatom silicon clusters representing the bulk and the surface structure are presented. 
Note  Thesis 
Rights  © Oklahoma Agricultural and Mechanical Board of Regents 
Transcript  AB INITIO MOLECULAR DYNAMICS (AIMD) A NEW APPROACH FOR DEVELOPMENT OF ACCURATE POTENTIALS By Milind M Malshe Bachelor of Engineering University of Mumbai Mumbai, India 1998 Submitted to the Faculty of the Graduate College of the Oklahoma State University in partial fulfillment of the requirements for the Degree of MASTER of SCIENCE December, 2004 ii AB INITIO MOLECULAR DYNAMICS (AIMD) A NEW APPROACH FOR DEVELOPMENT OF ACCURATE POTENTIALS Thesis Approved: ________________________________________________ Thesis Advisor ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Dean of the Graduate College Dr. R. Komanduri Dr. S. Roy Dr. H. Lu Dr. M. Hagan Dr. L. M. Raff Dr. A. Gordon Emslie iii 1 2 3 SUMMARY In this study a new approach is presented for the development of accurate potentialenergy hypersurfaces based on ab initio calculations that can be utilized to conduct molecular dynamics and Monte Carlo simulations to study chemical and mechanical properties at the atomistic level. The method integrates ab initio electronic structure calculations with the interpolation capability of multilayer neural networks. A sampling technique based on novelty detection is also developed to ensure that the neural network fitting for the potential energy spans the entire configuration space involved during the simulation. The procedure can be initiated using an empirical potential or direct dynamics simulation. The procedure is applied for developing the potential energy hypersurface for 5atom clusters within a silicon workpiece. Ab initio calculations were performed using Gaussian 98 electronic structure program. Results for 5atom silicon clusters representing the bulk and the surface structure are presented. iv 4 ACKNOWLEDGEMENTS I would like to express my gratitude to my parents and brother for their support, encouragement, love and confidence in me. I would like to thank my advisor, Dr. R. Komanduri, for his inspiration, technical guidance, and financial support. I wish to express my sincere appreciation to Dr. L. M. Raff for introducing me to the concept of Molecular Dynamics and Monte Carlo simulations and for providing guidance and encouragement throughout the study. I would like to thank Dr. M. Hagan for introducing me to the field of Neural Networks and for his invaluable guidance and encouragement. I would like to thank Dr. P. M. Agrawal for his guidance in understanding various aspects of simulation techniques. This project is a result of coordinated effort linking simulation methods and neural networks. I would also like to thank Dr. S. Roy and Dr. H. Lu for their suggestions in this study. I’m grateful to Dr. R. Komanduri and my committee members for helping me to explore and develop new methods for improving simulation techniques. This project is funded by grant DMI0200327 from the National Science Foundation. I thank Dr. W. DeVries, Dr. G. Hazelrigg, Dr. J. Chen, and Dr. D. Durham of the Division of Design, Manufacturing, and Industrial Innovation, Dr. B. M. Kramer, Engineering Centers Division, and Dr. J. Larsen Basse, Tribology and Surface Engineering program, for their interest and support of this work. This project was also sponsored by a DEPSCoR grant on the Multiscale Modeling and Simulation of Material v Processing (F496200310281). I thank Dr. Craig S. Hartley of the AFOSR for his interest in and support of this work. vi TABLE OF CONTENTS 1 Introduction................................................................................................................ 1 2 Literature Review....................................................................................................... 4 2.1 Literature review of design of interatomic potentials ......................................... 4 2.2 Literature review of quantum mechanical calculations of silicon clusters ....... 13 3 Molecular Dynamics Simulation ............................................................................. 17 3.1 Brief history of molecular dynamics simulation............................................... 17 3.2 Applicability of MD simulations ...................................................................... 18 3.3 Formulation of the differential equations of motion......................................... 20 3.4 Time integration algorithms.............................................................................. 22 3.4.1 Verlet algorithm................................................................................ 22 3.4.2 Leapfrog Verlet algorithm............................................................... 23 3.4.3 Velocity Verlet algorithm ................................................................. 24 3.4.4 Beeman algorithm............................................................................. 24 3.5 Methods to improve computational speed ........................................................ 25 3.5.1 Neighbor list and bookkeeping techniques...................................... 25 3.5.2 Cell structures and link lists.............................................................. 25 3.6 Periodic boundary conditions (PBC) ................................................................ 26 4 Interatomic Potentials .............................................................................................. 30 4.1 Design of interatomic potentials...................................................................... 31 vii 4.2 Classification of potential energy functions ..................................................... 32 4.2.1 Pair potentials.................................................................................... 34 4.2.2 Manybody potentials ....................................................................... 35 5 Ab Initio Calculations .............................................................................................. 41 5.1 BornOppenheimer approximation ................................................................... 42 5.2 Basis sets........................................................................................................... 42 5.2.1 Slatertype orbitals (STO)................................................................. 43 5.2.2 Gaussiantype orbitals (GTO)........................................................... 43 5.2.3 Types of basis sets ............................................................................ 44 5.3 HarteeFock method.......................................................................................... 51 5.4 Electron correlation methods ............................................................................ 55 5.4.1 Requirements of electron correlation theories .................................. 55 5.4.2 Configuration interaction (CI) .......................................................... 56 5.4.3 Limited configuration interaction ..................................................... 58 5.4.4 MøllerPlesset Perturbation theory (MP).......................................... 58 5.5 Density functional theory (DFT) ...................................................................... 60 6 Neural Networks ...................................................................................................... 66 6.1 Biological neurons ............................................................................................ 66 6.2 History of multilayer neural networks .............................................................. 70 6.3 Neural networks versus conventional computers ............................................. 71 6.4 Model of a neuron............................................................................................. 72 6.5 Multilayer network............................................................................................ 73 6.6 Transfer functions ............................................................................................. 74 viii 6.7 Neural network training .................................................................................... 75 6.7.1 Supervised training and parameter optimization .............................. 76 6.8 LMS (Least Mean Squared Error) algorithm.................................................... 77 6.9 Backpropagation algorithm............................................................................... 79 6.10 Numerical optimization techniques .......................................................... 82 6.10.1 Conjugate gradient methods ............................................................. 82 6.10.2 Newton and quasiNewton methods ................................................. 83 6.10.3 LevenbergMarquardt algorithm....................................................... 84 6.11 Bias/variance dilemma.............................................................................. 84 6.11.1 Neural network growing and neural network pruning...................... 85 6.12 Regularization and early stopping ............................................................ 86 6.13 Early stopping ........................................................................................... 87 6.14 Normalizing the data................................................................................. 88 6.15 Neural network derivatives ....................................................................... 88 6.15.1 First derivative with respect to weights ............................................ 89 6.15.2 Derivatives of the transfer function .................................................. 90 6.15.3 First derivative with respect to inputs............................................... 91 6.16 Novelty detection...................................................................................... 92 6.16.1 Minimum distance computation ....................................................... 94 7 Ab Initio Molecular Dynamics (AIMD) .................................................................. 97 7.1 Approach........................................................................................................... 97 7.2 Choice of the coordinate system....................................................................... 98 7.2.1 Internal coordinates........................................................................... 99 ix 7.3 Input vector to the neural network.................................................................. 101 7.4 Sorting and ordering the input vector for neural network training ................. 103 7.4.1 Procedure for sorting and ordering the input vector for AIMD...... 105 7.5 Calculation of forces during the MD simulation ............................................ 107 7.6 Sampling technique......................................................................................... 110 7.7 Novelty detection............................................................................................ 112 8 Results and Disscussion......................................................................................... 115 8.1 Ab initio calculations....................................................................................... 116 8.2 Neural network training .................................................................................. 118 8.3 Iterating for the neural network convergence ................................................. 122 8.4 Neural network convergence using different empirical potentials ................. 129 8.5 A method to obtain a conservative force field................................................ 132 8.6 Neural network training for surface and bulk configurations......................... 137 8.7 Neural network training for ab initio energies in carbon clusters of Buckyball (C60) and carbon nanotube ..................................................................................... 140 9 Conclusions and Future Work ............................................................................... 142 10 References.............................................................................................................. 146 11 Appendix................................................................................................................ 165 x 5 LIST OF FIGURES Fig 2.1 The geometries for the neutral and ionic clusters of Si2 to Si6. ............................ 14 Fig 2.2 Scaled cohesive energy vs cluster size for neutral silicon clusters calculated at the MP4/631G* level................................................................................................. 15 Fig 2.3 Incremental binding energy vs cluster size for neutral silicon clusters calculated at the HF/631G* and MP4/631G* levels. .............................................................. 16 Fig 3.1 Cell index method................................................................................................. 26 Fig 3.2 Representation of periodic boundary condition (PBC) and the simulation cell ... 27 Fig 5.1 Description of split valence basis set.................................................................... 47 Fig 5.2 Variation of energy with respect to radius............................................................ 49 Fig 6.1 Schematic of biological neuron ............................................................................ 66 Fig 6.2 Synaptic gap ......................................................................................................... 68 Fig 6.3 Model of a neuron................................................................................................. 69 Fig 6.4 Single input neuron............................................................................................... 73 Fig 6.5 Multilayer network ............................................................................................... 73 Fig 6.6 Sigmoid transfer functions.................................................................................... 74 Fig 6.7 Evaluation of error during training....................................................................... 88 Fig 6.8 Plot of neural network output for f(x) = x4 ........................................................... 92 Fig 7.1 Internal coordinates a) bond distance, b) bond angle, and c) dihedral angle. .... 100 Fig 7.2 Sign convention for dihedral angle..................................................................... 101 Fig 7.3 Configuration of 5 silicon atoms ........................................................................ 101 Fig 7.4 Contour plot of a function f(x,y) = x2+y2 ........................................................... 104 xi Fig 7.5 Effect of ordering of atoms in a configuration ................................................... 105 Fig 8.1 Silicon cluster ..................................................................................................... 115 Fig 8.2 Comparison of neural network output with ab initio energy for the training set120 Fig 8.3 Comparison of neural network output with ab initio energy for the validation set ................................................................................................................................ 121 Fig 8.4 Comparison of neural network output with ab initio energy for the testing set. 121 Fig 8.5 Distribution of minimum distances N( m) for the initial set of Si5 configurations ................................................................................................................................ 122 Fig 8.6 Distribution of minimum distances N( m) of the configurations in first iteration from initial set configurations............................................................................. 123 Fig 8.7 Comparison of distribution of minimum distances N( m) of the configurations in first iteration from initial configurations with the distribution of configurations from Tersoff potential. ........................................................................................ 124 Fig 8.8 Distribution of recomputed minimum distances of concatenated set of configurations in the initial and first iteration..................................................... 125 Fig 8.9 Parzen’s distribution of minimum distances for the initial set of configurations from the Tersoff potential ................................................................................... 126 Fig 8.10 Parzen’s distribution of minimum distances for the configurations stored while using neural network trained for ab initio energy during MD simulation.......... 127 Fig 8.11 Distribution of minimum distances of the configurations during second iteration from configurations during the first iteration...................................................... 128 Fig 8.12 Parzen’s distribution of the minimum distances for the concatenated set of configurations in the initial and first iteration..................................................... 129 xii Fig 8.13 Distribution of minimum distances of all configurations stored using modified Tersoff potential.................................................................................................. 130 Fig 8.14 Comparison of Tersoff and modified Tersoff potential energies (eV)............. 131 Fig 8.15 Comparison of neural network output energy (eV) for .................................... 132 Fig 8.17 Variation of the total energy............................................................................. 135 Fig 8.18 Variation of the potential energy...................................................................... 135 Fig 8.19 Phonon frequencies for silicon ......................................................................... 136 Fig 8.20 Comparison of neural network output with ab initio energy for the training set. ................................................................................................................................ 138 Fig 8.21 Comparison of neural network output with ab initio energy for the validation set ................................................................................................................................ 139 Fig 8.22 Comparison of neural network output with ab initio energy for the testing set139 Fig 8.23 Comparison of neural network output with ab initio energy for the training set ................................................................................................................................ 141 1 6 7 8 CHAPTER 1 1 Introduction Computer simulation techniques play an important role in the study of various chemical, biological and mechanical processes at the atomistic level. By the comparing results of the simulations with experimental observations, theoretical model used in the simulation can be corrected and validated. Once validated, the simulations can be used to study the system under different conditions for which experimental results are not easily available. By analyzing the results from the simulations, experiments can then be performed under specific conditions to achieve desired results. Thus computer simulations can complement both theoretical and experimental approaches. In conjunction with the developments in computer technology, the use of simulations has greatly extended the range of problems that can be studied. In chemistry, simulations are performed to study reaction dynamics. In engineering, simulations are used to study the behavior of a material under varying conditions in various processes such as machining, indentation, and melting. Such simulations can provide a useful insight into important mechanisms such as phase transformation and dislocation dynamics, which otherwise is not easily understood using other techniques. Simulation techniques can be broadly classified into two groups as, stochastic and deterministic. Stochastic simulations are based on statistical mechanics which relates 2 macroscopic properties such as pressure and temperature to the distribution and motion of atoms and molecules, whereas deterministic approach is based on classical mechanics. Examples of deterministic and stochastic approaches are molecular dynamics and Monte Carlo simulations respectively. Molecular dynamics simulations generate information at the microscopic level, such as atomic positions and velocities, by integrating the equations of motions and following the time evolution of a set of interacting atoms. Statistical mechanics deals with macroscopic systems from a molecular point of view, where macroscopic phenomenons are studied from the properties of individual molecules making up the system. The most important part of MD/MC simulation is the development of the potential energy surfaces, which describe the interactions of atoms within a system. It should provide a realistic description of the interatomic interactions to match the experimental observations. The usual approach for developing a potential is to determine a functional form motivated by physical intuition and then to adjust the parameters either to ab initio data and/or some physical properties to come up with an empirical potential. Although, such empirical potentials provide a simple and physically interpretable description for the interatomic interactions their applicability is limited to the type of data to which it was fitted. Once fitted, there is no easy way to improve upon it, without refitting the data. A solution to this problem is to model the system using ab initio electronic structure calculations by solving Schrödinger’s equation, to compute the potential energy and forces from the first principles. Such a technique can provide very accurate description of the interatomic interactions, but are computationally very extensive and hence limited only to small systems involving only a few atoms. 3 In this study a new approach is presented for the development of accurate potentialenergy hypersurfaces based on ab initio calculations that can be utilized to conduct molecular dynamics and Monte Carlo simulations to study chemical, mechanical, and electrical properties at the atomistic level. The method integrates ab initio electronic structure calculations with the interpolation capability of multilayer neural networks. A sampling technique based on novelty detection is also developed to ensure that the neural network fitting for the potential energy spans the entire configuration space involved during the simulation. Chapter 2 gives a literature review of various empirical potentials and different design approaches employed for modeling different situations. Chapter 3 explains the molecular dynamics technique, and various integration algorithms and implementation of periodic boundary conditions. Chapter 4 gives the classification and design of empirical potentials and details about some of the important empirical potentials developed for covalent materials, particularly silicon. Chapter 5 provides a detailed description of ab initio techniques. Different types of basis sets and techniques, such as HartreeFock method, electron correlation methods, such as configuration interaction, perturbation theory and density functional theory are discussed. Chapter 6 discusses multilayer neural networks, training algorithms and techniques to achieve an accurate fit. A sampling technique called novelty detection is also discussed. Chapter 7 presents the new technique for ab initio Molecular Dynamics, and discusses various steps involved. Chapter 8 shows the results for the silicon system. Chapter 9 summarizes the conclusions of this study. 4 9 10 11 CHAPTER 2 2 Literature Review The most important part of the MD/MC simulation is the development of the potential energy surfaces, which can model the system under the consideration sufficiently close to the actual one. In recent years many empirical potentials for silicon such as StillingerWeber potential, Tersoff potential, BoldingAnderson potential and Brenner potential have been developed and applied to number of different systems. 2.1 Literature review of design of interatomic potentials Existing models for interatomic differ in the degree of sophistication, functional form, fitting strategy and the range of applicability where each one is designed for a specific problem. Not only surfaces and small clusters are difficult to model (Balamane et al, 1992; Kaxiras, 1996) even bulk material, including crystalline and amorphous phases, solid defects and liquid phase have not provided a transferable description by a single potential. The usual approach for developing an empirical potential is to arrive at a parameterized functional form motivated by physical intuition and then to find a set of parameters by fitting to either ab initio data or experimental observation, or both for various atomic structures. A covalent material presents a difficult challenge because complex quantum mechanical effects, such as chemical bond formation, hybrdization, metallization, charge transfer and bond bending must be described by an effective 5 interaction between atoms. In spite of a wide range of functional forms and fitting strategies most of these models have been successful in the regime for which they were parameterised, but have shown a lack of transferability. Balamane and Halicioglu ( 1992) performed a comparative study of six classical manybody potentials namely, Pearson, Takai, Halicioglu and Tiller potential; Biswas and Hamann potential; Stillinger and Weber potential; Dodson potential; Tersoff potential (T2); modified Tersoff potential (T3). They concluded that each has strengths and limitations, none appearing clearly superior to the others, and none being fully transferable. The StillingerWeber potential (1985) was parameterized for experimental properties of solid and liquid silicon such as lattice energy and melting point. The model has a form of a third order cluster potential (Carlson, 1990) in which the potential energy is expressed as a linear combination of two and threebody terms, as: < < < = + i j k i j k i j i j E v i j v i j k , , 3 , 2 ( , ) ( , , ) , ( 2.1 ) where v2(i,j) represents twobody interactions and v3(i,j,k) represents threebody interactions. The twobody interactions are expressed as : ( )( ) , ( ) ( / ), 1 2 2 2 = = p q r a ij ij and f A B r r e v r f r ( 2.2) where has energy units and has length units. The threebody interaction is expressed as a separable product of radial functions g(r) and angular function h( ). Thus: 2 3 3 1 ) cos( ) ( ) ( ) , , ( = ijk + ijk ij ik v i j k g r g r , ( 2.3 ) 6 where ijk is the bond angle formed by the bonds ij and ik, and g(r) is a decaying function. The angular function has a minimum at = 109.5°, i.e. the tetrahedral angle to represent the angular preference for sp3 bonding. This potential gives a fairly realistic description of crystalline silicon, point defects, liquid and amorphous phases but provides inadequate description of undercoordinated or overcoordinated structures. Mistriotis et al. (1989) proposed an improvement over StillingerWeber potential which included fourbody interactions as well. It was able to give good agreement with the melting point of the crystal and the geometries and the energies of the ground and low metastable states of silicon clusters. Tersoff (1988; 1989) proposed an approach by coupling twobody and multibody correlations into the model. The potential is based on bond order which takes into account local environment. The bond strength depends on the geometry the more neighbors an atom has, the weaker the bond to each neighbor would be. The energy is the sum of repulsive and attractive interactions which depends on the local bonding environment. Thus: [ ] ( , , ). ( ) ( , ) ( ) ( , ) , 2 1 , , , 3 , = = + k i k j j i i j k ij i j i j C ij R ij ij A and v i j k E f r f i j b f i j ( 2.4 ) The fR and fA terms are repulsive and attractive pair potentials, respectively and fC is a cutoff function. The main feature of this form is the term bij. The idea is that the strength of each bond depends upon the local environment and is lowered when the number of neighbors is relatively high. The term ij defines the effective coordination number of 7 atom taking into account the relative distance of two neighbors (rijrik) and the bond angle ijk. The BoldingAnderson potential (1990) is a general from of Tersoff potential, in which the interaction between pair of atoms dependent on the environment. They suggested that clusters of four or more atoms have local minima on the potential energy surface corresponding to multiple configurations, which involve atoms with high coordination number and strained bonds angles. The increase in potential because of strained bond angles is offset by the large number of weak bonds formed. Their potential incorporated angle dependence that is different for clusters and crystals by introducing interference functions. Bazant and Kaxiras (1997) developed environment dependent interatomic potential (EDIP) for bulk silicon. The interaction model included twobody, threebody terms which depend on the local atomic environment through an effective coordination number given by: V2(r,Z) = R(r) + p(Z) A(r), where R(r) represents the short range repulsion of atoms, A(r) represents the attractive force of bond formation, and p(Z) is the bond order, which determines the strength of the attraction as a function of the atomic environment, measured by the coordination Z. The explanation of the bond order is that, as an atom becomes overcoordinated beyond the ideal coordination beyond its valence, the bonds become more metallic, characterized by delocalized electrons (Carlsson, 1985; 1990; Abell, 1985; Pettifor, 1990). They provided a test of empirical theories of covalent bonding in solids using a procedure to invert ab initio cohesive energy curves. They showed that the inversion of ab initio cohesive 8 energy curves verified trends in chemical bonding across various bulk bonding arrangements consistent with theoretical predictions (Bazant and Kaxiras, 1996). Their results indicated that a coordination dependent pair interaction can provide a fair description of high symmetry crystal structures without requiring additional manybody interactions, and angular forces are needed only to stabilize the structure under symmetry breaking distortion, primarily for small coordinations. In spite of various approaches, no potential has demonstrated a transferable description of molecule in all its forms leading to the conclusion that it may be too ambitious to attempt a simultaneous fit of all of the important atomic structures (bulk, crystalline, surface, amorphous, liquid phase and clusters), since qualitatively different aspects of bonding are at work in different types of structures. Rahman and Raff (2001) presented analytical fitting to the ab initio data for dissociation dynamics of vinyl bromide. Their approach was to develop analytic functions for each of the energetically open reaction channels using functional forms suggested by physical and chemical considerations. The parameters of these functions are expressed as appropriate functions of system’s geometry. These channel potentials are connected smoothly with parameterized switching functions that play a central role in fitting the potential barriers, the reaction coordinate curvatures, and the coordinate coupling terms obtained by the ab initio calculations. Ischtwan and Collins (1994) have developed a moving interpolation technique in which the potential energy in the neighborhood of any point is approximated by Taylor’s series using inverse bond length coordinates as the expansion variables. The data from ab initio calculations is used to obtain a set of initial Taylor series expansions. The 9 procedure expresses the potential at any configuration as a weighed average of the Taylor series about all points in the data set. Subsequently, the results are iteratively improved by computing trajectories on the Taylor series fitted surface and the internal coordinates are stored. These new points are added to the data set according to a weight factor that is determined by the relative density of points in the data set in the region of the new point. The criterion used to assess convergence of the iteratively improved potential to the final surface was computed using dynamic variable, such as reaction probability. Jordan et al. (1995) used the moving interpolation trajectory sampling method to investigate the reaction dynamics. Maisuradze et al. (2003) introduced an interpolating moving least squares (IMLS) method for the interpolation between the computed ab initio points. When the method is unrestricted, the least squares coefficients are obtained from the solution of a large matrix equation that must be solved repeatedly during a trajectory study. Another approach involves bypassing the physical motivation behind the functional form of the potential in favor of elaborate fitting as described by Ercolessi and Adams, (1994). The potential involves combinations of cubic splines with effectively hundreds of adjustable parameters and the approach was force matching method, which involves fitting the potential to ab initio atomic forces of many atomic configurations including surfaces, clusters, liquids, and crystals. Potential form is defined by single variable functions atomic coordinates, such as glue potential which is defined as: = + i j ij i j ij V (r ) U( ) r 2 1 , , ( 2.5 ) where ( ) ij r is a pair potential, ( ) ij r is a function of atomic density and U(n) is the glue function. They attempted to match the forces from first principle calculations for a large 10 set of configurations with those predicted by a classical potential, by minimizing the objective function, namely, ( ) F ( ) C ( ) Z = Z + Z , ( 2.6 ) where ZF contains ab initio data and ZC consists of physical quantities from experimental results. If the emphasis is given to ZF, then the minimizing potential appears as the best approximation of the first principles system, with ZC acting as a guide towards the correct region in the space of parameters. On the other hand, if the emphasis is on ZC, then the method looks like conventional fit, but the ZF term relieves the burden of guessing the functional form. Another approach utilizes genetic algorithms and genetic programming. The idea is to let the computer program determine the functional form and come up with the best possible solution with minimal human input. The concept is based on genetics and evolution theory, in which different genetic representations of a set of variables are combined to produce a better solution. The technique is stochastic in nature rather than conventional gradient based approaches and hence the dimensionality of the problem does not pose a serious limitation. It works by randomly generating and exchanging functional elements (variables and arithmetic operators, such as addition and multiplication), and selecting the fittest individuals (which represent a set of parameters) assessed by comparing with target values, a population of potential evolves until a superior form emerges. Makarov and Metiu (1998) proposed a procedure by which genetic programming could be used to find the best functional form and the best set of parameters to fit the energy and derivatives from ab initio calculations. They suggested directed search which guides the computer to use a specified functional form without 11 limiting too much on the flexibility of the search. Directed search was used to fill in portions of human engineered functional template, without which the computer could not find a simple interpretable form even for a pair potential. As an alternative to these methods Blank et al. (1993; 1995) explored an interpolation method based on feed forward neural networks. Functional approximation using neural networks does not require any assumption regarding the functional form of the potential. Multilayer feed forward neural networks could fit in principle to any real valued continuous function of any dimension to arbitrary accuracy provided it has sufficient number of neurons (Cybenko, 1989). Blank et al. used neural network to fit data sets produced by low dimensional models of CO molecule chemisorbed on a Ni(111) surface. The models were constrained versions of many dimensional empirical potential that had been used in the simulation of surface dynamics where it had been shown to give a reliable qualitative picture of the molecule surface interaction. Hobday et al. (1999) developed a neural network model for more complex CN system for which the potential energy function was not parameterized. They used the neural network to fit the manybody term in the Tersoff functional form rather than fitting the entire potential. In contrast to the conventional network architecture, where a single input vector is presented to network, a set of N vectors where N is the number of neighbors of the two atoms i and j, not including i and j. The value of N depends on the ij bond. For example if the ij bond is a part of monocyclic chain, then N=2. If the ij bond is tetrahedrally coordinated, then N=6. To present the local environment to the network, the bond order term is expressed as a function of ijk contributions, where k is the first neighbor of i or j. The input network consists of N vectors, where N is the number of 12 different ijk contributions. Each vector of the input data consisted of pairwise information, such as direction cosines, bond lengths, cutoff functions, and first neighbor information, such as its bond lengths and torsional angles and second neighbor information, such as the number of types of atoms used to describe the ijk atoms triplet. The size of the input vector was constant, but the number of input vectors N is a variable which could change during the simulation. Car and Parrinello (1985) developed a unified approach for molecular dynamics and density functional theory. In this method, the ions are still moved classically but under the action of forces obtained by solving the electronic structure problem, thus eliminating the need for empirical potentials at the expense of much larger computational requirements. This technique is known as CarParrinello molecular dynamics method. The electron density in terms of occupied single particle orthonormal orbitals is written as: = i i n r r 2 ( ) ( ) . ( 2.7 ) A point on the BornOppenheimer potential energy surface is given by the minimum with respect to i of the energy functional. In the conventional formulation, minimization of the energy functional with respect to orbitals i, subject to the orthonormality constraint leads to the self consistent KohnSham equations. Its solution involves repeated matrix diagonalization. Car and Parrinello (1985) adopted a different approach by regarding the minimization of the KohnSham (1965) functional as a complex optimization problem, which could be solved by applying optimization method called “simulated annealing” introduced by Kirkpatrick et al. (1983). In this approach, an objective function is minimized relative to the parameters with Boltzman type probability distribution via a 13 Monte Carlo procedure. In CarParrinello method the objective function is the total energy functional and the variational parameters are the coefficients of the expansion of KohnSham orbitals. They found the simulated annealing strategy based on the MD, rather than on Metropolis Monte Carlo of Kirkpatrick et al. (1983) could be applied efficiently to minimize the KohnSham functional. This technique called “dynamical simulated annealing” was found to be useful to study finite temperature properties. 2.2 Literature review of quantum mechanical calculations of silicon clusters Raghavachari (1985; 1986; 1988) investigated the geometries and energies of silicon clusters using ab initio calculations. He considered several geometrical arrangements and electronic states. All geometries considered were optimized with several basis sets (minimal basis set STO3G, 631G) at HartreeFock level of theory and incase of open shell species (triplets, quintets, etc.) unrestricted HartreeFock was used. Then single point calculations were performed at these geometries using complete fourth order perturbation (MP4) theory with the polarized 631G* basis set. For the clusters of five silicon atoms, the geometries considered were trigonal bipyramid, square pyramid, planar pentagon, linear structure, and the tetrahedral structure (which is the equilibrium structure for the bulk). He found the trigonal bipyramid with a singlet state to be the most stable structure of all the structures considered. In the case of square pyramid atom forms four equivalent bonds, all on one side of a plane which is not a stable structure and the molecule prefers to rearrange to more stable triangular bipyramid structure in spite of lower number of bonds present. They found the tetrahedral structure of five silicon atoms to have high energy. Though the central atom in such a cluster has 14 four bonds oriented in tetrahedral directions, the remaining four silicon atoms have only one bond each and are too distant from each other to interact favorably. Fig 2.1 The geometries for the neutral and ionic clusters of Si2 to Si6, (Raghavachari, 1985). (Numbers in parentheses indicate bond lengths and bond angles for the ions). Fig 2.1shows the ground state equilibrium structures for the neutral and ionic structures of Si2 to Si6. Consideration of the structures Si3 to Si7 revealed that each cluster Sin can be built from a smaller cluster Sin1 by addition of a silicon atom at an appropriate edge or face capped bonding site. Edge capped structures were found to be favored in the case of smaller clusters while the face capped clusters became comparable in energy for the intermediate clusters. For example the rhombus structure of Si4 could be considered as an edge capped triangular form, which is stable compared to the face capped tetrahedron. Many atoms in larger clusters Si7 to Si10 were found to have a coordination number of 6 (greater than 4) for the bulk. It indicated that unlike a closepacked metallic system there may be saturation of coordination at 6 and further bonding caused overcrowding and destabilization. He concluded that the principal differences between these clusters and the bulk were due to large fraction of surface atoms in the clusters. 15 These surface atoms provide a large driving force (analogous to the surface tension) to form more compact structures provided the resulting strain energy is not too high. In this sense there was a correspondence between the local coordination found in clusters and high pressure form of silicon ( tin structure) which has hexacoordinate silicon in octahedral environment. Fig 2.2 Scaled cohesive energy vs. cluster size for neutral silicon clusters calculated at the MP4/6 31G* level, (Raghavachari, 1985). Fig 2.2shows the scaled cohesive energy as a function of the size of the cluster. The cohesive energy generally increases with some indication of special stability at cluster sizes 4, 7 and 10. Si7 and Si10 approach the bulk cohesive energy more closely. Smaller clusters have low cohesive energy because of the difference in the number of bonds between the clusters and the bulk and not because of the nature of bonding. The local relative stabilities of different size clusters could be compared by the incremental binding energy (which is defined as the energy required in the reaction Sin Sin1+Si). Fig 2.3 shows an incremental binding energy as a function of the size of the cluster at HF/631G* and MP4/631G* levels. 16 Fig 2.3 Incremental binding energy vs. cluster size for neutral silicon clusters calculated at the HF/6 31G* and MP4/631G* levels (Raghavachari and Rohlfing, 1988). The peaks at cluster sizes of 4, 6, 7 and 10 indicate that these structures are locally stable, consistent with the prominent presence of 6, 7 and 10 in the mass spectral distributions of these clusters. 17 12 13 14 CHAPTER 3 3 Molecular Dynamics Simulation Molecular Dynamics is a computer simulation technique where the time evolution of a set of interacting atoms is followed by integrating the equations of motions. While the continuum mechanics approach provides a better understanding of the processes at the macro and micro regime, it cannot be implemented to simulate the processes at the atomistic regime. Unlike in finite element analysis or other continuum approaches, where the nodes and distances are selected arbitrarily, in molecular dynamics it is based on more fundamental unit for a specific material such as: lattice spacing and interatomic distances. Thus the processes can be studied at the fundamental level. However, since a large number of atoms constitute any common material, one has to consider the interactions of several thousands of atoms even for nanometric study. Such simulations require large amount of computing resources but in return can provide an insight into processes happening at the smallest level, namely, atomistic level. 3.1 Brief history of molecular dynamics simulation The molecular dynamics method was first introduced by Alder and Wainwright in the late 1950's (Alder and Wainwright, 1957, 1959) to study the interactions of hard spheres. The purpose of the paper was to investigate the phase diagram of a hard sphere system and in particular the solid and liquid regions. In a hard sphere system, particles 18 interact via instantaneous collisions, and travel as free particles between collisions. Gibson et al. (1960) studied the dynamics of radiation damage. It was the first example of molecular dynamics calculation with a continuous potential based on a finite time integration method. Rahman (1964) carried out perhaps the first simulation using a realistic potential for liquid argon. He studied properties of liquid argon using Lennard Jones potential. Rahman and Stillinger (1974) carried out simulation of liquid water. Verlet calculated the phase diagram of argon using the LennardJones potential, and computed correlation functions to test theories of the liquid state. The bookkeeping device, which became known as Verlet neighbor list was also introduced. The Verlet time integration algorithm was used. Phase transitions in the same system were investigated by Hansen and Verlet (1969). The number of simulation techniques has expanded since then; there exist now many specialized techniques for particular problems, including mixed quantum mechanical  classical simulations. 3.2 Applicability of MD simulations Molecular dynamics follows the laws of classical mechanics. At the atomistic level, the system follows laws of quantum mechanics rather than classical mechanics. A classical description of the system involved can be used, when quantum effects, such as tunneling, and electronic excitations, play no essential role in the dynamics. In addition to this, the kinetic energy of a particle must be large enough, to ensure that the de Broglie wavelength is much smaller than the lattice constant of the solid. A simple test of the validity of the classical approximation is based on the de Broglie thermal wavelength defined as: 19 M k T h 2 B 2 = , (3.1) where M is the atomic mass and T is the temperature. The classical approximation is justified if << a , where a is the mean nearest neighbor distance. The classical approximation is poor for lighter elements, such as hydrogen, helium. Moreover, quantum effects become important in any system when T is sufficiently low. The drop in the specific heat of crystals below the Debye temperature or the anomalous behavior of the thermal expansion coefficient, are well known examples of measurable quantum effects in solids. Hence, molecular dynamics results should be interpreted with caution in these regions. An important issue of MD simulations is the temporal scale that can be explored. The maximum time step with which Newton’s equations can be integrated numerically is inherently restricted to small values (about 1 fs), in order to reproduce atomic vibrations accurately. As a result molecular dynamics simulations require long computational times because the most interesting motions occurring during the simulation processes are very slow compared with the fast oscillations of bond lengths and bond angles that limit the integration time step. This limits the time scales that can be handled by MD in reasonable amount of actual clock time. MD simulations are usually limited to molecular processes happening in relatively short times, the so called “rare events” phenomenon in which infrequent processes are completed quickly, such as dissociation of a chemical bond which is a rapid process that is completed on the picosecond time scale. However the application of external load is longer by orders of magnitude compared with the dissociation time, making it difficult to observe such events at experimental speeds. In order to overcome the time scale limitation of MD, two approaches are usually followed. 20 In the first approach coarsegraining, the dynamic variables are divided into slow and fast degrees of freedom and averaging is performed on the fast degrees of freedom. In the second approach fast (and rare) trajectories between desired states are computed explicitly, and their probabilities (relative to nonreactive trajectories) are calculated. 3.3 Formulation of the differential equations of motion The forces between the atoms are computed explicitly and the motion of the molecule is followed over a period of time using a suitable numerical integration method. Classical mechanics describes how physical objects move and how their positions change with time. Consider an isolated system consisting of N bodies with coordinates (xi, yi, zi) where i=1, 2, 3… N (Goldstein, 1965). By isolated system it means that all other bodies are sufficiently remote to have a negligible influence on the system. Each of the N bodies is assumed to be small enough to be treated as a point particle. The position of the ith body with respect to a given inertial frame is denoted as ri (t). Its velocity and acceleration are given by vi (t) =( r) t i & and ai (t) = r (t) i && . Each atom has a mass mi. From Newton’s second law: i i i F m ar r = where i F r is the total force acting on an atom. This force is composed of a sum of forces due to each of the other interacting atoms in the system. If the force on ith body due to jth body is denoted as Fij, then = = N i j j i ij F F 1 r r , where Fii is zero, since there no force on ith atom due to itself. dt d p dt d mv d t d r F m i i i i r r r r = = = ( ) 2 2 , (3.2) where i p r is the momentum of ith body 21 The force on the atom i is the gradient of the potential energy function with respect to the coordinates of atom i. ( , ,... ) i i 1 2 N F V r r rr r r r = , (3.3) where V is the potential energy function. i rr is the position vector of atom i, xi, yi, and zi, are the Cartesian coordinates of atom i. r x i y j z k i i i i r = ˆ+ ˆ + ˆ , and k z j y i x i i i i ˆ ˆ ˆ + + = . (3.4) Now, force in the X direction on atom i is given by: x V dt d mv dt d p F x x x = = = ( ) . (3.5) The velocity of atom i in X direction is: m p d t d x v x x = = . (3.6) Similar equations can be derived in Y and Z directions. It should be noted that to update the position of an atom, one has to solve 6 coupled first order differential equations. If the number of atoms in the system is N, then the number of differential equations to be solved is 6N. If the potential function is a simple pairwise potential, such as Lennard Jones or Morse potential, then the total number of terms in such a simple potential is N(N1)/2. Thus, as an example, a system of 2000 atoms requires integration of 12,000 coupled first order differential equations of motion and about 2 million pairwise terms need to be calculated each time a derivative is evaluated. Such evaluations must be done for every integration step for every trajectory calculation. The computational time therefore, increases rapidly as the number of atoms in the system increase. 22 3.4 Time integration algorithms The MD trajectories are defined by both position and velocity vectors and they describe the time evolution of the system. Accordingly the positions and velocities are propagated with a finite time interval by solving a set of coupled first order differential equations. Time integration algorithms are based on finite difference methods, where time is discretized on a finite grid. These are based on Taylor series expansion truncated after some finite number of terms. Truncation error varies as ( t)n , where t is the time step and n is the order of the method (which depends on the number of time derivatives considered in the expansion). Hence, for higher order methods, the truncation error drops off rapidly even with a small reduction in the time step. The idea of the numerical integration of Newton’s equations of motion is to find an expression that defines positions at time t + t in terms of already known positions and their time derivatives, such as velocities and accelerations at time t. Thus: ( ) ( ) ( ) ... ( ) ... 2 1 ( ) ( ) ( ) ( ) ... 2 1 ( ) ( ) ( ) 2 2 + = + + + = + + + + = + + + a t t a t b t t v t t v t a t t b t t r t t r t v t t a t t (3.7) 3.4.1 Verlet algorithm This is perhaps the most widely used method of integrating the equations of motion, initially developed by Verlet (1967) and attributed to Stömer (Gear, 1971). The Verlet algorithm uses acceleration at time t and positions at time t and t t to calculate the new positions at time t+ t . The algorithm is time reversible, and given the conservative force field, it guarantees the conservation of the linear momentum. 23 ( ) ... 2 1 ( ) ( ) ( ) ( ) ... 2 1 ( ) ( ) ( ) 2 2 = + + + = + + + r t t r t v t t a t t r t t r t v t t a t t (3.8 A) (3.8 B) Adding Eqn. (3.8 A) and (3.8 B), and neglecting higher order terms, r(t + t)= 2r(t) r(t tt)+t a( ) 2 and the velocity is computed as, t r t t r t t v t = + 2 ( ) ( ) ( ) The basic Verlet algorithm does not uses velocities explicitly which may introduce numerical imprecision, since the local error in updating the positions is proportional to t 4, whereas the error in updating the velocities is proportional to t 2. 3.4.2 Leap frog Verlet algorithm This is a modification of the Verlet algorithm in which velocities are computed at half time step as well. The algorithm is given as, v t t v t t a t t r t t r t t v t t + = + + = + + ) ( ) 2 1 ) ( 2 1 ( ) 2 1 ( ) ( ) ( (3.9) The advantage of this method is that the velocities are explicitly calculated, but the disadvantage is that they are not calculated at the same time as the positions. The velocities at time t can be approximated as, = + + ) 2 1 ) ( 2 1 ( 2 1 v(t) v tt tt v (3.10) 24 3.4.3 Velocity Verlet algorithm The velocity Verlet algorithm provides a better description of the velocities, without the need to compute the velocities at half the time step. v t t v t [a t a t t ] t r t t r t v t t a t t + = + + + + = + + ( ) ( ) 2 1 ( ) ( ) ( ) 2 1 ( ) ( ) ( ) 2 (3.11) 3.4.4 Beeman algorithm Beeman algorithm (1976) provides more accurate expression for velocities v t t v t a t t t a t t a t t t r t t r t v t t a t t a t t t + = + + + + = + + ( ) 6 1 ( ) 6 5 ( ) 3 1 ( ) ( ) ( ) 6 1 ( ) 3 2 ( ) ( ) ( ) 2 2 (3.12) It should be noted that these algorithms assume that the force remains constant while updating the positions from t to t+ t . To circumvent this assumption, higher order terms involving higher time derivatives should be included in the algorithm. It may be noted that the above mentioned methods are self starting methods i.e. it is only necessary to know the positions and velocities at time t=t0. An estimate of accuracy during the integration can be obtained by monitoring the total energy of the system and the angular momentum which should be conserved. Step size reduction and back integration are other resources to check the accuracy of the integration results. Predictorcorrector methods provide an automatic error estimate at each integration step, allowing the program to employ a variable step size to achieve a specified accuracy. However, these methods are not self starting. Therefore it is necessary to use velocity Verlet or similar single step methods to start the integration. 25 3.5 Methods to improve computational speed 3.5.1 Neighbor list and bookkeeping techniques In the MD/MC simulations distances of atom i to all other atoms in the system are computed. For the calculation of potential and forces only the atoms that are separated by distances smaller than the potential cutoff are considered. The time to calculate all pair bond distances is proportional to N2, where N is the number of atoms in the system. Verlet (1967) suggested a technique for improving the speed by maintaining a list of neighbors of a particular atom, which is updated at intervals. Between the updates of the neighbor list, only the bond distances of atoms appearing in the list are computed. This procedure reduces the computational time to some extent. 3.5.2 Cell structures and link lists As the size of the system increases the conventional neighbor list becomes too large to store easily and testing of every pair in the system is inefficient. An alternative method of keeping track of neighbors for large systems is the cell index method. (Quentrec and Brot, 1975; Hockney and Eastwood, 1981). The cubic simulation box is divided into a regular lattice of M x M x M cells. A 2 dimensional representation is shown in Fig 3.1.These cells are chosen such that the side of the cell l=L/M is greater than the cutoff distance of the potential. 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Fig 3.1 Cell index method The first part of the method involves sorting all atoms into appropriate cells. For this 2 dimensional example, neighbors of any atom in the cell 13 can be found in cells 13, 7, 8, 9, 12, 14, 17, 18, 19. Now searching for the neighbors is fast and efficient. For a 2 dimensional system there are approximately Nc = N/M2 atoms in each cell and for a 3 dimensional system there would be Nc= N/M3 atoms in each cell. Using the cell structure method only 9NNc pairs have to computed for a 2dimensional system and in 3 dimensional system 27NNc pairs as opposed to ½ N(N1) in the conventional way. The cell structure can be set up and implemented in a linked list. The linked list array stores the number of the next atom in the cell, i.e. the list array element for an atom is the next of next atom in the cell and so on. 3.6 Periodic boundary conditions (PBC) No matter how large the simulated system is its number of atoms would be negligible compared to the number of atoms that make up the macroscopic specimen (of the order of 1023). As a result, in the simulation model the ratio of the number of surface atoms to the total number of atoms would be much larger than in reality, causing surface effects to be much more important than what they would be. A solution to this problem is the use of periodic boundary conditions. Periodic boundary is a mathematical formulation 27 to make a simulation that contains only a few hundred atoms to behave as though it is infinite in size. In doing so it also removes the surface effects since atoms near the boundary have less neighbor than the atoms inside. While using PBCs, the atoms are considered to be enclosed in a box, and this box is replicated by translation in all three Cartesian directions to form an infinite lattice. In effect every atom in the simulation box has an exact duplicate in each of the surrounding cells with the same velocities. Fig 3.2 Representation of periodic boundary condition (PBC) and the simulation cell Fig 3.2 shows the representation of the periodic boundary along with the main simulation cell. rcut is the cutoff used in the potential. If an atom is located at a position r in the box then this particle represents an infinite set of particle located at: r+ la+ mb + nc, (3.13) where l, m and n are integer numbers, and a, b and c indicates the vectors corresponding to the edges of the box. All these image particles move together, but only one of them is represented in the computer program. Each particle should be thought of as interacting not only with other particles in the box but also with their images in the surrounding boxes. Whenever an atom leaves the simulation cell, it is replaced by another with the 28 same velocity entering from the opposite cell face, so the number of atoms in the simulation cell is conserved. Therefore no atom feels any surface effects. An artifact of this technique is that if the simulation cell is too small and the cutoff radius is greater than half the length of the simulation cell then an atom i can interact with atom j as well as its image j1 in the neighboring cell. As a result the interaction between atoms i and j is effectively counted twice as: ij and ij1. The minimum image criterion states that if the box size is greater than 2 rc, where rc is the cutoff radius of the potential then among all pairs formed by an atom i in the box and the set of all periodic images of atom j atmost one will interact. To demonstrate this, if suppose an atom i interacts with two images j1 and j2 of an atom j. These two images must be separated by a distance of at least 2 rc. In order to interact with both j1 and j2, i should be within a distance rc from each of them, which is impossible since they are separated by more than 2 rc. Using the minimum image criterion, only the closest image of an atom j if within the cutoff radius rc will interact with an atom i. Hence, it is very important that the box size of the simulation cell is at least greater than twice the cutoff radius specified for the potential. Long range interactions such as Columbic interactions larger than the simulation cell are treated using Ewald summation which was developed for studying ionic crystals (Ewald 1921; Madelung 1918). Using the minimum image criterion, the periodic boundaries are modeled by modifying the distance calculation. The difference between the x coordinate of an atom i and j is denoted as dx, then if dx is greater than half the box size in x direction (L/2) then atom j does not interact with atom i. Instead, its image j1 in a surrounding cell is considered to be bonded with atom i. 29 If, dx > L/2 then dx = dx – L, If, dx < L/2 then dx = dx + L. (3.14) Alternatively, = L dx dx dx L anint , (3.15) where L is the box size in that particular Cartesian direction and anint is a function that rounds off the ratio L dx to the nearest integer value. Similar procedure is applied in the Y and Zdirections, and the distance is computed as: dx2 + dy2 + dz 2 . (3.16) 30 15 CHAPTER 4 4 Interatomic Potentials The interatomic potential used in a simulation to model the lattice structure plays an important role in determining the accuracy of the simulation results. The accuracy of the trajectories obtained from MD simulation is affected by the choice of the potential energy function. The total energy of the system is the sum of kinetic energy (KE) and the potential energy (PE). The KE is simple to compute but the PE computation is complex since it depends on the positions of all interacting atoms. The complexity of the potential also determines the computational time required for the simulation. The force acting on an atom is proportional to the first derivative of the potential energy function. The largest part of a molecular dynamics simulation is the evaluation of the forces which are required to calculate new positions of atoms. The potential energy can be obtained using two approaches. One approach is to perform ab initio calculations by solving the Schrodinger’s equation for the entire system at each integration step. Although, theoritically this is the best approach (without any arbitrary assumptions and ad hoc parameterization), it is not feasible to perform such calculations for the entire system at every time step. For larger systems comparitively simple and computationally efficient empirical potential functions are used which take into account factors such as, bond stretching, bending and torsions, covalent bonding and nonbonded (Van der Waals and Coulomb) interactions. 31 4.1 Design of interatomic potentials The traditional approach is to completely get rid of electronic degrees of freedom and move the nuclei using some appropriate analytical potential function. The functional form of this potential is chosen so as to mimic the behavior of the true potential in realistic ways for specific materials under specific conditions. Constructing the potential involves two steps: 1. Selecting an analytical form for the potential. This could be sums of pairwise terms where the energy of the pair depends on the relative distance, or a manybody form appropriate for specific type of bonding involving bond distances, angles, and coordination. 2. Once the form is decided then the next step is to determine specific parameterization of the functions. A set of parameters is found which fits the data from ab initio calculations or experimental properties, such as, density, cohesive energy and phonon frequencies, or a data set containing some combination of both theoretical and experimental observations. The most important factor is the range of applicability of the potential energy function. Since ad hoc functional form is used, it has no theoretical foundation and once the empirical potential is developed there is no straightforward means to improve it. Normally, the parameters are adjusted to the equilibrium data which means that the potential can be used to model the bulk properties of the crystal lattice close to equilibrium conditions. So it is unlikely that the energies and forces for amorphous structures will be accurately represented by the potential. For example, a potential function designed for bulk may not be appropriate for studying the bond dissociation or 32 diatomic molecule of the same element, since the environment would be very different from the one for which it was designed. However, it may be possible to model the bulk and surface where the environment differs due to the reduced coordination of the atoms at the surface. These problems are intractable and one should be critical while using analytical potentials for simulating high temperature and pressure conditions. 4.2 Classification of potential energy functions The empirical potentials are further classified into twobody, threebody and manybody potential forms. Most of the empirical potential forms comprise an attractive and a repulsive term. The term empirical may be misleading as it may not be strictly empirical as the term suggests (Komanduri and Raff, 1999). These potentials can provide a more realistic model than the potentials derived from purely theoretical considerations (Torrens, 1972) since parameterization of these potentials can take into account some physical properties as well. Empirical potential forms are based on mathematical expressions for pairwise interaction between two atoms or ions, which may or may not be justified from theory. These forms can be expressed in terms of attractive and repulsive terms. By convention repulsive forces are considered positive and attractive forces as negative. As the distance between two atoms decreases, both attractive and repulsive forces increase. The repulsive forces increase more rapidly than the attractive forces. The curvature of the potential energy function is mainly determined by the repulsive force component, which in turn also governs the elastic behavior of the solid. When attractive and repulsive forces exactly balance each other, this corresponds to the equilibrium position with the potential energy being minimum, which is also the bond energy. The cohesive properties of solids, 33 such as melting and vaporization behavior are determined by the magnitude of the maximum binding energy, which is governed by attractive force component. Higher the bonding energy, the higher is the melting temperature and the elastic modulus and lower is the coefficient of thermal expansion. Several empirical potential functions have been developed suitable for different materials. Some examples of simple pair potentials are Lennard Jones (also known as 612) (1925), Morse potential (1929) and BornMeyer potential (1933). Also for complex systems such as one involving covalent bonding common potential functions are StillingerWeber (1985), Tersoff (1988, 1989), BoldingAnderson potential (1990), Brenner potential (1990), and embedded atom potential and modified emebedded atom potential (MEAM) (Baskes, 1989). Cutoff radius In general, each atom can interact with all other atoms in the simulation. One method of increasing the computational speed is to limit the range of the potential. Normally the potential functions are truncated at a certain value of bond distance called the “cutoff radius”. By neglecting the interaction beyond the cutoff radius, the computational time is significantly reduced with very little loss in accuracy. Truncation of the potential energy function also results in the truncation of the forces. The cutoff distance is generally taken as the distance where the potential energy is ~35% of the equilibrium potentials energy. Consequently, a finite cutoff results in small fluctuations in total energy instead of strictly constant energy. 34 4.2.1 Pair potentials Pair potentials are applicable for many metals for atomistic studies. Pair potentials can be classified into two basic categories (Vitek, 1996), one in which the potential determnies the total energy of the system and the second type determines the change in energy when a configuration varies under constant density conditions. Lennard Jones potential (Lennard Jones, 1925) is given as: ! !" # = 12 6 ( ) 4 r r V r LJ $ $ , (4.1) where and$ are parameters that determine the well depth and the position of the potential minimum, respectively. One of the most commonly used forms of the pair potential is the Morse potential (Morse, 1929). It is used to model the interaction between atoms of two different materials. It is given as: V(r)=D(1 e ( r re ) )2 , or ( ) c r r r r V(r)=D e 2 ( e ) 2e ( e ) ...r % r . (4.2) where r is the bond distance between the two atoms, and rc is the cutoff radius beyond which the potential is truncated to zero. Potential values computed using the two forms given in Eqn. (4.2) differes by a factor of D, which is a constant, and as a result the forces computed using the two forms are the same and either of the forms could be used in simulation. The potential parameters D, , re are adjusted to the measured sublimation enthalpy, Debye temperature, and the nearest neighbor spacing for the material. Garifalco and Weizer (1959) calculated Morse potential parameters using experimental values for energy of vaporization, lattice constants and compressibility. Morse potential can model FCC metals well but not BCC metals and covalent materials. A general feature of a 35 physically reasonable pair potential is that they favor formation of closed packed structures. So they are unsuitable for describing covalent systems which assume more open structure. 4.2.2 Manybody potentials Pair potentials, such as LennardJones potential are suitable to model interactions of inert gases, such as Helium, Argon, and Xenon. A covalent material represents a difficult challenge because of complex quantum mechanical effects, such as chemical bond formation, hybridization, metallization, charge transfer, and bond bending, which must be described by an effective interaction between atoms. For instance, silicon undergoes a series of structural phase transitions from tetrahedral to a denser phase called as tin to simple cubic to FCC under pressure. This indicates that the energy difference between these structures is not too large which suggests that the cohesive energy is nearly independent of the coordination. The pairwise potential model would favor the more packed structure. Consequently, no reasonable pair potential can stabilize the diamond tetrahedral structure without considering the angular terms. Various methods have been introduced to circumvent this problem. Several authors have introduced potentials with a strong angular dependence (Stillinger and Weber, 1985; Biswas and Hamman, 1985; Baskes, 1987; Baskes et al, 1989). Pettifor (1989) developed similar approach from tight binding. Models based on the bond charge (Brenner and Garrison, 1985), and a function of local coordination (Tersoff, 1986) have also been developed. Most of these models have been successful in the regime for which they were parameterised but have shown a lack of transferability. A survey of six such potentials (Balamane and Halicioglu, 1992) concluded that each has strengths and limitations, none appearing clearly superior to 36 others, and none being fully transferable. Recently the environment dependent interatomic potential (EDIP) (Justo et al, 1997, 1998; Bazant et al, 1997) have been developed as an improvement over previous model. The other group of potentials (Kane, 1985; Keating, 1966) are constructed to accurately describe small distortions from the ground state in complex systems, such as diamond structure of semiconductors. The Keating model uses Taylor series expansion of the energy about its minimum. It can give accurate descriptions of small displacements but become progressively less accurate for large displacements. Such potentials are useful for describing phonons and elastic deformations. In general, any manybody potential energy function describing interactions among N identical particles can generally be resolved into onebody, twobody, threebody etc. contributions as follows: ( ) ( , ) ( , , ) ... (1,...., ) , , 3 , 1 2 E V i V i j V i j k V N N i j k i j k i j i i j = + + + + < < < . (4.3) The single particle potential V1 describes external forces. The first term which describes the interactions of atoms is the second term, which has a form of pair potential. In order that this description to be useful in theoretical modeling it is necessary that the component functions Vn quickly converges to zero with increasing n. The first step towards a manybody form is to include threebody terms, by favoring the bond angles corresponding to those of diamond structure. Stillinger and Weber (1985) proposed such a potential which has the form: 2 3 1 ( ) ( ) ( ) cos( ) 2 1 = + ijk + ijk ij ik ij ij V r g r g r , (4.4) where ijk is the bond angle formed by the bonds ij and ik and g(r) is a decaying function with a cutoff between the first and second neighbor. The bond angles in diamond 37 tetrahedral structure are ~ 109.5° and the cosine of this angle is ~ 1/3. This makes the term cos( ijk)+1/3 to go to zero for ~ 109.5°, which makes tetrahedral structure more stable than compact structure. This potential gives a fairly realistic description of crystalline silicon. However its builtin tetrahedral bias create problems in other situations for example, it cannot predict the right energies of the nontetrahedral structures found under pressure, the coordination of the liquid is too low and the surface structures are not correct. Consequently this potential is not transferable to situations other than it was designed for. Work by Carlsson (1990) showed that to build realistic potential, analytic forms which take into account local environments with bond strengths depending on it should be considered. The work of Biswas and Hamman (1987) suggests that a threebody potential is not adequate for accurately describing the cohesive energy of silicon over a wide range of bonding geometry and coordination. However a general form for a fourbody or fivebody potential becomes intractable as it would contain too many parameters. Instead of a general Nbody form, Tersoff (Tersoff, 1988; 1989) proposed an approach by coupling two body and multi body correlations into the model. The potential is based on bond order which takes into account local environment. The bond strength depends on geometry, the more neighbors an atom has, the weaker the bond to each neighbor would be. If the energy per bond decreases rapidly with increasing coordination then the diatomic molecules would be the most stable arrangements of atoms. On the other hand, if the bond order depends weakly on coordination then this should favor closed packed structures to maximize the number of bonds. Silicon undergoes structural transitions with a varying coordination under pressure with a small difference in cohesive energy (Yin 38 and Cohen, 1980, 1982, 1984; Chang and Cohen, 1984). This suggests that the decrease in bond strength with increase in coordination number almost cancels the increase in number of bonds over a range of coordination (Tersoff, 1988). Tersoff potential (Tersoff, 1989) is given as: ( ) [ ( ) ( )]. , 2 1 ij C ij R ij ij A ij i j ij i i V f r f r b f r E E V = + = = (4.5 a) where E is the total potential energy of the system. fR and fA are repulsive and attractive pair potentials, respectively, and fC is a cutoff function given by: & & ' & & ( ) > < < !" # + < = = = r S R r S S R r R r R f r f r B e f r A e ij ij ij ij C ij r A ij ij r R ij ij ij ij ij ij 0, , ( ) ( ) cos 2 1 2 1 1, ( ) ( ) , ( ) , ( ) ( ) μ , (4.5 b) where rij is the bond distance between atoms i and j. S is the cutoff radius. ij, μij, and R are potential parameters. The main feature of this form is the term bij. The idea is that the strength of each bond depends upon the local environment and is lowered when the number of neighbors is relatively high. This dependence is expressed by the parameter bij which can diminish the attractive force relative to the repulsive force. [ ], ( cos( )) ( ) 1 ( ) ( ), (1 ) , 2 2 2 2 2 , 2 1 i i ijk i i i ijk ijk k i j ij C ik ik n n ij n ij ij i d h c d c g f r g b i i  / . + = + = = + (4.5 c) The term ij defines the effective coordination number of atom taking into account the relative distance between two neighbors (rijrik) and the bond angle ijk. The function g( ) 39 has a minimum at h=cos( ), the parameter d determines how sharp the dependence on the angle is and c expresses the strength of the angular effect. The parameters R and S are not optimized but chosen so as to include first neighbors only for several selected high symmetry structures, such as: graphite, diamond, simple cubic, and face centered cubic. The parameter xij strengthens or weakens heteropolar bonds in multicomponent systems. xii =1, and xij = xji. Also ii = 1. The potential was first calibrated for silicon (Tersoff, 1988)a and then for carbon (Tersoff, 1988)b. The parameters were chosen to fit theoretical and experimental data, such as cohesive energy, lattice constants, and bilk modulus obtained from realistic and hypothetical configurations. Table 4.1 lists the parameters for carbon, silicon and germanium. Table 4.1 (Parameters of Tersoff potential, 1989) C Si Ge A (eV) 1.3936 x 103 1.8308 x 103 1.769 x 103 B (eV) 3.467 x 102 4.7118 x 102 4.1923 x 102 (°A1) 3.4879 2.4799 2.4451 μ (°A1) 2.2119 1.7322 1.7047 1.5724 x 107 1.1 x 106 9.0166 x 107 n 7.2751 x 101 7.8734 x 101 7.5627 x 101 c 3.8049 x 104 1.0039 x 105 1.0643 x 105 d 4.384 1.6217 x 101 1.5652 x 101 h 5.7058 x 101 5.9825 x 101 4.3884 x 101 R (°A) 1.8 2.7 2.8 S (°A) 2.1 3.0 3.1 Brenner (1990) developed an empirical manybody potential for hydrocarbons which can model intramolecular chemical bonding in a variety of small hydrocarbon molecules as well as graphite and diamond lattice. The potential function was based on Tersoff covalent bonding formulation, with additional terms to correct an inherent overbinding of radicals. Nonlocal effects were incorporated via an analytic function that defines conjugation based on the coordination of carbon atoms. The potential was fitted 40 to the binding energy of the C2 diatomic molecule, and binding energies and lattice constants of graphite, diamond, simple cubic and face centered cubic structures. Dyson and Smith (1996) extended the Brenner potential for CSiH system to model the chemical vapor deposition (CVD) diamond growth on the silicon substrate. Parameters for the SiSi and SiC interactions were fitted to diatomic bond energy and bulk properties such as bulk modulus, experimental cohesive energy, experimental lattice constant instead of bond length, phonon mode frequencies. 41 16 17 CHAPTER 5 5 Ab Initio Calculations 'Ab initio' is a term used to describe a solution of the nonrelativistic, time independent Schrödinger equation: = E . Where is the Hamiltonian operator for the system, which is a function of kinetic and potential energies of the particles of the system, is the wavefunction, and E is the energy. The Hamiltonian for an N electron atom is = = = = + = + N i N j i ij N i i e N i i e n n r e r Z m m H 1 1 2 1 2 1 2 2 2 2 2 2 h h . (5.1) The first term on the right hand side of the Eqn.(5.1) represents the kinetic energy of the nucleus. It contains the coordinates of the nucleus and derivatives with respect to the coordinates of the nucleus, but it does not contain the coordinates of any of the N electrons or derivatives with respect to these coordinates. Therefore, this is called as zeroelectron term. The second and third terms in Eqn.(5.1) contains the coordinates of one electron and hence are called as one electron terms, where as the last term, electronelectron repulsion, depends on coordinates of both electrons and hence is called as two electron terms. 42 5.1 BornOppenheimer approximation It is computationally impossible to solve Eqn.(5.1) for anything other than the hydrogen atom. For this reason, for a many bodied system, approximations are introduced to simplify the calculations. One of the approximations is the Born Oppenheimer approximation. It can be noted that the mass term appears in the denominator of the nuclear kinetic energy term is much larger than the mass of the electron. This large mass means that the nuclei are moving very slowly relative to the electrons. The nuclei can be considered to be moving in the potential generated by the moving electrons. This approximation allows the electronic Hamiltonian to be considered for a fixed set of nuclear coordinates. 5.2 Basis sets The next approximation involves expressing the molecular orbitals as linear combinations of a predefined set of oneelectron functions known as basis functions. A basis set is the mathematical description of the orbitals within a system, which in turn combine to approximate the total electronic wavefunction. These basis functions are usually centered on the atomic nuclei and so bear some resemblance to atomic orbitals. An individual molecular orbital is defined as: = = N i i c μ 1 μ / μ . (5.2) The coefficients i cμ are known as the molecular orbital expansion coefficients. The basis functions 1… N are also chosen to be normalized. 43 5.2.1 Slatertype orbitals (STO) STOs are constructed from a radial part describing the radial extend of the orbital and an angular part describing the shape of the orbital. lm C r n 1 e( r ) Y μ = . .(5.3) The radial part r n 1 e( r ) depends on the distance r from the origin of the basis function (usually the location of the nucleus), the orbital exponent, and the principal quantum number, n. The spherical part, Ylm known as spherical harmonics, depends on the angular quantum number, l and the magnetic quantum number, m. The normalization constant, C is chosen such that the integral over the square of the basis function yields unity. 5.2.2 Gaussiantype orbitals (GTO) Gaussiantype orbitals are constructed from a radical and a spherical part, but the radical part has a different dependence on r. The GTO squares r so that the product of gaussian primitives is another gaussian. By doing this, the equation is much easier at the cost of accuracy. To compensate for this, more gaussian equations are combined to get more accurate results. Gaussian and other ab initio electronic structure programs use gaussian type atomic functions as basis functions. Gaussian functions have a general form as: g =c xa yb z c e r 2 , (5.4) where is the constant determining the size (radical extent) of the function. In a Gaussian function e r 2 is multiplied by powers of x, y, z, and normalization constant so that: 0 = all space g 2 1 . (5.5) 44 The sum of these exponents L= a+ b+ c is used to define the angular momentum of the basis function: s type (L=0), p type (L=1) d type (L=2), f type (L=3), g type (L=4)… Following are some representative Gaussian functions for s, py and dxy respectively: ( ) ( ) . 2048 , , 128 , , 2 ( , ) 2 2 2 4 1 3 7 4 1 3 5 4 3 r xy r y r s g r xy e g r y e g r e = = = (5.6) Linear combinations of primitive Gaussians are used to form the actual basis functions called contracted Gaussians which have the form: = p p p / μ dμ g , (5.7) where gp are primitive Gaussians, p dμ are fixed constants within a given basis set. This results in the following expansion for molecular orbitals: = = μ μ μ μ μ / μ p i i i p p c c d g . (5.8) 5.2.3 Types of basis sets Larger basis sets more accurately approximate the orbitals by imposing fewer restrictions on the locations of electrons in space. In the true quantum mechanical picture, electrons can have a finite probability of existing anywhere in space; this limit corresponds to the infinite basis set expansion. Standard basis sets for electronic calculations use linear combinations of Gaussian functions to form orbitals. Basis sets may be classified by the number and type of basis functions that they contain. Basis sets 45 assign a group of basis functions to each atom within a molecule to approximate its orbitals. These basis functions are composed of a linear combination of Gaussian functions referred to as contracted functions, and the component Gaussian functions are referred to as primitives. A basis function consisting of a single Gaussian function is termed as uncontracted. Minimal basis set Single Gaussian functions as described by Eqn.(5.4) are not well suited to describe the spatial extent and nodal characteristics of atomic orbitals. Hence, basis functions are described as sum (contraction) of several Gaussians as used in Eqn.(5.7). Minimal basis set contains minimum number of basis functions needed for each atom. Minimal basis set use fixed size atomic type orbitals. The STO3G basis set (Hehre et al., 1969; Collins et al., 1976) is a minimal basis set (although it is not the smallest possible basis set). It uses three Gaussian primitives per basis function, which accounts for 3G in its name. STO stands for Slatertype orbitals. The STO3G basis set approximates Slatertype orbitals with Gaussian functions. Double zeta, triple zeta and quadruple zeta basis set The minimal basis set approximates all orbitals to be of the same shape. The double zeta basis sets such as the DunningHuzinaga basis set denoted as D95 (Dunning and Huzinaga, 1976) from all molecular orbitals from linear combinations of two sizes of functions for each atomic orbital. Dunning’s basis set has been derived from an already existing large atomic basis set of nine uncontracted Gaussian primitives of the stype and five uncontracted Gaussian primitives of the ptype. Six of the nine stype functions have then been grouped into a single contraction, while the other three stype functions have 46 been left alone. Similarly, four of the five ptype functions have been contracted into a single function, while one function was left uncontracted. This yields a basis set of four stype and two ptype basis functions. In contrast to the split valence basis sets discussed before, the D95 basis set is a full double zeta basis set in that it allocates two basis functions for each atomic orbital of the core as well as the valence region occupied in the electronic ground state. The double zeta basis set is important because it allows to treat each orbital separately during HartreeFock calculations. This gives more accurate representation of each orbital. Each atomic orbital is expressed as a sum of two Slater type orbitals. The two equations are the same except for the value of zeta, which accounts for how diffuse (large) the orbital is. The two STOs are added in some proportion as in Eqn.( 5.9 ). ( 5.9 ) The constant d determines how much each STO will account towards the final orbital. Thus, the size of the atomic orbital can range anywhere between the value of either of the two STOs. The triple and quadruple zeta basis sets use three and four Slater equations respectively. Split valence basis set The description of the valence electrons can be improved over that in the minimal STO 3G basis set if more than one basis function is used per valence electron. Basis sets of this type are called split valence basis set. They have two sizes of basis function for each valence orbital. Often, it takes too much effort to calculate a doublezeta for every Slater orbital 1 Slater orbital 2 Constant ( , ) ( , ) 2 2 r 1 d 2 r 2 STO s STO s = s + 47 321G orbital. Since the innershell electrons are not as vital to the calculation, they are described with a single Slater Orbital and the only valence orbitals are treated using a doublezeta. Examples of split valence basis sets are 321G (Binkley et al., 1980; Gordon et al., 1982; Pietro et al., 1982; Dobbs and Hehre, 1986; 1987) and 631G (Ditchfield et al., 1971; Hehre et al., 1972; Hariharan and Pople, 1973; 1974; Gordon, 1980; Binning and Curtiss, 1990). The notation 3 21G means summing 3 Gaussians for the inner shell orbital, each valence electron is described by two basis functions, two Gaussians for the first STO of the valence orbital and 1 Gaussian for the second STO. Fig 5.1 Description of split valence basis set The main difference between 321G and 631G is that a much larger number of primitives are used in the latter in the core as well as the inner most valence shell. The use of a contraction of six Gaussian primitives for each core orbital improves the description of the core region significantly. The valence region is described by two basis The number of Gaussian functions summed to describe the inner shell orbital The number of Gaussian functions summed in the second STO The number of Gaussian functions that comprise the first STO of the double zeta 48 functions per atomic orbital. The inner shell is composed of a contraction of three Gaussians and the outer shell consists of one single Gaussian primitive. 631G† also known as 631G(d ) and 631G†† also known as 631G(d ,p ) is defined as part of the complete basis set methods (Petersson and AlLaham, 1991 ; Petersson et al. 1988). Similarly, triple split valence basis sets such as 6311G also known as MC311G (McLean and Chandler, 1980; Krishnan et al., 1980; Wachters, 1970; Hay, 1977; Raghavachari and Trucks, 1989; Binning and Curtiss, 1990; Curtiss et al., 1995; McGrath and Radom, 1991) use three sizes of contracted functions for each orbital type. Polarized basis set A better approximation is to account for the fact that sometimes orbitals share qualities of 's' and 'p' orbitals or 'p' and 'd', etc. and not necessarily have characteristics of only one or the other. As atoms are brought close together, their charge distribution causes a polarization effect (the positive charge is drawn to one side while the negative charge is drawn to the other) which distorts the shape of the atomic orbitals. Split valence basis sets allow orbitals to change size, but not to change shape. Polarized basis sets remove this limitation by adding orbitals with angular momentum beyond what is required for the ground state to the description of each atom. The first step consists of the addition of a set of dtype functions to the basis sets of those atoms, which have occupied s and pshells in their electronic ground states. For hydrogen, this corresponds to the addition of a set of ptype functions. Two different notations exist to specify the addition of polarization functions. The first notation adds one asterisk to the basis set to specify addition of polarization functions to nonhydrogen atoms, while two asterisks symbolize 49 the addition of polarization functions to all atoms (including hydrogen). The 631G** basis set (Frisch et al., 1984) is thus constructed from the split valence 631G basis set through addition of one set of dfunctions to all nonhydrogen atoms and one set of pfunctions to all hydrogen atoms. In the second notation the polarization functions are specified through their angular quantum number explicitly. The 631G** basis set would then be termed 631G (d,p). This latter notation is much more flexible as multiple sets of polarization functions can be specified much more easily. The notation 631G (d) means the polarization functions for the 631G basis appear in the basis set listing as a single set of uncontracted dtype Gaussians. There are six Cartesian dtype Gaussians (x2, y2, z2, xy, yz, zx) e( r2 ), which are equivalent to five pure dtype functions (xy, yz, zx, x2y2, 3z2r2) e( r2 ) plus one additional stype function. Diffuse functions In chemistry, one is mainly concerned with the valence electrons which interact with other molecules. However, many of the basis sets concentrate on the main energy located in the inner shell electrons. This is the main area under the wave function curve. In Fig 5.2, this area is that to the left of the dotted line. Normally the tail (the area to the right of the dotted line), is not really a factor in calculations. Fig 5.2 Variation of energy with respect to radius 50 However, when an atom is in an anion or in an excited state, the loosely bond electrons, which are responsible for the energy in the tail of the wave function become much more important. The theoretical description of negatively charged species is particularly challenging for ab initio molecular orbital theory. This is due to the fact that the excess negative charge spreads outward to a much larger degree than is typically the case for uncharged or positively charged molecules. To compensate for this, diffuse functions are used. Diffuse functions are large size versions of s and ptype functions (as opposed to standard valence size functions). They use small orbital exponents. They allow orbitals to occupy a larger region of space and are important for systems where electrons are relatively far from the nucleus, such as molecules with lone pairs, anions and other systems with significant negative charge, systems in their excited states, and systems with low ionization potentials. Diffuse basis functions are typically added as an additional set of uncontracted Gaussian functions of the same angular momentum as the valence electrons. To reflect the addition of diffuse basis functions on all nonhydrogen atoms, a + sign is added to the standard basis set notation. If diffuse stype functions are also added to the basis set of hydrogen atoms, a second + sign is appended. The 631+G(d) basis set (Clark et al., 1983) is the 631G(d) basis set with diffuse functions added to heavy atoms. The basis et 631++G(d) adds diffuse functions to the hydrogen as well. Diffuse functions on hydrogen seldom make a significant difference in accuracy. High angular momentum basis set Such basis sets add multiple polarization functions per atom to the triple zeta basis set. For example, the 631G(2d) basis set adds two d functions per heavy atom instead of just one. Such basis sets are useful for describing the interactions between 51 electrons in electron correlation methods; they are not generally needed for HartreeFock calculations, to be discussed in the next section. Basis sets for post third row atoms basis sets for atoms beyond the third row of the periodic table are handled somewhat differently. For very large nuclei, electrons near the nucleus are treated in an approximate way, using effective core potentials (ECP). This treatment includes relativistic effects. The LANL2DZ basis set is an example of such a basis set. 5.3 HarteeFock method Molecular orbital theory decomposes the wave function into a combination of molecular orbitals 1, 2… i are chosen to be normalized orthogonal set of molecular orbitals. Thus 000 * dx dy dz =1 i i , (5.10) 000 dx dy dz = i j i j * 0; . (5.11) The simplest way of expressing the wave function of a many electron system is to take the product of the spinorbitals of the individual electrons. For the case of two electrons ( ) ( ) ( ) 1 2 1 1 2 2 x x =/ x / x . (5.12) This is known as Hartree product. However, such a function is not antisymmetric, since swapping the orbitals of two electrons does not result in a sign change. i.e. ( ) ( ) 1 2 2 1 x x x x . (5.13) Therefore the Hartree product does not satisfy the Pauli principle. This problem can be overcome by taking a linear combination of both Hartree products: 52 { ( ) ( ) ( ) ( )} 2 1 ( ) 1 2 1 1 2 2 1 2 2 1 x x = / x / x / x / x , ( 5.14) where the coefficient is the normalization factor. This wave function is antisymmetric. Moreover, it also goes to zero if any two wave functions or two electrons are the same. The expression can be generalized by writing it as a determinant called Slater determinant: ( ) ( ) ... ( ) . . ... . . . ... . . . ... . ( ) ( ) ... ( ) ( ) ( ) ... ( ) ! 1 ( , ,..., ) 1 2 1 2 2 2 2 1 1 2 2 1 1 2 N N N N N N N x x x x x x x x x N x x x / / / / / / / / / = ( 5.15) A single Slater determinant is used as an approximation to the electronic wave function in HartreeFock theory. In more accurate theories, such as configuration interaction and multiconfiguration self consistent field (MCSCF) calculation (Hegarty and Robb, 1979; Eade and Robb, 1981; Schlegel and Robb, 1982; Bernardi, et al., 1984; Frisch et al., 1992), a linear combination of Slater determinants is used. Each row is formed by representing all possible assignments of electron i to all orbital spin combinations. Swapping two electrons corresponds to interchanging two rows of the determinant, which will have the effect of changing its sign. The original Hartree method expresses the total wave function of the system as a product of oneelectron orbital. In the HartreeFock method (Hehre et al., 1986), the wave function is an antisymmetrized determinantal product of oneelectron orbital (the "Slater" determinant). Schrödinger’s equation is transformed into a set of HartreeFock equations. Now the problem is to solve for a set of molecular orbital expansion 53 coefficients i cμ in Eqn.(5.8). HartreeFock theory takes advantage of the variational principle, which states that for the ground state of any antisymmetric normalized function of the electronic coordinates denoted as , the expectation value corresponding to will always be greater than the energy for the exact wave function: E( ) > E( ). (5.16) In other words, the energy of the exact wave function serves as a lower bound to the energies calculated by any other normalized antisymmetric function. Thus the problem becomes one of finding the set of coefficients that minimize the energy of the resultant wave function. The variables in the HartreeFock equations depend on themselves. So, they must be solved in an iterative manner. Convergence may be improved by changing the form of the initial guess. Since the equations are solved selfconsistently, HartreeFock is an example of a selfconsistent field (SCF) method. HartreeFock calculation involves the following steps: 1. Begin with a set of approximate orbital for all the electrons in the system. 2. One electron is selected, and the potential in which it moves is calculated by freezing the distribution of all the other electrons and treating their averaged distribution as the Centrosymmetric source of potential 3. The Schrödinger equation is solved for this potential, which gives a new orbital for it. 4. The procedure is repeated for all the other electrons in the system, using the electrons in the frozen orbitals as the source of the potential. 5. At the end of one cycle, there are new orbitals from the original set. 54 6. The process is repeated until there is little or no change in the orbitals. Unrestricted HartreeFock method is capable of treating unpaired electrons for open shell systems. In this case the and electrons are in different orbitals, resulting in two sets of molecular orbital expansion coefficients. = = μ μ . μ . μ μ μ 3 / 3 / . , i i i i c c The two sets of coefficients result in two sets of Fock matrices producing two sets of orbitals. These separate orbitals proper dissociation to separate atoms and correct delocalized orbitals for resonant systems. However, the eigenfunctions are not pure spin states, but contain some amount of spin contamination from higher states, for example, doublets are contaminated to some degree by functions corresponding to quartets and higher states.. It might give an impression that molecular orbitals are real. Except for the hydrogen atom that is not the case. Molecular orbitals are the product of HartreeFock theory, which is an approximation to the Schrödinger equation. The approximation is that each electron feels only the average Coulomb repulsion of all the other electrons. This approximation makes HartreeFock theory much simpler than the real problem, which is an Nbody problem. Unfortunately, in many cases this approximation is rather serious and can give bad answers. It can be corrected by explicitly accounting for the electron correlation by manybody perturbation theory (MBPT), configuration interaction (CI), or density functional theory (DFT). 55 5.4 Electron correlation methods HartreeFock theory provides an inadequate treatment of the correlation between the motions of electrons within a molecular system, especially that arising between electrons of opposite spin. When HartreeFock theory fulfills the requirement that  2 be invariant with respect to the exchange of any two electrons by antisymmetrizing the wave function, it automatically includes the major correlation effects arising from pairs of electrons with the same spin. This correlation is termed as exchange correlation. The motion of electrons of opposite spin remains uncorrelated under HartreeFock theory. Any method which goes beyond SCF in attempting to treat this phenomenon is known as electron correlation method or post SCF method. 5.4.1 Requirements of electron correlation theories Any electron correlation theory should provide a unique total energy for each electronic state at a given geometry and should also provide continuous potential energy surfaces as the geometry changes. The resulting energy should be variational, i.e. it should be an upper bound to the exact energy. The most important criterion for an accurate electron correlation theory is the property of size consistency. This term refers to the linear scaling of energy with the number of electrons in the system. A size consistent method leads to additive energies for infinitely separated systems. Size consistent method is necessary to reach quantitative accuracy. Another important criterion is the correctness for two electron systems. For any molecular system composed of reasonably well defined electron pair bonds, such methods provide excellent starting points for inclusion of additional corrections such as those from three electron correlations. Such three electron correlations although computationally expensive, are crucial to reach quantitative 56 accuracy (Raghavachari and Anderson, 1996). In general, correlation techniques are computationally more expensive than HF methods. Many correlation techniques involve an iterative solution of a set of coupled equations. 5.4.2 Configuration interaction (CI) Among the many schemes introduced to overcome the deficiencies of Hartree Fock method, the most simple and general technique to address the correlation problem is configuration interaction (Shavitt, 1977; Schaefer, 1977; Boys, 1950). Configuration interaction is a straight forward application of the linear variational technique to the calculation of electronic wavefunctions. A linear combination of configurations (Slater determinants) is used to provide a better variational solution to the exact manyelectron wavefunction. The most important type of correlation effect which contributes to chemical bonding is usually termed as leftright correlation. For H2, this refers to the tendency that when one electron is near the first hydrogen, the other electron tends to be near the second hydrogen. This is absent in the HF method where the spatial positions of the two electrons occupying the lowest bonding molecular orbital are uncorrelated. The problem gets worse as the two atoms move apart and dissociate. Qualitatively, this can be corrected by including a second configuration where both electrons occupy the antibonding orbital. While this is unfavorable energetically, a mixture of the HF configuration with this second configuration provides a better description of the system. This is referred to as “configuration interaction”. The second configuration has only a small weight at the equilibrium distance in H2 but its weight increases as the bond 57 distance increases until the configurations have equal weights at dissociation. Such a leftright correlation is included in valencebondtype wave functions. The exact wave function can not be expressed as a single determinant. CI method constructs other determinants by replacing one or more occupied orbitals within the HartreeFock determinant with a virtual orbital. In a single substitution, a virtual orbital a replaces an occupied orbital i within the determinant. This is equivalent to exciting an electron to a higher energy orbital. Similarly in a double substitution, two occupied orbitals are replaced by virtual orbitals and triple substitution would exchange three orbitals and so on. CI wave function mixes the HartreeFock wave function with single, double, triple, quadruple … excited configurations, and the coefficients which determine the amount of mixing are determined variationally. The full CI method forms the wave function as a linear combination of HartreeFock determinant and all possible substituted determinants: > = + 0 0 0 s s s b b , (5.17) where the 0indexed term is the HartreeFock level and s runs over all possible substitutions. The b’s are the set of coefficients to be solved for by minimizing the energy of the resultant wave function. If all possible excited configurations are included, the method gives the exact solution within the space spanned by a given basis set. Full CI is the most complete nonrelativistic treatment of the molecular system possible, with the limitation imposed by the chosen basis set. 58 5.4.3 Limited configuration interaction The full CI method is size consistent and variational. However, it is very computationally expensive and impractical for large systems, since the number of configurations in full CI expansion grows exponentially with the size of the system. Practical CI methods augment the HartreeFock by adding only a limited set of substitutions. CIS method adds single excitation to HartreeFock determinant, while CISD adds singles and doubles. The CISD method (Pople et al., 1977; Krishnan et al., 1980; Raghavachari and Pople, 1981) is an iterative technique where the computational dependence of each iteration scales as the sixth power of the size of the system. A disadvantage of the limited CI methods is that they are not size consistent, i.e. the energy does not scale linearly with the size of the system, and CISD energy is not additive for infinitely separated systems. For example, the CISD energy for two infinitely separated He atoms is different from twice the energy of a single He atom. Quadratic configuration interaction (QCI) technique (Pople et al., 1987) introduces size consistency in CISD theory. The CISD method consists of a set of linear equations in the configuration expansion coefficients (for single and double excitations) which are solved iteratively. In the QCISD method (Gauss and Cremer, 1988; Trucks and Frisch, 1998; Salter et al., 1989), these equations are modified by the introduction of additional terms, quadratic in the expansion coefficients, which make the method size consistent. 5.4.4 MøllerPlesset Perturbation theory (MP) MP theory (Møller and Plesset, 1934) treats the electron correlation as a perturbation on the HartreeFock problem. It adds higher excitations to the HartreeFock 59 theory as a noniterative correction. The wave function and energy are expanded in a power series of the perturbation. The Hamiltonian is expressed in two parts: H = H0 + H , (5.18) where H0 is called the zeroth order Hamiltonian, and H is the perturbation. H0 is chosen such that the Schrödinger equation H0 0 = E0 0 can be solved exactly for the zeroth order eigenfunctions 0 and the corresponding zeroth order energy eigenvalues E0. It is assumed that perturbation is sufficiently small that its presence does not appreciably alter the eigenfunctions of the system. HatreeFock energy is correct to the first order. Hence the perturbation energies start contributing from second order. While performing MP calculations, first HF calculation is performed and then second, third, fourth, and fifth order perturbation corrections are computed. The final ground state energy is given by Eground state = EHF + E(2) + E(3) + E(4) + E(5) + … , (5.19) where EHF is the HartreeFock energy and E(i) are the successive MøllerPlesset perturbation corrections. In perturbation theory, the different correlation contributions emerge through their interaction with the starting HF wave function 0. Since the Hamiltonian contains only one and twoelectron terms, only single and double excitations can contribute via direct coupling to 0 in the lowest orders. However, the selfconsistent optimization of the HF wave function prevents direct mixing between single excitations and 0. Thus, the second and thirdorder energies have contributions only from double excitations. In higher orders, there is indirect coupling via double excitations, and thus the fourth and fifthorder energies have contributions from single, double, triple, and quadruple excitations. 60 A common notation denoted as MPn (Raghavachari and Pople, 1978; Raghavachari et al., 1980; Frisch et al., 1980) is used. Thus, MP2 (HeadGordon et al, 1988; Frisch et al, 1990; HeadGordon, 1994; Trucks et al, 1998; Saebo and Almlof, 1989), MP3 (Pople et al, 1977; 1976), MP4 (Raghavachari and Pople, 1978) denote the total energies correct to second, third, fourth … order respectively. Perturbation theory truncated at any order is size consistent. MP2, MP3, MP4, and MP5 methods scale as the fifth, sixth, seventh and eighth power of the size of the system. If triples are excluded from the MP4 method (Raghavachari and Pople, 1978; Raghavachari et al., 1980; Frisch et al., 1980; Bartlett and Shavitt, 1977; Bartlett and Purvis, 1978; Bartlett et al., 1983; Wilson and Saunders, 1979) the resulting MP4 (SDQ) technique (Raghavachari and Pople, 1978) can be evaluated with sixth order computational dependence. Fifthorder theory (MP5) (Kucharski and Bartlett, 1986; Kucharski et al., 1989; Bartlett et al., 1990; Raghavachari et al., 1990) has been implemented though feasible only for small systems. Analytical derivatives of the potentials energy can be computed for MP2 (Frisch et al, 1990; Pople et al, 1979; Handy and Schaefer, 1984), MP3, and MP4 (SDQ) methods (Trucks et al, 1988). Analytical frequencies can be computed for MP2 method (Head Gordon, 1994; Trucks et al, 1998). MP2 is the most applicable method for electron correlation effects for large systems. It can be implemented without requiring the storage of twoelectron integrals and many other intermediate quantities which makes it possible to study large systems such as C60. 5.5 Density functional theory (DFT) Shortly after the formulation of quantum mechanics in the mid 1920's, Thomas (1926) and Fermi (1928) introduced the idea of expressing the total energy of a system as 61 a functional of the total electron density. Because of the crude treatment of the kinetic energy term, i. e. the absence of molecular orbitals, the accuracy of these early attempts was far from satisfactory. It was not until the 1960's that an exact theoretical framework called density functional theory (DFT) was formulated by Hohenberg and Kohn (1964) and Kohn and Sham (1965) that provided the foundation for accurate calculations. Earlier, motivated by the search for practical electronic structure calculations. In contrast to the HartreeFock picture, which deals with a description of individual electrons interacting with the nuclei and all other electrons in the system, in density functional theory, the total energy is decomposed into three contributions, a kinetic energy, a Coulomb energy due to classical electrostatic interactions among all charged particles in the system, and a term called the exchangecorrelation energy that captures all manybody interactions. In density functional theory (Hohenberg and Kohn, 1964; Kohn and Sham, 1965), correlation energy along with exchange energy is treated as a functional of the three dimensional electron density. DFT partitions the electronic energy as: E = ET +EV + EJ + EXC, (5.20) where ET is the kinetic energy term, EV includes terms describing the potential energy of the nuclearelectron attraction and of the repulsion between pairs of nuclei, EJ is the electronelectron repulsion term and EXC is the exchange correlation term. All terms, except the nuclearnuclear repulsion in Eqn.(5.20) are functions of the electron density. The term EXC accounts for the exchange energy arising from the antisymmetry of the quantum mechanical wavefuncion, and dynamic correlation in the motions of the individual electrons. Hohenbrg and Kohn (1964) demonstrated that EXC is 62 determined entirely by the electron density, i.e. it is a function of electron density. The term EXC is usually approximated as an integral involving only the spin densities and possibly their gradients as given by E f r r r r d r B XC ( ) ( ) (v), (v), (v), (v) 3 v = 0 . , (5.21) and refer to and spin density, and total electron density is ( + ). EXC is separated into exchange and correlation parts corresponding to same spin and mixed spin interactions as given by EXC ( ) = EX( ) + EC( ), (5.22) the terms EX( ) and EC( ) are again functionals of electron density, termed as exchange functional and correlation functional. Local functionals depend on the electron density, while gradient corrected functionals depend on both electron density and its gradient . In local density approximation (LDA) approximation, the exchangecorrelation energy is taken from the known results of the manyelectron interactions in an electron system of constant density (homogeneous electron gas). The LDA assumes that at each point in a molecule or solid there exists a well defined electron density; an electron at such a point experiences the same manybody response by the surrounding electrons as if the density of these surrounding electrons had the same value throughout the entire space as at the point of the reference electron. The exchangecorrelation energy of the total molecule or solid is then the integral over the contributions from each volume element. The contributions are different from each volume element depending on the local electron density. The local exchange functional is defined as 63 0 E X = d r LDA 3 3 v 3 4 1 4 3 2 3 . (5.23) In DFT, the total electron density is decomposed into oneelectron densities, which are constructed from oneelectron wave functions. These oneelectron wave functions are similar to those of HartreeFock theory. For molecular systems, DFT leads to a molecular orbital (MO) picture in analogy to the HartreeFock approach. DFT has been successfully extended to openshell systems and magnetic solids (von Barth and Hedin, 1972; Gunnarsson et al., 1972). In these cases, the local exchangecorrelation energy depends not only on the local electron density, but also on the local spin density (which is the difference between the electron density of spinup electrons and that of spindown electrons). The resulting generalization of LDA is called local spin density approximation (LSDA). Within the local density approximation, binding energies are typically overestimated. Any real system is spatially inhomogeneous, i.e. it has spatially varying density. The accuracy could be improved if the rate of this variation is included in the functional. Gradient expansion approximation (GEA) tries to systematically calculate the gradient corrections to the LDA. In generalized gradient approximations (GGA), instead of power series gradient expansions, generalized functions are used. Inclusion of nonlocal gradient corrections improves the values of binding energies. Becke (1988) formulated the gradient corrected exchange functional based on the LDA exchange functional given as 0 + = d r x x E EX LDA X Becke 3v 1 3 2 4 (1 64 sinh ( )) 4 , (5.24) 64 where x = 4 / 3 ,4 is a parameter chosen to fit known exchange energies of the inert atoms. Becke functional is defined as a correction to the local LDA exchange functional. Self consistent Kohn Sham DFT calculations are performed in an iterative manner analogous to the SCF computation. Becke formulated which include Hartree Fock and DFT exchange along with DFT correlation as; XC DFT DFT X HF HF XC hybrid E =c E + c E . (5.25) For example Becke style three parameter functional denoted as B3LYP (Becke, 1993) is defined as: ( ) ( ) 3 0 3 3 C VWN C C LYP C VWN X X Becke X LDA X HF X LDA XC B LYP E = E + c E E + c E + E +c E E . (5.26) The parameter c0 allows any combination of Hartree Fock and LDA local exchange to be used. The local correlation functional VWN3 is corrected by the LYP correlation correction. Table 5.1 lists various model chemistries definable via ab initio methods and standard basis sets. Each cell in the chart defines a model chemistry. Table 5.1 Effect of various model chemistries and basis sets on accuracy of the solution to Schrödinger’s equation (Foresman and Frisch). 65 The columns correspond to different theoretical methods and the rows to different basis sets. The level of correlation increases from left to right across any row, with HartreeFock method at the extreme left (including no correlation), and the full configuration interaction method at the right (which fully accounts for electron correlation). The rows of the chart correspond to increasingly larger basis sets. The bottom row of the chart represents a completely flexible basis set. The cell in the lower right corner of the chart represents the exact solution of the Schrödinger equation. 18 66 19 CHAPTER 6 6 Neural Networks As the term neural network (NN) implies, this approach is aimed towards modeling real networks of neurons in the brain. It was inspired by a number of features of the brain that would be desirable in artificial systems, such as its robustness and fault tolerance, its flexibility and the ability to deal with fuzzy and noisy information and its highly parallel structure. 6.1 Biological neurons The brain is composed of about 1011 highly connected elements called neurons, each of which is connected to about 104 other neurons. Fig 6.1 shows the schematic of biological neuron. Fig 6.1 Schematic of biological neuron 67 A neuron's dendritic tree is connected to thousands of neighboring neurons. Each neuron receives electrochemical inputs from other neurons at the dendrites. When one of those neurons fire, a positive or negative charge is received by one of the dendrites. The strengths of all the received charges are added together through the processes of spatial and temporal summation. Spatial summation occurs when several weak signals are converted into a single large one while temporal summation converts a rapid series of weak pulses from one source into one large signal. The aggregate input is then passed to the soma (cell body). The soma and the enclosed nucleus do not play a significant role in the processing of incoming and outgoing data. Their primary function is to perform the continuous maintenance required to keep the neuron functional. If the sum of these electrical inputs is sufficiently powerful to activate the neuron, it transmits an electrochemical signal along the axon, and passes this signal to the other neurons whose dendrites are attached at any of the axon terminals. These attached neurons may then fire. The part of the soma that does concern itself with the signal is the axon hillock. If the aggregate input is greater than the axon hillock's threshold value, then the neuron fires, and an output signal is transmitted down the axon. The strength of the output is constant, regardless of whether the input was just above the threshold, or a hundred times as great. The output strength is unaffected by the many divisions in the axon; it reaches each terminal button with the same intensity it had at the axon hillock. This uniformity is critical in an analogue device such as a brain where small errors can be critical, and where error correction is more difficult than in a digital system. It is important to note that a neuron fires only if the total signal received at the cell body exceeds a certain level. The neuron either fires or it does not, there are not different grades of firing. 68 Each terminal button is connected to other neurons across a small gap called a synapse, as shown in Fig 6.2 . The physical and neurochemical characteristics of each synapse determine the strength and polarity of the new input signal. This is where the brain is most flexible. Fig 6.2 Synaptic gap Changing the constitution of various neurotransmitter chemicals can increase or decrease the amount of stimulation that the firing axon imparts on the neighboring dendrite. Altering the neurotransmitters can also change whether the stimulation is excitatory or inhibitory. Many drugs, such as alcohol have dramatic effects on the production or destruction of these critical chemicals. The infamous nerve gas, sarin, can kill because it neutralizes a chemical (acetylcholinesterase) that is normally responsible for the destruction of a neurotransmitter (acetylcholine). This means that once a neuron fires, it keeps on triggering all the neurons in the vicinity and one no longer has control over muscles. So, from a very large number of extremely simple processing units (each performing a weighted sum of its inputs, and then firing a binary signal, if the total input 69 exceeds a certain level) the brain manages to perform extremely complex tasks. This is the model on which artificial neural networks (NN) are based. Thus far, artificial neural networks have not even come close to modeling the complexity of the brain. It is worth noting that even though biological neurons are slow compared to electrical circuits, the brain is able to perform many tasks much faster than any conventional computer because of the massively parallel structure of biological neural networks. While in actual neurons the dendrite receives electrical signals from the axons of other neurons, in an artificial neuron (perceptron) these electrical signals are represented as numerical values. At the synapses between the dendrite and axons, electrical signals are modulated in various amounts. This is also modeled in an artificial neuron by multiplying each input value by a value called the weight. An actual neuron fires an output signal only when the total strength of the input signals exceeds a certain threshold. We model this phenomenon in a ANN by calculating the weighted sum of the inputs to represent the total strength of the input signals, and applying a step function on the sum to determine its output. Fig 6.3 shows a simple model of a neuron. Fig 6.3 Model of a neuron 70 6.2 History of multilayer neural networks The first step toward artificial neural networks came in 1943 when Warren McCulloch and Walter Pitts (1943) wrote a paper “A logical calculus of the ideas immanent in nervous activity” describing how neurons might work that united the studies of neurophysiology and mathematical logic. They modeled a simple neural network with electrical circuits and showed the network in principle, co 



A 

B 

C 

D 

E 

F 

I 

J 

K 

L 

O 

P 

R 

S 

T 

U 

V 

W 


