

small (250x250 max)
medium (500x500 max)
Large
Extra Large
large ( > 500x500)
Full Resolution


Molecular dynamics (MD) Simulations of Chemical Vapor Deposition (CVD) of Carbon Dimer on a Diamond (100) Surface and Application of Neural Networks (NN) for Event Probability Predictions By ABDUL NIZAM ABDUL SAMADH Bachelor of Science University of Madras Chennai  600005 Tamilnadu, India 2001 Submitted to the Faculty of the Graduate College of the Oklahoma State University in partial fulfillment of the requirements for the Degree of MASTER OF SCIENCE July, 2005 ii Molecular Dynamics (MD) Simulations of Chemical Vapor Deposition (CVD) of Carbon Dimer on a Diamond (100) Surface and Application of Neural Networks (NN) for Event Probability Predictions Thesis Approved: Dr. R. Komanduri Thesis Advisor Dr. L. M. Raff Dr. M.T. Hagan Dr. H. B. Lu Dr. A. Gordon Emslie Dean of the Graduate College iii SUMMARY Carbon dimers are found to be an important growth species in the growth of nanocrystalline diamond (NCD) through CVD process. Events, such as chemisorption, reflection, and desorption occur during the deposition of carbon dimers on to the substrate on which the diamond films are to be grown. The probabilities of each of these events have a significant effect on diamond growth. Molecular Dynamics (MD) simulations are widely used to predict the probabilities of such events. Though, MD simulations give agreeable results with experimental values, the calculation of the effect of different input parameters on various events involve time consuming numerical methods and hence the process is cumbersome. In this study, initially MD simulations of carbon dimer deposition on diamond (100) surface were performed using a many body empirical potential and the probabilities of the aforesaid events were calculated by varying the input conditions. This information was used to implement Neural Networks (NN) to predict the probabilities of the events. The neural network was also used to predict the underlying relationship between various input parameters and event probabilities. The computational time for the prediction of the events using molecular dynamics is generally several days while implementation of neural networks reduces it to mere minutes. The functional relationship between various input parameters and event probabilities predicted by NN is found to agree well with the MD simulation results. iv ACKNOWLEDGMENTS Though myriad events happen in one’s life span, only a few remain precious and evergreen. Writing a thesis is one such event. I take immense pleasure in expressing my sincere thanks to all those who made this work a successful one. First, I would like to convey my hearty thanks to my parents and sisters for all their love, support, encouragement and prayers that made me to reach new pinnacles in every walk of my life. I would like to express my sincere thanks to the National Science Foundation (NSF) for lending their support to this project. I wish to express my deepest gratitude to my advisor, Dr. Ranga Komanduri, for believing in my abilities and offering me a chance to work on this project. His passion for research and exploring new areas will forever be remembered. Unfortunately, I have no amount of words to convey my appreciation to him for his guidance, patience and support throughout my graduate career. I am beholden to Dr. Lionel M. Raff, Dr. Paras M. Agrawal and Dr. Martin T. Hagan, for all their efforts and precious suggestions that made me to overcome many barriers during this project. Their ingenuity and ability to explain even the most intricate things in a simple manner have always fascinated me. v I am also grateful to Dr. Hong Bing Lu for being kind enough to serve on my thesis committee. I would like to thank Dr.Satish Bukkapatnam for his suggestions and ideas during the MD group meetings. To my friend, Bala subramanian, thank you for all the pains and efforts you have taken in making me achieve my goal. Your ability in problem solving and passion for learning new things have always mesmerized me. I would also like to thank my friends Ramya, Siva and Freeman for their encouragement, time and help. Thanks to Rutu, Milind, Krishnaveni and other members of our MD group, for their useful suggestions and discussions. Finally I would like to thank all my colleagues and friends who made my graduate career program a pleasant and memorable one. vi TABLE OF CONTENTS Chapter Page 1. INTRODUCTION 1 1.1. Chemical Vapor Deposition 1 1.2. Molecular Dynamics Simulations 3 1.3. Neural Networks 4 2. ESSENTIALS OF MOLECULAR DYNAMICS SIMULATIONS 8 2.1. Introduction 8 2.2. Procedure for conducting MD Simulation 8 2.3. Interatomic Potentials 9 2.4. Time Integration Algorithm 16 2.5. Advantages and Limitations of MD Simulations 18 3. ESSENTIAL OF NEURAL NETWORKS 21 3.1. Introduction 21 3.2. Evolution of neural networks: Milestones and Developments 22 3.3. Neural network components and parameters 23 3.3.1. Neuron 23 3.3.2. Weights and Biases 24 3.3.3. Transfer functions 25 3.4. Neural Network Classification 27 vii 3.4.1. Single layer neural network 27 3.4.2. Multilayer network 27 3.5. Neural network learning and testing 28 3.6. Feed forward neural networks 29 3.6.1. Architecture of the feed forward neural network 29 3.6.2. Working of the feed forward network 30 3.7. Learning Algorithms 31 3.7.1. LeastMeanSquared (LMS) rule 31 3.7.2. Back propagation algorithm 33 3.7.3. LevenbergMarquardt algorithm 37 4. LITERATURE REVIEW 39 5. PROBLEM STATEMENT 54 6. MOLECULAR DYNAMICS SIMULATION OF CARBON DIMER (C2) DEPOSITION ON DIAMOND (100) SURFACE 56 6.1. Computational Model 56 6.2. Parameters of Interest 59 6.2.1. Incident polar angle 59 6.2.2. Rotation angle 61 6.2.3. Impact parameter 61 6.2.4. Translational energy of the carbon dimer 62 6.2.5. Rotational energy of the carbon dimer 63 6.3. Predominant events in CVD dimer deposition 64 6.3.1. Chemisorption 65 6.3.2. Scattering 67 viii 6.3.3. Desorption 68 7. NEURAL NETWORKS (NN) FOR EVENT PROBABILITY PREDICTION 70 7.1. Architecture and working of the neural network 70 7.2. Implementation of the neural network 71 8. PREDICTION OF EVENT PROBABILITIES NEURAL NETWORK Vs MD SIMULATIONS 73 8.1. Data points generation for neural networks 73 8.2. Training and testing of the event probability Neural network 74 8.3. Effect of input parameters on event probabilities: Neural Network Vs MD predictions 78 8.3.1. Effect of Incidence angle 78 8.3.2. Effect of Rotation angle 80 8.3.3. Effect of Impact Parameter 82 8.3.4. Effect of Translational energy of the dimer 83 8.3.5. Effect of Rotational energy of the dimer 85 8.4. Statistical uncertainty : Neural network Vs MD 86 8.5. More results from neural networks 88 9. CONCLUSIONS AND FUTURE INVESTIGATIONS 94 9.1. Conclusions 94 9.2. Future Work 95 REFERENCES ix LIST OF TABLES Table Page 2.1. Parameters for carboncarbon pair terms 12 2.2. Parameters for the angular contribution to the carbon bond order 13 2.3. Parameters needed for carboncarbon cubic spline 15 2.4. LJ potential parameter values 16 x LIST OF FIGURES Figure Page 3.1. Single input neuron 24 3.2. Tan sigmoid transfer function 26 3.3. Purelin transfer function 26 3.4. Single layer neural network 27 3.5. Multilayer neural network 28 3.6 . Feed forward neural network 29 4.1. Six types of chemisorption configurations on diamond (001)  (2 x 1) surface. 45 6.1. Simulation model and carbon dimer (C2) 58 6.2. Top three layers of atoms of diamond (100) substrate and radical site 58 6.3. Sketch showing input variables for the trajectory calculations 60 6.4. Distribution of incidence angle 60 6.5. Distribution of rotation angle 61 6.6. Distribution of impact parameter 62 6.7. Distribution of translational energy of the dimer 63 6.8. Distribution of rotational energy of the dimer 64 6.9. Variation of systems potential energy, Z coordinate of COM of the xi dimer, distance between carbon atoms of the dimer (chemisorption event) 66 6.10. Variation of systems potential energy and Z coordinate of COM of the dimer (scattering event) 67 6.11. Variation of systems potential energy and Z coordinate of COM of the dimer (desorption event) 69 8.1. Neural network training and testing plots for the probability of chemisorption for one neural network 75 8.2. Neural network training and testing plots for the probability of scattering for one neural network 76 8.3. Neural network training and testing plots for the probability of desorption for one neural network 77 8.4. Effect of incidence angle on chemisorption, scattering and desorption probabilities – MD and Neural network predictions. The error bars represent one sigma limit of statistical uncertainty in the MD results. 79 8.5. Effect of rotation angle on chemisorption, scattering and desorption probabilities – MD and Neural network predictions. The error bars represent one sigma limit of statistical uncertainty in the MD results. 81 8.6. Effect of impact parameter on chemisorption, scattering, and desorption probabilities – MD and Neural network predictions. The error bars represent one sigma limit of statistical uncertainty xii in the MD results. 82 8.7. Effect of translational energy on chemisorption, scattering and desorption probabilities – MD and Neural network predictions. The error bars represent one sigma limit of statistical uncertainty in the MD results. 84 8.8. Effect of rotational energy on chemisorption, scattering and desorption probabilities – MD and Neural network predictions. The error bars represent one sigma limit of statistical uncertainty in the MD results. 85 8.9. Comparison of statistical uncertainty in neural network and MD 87 8.10. Effect of impact parameter on event probabilities for various translational energies of the dimer Neural network predictions 89 8.11. Effect of translational energy on event probabilities for various Incidence angle of the dimer Neural network predictions 90 8.12. Effect of rotation angle on event probabilities for various translational energies of the dimer Neural network predictions 91 8.13. Effect of incidence angle on event probabilities for various Impact parameters  Neural network predictions 92 8.14. Event of rotational energy on event probabilities for various Impact parameters  Neural network predictions 93 1 CHAPTER 1 INTRODUCTION 1.1. Chemical Vapor Deposition Diamond is the hardest material known. Its unique mechanical, chemical, and electrical properties make it not only one of the most scientifically and technologically valuable material but also one of the most fascinating material known to researchers. It was in the mid1950s that diamond was first successfully synthesized on a commercial scale using high pressure, high temperature (HPHT) techniques [1]. Efforts were made about the same time to grow diamond directly from gases, but since the growth rates were extremely low, the vaporphase deposition of diamond was not assigned much importance. In the early 1980’s, when it was shown that growth rates in the range of a few μm/hour can be obtained using vapor phase deposition [25], this technique became an area of general attraction and exploration among researchers. Chemical Vapor Deposition (CVD) is a vapor phase deposition technique in which the gaseous reactants undergo chemical reaction in an activated environment, such as plasma, leading to the formation of stable solid product, such as diamond powders or thin diamond films on the surface of the heated 2 substrate. In a CVD diamond film growth, some of the widely used gaseous species include CH3, C2H2, C2H4, and C2. The CVD process has a wide range of advantages over other thinfilm deposition processes. A few such advantages are the ability to produce highly dense and uniform films with good reproducibility and adhesion, reasonable deposition rates and processing cost, ability to control surface morphology and orientation of the films obtained by controlling the CVD parameters, ability to adjust the deposition rates easily and flexibility of using a wide range of chemical precursors, such as halides, hydrides which enable the deposition of wide variety of films apart from the diamond films [6]. The diamond produced by the CVD process is comparable in purity and properties to HPHT or natural diamond that makes it a potential candidate for numerous applications. The CVD produced diamond films are used as coatings on cutting tool inserts to enhance the tool life, protective windows or optical coatings with high transmittance in the visible and infrared region, as shadow mask supports in xray lithography of electronic components [7], etc. The process of diamond film growth by CVD process involves a number of complex reaction mechanisms taking place between the surface atoms of the substrate and the gaseous species. There are a few elementary reactions, such as chemisorption, insertion, scattering, and desorption that serve as the building blocks for the complex reactions leading to thindiamond film growth. Hence, the investigations of these preliminary reaction events, their probabilities, various parameters 3 affecting the probabilities of their occurrence have gained significant importance [811]. Molecular dynamics simulations is one of the most powerful and widely used tools for investigating such reaction mechanisms and reaction events occurring during diamond film growth in a CVD process. 1.2. Molecular Dynamics Simulations Molecular dynamics (MD) simulation, as the term indicates, deals with simulating the behavior of a system at the atomistic level under given processing conditions. In MD simulations, the system is represented as an ensemble of atoms. In such simulations, we make use of a potential energy function which gives the potential energy experienced by each atom due to its position relative to that of its neighbors. From this potential energy, we can determine the force experienced by each atom as time progresses. Molecular dynamics simulation is a deterministic approach, where, once the current positions of the atoms as well as the forces acting on the atoms due to their neighbors are known then the positions of the atoms after a very small time increment, (usually in the orders of femtosecond) can be easily evolved by integrating the Newtonian equations of motion using suitable time integration algorithms. In spite of its high computational cost, today MD simulations serve as a powerful tool in the study of nanometric cutting [7071], different types of fracture mechanism [72], filmgrowth mechanisms [7374], friction and surface property studies [75], and biomaterial engineering [76]. Carbon dimers are found to be an important reaction species in the growth of nanocrystalline diamond films using CVD process [2730]. Therefore the 4 investigation of various reaction channels and mechanism that occur when carbon dimer is used as growth species is always of immense interest among researchers [8]. In this study, MD simulations have been used to study the probabilities of various elementary gas phase reactions, such as chemisorption, desorption, insertion and scattering occurring during diamondfilm growth in a CVD process when carbon dimmers (C2) are used as the growth species. The results from the MD simulations are used to train a neural network (NN) and then that neural network is used to predict the probabilities of various events. 1.3. Neural Networks A neural network (NN) is an artificial network of neurons that mimics or emulates the real network of neurons present in the human brain. Though the neural networks are not as sophisticated as the networks present in the human brain, they have the capability to predict many complex underlying functions between variables of any particular process or event. A neural network basically consists of a number of artificial neurons or nodes, typically arranged in layers, interconnected through a set of links. Each link multiplies its input by a suitable parameter called the weight before supplying it to the next neuron. Each neuron sums over its input and passes the output, which is a weighted sum of the input and the bias, to a suitable transfer function. The output from this transfer function is the final output. This output can be made the inputs to the next layer of neurons. The network used in this study is a multilayered feedforward network, which has two layers, of which the first layer 5 is called a hidden layer because its input and output are not available to the outside world. The hidden layer is associated with a tan sigmoid transfer function that gives the network the ability to learn the linear and nonlinear relations that exist between the input and outputs. It also makes them an ideal choice for generalization and event probability predictions. We have two stages in the implementation of a neural network. The first stage is the learning stage and the second stage is the testing or the production stage. During the learning stage, the neural network is provided with a set of input data as well as the corresponding output data and the network is made to learn by examples. The difference between the neural network output and the actual target output is used to determine the error which is used in strengthening the network so that its subsequent predictions are better. Once the error from neural network prediction has been sufficiently reduced, the network is assumed to have completed its learning phase. The network is then subjected to the next stage, namely, the testing stage. In this, the neural network is provided with a set of input data. Corresponding output data are not given to the network. The network is allowed to make its choice or prediction based on its previous experience in encountering such data during the training. If the network is able to predict the corresponding output data correctly, then it indicates that the network has been properly trained and has attained the ability. In this study we have used neural networks to predict the probabilities of various events occurring in a CVD process when a carbon dimer (C2) is used as the species for diamond film growth. The input parameters, namely, incidence 6 angle ( ), rotation angle ( ), impact parameter (b), translational energy (ETrans), and rotational energy (ERot) form the input to the neural network. The probabilities of events, such as chemisorption, scattering, and desorption form the output of the neural network. The network is first trained by supplying the input parameters and the corresponding event probabilities. After the neural network has been trained well, the network is made to predict the event probabilities for a given set of input parameters. The conventional approach adopted so far by chemists is to use MD simulations for predicting the event probabilitities for a given set of input conditions [3233]. But, the drawback of MD simulations is that it involves computationally intensive and time consuming numerical integration algorithms. So, the exploration of the entire range of input parameters becomes a very difficult task, and also the time taken increases with the number of atoms comprising the system. It will be shown in this investigation that using neural networks the computational time for determining the effect of various input parameters on event probabilities can been reduced from hours to mere minutes. Also, this procedure is independent of the number of atoms comprising the system, thereby giving us an opportunity to explore a wide range of input data values. In this investigation, we initially ran MD trajectories for different sets of input conditions and determined the event probabilities which were then used for training the neural network. Chapter 2 will cover the empirical potentials used, the time integration algorithms, and the advantages and limitations of MD 7 simulations. Chapter 3 deals with the basic components of neural networks, their classifications, and stages in the implementation of neural network, different training algorithms used. Chapter 4 reviews the literature on various kinds of species employed in a CVD diamond film growth, MD simulations performed on various surfaces, such as diamond and silicon surfaces, time involved in such studies, and how neural networks have been effectively used in a CVD thin film growth processes. Chapter 5 discusses the drawbacks of using MD simulation for event probability predictions and the solution to overcome the present situation. Chapter 6 deals with the distributions for the five input parameters, and the event probabilities considered in this investigation. Chapter 7 presents the architecture of the network employed in this investigation and the implementation of the neural network. Chapter 8 deals with the neural network predictions of various event probabilities for different sets of input conditions and the comparison of neural network prediction with MD simulation results. Also this chapter discusses the statistical error involved in MD simulations as against the error in neural network predictions. Chapter 9 presents the conclusions based on the results of this investigation and proposes future investigations that can be carried over using the current neural network technique that has been implemented in this investigation. 8 CHAPTER 2 ESSENTIALS OF MOLECULAR DYNAMICS (MD) SIMULATIONS 2.1. Introduction In molecular dynamics (MD) simulations, the entire workpiece is represented as an ensemble of atoms. Given the initial positions and forces acting on the atoms, the subsequent positions and forces on the atoms can be evolved over time by integrating Newton’s equations of motion using a suitable time integration algorithm [12]. This chapter mainly focuses on the basic procedures for carrying out the molecular dynamics simulation, the advantages and disadvantages of MD, the interaction potential, and the time integration algorithm used in this study. 2.2. General Procedure for conducting MD simulations In molecular dynamics simulations, all the atoms in the system are considered as point masses. The initial position of the atoms are selected based on the structure of the system under consideration. The detailed procedure for carrying out MD simulation is as follows: 1. Based on the initial position of each atom with respect to its neighboring atoms, the potential energy (V ) experienced by the system at an initial time, say t0, is determined using an empirical potential. 9 2. The forces ( F ) acting on an atom are then determined by taking the derivative of the potential (V ) with respect to the position ( r ) of the atom, dr dV F = −ÑV = − . (2.1) 3. Newton’s second law can be mathematically expressed as follows F = ma , (2.2) where ‘m ’ is the mass of the atom and ‘ a ’ is its acceleration. The acceleration of the atom can be determined from Eqns. (2.1) and (2.2). 4. Once the acceleration of the atom at time ‘t0’ is known, the new velocity ( ) new v and the new position ( ) new r of the atom, after an infinitesimal time period of d t, can be calculated using the following equations new initial v = adt + v , (2.3) d new initial new r = r + v t , (2.4) where initial v represents the initial velocity and initial r the initial position of the atom. The infinitesimal time period used in MD simulations is usually on the order of a few femto seconds. 5. Finally, the atoms are displaced to their corresponding new positions calculated above and again the steps stating from 1 through 5 are repeated to monitor the evolution of the system with time under the given operating conditions. 2.3. Interatomic Potentials Interaction potentials form the main ingredient of MD simulations. A potential is a function of relative positions of atoms with respect to each other, 10 representing the potential energy of the system for a given configuration of the atoms comprising the system. The interatomic potential functions are both rotationally and translationally invariant. These functions are usually derived empirically and hence, are known as empirical potentials. A number of potentials exist today, some to mention are the Morse potential, StillingerWeber potential, Tersoff potential, Lennard Jones potential, and Brenner Potential [1314]. The potential used in this study for short range interaction is a manybody potential given by Brenner et al. [13]. The potential, V can be written as a sum over atomic sites i, = i i V E 2 1 , (2.5) where each contribution of i E is given by i E = ¹ − ( ) [ ( ) ( )] j i R ij ij A ij V r B V r . (2.6) In Eqn. (2.6), the summation is over the nearest neighbors j of atom i, excluding atom i, ij B is the many body coupling term between the bond from atom i to atom j and the local environment of atom i, V (r ) R and V (r ) A represents the pairadditive repulsive and attractive interactions. Eqn. (2.5) given by AbellTersoff [14] can realistically describe carboncarbon single, double, and triple bond lengths and energies in hydrocarbons and in solid graphite and diamond. However, the problem with this expression is that the assumption of nearneighbor interactions combined with the sum over atomic sites results in nonphysical behavior in the case of intermediate bonding situations. Nonphysical behavior arises again when conjugated and nonconjugated bonds are examined. 11 The aforesaid problems have been overcome by rewriting Eqns. (2.5) and (2.6) as follows > = − ( ) [ ( ) ( )] j i ij A ij ij R i V V r b V r . (2.7) Eqn. (2.7) represents the potential given by Brenner et al. [13] and is a modified form of AbellTersoff [14] potential function. Here, V represents the interaction potential, ( ) ij R V r and ( ) ij A V r are pairadditive interactions that represent all interatomic repulsions (corecore) and attraction, ij r is the distance between pairs of nearestneighbor atoms i and j, and ij b is the bond order between atoms i and j and is conveniently represented as follows s p s p p ij ij ji ij b = b + b + b [ − − ] 2 1 . (2.8) Values of s −p ij b and s −p ji b depend on the local coordination and bond angles for atom i and j. The term p ij b can be expressed as DH ij RC ij ij b = p + b p . (2.9) The terms indicating the interatomic repulsions and attractions are given by R c r V r f r Q r Ae −a ( ) = ( )(1 + / ) , (2.10) r n n V A r f c r B e n − b = = 1 , 3 ( ) ( ) . (2.11) 12 Table 2.1. Parameters for carboncarbon pair terms used in Eqns. (2.10) and (2.22). B1 = 12388.79197798 eV 1 = 4.7204523127 Å−1 Q = 0.3134602960833 Å B2 = 17.56740646509 eV 2 = 1.4332132499 Å−1 A = 10953.54416217 eV B3 = 30.71493208065 eV 3 = 1.3826912506 Å−1 A = 10953.544162170 eV Dmin = 1.7 Dmax = 2.0 The parameters used for the carboncarbon pair terms in Eqns. (2.10), (2.11), and (2.22) are given in Table 2.1. The first term in Eqn. (2.8) is given by ¹ − = + + − ( , ) [1 ( ) (cos( )) ( , )] 1 / 2 k i j H i C ik ijk ij i c ij ik b f r G e P N N ijk s p l q . (2.12) The subscripts i and j refer to the atom identity, the function P represents a bicubic spline. The function f (r) C ensures that the only nearest neighbors are included in the interactions. It limits the range of covalent interactions. C i N and H i N represent the number of carbon and hydrogen atom neighbors of atom i and are represented as: ¹ = carbon atoms ( , ) ( ) k i j ik c ik C i N f r , (2.13) ¹ = Hydrogen atoms ( , ) ( ) l i j il c il H i N f r . (2.14) The values of l and function P are taken to be zero for solid state carbon. An expression for s −p ji b can be obtained by interchanging the subscripts in Eqn. (2.12). The function (cos( )) ijk G q in Eqn. (2.12) controls the contribution each 13 nearest neighbor makes to the empirical bond order according to the cosine of the angle of the bonds between atoms i and k and atoms i and j. The parameters for the angular contribution to the carbon bond order are given in Table 2.2. Table 2.2. Parameters for the angular contribution to the carbon bond order q (rad) G(cos(q)) dG/d(cos(q)) d2G/d(cos(q))2 g(q) 0 8   1 p/3 2.0014   0.416335 p/2 0.37545   0.271856 0.6082p 0.09733 0.4 1.98  2p/3 0.05280 0.17 0.37  p 0.001 0.104 0.00  The term RC ij p in Eqn. (2.9) represents the influence of radical energetics and p bond conjugation on the bond energies. This term takes care of correctly describing the radical structures in diamond and accounts for nonlocal conjugation effects in graphite and benzene. This term was absent in the first generation form of the Brenner Potential [15]. The term RC ij p is taken as a tricubic spline F ( , , conj ) ij t j t ij i RC ij p = F N N N , (2.15) that depends on the total number of neighbors of bonded atoms i and j, as well as a function conj ij N that depends on local conjugation. The term t i N represents the coordination of atom i given by 14 H i C i t Ni = N + N . (2.16) The function conj ij N is represented as ¹ ¹ = + + carbon carbon ( , ) 2 ( , ) 1 [ ( ) ( )] 2 [ ( ) ( )] l i j jl jl c jl k i j ik ik c ik conj ij N f r F X f r F X , (2.17) where ( ) = 1, < 2 ik ik F x x ( ) = [1+ cos(2 ( − 2))] / 2, 2 < < 3 ik ik ik F x p x x (2.18) ik ik F ( x ) = 0, 3 < x and ( ) ik c ik t ik k x = N − f r . (2.19) The value of conj ij N becomes 1 if all the neighbors bonded to a pair of carbon atoms i and j have four or more neighbors and the bond between these atoms is considered to be part of a conjugated system. conj ij N becomes greater than 1, if the coordination number of the neighboring atoms decrease, indicating a conjugated bonding configuration. The term DH ij b in Eqn. 2.9 is given by ¹ ¹ = − Q ( , ) ( , ) ( , , )[ (1 cos 2 ( )) ( ) ( )] k i j l i j jl c ik jl c ijkl ik conj ij t j t ij i DH ij b T N N N f r f r , (2.20) where ijkl jik ijl Q = e e . (2.21) 15 The function ( , , conj ) ij t j t ij i T N N N is a tricubic spline, and the functions jik e and ijl e are unit vectors in the direction of the cross products ji R x ik R and ij R x jl R , respectively, where the R ’s are vectors connecting the subscripted atoms. Table 2.3. gives the values needed for the carboncarbon cubic spline T in Eqn. 2.20. All function values and derivatives not given in the table are equal to zero. Table 2.3. Parameters needed for carboncarbon cubic spline T in Eqn. (2.20). i j k T( i, j, k) Fitting data/Structure 2 2 1 0.070280085 Ethene 2 2 9 0.00809675 Solidstate structure The entire parameterfitting method discussed above was made much easier by assuming only nearestneighbor interactions. However, the best way to define this for a continuous function is problematic. The value of f (r) c ij is defined by a switching function of the form min ( ) 1 , r r D ij c f ij = < [ min max min ] min max 1 cos(( ) /( )) , 2 1 ( ) r r Dij Dij Dij Dij r Dij c fij = + − − < < max ( ) 0 , r r D ij c f ij = > (2.22) where max min ij ij D − D defines the distance over which the function goes from one to zero. 16 The Van Der Waal (VDW) interaction between the atoms comprising the system is taken care of by the LJ (126) potential. The LJ (126) potential has the following form − = Î 6 12 ( ) 4 r r V r s s . The well depth Î and the equilibrium separation r are the only adjustable parameters in the LJ (126) potential. In this investigation, the values of the well depth and equilibrium separation for the VDW interaction between two atoms are given in Table 2.4 Table 2.4. LJ potential parameter values Atoms Type Well depth (meV) Equilibrium separation (Å) Carbon  Carbon 4.412 2.28 Carbon  Hydrogen 1.806 2.54 Hydrogen  Hydrogen 0.740 2.81 2.4. Time Integration Algorithm In MD simulations we integrate the equations of motion over a given time period using numerical integration techniques. These time integration algorithms are based on finite difference methods. Here, the total time is discretized into a finite number of equal time intervals or time steps given byDt , which is 0.5 fs in the present investigation. If the positions and their time derivatives at time t are known, the integration algorithm gives the same quantities at a later time, t+Dt . The integration algorithms can show the evolution of the system with time by 17 repeating the procedure mentioned above. Currently, there exist a number of integration algorithms, such as the Verlet algorithm [16], Beeman algorithm [17]. The algorithm used in the present investigation for integrating the equations of motion is the “Gear PredictorCorrector” algorithm [18]. The algorithm consist of two parts, namely, the predictor part and the corrector part. Predictor Part: If we know the position r, velocity v, acceleration a, and some other time derivatives up to a certain degree q at a given time t, the Taylor expansion can be used to predict the values of these quantities at time t+Dt . The newly predicted values of the position, velocity, acceleration and the rate of change of acceleration after a time interval Dt is given by: rp(t+dt) = r(t) + dt v(t) + (1/2) (dt)2 a(t) + (1/6)(dt)3b(t)+… , (2.23) vp(t+dt) = v(t) + dt a(t) + (1/2) (dt)2 b(t) + … , (2.24) ap(t+dt) = a(t) + dt b(t) + … , (2.25) bp(t+dt) = b(t) + … , (2.26) where rp, vp, ap, and bp represents the position, velocity, acceleration, and rate of change of acceleration after a time interval of Dt from the initial time interval t. Force Calculation: The force on the atom is calculated by taking the derivative of the potential with respect to the position of the atom and is given by p p p i dr dV r F V ( ) = −Ñ = − . (2.27) 18 From the force, we can calculate the acceleration using Eqn. (2.2). The resulting value of acceleration will be different from that predicted by the above Taylor’s expression. The difference between these two values constitutes an “error signal” given by Da (t+dt)=ac(t+dt)  ap(t+dt) . (2.28) Corrector Part: The “error signal” along with certain other coefficients is used to correct the values of the position, velocity, and acceleration predicted by the predictor method. The corrected values of the position, velocity, acceleration, and the rate of change of acceleration at time t+Dt are given by rc(t+dt) = rp(t+dt) + c0 Da (t+dt) , (2.29) vc(t+dt) = vp(t+dt) + c1 Da (t+dt) , (2.30) ac(t+dt) = ap(t+dt) + c2 Da (t+dt) , (2.31) bc(t+dt) = bp(t+dt) + c3 Da (t+dt) , (2.32) where c0, c1, c2, and c3 are the coefficients of proportionality given by [16]: c0 = 1/6, c1 = 5/6, c2 = 1, and c3 = 1/3. 2.5. Advantages and Limitations of MD simulations Molecular dynamics simulation is one of the powerful tool widely used in the study of very complex reaction mechanisms. Some of the strengths of MD simulation that has made it to take a leadingedge method over other techniques are the following [12]: 1. Molecular dynamics simulations offer a great opportunity to explore the behavior of systems at atomistic and molecular levels. 19 2. MD simulations can be of immense use in simulating experiments that are very costly and difficult to carry in real world. For example, when MD is applied to experiments, such as nanometric machining, the effect of various parameters such as tool geometry and cutting speed can be easily studied at an insignificant fraction of the cost. 3. When simulations are running, the human involvement required is almost not there unlike the real world experiments where utmost care has to be taken when the experiments are in progress. 4. A major advantage of MD simulation is its repeatability. Any particular simulation can be exactly repeated any number of times with the same degree of accuracy. 5. MD simulation is a very deterministic technique, providing complete information, such as potential energy, velocity and force experienced by each and every atom comprising the system at any point of time which can be easily and accurately evolved. Though MD simulation has numerous advantages as stated above, it also has its own limitations, a few of these limitations are given in the following: 1. In MD simulations, results are purely dependent on the forces acting between atoms based on their positions. These forces are obtained by taking the derivative of the empirical potential function. Therefore, the extent to which molecular dynamics simulation can imitate real experiments depends on the ability of the potential function to reproduce the real behavior of the system. 20 2. MD simulations involve integration of Newton’s equations to obtain new velocities and positions of the atoms. The integration involves the use of numerical algorithms that require a very small integration timestep to give accurate results. Therefore, MD simulations are not preferred to simulate processes or reactions that take very long time periods. 3. The computational time and costs involved in MD simulations are significant because Newton’s equations of motion are to be integrated for every atom comprising the system and for each time step. The time and costs increase rapidly with increase in the size of the system considered. 4. When the temperature of the system considered is very low, quantum effects become significant. In such cases MD simulation results have to be interpreted with utmost caution due to the possibility of errors in the potential used. 21 CHAPTER 3 ESSENTIALS OF NEURAL NETWORKS (NN) 3.1. Introduction A neural network (NN) can be considered as a computational system made up of a number of simple and highly interconnected processing elements called nodes, which processes information by its active state of response to external inputs [69]. The structure and working of the human brain serves as the basic inspiration for the invention and development of neural networks[69]. Neural network attempts to emulate the adaptability, intelligent decision making and information processing ability of the brain. The greatest strength of neural networks is adaptive learning. It has the capability to learn, generalize, and reproduce from experience and examples. First, the neural network is trained using a number of examples and then the network is tested to see whether it can interpret new data based on previous experience. Neural networks offer a wide range of advantages, such as adaptive learning, selforganization, fault tolerance, and easy implementation that allow them to take a lead over other approaches. The neural network architecture, terminologies, training algorithms, and methodologies followed in this investigation were adopted from the book Neural Network Design Hagan et al. [69]. The book presents the most useful and 22 practical NN architectures, learning rules and training algorithms in a clear and consistent manner. Various topics of practical importance in the application of neural networks and neural network operations have been well explained in this book [69]. 3.2. Evolution of neural networks: milestones and development Neural networks is a field of recent origin. However, this field has a long history tracing back to periods before the invention of computers and has survived at least one major setback and several decades of oblivion. McCulloch and Pitts [59] used formal logic to create neural network models with simple neurons that were considered as binary devices with fixed thresholds. Their network was used mainly for simple logic functions such as “OR” and “AND”. Farley and Clark [60] created the first computer simulations of neuronal models and used normalization procedures to ensure better operation of their simulation models. Rosenblatt [61], a psychologist, designed and developed a threelayered system known as Perceptron network that exhibited adaptive behavior. Though Rosenbaltt’s design was considered a milestone in the field of neural network, it had some limitations such as inefficiency in solving pattern recognition problems and inability to handle large inputs. Widrow and Hoff [62] developed the ADALINE (ADAptive LINear Element) and MADALINE (Many ADALINEs) networks that employed a learning procedure called LeastMeanSquared (LMS) learning rule. The network operates by attempting to minimize the difference between the observed and desired output. Amari [63] published a mathematical model that served as the basis for error 23 correction methods employed in adaptive pattern classifications. Webros [64] first proposed the backpropagation algorithm that gave rise to backpropagation networks which are basically perceptrons with multiple layers with enhanced robustness and learning rules. Fukushima [65] developed competitive networks called Cognitron and Neocognitron for interpreting the handwritten characters. Klopf [66] developed the “drivereinforcement learning” for artificial neurons. This is similar to neuronal learning called “heterostasis” that occurs in biological neurons. Rumelhart and McClelland popularized the backpropagation algorithm in their book Parallel Distributed Processing [67]. Hopfield and Tank [68] developed the well known autoassociative network, the Hopfield Network, which attracted much attention due to its stability and ease of its fabrication using VSLI technology. Hagan et al. [69] introduced GaussianNewton approximation to Bayesian Regularization (GNBR) algorithm that reduced the cost of implementing the changes in the training algorithm and also produced optimal results with minimum computational time. Today, neural network concepts have been implemented on chips and are emerging as a prime solution to various complex problems representing the dominance of neural network in today’s scientific world. 3.3. Neural network components and parameters 3.3.1. Neuron Biological neurons are the building blocks in the human brain. Likewise the artificial neuron forms the fundamental data processing unit of the neural network. Figure 3.1 shows a simple neuron model. 24 Fig. 3.1: Single input neuron [69] Every neuron is associated with particular inputs, weights, biases, transfer functions, and outputs. The input ‘p’ is first multiplied (weighted) by a suitable weight ‘w ‘, and is passed on to the summer, the summer adds the weighted input ‘wp’ with suitable bias ‘b’ and passes the net input ‘n’ to the transfer function ‘ f ‘, which operates on the net input ‘n’ and produces an output ‘a’. This output can be made to become the input to the next layer of neurons. 3.3.2. Weights and Biases Every input supplied to a neuron is weighted before it is passed on to the summer in every neuron. Weights are a set of numbers associated with each interconnection between neurons in different layers. The weights indicate the strength of the interconnection between a neuron in one layer and another neuron in the next layer of the network. The initial values of the weights are set to zeroes or any small random number and are modified suitably during the network training to get the desired output from the network. Once the network has been completely trained, the final values of the weight matrix are stored and recurrently called during the testing session. 25 The bias can be considered as a threshold value added to the weighted inputs before they are passed on to the transfer function of the neuron. As can be seen from Figure 3.1, the bias has the effect of shifting the center of the transfer function f, while the weight changes the slope. 3.3.3. Transfer functions Every artificial neuron in a neural network is characterized by its transfer function. Two neurons which are fed with the same inputs can produce different outputs depending on the transfer functions to which they are associated. A neuron can take many input signals, multiply by the weights, add the bias, and pass the resulting scalar on to the associated transfer function. The transfer function decides how the neuron will scale its response to the input data, and generates the neuron’s activation. Some of the transfer functions that are of frequent use in the neural network are the following: • Hardlimit transfer function • Linear transfer function • Sigmoid transfer function The neural network used in this study uses a tangent sigmoid (tansig) transfer function in the hidden layer, and a purelinear transfer function in the output layer. The use of these transfer functions allow the network to understand the linear and non linear relationships that exist between it’s input vectors and output vectors. The two transfer functions employed in this investigation are described in the following. 26 a. Hyperbolic tangent sigmoid transfer function (tansig) The hyperbolic tangent sigmoid function, also known as the tan sigmoid transfer function takes input values between − ¥ and + ¥ , produces an output signal between –1 and +1. Fig 3.2: Tan sigmoid transfer function [69] The output is calculated using the equation 1 (1 exp( 2* )) 2 − + − = n a . (3.1) b. Purelinear transfer function (purelin) The purelinear transfer function produces an output, linearly increasing with the input supplied to it. The purelinear transfer function takes the following form a = n . (3.2) Fig 3.3: Purelin transfer function [69] 27 3.4. Neural network classification Neural networks are classified into two major categories, namely, single– layered and multilayered networks based on the number of layers in the network. Multilayer neural networks have sub classifications, such as multilayer feedforward networks and multilayer cooperative networks. The singlelayer networks can also be subdivided into singlelayer laterallyconnected networks and singlelayer topologically ordered networks. 3.4.1. Singlelayer network The single layer neural networks have only one layer of neurons. A singlelayer neural network can have one or more neurons in their single layer and can produce one or more outputs. Figure 3.4 shows a singlelayer neural network, having R inputs and S number of neurons Fig 3.4: Singlelayer neural network [69] 3.4.2. Multilayer network Multilayer neural networks have more than one layer of neurons. In the multilayered feedforward network, which has been used in this study, all neural responses flow in a forward direction through different layers of the network. 28 Figure 3.5 below shows a multilayered feed forward network having a single Fig 3.5: Multilayer neural network [69] hidden layer with a sigmoid transfer function that gives the network the ability to learn linear and nonlinear relations that exist between inputs and outputs and an output layer with purelinear transfer function allowing the neural network to produce values inside the range 1 to +1. The multilayered networks are an ideal choices for nonlinear regression and pattern recognition. 3.5. Neural network learning and testing Basically there are two important stages in the implementation of neural networks for any application, namely, learning and testing. The feed forward network used in this study employs a supervised learning procedure. In supervised learning, the network is provided with a set of input vectors, ‘A’ as well as with corresponding desired output vectors, ‘B’. During the learning process, the network compares its output vector, ‘C’, with the desired output vectors ‘B’ to produce the error percentage. The values of the weight matrix are adjusted so as to decrease the error. A network is said to be trained if its output responses are matching well with the desired outputs with minimum percent error. 29 After the completion of learning, the network is supplied with a testing data set which was never seen by the network during training and the output of the network is compared with the actual output to predict the network performance. 3.6. Feed forward neural networks In the feed forward networks the flow of data always occurs in the forward direction from the input to the neurons in the hidden layers and thereon to the output layers. No information is back propagated during the operation of the network. Generally, the multilayered feed forward networks are associated with one or more hidden layers having sigmoidal functions that allow the network to learn both nonlinear and linear relationships between input and output vectors. 3.6.1. Architecture of the feed forward neural network The multilayered feed forward network shown in Figure 3.6 has a total of three layers of neurons of which the first and second layers are known as the Fig 3.6: Feed forward neural network [69] hidden layers and the third layer which gives the final output of the entire neural network is called the output layer. We used the number of the layer as a superscript for the weights, neurons, biases, net inputs and outputs from each layer. As shown in Figure 3.6 there are s1 neurons in the first layer, s2 neurons in 30 the second layer, and s3 neurons in the third layer. There are R inputs to the network and the weight matrix for the first, second, and third layer are represented as w1, w2, and w3 . 3.6.2. Working of the feed forward neural network The ultimate task of the neural network is accomplished in two stages, namely, the training mode and production or testing mode. During the training mode, a set of examples known as the input vectors and their corresponding desired outputs are given to the neural network. The network is trained to learn the relationship that exists between the inputs and the outputs by using a learning algorithm known as the back propagation algorithm. During the training mode, the network, especially the hidden layer neurons, learn to respond to features and gradually the network develops the ability to generalize. After the network has been trained successfully, the next step is to test the neural network by giving it a set of input vectors that were not included in the sets used for training the network. If the network has been trained properly, it should be able to predict the outputs correctly for the input test data set that were never used during the training mode. In the training mode, the inputs are first passed to the neurons in the first layer. The final output from the neural network is compared with known desired output and error between the actual output of the network and the desired output is determined. The errors are used for adjusting the connection weights associated with the different layers of the network so that the error is progressively decreased in the subsequent prediction of the network. The 31 training process is repeated until the error has been reduced to a minimum value indicating that the network has learned to the maximum extent possible. After the training session, the network is put into the testing mode. During testing, the neural network is supplied with a set of test data which was not used during the training session, the neural network would not be given the corresponding outputs associated with the test data inputs. The neural networks response or the output is monitored and compared with the known desired outputs. If the output predicted by the neural network agrees closely with the known desired output with minimum error, it indicates that the network has been trained properly and has attained the power to generalize. During the testing process, the connection weights associated with the various layers of the neural network remain unmodified. 3.7. Learning Algorithms 3.7.1. LeastMeanSquared (LMS) rule The leastmeansquared (LMS) algorithm is a kind of supervised training algorithm where the neural network is provided with a set of inputs and their corresponding outputs during training. The algorithm works in such a manner as to reduce the mean square error between the actual network output and target output by adjusting the connection weights and biases of the neurons in different layers of the network. The least mean square error is calculated as follows F(x) = E[(t(k) − a(k))2 ], (3.3) Where the target output t(k) is the desired output, a(k) is the network output and E represents least mean squared error of the output that has to be reduced. The 32 least mean squared (LMS) algorithm is also known as WindrowHoff algorithm or the delta rule. The mean square error F(x) can be approximated by F(x) Ù = 2 ( ) e k = (t(k) − a(k))2 , (3.4) where k represents the iteration number. The estimate of the gradient is given by ( ) 2 ( ) Ñ F x @ Ñ e k . (3.5) The partial derivative of e(k) and 2 ( ) e k with respect to the weights for the th k iteration is given by ( ) ( ) ( ) ( ) 1 1, 1, 1, t k w p k b p k w w e k j R i i i j j − = − + ¶ ¶ = ¶ ¶ = , (3.6) j j w e k e k w e k 1, 1, 2 ( ) 2 ( ) ( ) ¶ ¶ = ¶ ¶ , (3.7) and the partial derivatives of 2 ( ) e k with respect to the biases at th k iteration is given by b e k e k b e k ¶ ¶ = ¶ ¶ ( ) 2 ( ) 2 ( ) , (3.8) Similarly, the derivative of e(k) with respect to the bias is given by 1 ( ) = − ¶ ¶ b e k . (3.9) Substituting Eqns. (3.6) and (3.9) in Eqns. (3.7) and (3.8), respectively yields the gradient of the squared error for the th k iteration: Ñ =Ñ = − 1 ( ) ( ) 2 ( ) 2 ( ) p k F x e k e k j . (3.10) 33 The steepest descent algorithm, with constant learning rate is given by k k x xk x x F x + = 1 = − Ñ  a ( ) . (3.11) Substitution of Eqn. (3.10) for Ñ F(x) in the Eqn. (3.11) yields 2 ( ) ( ) x 1 x e k z k k k = + a + , (3.12) or ( 1) ( ) 2 ( ) ( ) 1w k + =1w k + ae k p k (3.13) and b(k +1) = b(k) + 2ae(k) . (3.14) Eqns. (3.13) and (3.14) represent the least mean square algorithm for a single output network. For network with multiple neurons in the outer layer, the LMS algorithm can be written in matrix form as: W (k+1) = W (k) +2 e (k) pT (k) and (3.15) b (k+1) = b (k) +2 e (k) . (3.16) 3.7.2. Backpropagation algorithm The backpropagation algorithm is a generalization of the least mean square algorithm. It employs a generalized delta rule. Most of the multilayered feed forward networks, including the one used in this investigation employs backpropagation algorithm. At the end of forward propagation step, the error between the actual network output and targeted output is calculated and based on this error the weights associated with each neuron in the output layer is changed. In back propagation, the network weights are moved along the negative of the gradient of the performance function. The algorithm is known as back propagation algorithm because the change of weights starts from the output layer and then proceeds backwards until all weights associated with the first layer 34 has been changed in such a way that the error decreases in subsequent predictions of the network. The mean squared error (MSE) in a multilayered network is given by F(x) E[e e] E[(t a) (t a)] e (k)e(k) T T T = = − − @ . (3.17) The error in multilayered networks is an implicit function of connection weights of the hidden layers. Therefore, chain rule has to be used for calculating the gradient for the steepest descent algorithm. The approximate MSE is given by m i j m i j m i j w F w k w k , , , ( 1) ( ) ¶ ¶ + = −a , (3.18) m i m i m i b F b k b k ¶ ¶ ( +1) = ( ) −a . Using the chain rule, we get m i m i m i m i m i j m i m i m i j b n n F b F w n n F w F ¶ ¶ ¶ ¶ = ¶ ¶ ¶ ¶ ¶ ¶ = ¶ ¶ * * , , (3.19) The net input to any layer ‘m’ is a direct function of the weights and bias in that layer. So it is relatively easy to compute the second terms in the above equations. The net input m i n to the layer m is given by and 1 , 1 , 1 1 , 1 = ¶ ¶ = ¶ ¶ = + − = − − m i m m i m j i j m i S j m i m j m i j m i b n a w n n w a b m (3.20) 35 The change in the function F with respect to the change in the th i element of the net input at layer m is given by m i m i n F s ¶ ¶ = . (3.21) Using Eqn. (3.21) for sensitivity in Eqn. (3.19), yields m m i i m j m m i i j s b F s a w F = ¶ ¶ = ¶ ¶ −1 , (3.22) Now the steepest descent algorithm can be conveniently expressed in the following form the using matrix notation m m m m m m m T b k b k s W k W k s a a a + = − + = − − ( 1) ( ) ( 1) ( ) ( 1 ) , (3.23) ¶ ¶ ¶ ¶ ¶ ¶ = ¶ ¶ = m s m m m m m n F n F n F n F s . . . 2 1 , (3.24) where m W , m b and m s are the weight matrix, bias vector, and the sensitivity vector, respectively. As the term back propagation indicates, in this algorithm the sensitivity at layer m is computed using the sensitivity at its succeeding layer m+1. The Jacobian matrix is given by 36 m m n n ¶ ¶ +1 = ¶ ¶ ¶ ¶ ¶ ¶ ¶ ¶ ¶ ¶ ¶ ¶ ¶ ¶ ¶ ¶ ¶ ¶ + + + + + + + + + + + + m S m S m m S m m S m S m m m m m m S m m m m m m m m m m m n n n n n n n n n n n n n n n n n n 1 2 1 1 1 1 2 2 1 2 1 1 2 1 1 2 1 1 1 1 1 1 1 1 ... . . . . . . . . . ... ... . (3.25) Now, ( ) ( ) 1 , 1 , 1 , 1 1 1 1 , m j m m i j j m j m m m i j j m m j i j m j m i m l S l m i l m j m i w f n n f n w n a w n w a b n n m • + + + + = + + = ¶ ¶ = ¶ ¶ = ¶ ¶ + = ¶ ¶ , (3.26) using Eqn. (3.26), the Jacobian matrix can be rewritten as = = ¶ ¶ + + 0 0 ( ) . . . . . . . . . 0 ( ) ... 0 ( ) 0 ... 0 ( ) ( ) 2 1 1 1 m S m m m m m m m m m m f n f n f n F n W F n n n & & & & & (3.27) Now using the chain rule again, the recurrence relation can be expressed in matrix form as ( ) ( +1) +1 = m m m m T m s F n W s & , (3.28) 1 3 1 s s s s M M ® ® ® − . 37 From the above expression, we notice that the sensitivity is propagated backward from the last layer to the first layer in the network. The sensitivities at the output layer is expressed in matrix form as s 2F (n )(t a) M M M = − & − . (3.29) 3.7.3 LevenbergMarquardt algorithm The basic back propagation algorithm is often the simplest and slowest minimization method. The LevenbergMarquardt algorithm provides faster convergence and is successfully used to speed up the convergence of back propagation. It should be noted that LevenbergMarquardt algorithm uses the backpropagation procedure in which derivatives are processed from the last layer of the network to the first. Hence, the LevenbergMarquardt algorithm could be called a backpropagation algorithm. The secondorder Taylor series is represented as follows ... 2 1 F(w + Dw) = F(w) + g × Dw + Dw × H ×Dw + T T , where Dw is the adjustment to the weight, g is the gradient vector and H is the Hessian matrix. They are defined as: 2 2 ( ) ( ) w F w H w F w g ¶ ¶ = ¶ ¶ = , Dw = −H × g −1 In quasi Newton method, Hessian matrix is estimated by some positive definite matrix, which ensures the convergence. The Hessian matrix is approximated as: 38 H J J @ 2 T , where J is the Jacobian matrix that contains the first derivatives of the network errors with respect to the weights and biases. LevenbergMarquardt algorithm uses an approximation to the Hessian matrix and is given by [ T ] T w J J I J −1 D = + μ , where J is the Jacobian matrix and μ is a scalar. 39 CHAPTER 4 LITERATURE REVIEW Diamond has a unique combination of physical, chemical, electrical, and optical properties which make it a potential candidate for numerous industrial applications. For example, its very high hardness and wear resistance make it ideal for cutting tools and grinding wheels; its insulating properties, radiation hardness and high thermal conductivity make it an ideal member for applications in circuit packaging, high power, electrooptic, semiconductor devices and optical devices [20]. In the 80’s diamond films were grown at low pressures under metastable conditions using chemical vapor deposition (CVD) techniques, such as hot filament CVD, microwave plasma assisted CVD and DC arc plasma jet and flame plasma deposition [2123]. Due to the limited growth rate (<0.1 μm/hr), CVD process cannot effectively compete with the HPHT process on the basis of growth rate or overall cost. However, the low pressure diamond synthesis has led to a new era in diamond technology. (Extensive review presented by DeVries [24]). There are, however, some applications where LPCVD diamond synthesis is preferred over HPHT synthesis. For example, the lowpressure CVD process has a great potential for optical, infrared, and Xray applications as well as for a number of manufacturing and tribological applications. For example, in the manufacturing area, diamond coatings on cutting tools by the CVD technique can 40 be used to improve wear resistance, thereby improving the tool life. The diamond crystals obtained through the CVD technique find applications as cutting tools in nanometric machining and grinding operations and as heat sinks in electronic applications. Recently, nanocrystalline CVD diamond films have been synthesized with superior properties, such as, higher toughness, lower light scattering and higher Young’s modulus [25]. Nanocrystalline diamond (NCD) films are composed of diamond grains of the order of 50 nm, and display under certain conditions smooth morphology. They have been considered for applications in microelectro mechanical systems (MEMS) and its nano variant, namely, nanoelectromechanical systems (NEMS), where the mechanical, electrical, and corrosion properties of these fully dense films extend the range of applications of these novel devices [26]. Gruen et al. [2731] have conducted extensive research on nanocrystalline diamond film. One of the key factors in the CVD diamond film growth is the nature of the hydrocarbon species used. Gruen et al. [27] reported successful growth of diamond films using fullerene precursors in an argon microwave plasma without the addition of hydrogen or oxygen. The average grain size of the films obtained is reported to be 0.05 μm. They postulated that collisional fragmentation of C60 to give C2 could be responsible for the high growth rate of the veryfinegrained diamond films. Zhou et al. [28] investigated the transition from microcrystalline to nanocrystalline films grown from Ar/H2/CH4 microwave plasma; the transition becomes pronounced at an Ar/H2 volume ratio of 4, and the microcrystalline 41 diamond films are totally transformed to nanocrystalline at an Ar/H2 volume ratio of 9. They suggested that the transition in the microstructure to be due to change in the growth mechanism from CH3 in high hydrogen content to C2 as the growth species in low hydrogen content plasmas. Goyette et al. [29] experimentally determined the density of gas phase C2 in ArH2CH4 and ArH2C60 plasmas and reported C2 to be an important species in these growth environments. Gruen et al. [30] used optical spectroscopy to examine C60/Ar plasma and noticed the spectrum to be dominated by swan bands of C2.They proposed that collisionally induced dissociation of C60 in argon plasmas could be the mechanism for C2 production and that C2 is the principal growth species in their diamond film growth experiments. The above studies indicate that carbon dimer (C2) is an important growth species for nanocrystalline diamond growth and the elementary reactions of carbon dimer on a diamond substrate surface is of interest. A variety of hydrocarbon species are used in the microwave plasma CVD process for diamond growth. Some of the widely employed hydrocarbon species are C2H2, C2H, CH3, C2, C, and C2H4. The mechanism by which diamond film growth occurs differs from one hydrocarbon species to another and it also plays a vital role in the properties of the films obtained. Hence, the investigation of the mechanisms by which diamond film growth takes place and various reactions that occur between the gaseous hydrocarbon molecules and the substrate on which the films are grown is always of considerable interest. 42 Any complex reaction between the gaseous precursors and the atoms of the substrate involve basic elementary events, such as chemisorption, insertion, scattering, and desorption. The probability of occurrence of each of these events is greatly affected by factors, such as the incident translational energy, rotational energy of the molecules, the angle of incidence, rotation angle, temperature of the substrate, and impact parameter. Hence, the investigation of the effect of these parameters on various event probabilities in CVD and other film growth processes has attracted considerable attention among research community. Ulloa et al. [31] investigated the adsorption of hydrocarbons, such as CH3, CH2, and C2H4 on the flat terraces and near step edges of diamond (100) surfaces using MD simulations. They found that adsorption of CH3 on the (100) face and subsequent abstraction of one of its hydrogen will promote  scission essential for continued growth. Alfonso and Ulloa [32] studied methyl radical deposition on diamond (100) surfaces using MD simulations. The time step employed in their simulation was between 0.25 – 0.5 fs and most of their trajectories were monitored until elapse time of 2.5 ps. They reported that the adsorption probability of the CH3 radical on diamond substrate increases with the kinetic energy of the methyl radical and decreases with the incidence angle but the rise in adsorption probability becomes less pronounced as kinetic energy of the incident CH3 goes up. Hydrogen knock out events were reported to occur when the methyl radical is incident with a normal energy above 1 eV and is found to be more pronounced for a normal energy of 10 eV. 43 Hu and Sinnott [33] examined the deposition of an ethylene molecularcluster beam at various incident angles and incident energies on a diamond (111) surface that was terminated at the top and bottom with hydrogen atoms using a reactive potential coupled to LennardJones (LJ) potential. The substrate had 24 layers of carbon atoms and it contained 1370013900 atoms with an impact plane area of 69 x 40 Å2. All simulations were carried out for 3 ps and the timestep used was 0.2 fs. They predicted that with an increase in the incidence angle, the amount of adhesion of a thin film decreases. They also reported that crystallographic orientation and the incidence angle have less effect on the film structure and formation. Wang et al. [34] investigated the deposition of CH3 and CH2 radicals on diamond (001) surfaces at room temperature to determine the energy threshold (Eth) for chemisorption and reported that for CH3, the value of Eth on diamond (001)(2x1) H surface is higher than that on diamond (001)(2x1) surface and lower than that of C2H2 on the diamond (001)(2x1) surface. Perry and Raff [35, 36] computed rate coefficients, event probabilities, and desorption probabilities for many elementary chemisorption reactions on a diamond ledge and diamond terrace structures at 1250K. The diamond (111) terrace substrate they used had a total of 145 lattice atoms, a trajectory was carried out between 0.5  1.5 ps and each trajectory took an average of ~ 30 min of CPU time on a Digital (DEC ALPHA 3000/Model 400) workstation. The diamond ledge surface had a total of 147 atoms, the individual trajectory time varied over the range 0.10.87 ps, and the CPU time was the same as for the 44 terrace structure. They investigated molecules and radicals, such as C2H2, C2H, CH3, CH2, C, C2, C3H and found that chemisorption rate is lower for nonradical species, such as C2H2 and C2H4 than for radicals (1012  1013 cm3/mols). They reported that CH3 is the least reactive and atomic carbon has the largest chemisorption rate of all the species investigated. Izumi et al. [37] investigated the reaction probability of silane molecules on silicon (001) surface. They considered twenty different substrate conditions to overcome the scattering effect due to vibration of the substrate. They carried out a total of 20000 trials to obtain probability on the order of 103. They reported that the reaction probability depends significantly more on the internal energy of the silane than on the substrate temperature. The reaction probability increased linearly with the translational energy; quadratically with the vibrational energy of the silane molecule; and depends less on its rotational energy. Zhu et al. [38] studied the interaction between low energy C2H2 and diamond (001)2x1 surface, 200 trajectories were considered and each trajectory lasts for around 3 ps. Six types of chemisorption configurations (S1 through S6) were noticed as shown in Figure 4.1.The S1 structure is found to be the most stable because of its high binding energy and S6 structure to be the least stable. 45 Fig. 4.1: Six types of chemisorption configurations on diamond (001) – (2 x 1) surface [38]. The S2 and S3 configurations are found to be most frequently occurring configurations during diamond film growth and thereby playing an important role in diamond synthesis. Hansen and Hudson [39] studied interaction of oxygen molecules with clean and oxygen covered Ge (100) surfaces using molecular beam scattering techniques. They reported that sticking coefficient increased from 0.018 ± 0.002 to 0.079 as the incident beam energy was increased from 2.1 to 7.9 kcal/mol, but it decreased from 0.0176 at normal incidence to a minimum value when the incident angle of the beam is increased to = 700 . Belsky et al. [40] used MD simulations to study the sticking probability of Cu and Ta atoms on FCC Cu (111) and BCC Ta (110) crystal faces, respectively, for different values of incident energies ranging from 0 to 150 eV and angle of incidence from 00 to 900. The time step for the trajectories was on the order of 0.1 – 1 fs and the total duration for every trajectory was on the order of 100 – 1000 46 fs. The sticking probability of both Cu and Ta atoms was found to be inversely proportional to the incident beam energies. The sticking probability was found to decrease first with the incidence angle, but at larger angles, the probability began to increase. This behavior was ascribed to increased interaction time of the incoming atom with the substrate and a low normal velocity component. Vattuone et al. [41] investigated chemisorption of O2 on Ag (001) surfaces at 100 K by reflectivity method. The sticking probability was found to increase monotonously until the translational energy of the O2 molecule reached 0.7 eV and decreased thereafter because at high energies the collided molecules still retain sufficient energy to avoid being trapped on the surface and are able to escape into the gas phase by scattering inelastically off the repulsive part of the chemisorption potential. Huang et al. [42] studied the interaction between lowenergy CH3 and diamond (001)(2x1) at room temperature using MD simulations. An energy threshold (Eth) of 8 eV below which no chemisorption of CH3 would occur was noticed. Hydrogen dissociation from CH3 was observed for incident energies higher than 15 eV. They also found that below 10 eV incident energy, the chemisorption probability of C2H2 on a clean diamond (001)(2x1) surface was lower than that of CH3 on a hydrogen covered surface at the same impact energy. Neyts et al. [43] investigated the sticking efficiencies and hydrogen abstraction efficiencies for hydrocarbon species, such as C2, C2H, C3H2, and C3 on a diamond like carbon (DLC) layer at two different values of their initial kinetic 47 energies, namely, 0.1 eV and 1.0 eV using MD simulations. The DLC layer had 830 atoms, the time step used was 0.5 fs, and the trajectory duration was 1.25 ps for 1 eV kinetic energy impacts and 2.5 ps for 0.1eV kinetic energy impacts. All species had sticking efficiencies between 0.1 and 0.4. They found that species with no hydrogen atom had a smaller decrease in sticking efficiency with decreasing energy than species that contain hydrogen. They also noted that C3H2 had the highest hydrogen abstraction efficiency, C2 had the lowest abstraction efficiency, and C3 had zero hydrogen abstraction efficiency. Palithorpe [44] conducted MD simulation studies for the deposition of lowenergy carbon atoms onto a lowtemperature diamond (111) surface using StillingerWeber potential. The time step for the integration was 0.13 fs, the energy of the incident carbon atom was in the range of 1100 eV and the substrate temperature was maintained at 100 K. He reported that with intermediate energies (2060 eV), the incident atom penetrates beneath the exposed (111) surface and significantly increases the lateral compressive stress in the diamond film, thereby promoting amorphous diamond formation. From the above literature survey, we infer that MD simulation is a powerful tool for studying many complex reactions that takes place when different species are incident on the diamond substrate. But a main disadvantage of MD simulation is that it consumes huge amounts of computational time (and consequently higher cost), and this time increases rapidly as the number of atoms in the system considered increases. Even to study the effect of one parameter, say the effect of impact parameter on the chemisorption probability of 48 a carbon dimer on (100) diamond surface, keeping other inputs at a fixed value, consumes large amounts of computational time. Therefore, the exploration of the entire set of values become highly challenging and computationally costly when MD simulation has to be used. Any technique that permits the exploration of the wide range of input conditions will be of great benefit this area. Neural network is one such approach that offers such an advantage. This can be inferred from the following literature. Natale et al. [45] employed a modular neural network approach for enhancing the quality of films obtained during atmospheric chemical vapor deposition of doped silicon dioxide films. A neural network was used to establish a relationship between the equipment’s operating conditions and the characteristics of the resulting films. This in turn aids in finding the optimal set up conditions for obtaining high quality of films. Machine operating conditions are determined by factors, such as gas flow, chamber pressure, injector temperature, nitrogen flow. These factors are used as inputs to the network and film quality deciding factors, such as boron and phosphorus weight percent in the plasma and film thickness were used as outputs to the network for training and testing the neural network. The prediction by the network was good with an average error of about 1 % and a maximum error below 10 %. Erbil et al. [46] developed a semiempirical model using hybrid neural networks to determine the deposition rates of TiO2 films in a metalorganic CVD process. Temperature, total flow rate, reactor chamber pressure, source pressure, and precursor flow rate were used as inputs to the network and the TiO2 deposition rate was used as output during 49 the learning and testing stages of the network. The neural network used was able to identify three critical parameters, unknown in analytically derived deposition rate expression, leading to more general physical expression and methodology for predicting deposition rate over a wide range of operating conditions. Bhatikar and Mahajan [47] used a feedforward neural network to predict the performance of a CVD barrel reactor widely used in silicon epitaxy. Their approach involved spatial variation of the deposition rate of silicon on a facet of the reactor. They hypothesized that this spatial variation encodes a pattern that reflects the state of the reactor. A feedforward neural network with eight neurons in the hidden layers was used to predict and decode the pattern thereby predicting the state of the reactor so that it can be optimized to increase the production efficiency. Three different patterns or process faults were diagnosed and the network was able to predict and discriminate these process faults with 100% accuracy. Han and May [48] applied neural networks to predict the complex correlation between the deposition conditions and output parameters reflecting film quality in plasma enhanced CVD (PECVD) process. Deposition parameters, such as, substrate temperature, pressure, RF power, silane flow, and nitrous oxide flow were used as inputs and the corresponding deposition rate, permittivity, film stress, uniformity, silanol and water concentration in the films deposited were used as outputs for training the network. This trained network model was used “in reverse” to predict the necessary operating conditions to achieve the desired film quality. They were able to synthesize recipes to produce 50 novel film properties, such as uniformity, low permittivity, stresses, and impurity concentration using the optimized neural network models. Geisler et al. [49] modeled chemical vapor deposition of silicon nitride (Si3N4) films using a fivelayered feed forward network. The neural network was trained using both supervised and unsupervised learning techniques. The input vector had six variables, namely, substrate temperature, chamber pressure, RF power, NH3 flow, SiH4, and N2 flow and the output vector had three variables, namely, the film’s refractive index, the effective lifetime, and the positive charge density. Competitive learning algorithms were then used to determine the inputoutput relationship functions and these functions are then optimally adjusted to enhance the silicon nitride film properties. Lorenz et al. [50] used a multilayered feedforward network and developed an abinitio potential energy surface (PES). They showed the accuracy of the neural network developed PES using the hydrogen dissociation on the (2 x 2) potassium covered Pd (100) surface. The sticking probability of H2 on this potassium (2 x 2) covered Pd (100) surface is calculated using MD simulations on the neural network PES. The results were compared with the analytically developed potential energy surfaces and found to be in good agreement. Hobday et al. [51] showed that a feedforward network can be used to develop a potential energy surface to study the complex CH problem. The network used had an input vector set with five elements and six hidden nodes with a total of 43 weights and biases. The results were compared with the Brenner potential formulation for CH clusters which indicated a good agreement 51 with both structure and energetics. Numerical experiments showed that the PES developed using neural network though slower than Brenner potential by 6080 % it is still inexpensive compared to the ab initio calculations and can be efficiently used for still more complex systems like CN where bonds are more complex as compared to CH systems considered. Raff et al. [52] interpolated ab initio potentialenergy surfaces using a feed forward neural network and novelty sampling approach. They used various configurations of fiveatom silicon cluster and calculated the force and potential associated with each configuration at the MP4(SDQ) level of accuracy using 6 31G** basis set. They employed a novel sampling procedure and sampled the important regions of configuration space in iterative fashion using MD trajectories. A large number of new cluster configurations and corresponding potential and forces associated with those configurations were obtained using the novelty sampling technique. These cluster configurations, and the potential and forces associated with them were used to fit a neural network and obtain the potential energy surface (PES). The interpolated potential energy surface (PES) can be used efficiently for conducting MD and Monte Carlo studies of large systems involving complex reactions, nanometric cutting and nanotribology. The novelty sampling technique involves tight integration of MD calculations with NN and enables easy identification of new configurations in MD and also act as a good convergence test independent of MD computations. Early stopping and regularization techniques were used to give quick and precise results. The neural 52 network used was found to give good interpolation accuracy and easy usage of the obtained force fields directly for dynamic studies. Sumpter and Noid [53] employed a neural network with 426 input nodes, one hidden layer with 7 nodes, and an output layer of 18 nodes for obtaining a potential energy surface for macromolecules, such as a polyethylene molecule. An accurate anharmonic potential energy surface was formulated. The parameters in this PES were suitably changed and the corresponding vibrational spectra of the macromolecule is monitored. The neural network is then trained for 51 different vibrational spectra values of the macromolecules as inputs and the corresponding potential energy parameters outputs for 20000 cycles. Then the network was trained to determine the relation between the vibrational spectra and the corresponding parameters of the PES with a maximum error of less than 4%. This network was later used for obtaining parameters for a multidimensional PES. Noid et al. [54] used neural network to investigate the energy flow in molecular systems, such as H2O2. The neural network was made to learn the correlation between phasespace points along a classical trajectory and mode energies for stretch, bend, and torsion vibrations. The input vector to the network comprises of 12 cartesian atom positions (x, y, z), 12 cartesian momenta (px, py, pz), and four atomic masses. The output from the network comprised of six kinetic internal mode energies. The network employed had 28 input nodes, two hidden layers with 38 nodes in the first hidden layer and 12 nodes in the second hidden layer and an output layer of six nodes, giving a total of 84 nodes and 53 1648 connection values including bias values. The trained network was able to produce reasonably accurate results with an average error between 1% and 12%. Also the network has been employed for studying the energy flow in other tetratomic molecules, such as H2X2, X=C, and Se. From the above review of literature on neural networks, we find the application of neural networks to molecular dynamics can reduce the burden on MD simulations to a considerable extent. It can also provide an opportunity to explore the effect of different parameters on the probabilities of various events in a CVD process with less computational time and cost. 54 CHAPTER 5 PROBLEM STATEMENT From a review of the literature one can perceive that investigation of elementary reactions, such as chemisorption, scattering, insertion, and desorption that occur during thinfilm growth of microcrystalline diamond by chemical vapor deposition (CVD) process or any other growth process is of immense importance. Though MD simulations have been used successfully to study these reactions, a major limitation inherent with it is that it involves integration of numerous equations that consume an enormous amount of computational time and cost. For example, to study the influence of one of the input parameters, say the effect of impact parameter (b) on the chemisorption probability for a fixed set of other parameters, namely, translational energy (ETrans), rotational energy (ERot), incidence angle ( ), and azimuthal angle ( ) of the carbon dimer, we ran 50 trajectories for every value of impact parameter (b), ranging from b = 0 to b = 3.5 Å. By dividing this range into ten equal intervals, a total of 550 trajectories were run. We used a system of 324 atoms and time for running a single trajectory was ~1 minute. So, it took us a total of 550 minutes (~9.2 hours) to study the influence of a single parameter on one of the reaction probabilities for a single set of other input parameters. 55 Therefore, the investigation of the entire domain consisting of the dependence of various reaction probabilities on different variables is a very time consuming and challenging job. Any technique that cut shorts the enormous simulation time and offers an opportunity to explore the entire domain of input variables will be of immense interest to researchers and deserves to be explored. To achieve this objective, we intend to use neural networks to determine the probabilities of various reaction events, namely, chemisorption, scattering, and desorption that occur during the deposition of carbon dimer on a diamond (100) surface as a function of various input parameters. The implementation of the neural network for predicting the probabilities of various events occurring during carbon dimer (C2) deposition on a diamond (100) surface is achieved through following stages: • We first employed MD simulations to compute the probabilities of chemisorption, scattering, and desorption as a function of input parameters, such as rotational energy (ERot), translational energy (ETrans), angle of incidence ( ), impact parameter (b), and rotation angle ( ). • Training the neural network by feeding the values of outputs for some known values of input parameters over a wide range. • Testing the NN by supplying it with few input parameters and asking it to predict the output, and comparing the NN prediction with MD results. 56 CHAPTER 6 MOLECULAR DYNAMICS (MD) SIMULATIONS OF CARBON DIMER (C2) DEPOSITION ON DIAMOND (100) SURFACE Diamond film growth in a chemical vapor deposition (CVD) process involves complex reaction mechanisms taking place between the atoms of the substrate and the gaseous radicals used in the process. However complex the reaction mechanism might be, it all starts with preliminary elementary reactions, such as chemisorption of the radical species, scattering and desorption. This chapter focuses on these elementary reactions that occur during the deposition of carbon dimer (C2) on to the (100) diamond surface in a CVD process. 6.1. Computational Model In this study, a (100) diamond surface has been used. The surface was modeled using a slab of five layers of carbon atoms with the (100) face exposed. Except for one carbon atom that serves as the radical site, every carbon atom on the top layer is capped with one hydrogen atom. The atoms on all five faces of 57 the substrate, except the top face, are made to be nonmoving atoms, while the remaining atoms were allowed to move. The system used in this study has a total of 324 atoms of which two are the atoms of the carbon dimmer and remaining 322 atoms are of the diamond substrate and hydrogen atoms. The dimensions of the substrate are 17.7 x 3.54 x 17.7 Å. The two atoms of the carbon dimer are placed above the radical site such that the center of mass of the dimer is at a vertical distance of ~10 Å from the top surface of the substrate to make sure that the longrange interactions between the carbon dimer and the substrate atoms is near zero. An empirical manybody Brenner potential (Brenner et al. [13]), which realistically describes the bonding in hydrocarbon systems, is used to account for the shortrange interactions. A Lennard – Jones 612 potential is used to model the longrange interactions. The substrate temperature was maintained at Ts=600 K using a thermostat that employs the velocity scaling method of Berendsen [55]. A constant time step of 0.5 fs is used for numerical integration of the equations of motion and the Gear predictor corrector [18] method was used for numerical integration. Before the dimer deposition process, the substrate is relaxed in a 600 K thermal bath for 30 ps allowing it to approach the thermal equilibrium state. The simulation model used is shown along with the carbon dimer (C2) in Figure 6.1. The top three layers of the diamond substrate and the radical site are shown in Figure 6.2. 58 Fig 6.1: Simulation model and carbon dimer (C2) Fig 6.2: Top three layers of atoms of diamond (100) substrate and radical site 59 6.2. Parameters of Interest The mechanism of diamondfilm growth has been investigated by several research groups using theoretical and/or experimental methods. A number of elementary reactions have been suggested as playing a vital role in diamondfilm formation. The occurrence of these reactions depend not only on the surface structure of the substrate on which the hydrocarbon atoms are deposited but also on a number of parameters [31, 33, 35, 36, 38], such as • Incident azimuthal angle ( ) • Rotation angle ( ) • Impact parameter (b) • Translational energy of the Carbon dimer (ETrans) • Rotational energy of the Carbon dimer (ERot). In the following section we will be dealing with the distribution of each of these input parameters as well as with the events considered in this investigation. 6.2.1. Incident polar angle ( ): Incident angle of the dimer ( ) is the angle between the velocity vector of the center of mass of the dimer and the normal from the aimed point on the substrate. The polar angles were selected from the distribution function P ( ) d = C sin d over the range 0 max. This can be conveniently accomplished using a cumulative distribution function that leads to Eqn. (6.1) [56]. = cos1{1 – 1 (1cos max)} , (6.1) where 1 is a random number selected over the range [0, 1] and max is determined as follows. In the case of an infinite lattice model, max = /2. In the 60 present calculations, the size of the lattice model used requires that the value of max be limited to 23o. With this choice for max, normalization of the distribution function gives C=12.579. The incidence angle of the hydrocarbon atom is shown along with other input parameters in Figure 6.3 Fig 6.3: Sketch showing input variables for the trajectory calculations [35] The distribution for the incidence angle of the dimer follows the smooth linearly increasing curve shown in Figure 6.4. The histogram is an example of the distribution obtained using Eqn. (6.1) 0 5 10 15 20 Incidence angle (q°) 0 0.02 0.04 0.06 0.08 P(q) Statistical result Theoretical result Fig 6.4: Distribution of incidence angle 61 6.2.2. Rotation angle ( ): The probability distribution for the rotation angle is of the function form P ( ) d = C d and is uniform over the interval 0 2 ; the initial value of has been selected from in Eqn. (6.2) = 2 2 , (6.2) where 2 is a random number selected over the range [0, 1]. The normalization of the distribution function gives C= 0.1591.The theoretical distribution of the rotation angle is a constant straight line shown in Figure 6.5 as the line. The statistical result obtained from Eqn. (6.2) [56] is shown as the histogram. 50 100 150 200 250 300 350 Rotation angle (fº) 0.02 0.025 0.03 0.035 P(f) Statistical result Theoretical result Fig 6.5: Distribution of rotation angle 6.2.3. Impact parameter (b): The impact parameter represents the distance between the radical site and the aiming point on the surface of the substrate. The impact parameters are 62 selected from the distribution P(b)db = N2pbdb over the range 0 b bmax, where the upper limit, bmax, is chosen such that for impact parameters b > bmax, the chemisorption probability is zero. Using a cumulative distribution function, this selection can be made by obtaining b for each trajectory from the equation b = x 3 bmax , (6.3) where 3 is a random number selected from a uniform distribution on the interval [0, 1]. The maximum impact parameter (bmax) is found to be 3.5 Å. With this choice for bmax the normalization of the distribution function gives N=0.1633. The impact parameter distribution obtained using Eqn. (6.3) is compared with the theoretical result obtained from the probability distribution function [36]. 0 0.5 1 1.5 2 2.5 3 3.5 Impact parameter (Å) 0 0.01 0.02 0.03 0.04 0.05 0.06 P(b) Statistical result Theoretical result Fig 6.6: Distribution of impact parameter 6.2.4. Translational energy of the carbon dimer (ETrans) The initial translational velocity of the carbon dimer was selected from a Boltzmann distribution at the same temperature as the lattice which is Ts = 600 K. The functional form of the Boltzmann distribution is given by [57] 63 0 0.05 0.1 0.15 0.2 0.25 Translational energy (eV) 0 0.05 0.1 0.15 0.2 P( ETrans ) Statistical result Theoretical result Fig 6.7: Distribution of translational energy of the dimmer P ( E Trans) = A E KT e − / , (6.4) The distribution of translational energy of the dimer is shown in Figure 6.7. 6.2.5. Rotational energy of the carbon dimer (ERot) The rotational energy of the carbon dimer was calculated assuming a rigidrotor type rotational energy quantization [58]: J E = J (J +1)h2 / 2I , where h = h / 2p . (6.5) Here, I represents the equilibrium moment of inertia and J represents a continuous quantum number [58] given by the Eqn (6.6) 1/ 2({1 8 ln(1 ) / } 1) 2 1/ 2 J = − IkT −x h − , (6.6) where T is the temperature and x is a random number selected from a uniform distribution in the interval [0, 1]. The spread for the rotational energy of the dimer is shown in Figure 6.8. The theoretical result for the rotational energy is given by the distribution function [58] in Eqn (6.7). P (J ) dJ Cg exp(E / kT) J J = , (6.7) 64 0 0.005 0.01 Rotational energy (eV) 0 0.01 0.02 0.03 0.04 Statistical result Theoretical result P(E Rot ) Fig 6.8: Distribution of rotational energy of the dimer In Eqn (6.7) C represents the normalization constant and J g represents the degeneracy. The vibrational energy of the dimer corresponds to the zero point energy ( ZPE ) of the dimer. ZPE = h / 2 = Ev u0 , (6.8) (1/ 2) 2 v R E = μV , (6.9) where h is the Planck’s constant, u0 is frequency, μ is the reduced mass of the dimer, and R V is the relative vibrational velocity of the dimer. Eqn (6.9) assumes the initial vibrational phase of the dimer corresponds to the equilibrium positon. 6.3. Predominant events in CVD dimer deposition Many complex chemical reactions on a surface begin with simple elementary steps. These steps include adsorption on the surface, diffusion of the adsorbed atoms or molecules between binding sites, bondbreaking, insertion of 65 atoms or molecules and desorption of product molecules. In our present studies we have focused on the following events and their probabilities. • Chemisorption • Scattering/Reflection • Desorption 6.3.1. Chemisorption The initial conditions for the substrate and the carbon dimer are selected as discussed above. The position and the force on each atom in the system is determined by solving the Newtons equations of motion. The potential energy of the system is monitored after every integration step. A sudden drop in the potential energy of the system was noticed as the dimer approaches the substrate indicating the bond formation between the carbon dimer and the radical site. The trajectory calculations are carried out for an additional time of 1 ps after the dimer has reached the surface. Chemisorption of the carbon dimer is said to have occurred if the adsorbed atom undergoes ten or more inner turning points with respect to motion in the surface normal direction and the distance between the radical site and one of the carbon atoms of the dimer is within a cutoff radius of 2 Å of the radical site. The chemisorption probability is determined by running 50 trajectories keeping the input parameters, such as the incidence angle, rotational angle, translational energy, rotational energy and impact parameter constant and averaging over other factors such as the thermal vibrations of the lattice, vibrational phase angles of the lattice, rotational plane of the dimer and 66 initial orientation of the dimer. Figure 6.9 gives the variation of potential energy of the system V, the Z coordinate of the center of mass of the dimer, and the 1680 1675 1670 1665 V 0 500 1000 1500 2000 Time (fs) 1 1.2 1.4 1.6 1.8 R(CC) 0 2 4 6 8 10 Z Fig 6.9: Variation of system’s potential energy, Z coordinate of COM of the dimer distance between carbon atoms of the dimer (chemisorption event) 67 distance between two carbon atoms of the dimer, R, as a function of time. It can be seen from Figure 6.9, there is a sudden drop in the potential energy of the system by ~ 6 eV at the instance of bond formation. 6.3.2. Scattering Scattering of the dimer is said to have occurred if the dimer executes only 1675 1670 1665 V 0 500 1000 1500 2000 2500 3000 3500 Time (fs) 0 5 10 15 20 Z Fig 6.10: Variation of system’s potential energy and Z coordinate of COM of the dimer (scattering event) 68 one inner turning point on the surface of the (100) lattice and then bounces back. The system energy is monitored over every integration time step and there is found to be no drop in the potential energy of the system as there is no bond formation between the incoming dimer atoms and the atoms of the diamond surface. The scattering probability is determined using the same procedure as used for chemisorption. Figure 6.10 gives the variation of potential energy of the system,V , and the Z coordinates of the center of mass of the dimer as a function of time. It can be seen from Figure 6.10, there is no change in the potential energy of the system as a result of scattering of the dimer. 6.3.3. Desorption In the desorption event, the carbon dimer comes to the surface of the lattice, gets adsorbed without appreciable change in the potential energy and then desorbs back after a few oscillations. The probability of desorption is determined by running 50 trajectories keeping the incidence angle, rotation angle, translational velocity, rotational velocity and impact parameter constant and averaging over the thermal vibrations and vibrational phase angle of the lattice. Desorption is said to have occurred if the carbon dimer (C2) executes less than 10 inner turning points on the diamond surface and bounces back. There is no change in the potential energy of the system. 69 1672 1670 1668 1666 1664 V 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Time (fs) 0 5 10 15 Z Fig 6.11: Variation of system’s potential energy and Z coordinate of COM of the dimer (desorption event) Figure 6.11 shows the potential energy of the system and Z coordinate of the dimer’s center of mass with time for the desorption event. As can be seen from the plot, the dimer stays on the substrate for greater period of time as compared to that for a scattering event. 70 CHAPTER 7 NEURAL NETWORKS (NN) FOR EVENT PROBABILITY PREDICTION The probabilities of various events, such as, chemisorption, scattering and desorption determined by MD simulation has been used to implement a neural network and subsequently use that neural network to predict the probabilities of these events as well as to find the effect of various input parameters on these probabilies. In studies described in Chapters 8, three separate networks have been implemented, one each, for predicting the probabilities of the three events. This chapter discusses the architecture of the network used, the structure of the input and output data set to the network, the total number of data sets used, and the procedure for implementation of the network, namely, training and testing of the network. 7.1. Architecture and working of the neural network The neural network used in this investigation for predicting the probabilities of various events that occur during carbon dimmer (C2) deposition on a diamond (100) surface is a multilayered feed forward network. The network has two layers, the first layer is a hidden layer that has 50 hidden neurons and a tansigmoid transfer function associated with every hidden neuron in the layer. The second layer is the output layer that has one neuron and a purelinear transfer function 71 associated with it. The output from this second layer is the probability of a particular event and this forms the final output of the neural network. The input vector to the neural network has five components, namely, the incidence azimuthal angle of the dimer ( ), the rotation angle ( ), impact parameter (b), translational energy of the dimer (ETrans), and rotational energy of the dimer (ERot). The output vector of the neural network has a single component which forms the probability of a particular event predicted by the neural network. 7.2. Implementation of the neural network The implementation of the neural network for predicting the probabilities is carried over in two stages, namely, the training stage and the testing stage. The network is trained using LevenbergMarquardt algorithm that employs a procedure known as early stopping [69]. In this procedure, the training of the neural network was met within approximately 20 iterations or epochs. The total input data containing 2000 data sets to the network is divided into two subsets, of which 85% of the data (1700 data sets) becomes the training set, which was used for training the network, and the remaining 15 % (300 data sets) of the data becomes the validation set. Fifty different neural networks were trained by random selection of the 85% of the training data and the average of outputs of these 50 networks is computed to arrive at the predicted probabilities. It may be noted that during initial stage of each training, the error on the training and validation sets decrease. But, when the network starts overfitting, the error on the training set continues to decrease while that on the validation set starts increasing. When the error on the validation set begins to increase for a 72 specified number of iterations, it indicates the network is attempting to overfit and so the training is stopped. Such an early stopping procedure has been successfully used to prevent the network from overfitting [69]. After the network has been trained successfully, it is tested by supplying it with a set of input data to predict the output. If the network has been trained properly, it will be able to predict an output that matches closely with the desired output. 73 CHAPTER 8 PREDICTION OF EVENT PROBABILITIES – NEURAL NETWORK VS MD SIMULATION In this chapter we report results of MD simulations for predicting the probabilities of various events, such as, chemisorption, scattering, and desorption that occur during the deposition of carbon dimer (C2) species on a diamond (100) surface in the CVD process. We shall use these results to train the neural network and determine the underlying relationships between the five input parameters of the dimer and each of the three event probabilities. 8.1. Data points generation for neural networks The five input parameters used in the synthesis of diamond by CVD process, namely, the incidence angle ( ), rotation angle ( ), impact parameter (b), translational energy (ETrans), and rotational energy (ERot) of the dimer forms the input vector for the neural network and the corresponding event probabilities forms the output vector of the neural network. A total of 2000 data points are used for training and testing the neural network. Every point for the neural network is generated by running 50 MD trajectories. All the five input parameters were kept constant during these 50 trajectories. The probability of occurrence of each of the three events were estimated at the end of these 50 trajectories by 74 taking the ratio of the number of times a particular event has occurred to the total number of trajectories computed. 8.2. Training and testing of the neural network As mentioned in Chapter 7, the implementation of the neural network involves training and testing. First, the five input parameters and the output probabilities were normalized to make the range to lie between 1 and +1. Normalizing is done using the formula 1 ( ) 2 ( ) max min min − − − = p p p p pn , (8.1) where p is the variable to be scaled, pmin and max pmax are the minimum and maximum values of each variable in the input or output vectors for the entire database consisting of all the points. pn is the normalized value corresponding to p. 85 % of the normalized data have been used for training and the remaining 15% is used for the validation of the network. 50 neural networks were generated by a random selection of 85% of the training data, and the average of the outputs of these 50 networks is computed to obtain the final predicted probabilities. The training of the neural network was accomplished within approximately 20 iterations or epochs. The initial weight matrices for each training were randomly chosen. This is done to enable the network to get trained for any randomness. Each neural network was trained using supervised learning mentioned in Chapter 3. Early stopping was used to prevent the network from overfitting [69]. After each neural network has been trained, the network was tested with a test data set to see whether the network is able to predict the outputs correctly. 75 Figures 8.1 through 8.3 show the training and testing plots for the three events, namely, chemisorption, scattering, and desorption for one of the neural networks. The scatter present in the training and testing plots is because of the uncertainty occurring due to averaging over just 50 trajectories for calculating the probabilities of the events. 0 0.2 0.4 0.6 0.8 1 MD 0 0.2 0.4 0.6 0.8 1 Neural Network 0 0.2 0.4 0.6 0.8 1 Neural Network Training Testing Fig 8.1: Neural network training and testing plots for the probability of chemisorption for one neural network 76 The scatter in the plots can be greatly minimized by averaging over a larger number of MD trajectories, say 500 trajectories per data point instead of 50 trajectories per data point. For example, if we average over 500 MD trajectories, and assume one chemisorption event occurred then the statistical uncertainty involved here is calculated using Eqn (8.2) to be 0.001999. Now, let us take the 0 0.2 0.4 0.6 0.8 1 MD 0 0.2 0.4 0.6 0.8 1 Neural Network 0 0.2 0.4 0.6 0.8 1 Neural Network Training Testing Fig 8.2: Neural network training and testing plots for the probability of scattering for one neural network 77 present case in which we average over 50 trajectories, and one chemisorption event occurred, the statistical uncertainty in this case is 0.0197. We see that the statistical uncertainty reduces by ten times if the number of trajectories is increased. As mentioned in Chapter 3, three individual neural networks have been used for predicting the probabilities of the three events, one network for each event probability. 0 0.2 0.4 0.6 0.8 1 MD 0 0.2 0.4 0.6 0.8 1 Neural Network 0 0.2 0.4 0.6 0.8 1 Neural Network Training Testing Fig 8.3: Neural network training and testing plots for the probability of desorption for one neural network 78 The rms error in the training was found to be 0.0422 for the chemisorption probability network, 0.0561 for the scattering probability network and 0.0550 for the desorption probability network. The rms error during testing was found to be 0.0512 for the chemisorption probability network, 0.0695 for the scattering probability, network and 0.0630 for the desorption probability network. 8.3. Effect of input parameters on event probabilities: Neural network Versus MD predictions The effect of the five input parameters, on the probabilities of chemisorption, scattering, and desorption has been studied using MD simulations. Subsequently, neural networks were used to predict the relationship existing between the input parameters and the event probabilities. In this section we will investigate the predictions made by the neural networks and compare their predictions with MD simulation results. 8.3.1. Effect of incidence angle ( ) The effect of incidence angle ( ) on the three probabilities was studied using MD simulations by running trajectories in which the other four input parameters are maintained constant. For every value of the incidence angle of the dimer 50 trajectories were run in order to average over the thermal vibrations and vibrational phase angles of the lattice. The probabilities of the three events were determined as described in Section 8.1. The input parameters for which MD trajectories were run are as follows: = 110°, b = 1 Å, ETrans = 0.124 eV, ERot = 0.052 eV and the incidence angle is varied from = 0° to the maximum incidence angle max = 23° in steps of 2°. The neural networks that were trained (as described in Section 8.2) 79 are used to simulate the event probabilities for the same input data and the results are plotted in Figure 8.4 along with the MD results. It can be seen from Figure 8.4, 0 0.2 0.4 0.6 0.8 1 P MD Neural Network 0 0.2 0.4 0.6 0.8 P 0 5 10 15 20 Incidence angle (qº) 0 0.2 0.4 0.6 0.8 PD S C Fig 8.4: Effect of incidence angle on chemisorption, scattering and desorption probabilities  MD and Neural network predictions. The error bars represent one sigma limit of statistical uncertainty in the MD results. the results predicted by MD calculations and neural network agree well with each other.The chemisorption probability, scattering probability, and desorption probability are represented as PC, PS, and PD, respectively, in the graphs. The 80 statistical error present in MD is found to be one sigma limit of uncertainty. It is calculated using the formula [36] [( ) / ]1/ 2 * ( ) N N NN P R R Ñ = − s . (8.2) Here, N represents the total number of trajectories, R N represents the number of times a particular event has occurred, and s (P) represents the event probability. For example, say that out of the 50 trajectories ran, 40 events are chemisorption, then the one sigma limit of uncertainty in MD using Eqn. (8.2) is 0.0565. 8.3.2. Effect of rotation angle ( ) The effect of rotation angle on the event probabilities is studied using MD simulations using the same procedure. In this case, all parameters except the rotation angle ( ) are kept constant for all trajectories. The rotation angle is varied from 10° to 360° in steps of 20° for every 50 trajectories. The input parameters for which MD trajectories were run are given as: = 11°, b = 1 Å, ETrans = 0.06 eV, ERot = 0.052 eV. The same input data sets that were used for running the MD trajectories are used as the test data set for the neural networks and the average neural network output is plotted along with the MD results in Figure 8.5. The error bars in the figure corresponds to the statistical error in MD and is computed using Eqn (8.2). We notice that the MD results and the neural network results agree well with each other. 81 0 0.2 0.4 0.6 0.8 1 P MD Neural Network 0 0.2 0.4 0.6 0.8 P 0 50 100 150 200 250 300 350 Rotation angle (f°) 0 0.2 0.4 0.6 0.8 P D S C FIG 8.5: Effect of rotation angle on chemisorption, scattering and desorptionprobability MD and Neural network predictions. The error bars represent one sigma limit of statistical uncertainty in the MD results. 82 8.3.3. Effect of impact parameter (b) The effect of impact parameter on the three event probabilities are determined using MD simulations using the same procedure as described in Section 8.3.1. 0 0.2 0.4 0.6 0.8 1 P MD Neural Network 0 0.2 0.4 0.6 0.8 P 0 0.5 1 1.5 2 2.5 3 3.5 Impact parameter (Å) 0 0.2 0.4 0.6 0.8 P D S C FIG 8.6: Effect of impact parameter on chemisorption, scattering and desorption probabilities MD and Neural network predictions. The error bars represent one sigma limit of statistical uncertainty in the MD results. 83 The impact parameter is varied from 0.25 Å to maximum impact parameter bmax = 3.5 Å in steps of 0.25 Å for every 50 MD trajectories. The values of other input parameters are as follows: = 17°, = 310°, ETrans = 0.06 eV and ERot = 0.052 eV. The neural networks are now used for predicting the event probabilities by using the same test data as used for running the MD calculations. The results are plotted along with MD results and uncertainty associated with MD in Figure 8.6. It can be seen that the output of the neural network is in accordance with that of MD. 8.3.4. Effect of translational energy of the dimer (ETrans) The effect of translational energy of the dimer on each of the three event probabilities were determined using MD using the same procedure as described above for the other parameters. The values of input parameters are as follows: = 11°, = 110°, b = 1 Å and ERot = 0.052 eV. The same input data set is used for the neural networks. The average output of the networks and MD results along with statistical error associated with MD are shown in Figure 8.7. Here again, we note a good agreement between MD and NN. 84 0 0.2 0.4 0.6 0.8 P MD Neural Network 0 0.2 0.4 0.6 0.8 P 0 0.05 0.1 0.15 0.2 0.25 Translational energy (eV) 0 0.2 0.4 0.6 0.8 PD S C Fig 8.7: Effect of translational energy on chemisorption, scattering and desorptio desorption probabilities – MD and Neural network predictions. The error bars represent one sigma limit of statistical uncertainty in the MD results. 85 8.3.5. Effect of rotational energy of the dimer (ERot) The effect of rotational energy of the dimer on three probabilities is determined using MD simulations and also using neural networks using the same 0 0.2 0.4 0.6 0.8 1 P MD Neural Network 0 0.2 0.4 0.6 0.8 P 0 0.05 0.1 0.15 0.2 0.25 Rotational energy (eV) 0 0.2 0.4 0.6 0.8 PD S C Fig 8.8: Effect of rotational energy on chemisorption, scattering and desorption probabilities – MD and Neural network predictions. The error bars represent one sigma limit of statistical uncertainty in the MD results. 86 procedure as described in section 8.3.1.The input data set for MD calculations are as follows: = 17°, = 110°, b = 1 Å, ETrans = 0.06 eV. The neural networks are tested to predict the output for the same data set and the results of MD and neural network are shown in Figure 8.8. We note that the neural network and MD results agree well with each other. 8.4. Statistical uncertainty: Neural network Vs MD The results given by molecular dynamics simulation have a statistical uncertainty that can be calculated using Eqn. (8.2). Figures 8.4 through 8.8 show the MD results with error bars to indicate the one sigma limit of statistical uncertainty in the MD calculations. The neural network plots are obtained by averaging over 50 sets of neural network matrices. Therefore, the neural network predictions also have statistical errors associated with them, but, the error in neural network prediction is very small compared to MD (See Figure 8.9). The figure shows the neural network predictions and MD predictions along with the error bars to show the one sigma limit of statistical uncertainty associated with each case. The error bars in dotted lines represent the statistical noise associated with neural network, and the error bars in solid lines represent the statistical error associated with MD. We infer from Figure 8.9 that the functional relationship between various input parameters and different event probabilities predicted by the neural network are continuous and have less statistical uncertainty associated with them than do the MD results. 87 0 0.2 0.4 0.6 0.8 1 P MD Neural Network 0 0.2 0.4 0.6 0.8 P 0 5 10 15 20 Incidence angle (qº) 0 0.2 0.4 0.6 0.8 PD S C Figure 8.9: Comparison of statistical uncertainty in neural network and MD 88 8.5. More results from neural networks In Section 8.3, we have seen that neural network is able to predict the underlying relationship existing between the incidence angle ( ), rotation angle ( ), impact parameter (b), translational energy (ETrans) and rotational energy (ERot) of the dimer, and the probabilities for chemisorption, scattering and desorption. After training the neural network, it is easy to compute the probabilities of different events for arbitrary sets of input parameters. In Figures 8.10 through 8.14, we present additional results given by the trained neural network. The time taken by the neural network for predicting the relationship between each of the input parameters and the three event probabilities is approximately 3 minutes, whereas MD simulation takes 550 minutes (~9.2 hours) to study the influence of a single parameter on three event probabilities. So, it is easy to note that the computation of the Figures 8.10 through 8.14 by MD simulations would require hundreds of CPU hours in contrast to a few minutes by the trained neural networks. 89 0 0.2 0.4 0.6 0.8 1 P Translational Energy = 0.15eV Translational Energy = 0.17 eV Translational Energy = 0.27eV 0 0.2 0.4 0.6 0.8 P 0 0.5 1 1.5 2 2.5 3 3.5 Impact Parameter (Å) 0 0.2 0.4 0.6 0.8 P Incidence Angle = 17º Azimuthal Angle = 110º Rotational Energy = 0.05 eV D S C Fig 8.10: Effect of impact parameter on event probabilities for various translational energies of the dimer NN predictions 90 0 0.2 0.4 0.6 0.8 P Incidence Angle = 1º Incidence Angle = 11º Incidence Angle = 15º Incidence Angle = 21º 0 0.2 0.4 0.6 0.8 P 0 0.05 0.1 0.15 0.2 0.25 Translational energy (eV) 0 0.2 0.4 0.6 0.8 P Impact Parameter = 1Å Azimuthal Angle = 110 Rotational Energy = 0.052 eV D S C Fig 8.11: Effect of translational energy on event probabilities for various incidence angles of the dimer NN predictions 91 0 0.2 0.4 0.6 0.8 P Translational Energy = 0.001 eV Translational Energy = 0.03 eV Translational Energy = 0.12 eV Translational Energy = 0.27 eV 0 0.2 0.4 0.6 0.8 P 0 50 100 150 200 250 300 350 Rotation angle (fº) 0 0.2 0.4 0.6 0.8 P Impact Parameter = 1Å Incidence Angle = 17º Rotational Energy = 0.052 eV D S C Fig 8.12: Effect of rotation angle on event probabilities for various translational energies of the dimer – NN predictions 92 0 0.2 0.4 0.6 0.8 P Impact Parameter = 1Å Impact Parameter = 1.5Å Impact Parameter = 2Å Impact Parameter = 2.5Å 0 0.2 0.4 0.6 0.8 P 0 5 10 15 20 Incidence angle (qº) 0 0.2 0.4 0.6 0.8 P Translational Energy = 0.06 eV Azimuthal Angle = 110 Rotational Energy = 0.052 eV D S C Fig 8.13: Effect of incidence angle on event probabilities for various impact p parameters – NN predictions 93 0 0.2 0.4 0.6 0.8 P Impact Parameter = 0.5Å Impact Parameter = 1Å Impact Parameter = 1.5Å Impact Parameter = 2Å 0 0.2 0.4 0.6 0.8 P 0 0.05 0.1 0.15 0.2 0.25 Rotational energy (eV) 0 0.2 0.4 0.6 0.8 P Impact Parameter = 1Å Incidence Angle = 17º Rotational Energy = 0.052 eV D S C Fig 8.14: Effect of rotational energy on event probabilities for various impact parameters – NN predictions 94 CHAPTER 9 CONCLUSIONS AND FUTURE INVESTIGATION MD simulations were conducted to generate the initial data required for training the neural network. The neural networks have been successfully applied to study the effects of five input parameters (incidence angle ( ), rotation angle ( ), impact parameter (b), translational energy (ETrans) and rotational energy (ERot) of the dimer), on the probabilities of three events (chemisorption, scattering, and desorption events), that occur during the deposition of carbon dimer (C2) onto the diamond (100) surface in a CVD process for thin film growth. The conclusions of this study and future investigations are presented in the following. 9.1. Conclusions 1. Neural networks (NN) can be used effectively to predict the underlying relationships between the five input parameters of the dimer, and the probabilities of three events outlined above. 2. The chemisorption probability is found to decrease with increase in impact parameter (b). The scattering probability and desorption probability are found to increase with the impact parameter (b). 3. The chemisorption probability is found to increase with increase in the translational energy (ETrans) of the C2 dimer but the scattering and desorption 95 probabilities are found to decrease with the increase in the translational energy (ETrans) of the dimer. 4. The chemisorption probability, scattering probability, and desorption probability are found to be independent of the rotation angle ( ). 5. The chemisorption probability is found to decrease with increase in the incidence angle ( ) of the dimer. The scattering probability and desorption probability are found to increase with the incidence angle ( ) of the dimer. 6. The chemisorption probability is found to decrease with increase in the rotational energy (ERot) of the C2 dimer but the scattering and desorption probability are found to increase with increase in the rotational energy (ERot) of the dimer. 9.2. Future Work 1. The approach presented in this investigation can be extended to investigate different types of reaction channels and mechanisms that occur during diamond film growth. 2. The neural network concept applied here can be extended to investigate the event probabilities of other types of growth species, such as CH3, C2H2, and C2H4. 3. With a slight modification to the neural network used in this study, the effects of the type of substrate used, lattice plane(s), temperature and pressure effects on event probabilities, reaction channels, and growth rates can be investigated to determine the optimum temperature and pressure conditions, and appropriate crystal planes for achieving high growth rates and better quality of the films deposited. 96 4. By training the network with more data sets for a particular event, say for example, insertion or hydrogen abstraction, the network can be strengthened to predict the probabilities of such rarely occurring events with high accuracy and less time. 5. The neural network approach used in this study can also be successfully applied to investigate the reaction channels leading to the growth of other thin films, such as polycrystalline silicon and gallium arsenide. 97 REFERENCES 1. Bundy, F.P, Hall H.T., Strong, H.M. and R.H. Wentorf, Jun. “Man – made diamonds,” Nature 176 (1955). 17. 2. Mitura, S., “Nucleation of diamond powder particles in an RF methane plasma,” J. Cryst. Growth. 80 (1987) 417424.N 3. Howard, W., Huang, D., Yuan, Frenkiach, M., Spear, K.E., Koba, R. and A.W. Phelps, “Syntehsis of diamond powder in acetylene oxygen plasma,” J. Appl. Phys. 68 (1990) 12471251. 4. Frenkiach, M., Howard, W., Huang, D., Yuan, J., Spear, K.E., and R. Koba, “Induced nucleation of diamond powder,” Appl. Phys. Lett. 59 (1991) 546548. 5. Buerki, P. R. and Samuel Leutwyler, “Homogeneous nucleation of diamond powder by CO2 –laserdriven gasphase reactions,” J. Appl. Phys. 69 (1991) 37393744. 6. Choy, K.L., “Chemical vapor deposition of coatings,” Progress in Materials Science 48 (2003) 57170 7. Celli, F.G., “Diamond chemical vapor deposition,” Annu. Rev. Phys. Chem. 42 (1991) 643684. 98 8. Yang, S.W., Xie, X., Wu, P. and K.P. Loh, “Chemisorption of C2 biradical and acetylene on reconstructed diamond (111)(2 x 1),” J.Phys. Chem. B. 107 (2003) 985993. 9. Hukka, T.I. and T.A. Pakkanen, “Chemisorption of hydrogen on the diamond (100)2 X 1 surface: An ab Initio study,” J. Phys. Chem. 98 (1994) 12420 12430. 10. Rettner, C.T., DeLouise, L. A. and D.J. Auerbach, “Effect of incidence kinetic energy and surface coverage on the dissociative chemisorption of oxygen on W (110),” J. Chem. Phys. 85 (1986) 11311149. 11. Dyson, A.J. and P.V. Smith, “A molecular dynamics study of the chemisorption of C2H2 and CH3 on the Si(001)(2 X 1)surface,” Surface Science 375 (1997) 4554. 12. Komanduri, R. and L.M. Raff, “A review on the molecular dynamics simulation of machining at the atomic scale,” Proc. Instn. Mech. Engrs. Part B 215 (2001) 16391672. 13. Brenner, D. W., Shenderova, O. A., Harrison, J. A., Stuart, S. J., Ni, B., Susan B Sinott, “A secondgeneration reactive empirical bond order (REBO) potential energy expression for hydrocarbons “, J. Phys.: Condens. Matter 14 (2002) 783802. 14. Tersoff, J., “Modeling solidstate chemistry: Interatomic potentials for multicomponent systems,” Phys. Rev. B. 39 (1989) 55665568. 99 15. Brenner, D. W., “Empirical potential for hydrocarbons for use in simulating the chemical vapor deposition of diamond films,” Phys. Rev. B. 42 (1990) 94589471. 16. Verlet, L., “Computer “Experiments” on classical fluids. II. equilibrium correlation functions,” Phy. Rev. 165 (1968) 201214. 17. Beeman, D., “Some multistep methods for use in molecular dynamics calculations,” J. Comp. Phys. 20 (1976) 130139. 18. Allen, M.P. and D.J., Tildesley, “Computer simulation of liquids,” (Oxford University Press, 1989). 19. Zhou, D., Gruen, D.M., Qin, L.C., McCauley, T.G. and A.R. Krauss, “Control of diamond film microstructure by Ar additions to CH4/H2 microwave plasmas,” J. Appl. Phys. 84 (1998) 19811989. 20. Spitsyn, B. V., Bouilov, L. L. and B.V. Derajaguin, “Vapor growth of diamond on diamond and other surfaces,” Journal of Crystal Growth 52 (1981) 219 226. 21. Reinke, P., Kania, P., P.Oelhafen, “Investigation of the nucleation mechanism in biasenhanced diamond deposition pn silicon and molybdenum,” Thin Solid Films 270 (1995) 124129. 22. Kurihara, K., Sasaki, K., Kawarada, M. and Nagaaki Koshino, “High rate synthesis of diamond by dc plasma jet chemical vapor deposition,” Appl. Phys. Lett. 52 (1988) 437438. 23. Cappelli, C.A. and P.H. Paul, “An investigation of diamond film deposition in a premixed oxyacetylene flame,” J. Appl. Phys. 67 (1990) 25962602. 100 24. DeVries, R.C., “Synthesis of diamond under metastable conditions,” Ann. Rev. Mater. Sci. 17 (1987) 161187. 25. Philip, J., Hess, P., Feygelson, T., Butler, J.E., Chattopadhyay, S., chen, K.H. and L.C. Chen, “Elastic, mechanical, and thermal properties of nanocrystalline diamond films,” J. Appl. Phys. 93 (2003) 2164  2171. 26. Sekaric, L., Parpia, J.M., Craighead, H.G., Feygelson, T., Houston, B.H. and J.E. Butler, “Nanomechanical resonant structures in nanocrystalline diamond,” Appl. P
Click tabs to swap between content that is broken into logical sections.
Rating  
Title  Molecular Dynamics (MD) Simulations of Chemical Vapor Deposition (CVD) of Carbon Dimer on a Diamond (100) Surface and Application of Neural Networks (NN) for Event Probability Predictions 
Date  20050701 
Author  Abdul Samadh, Abdul Nizam 
Keywords  MD Simulation, Neural Network, CVD, Carbon Dimer 
Department  Mechanical Engineering 
Document Type  
Full Text Type  Open Access 
Abstract  Carbon dimers are found to be an important growth species in the growth of nanocrystalline diamond (NCD) through CVD process. Events, such as chemisorption, reflection, and desorption occur during the deposition of carbon dimers on to the substrate on which the diamond films are to be grown. The probabilities of each of these events have a significant effect on diamond growth. Molecular Dynamics (MD) simulations are widely used to predict the probabilities of such events. Though, MD simulations give agreeable results with experimental values, the calculation of the effect of different input parameters on various events involve time consuming numerical methods and hence the process is cumbersome. In this study, initially MD simulations of carbon dimer deposition on diamond (100) surface were performed using a many body empirical potential and the probabilities of the aforesaid events were calculated by varying the input conditions. This information was used to implement Neural Networks (NN) to predict the probabilities of the events. The neural network was also used to predict the underlying relationship between various input parameters and event probabilities. Neural Network could be effectively used to predict the relationship between the input parameters and event probabilities in minutes as compared to days taken by MD simulations. The chemisorption probability is found to decrease with increase in the incidence angle (θ), impact parameter (b), and rotational energy (ERot) of the carbon dimer, but the scattering and desorption probabilities are found to increase these three parameters. The chemisorption probability is found to increase with an increase in the translational energy (ETrans) of the dimer, but the scattering and desorption probabilities are found to decrease with the translation energy (ETrans). The three event probabilities are found to be independent of the rotation angle (Φ) of the carbon dimer. 
Note  Thesis 
Rights  © Oklahoma Agricultural and Mechanical Board of Regents 
Transcript  Molecular dynamics (MD) Simulations of Chemical Vapor Deposition (CVD) of Carbon Dimer on a Diamond (100) Surface and Application of Neural Networks (NN) for Event Probability Predictions By ABDUL NIZAM ABDUL SAMADH Bachelor of Science University of Madras Chennai  600005 Tamilnadu, India 2001 Submitted to the Faculty of the Graduate College of the Oklahoma State University in partial fulfillment of the requirements for the Degree of MASTER OF SCIENCE July, 2005 ii Molecular Dynamics (MD) Simulations of Chemical Vapor Deposition (CVD) of Carbon Dimer on a Diamond (100) Surface and Application of Neural Networks (NN) for Event Probability Predictions Thesis Approved: Dr. R. Komanduri Thesis Advisor Dr. L. M. Raff Dr. M.T. Hagan Dr. H. B. Lu Dr. A. Gordon Emslie Dean of the Graduate College iii SUMMARY Carbon dimers are found to be an important growth species in the growth of nanocrystalline diamond (NCD) through CVD process. Events, such as chemisorption, reflection, and desorption occur during the deposition of carbon dimers on to the substrate on which the diamond films are to be grown. The probabilities of each of these events have a significant effect on diamond growth. Molecular Dynamics (MD) simulations are widely used to predict the probabilities of such events. Though, MD simulations give agreeable results with experimental values, the calculation of the effect of different input parameters on various events involve time consuming numerical methods and hence the process is cumbersome. In this study, initially MD simulations of carbon dimer deposition on diamond (100) surface were performed using a many body empirical potential and the probabilities of the aforesaid events were calculated by varying the input conditions. This information was used to implement Neural Networks (NN) to predict the probabilities of the events. The neural network was also used to predict the underlying relationship between various input parameters and event probabilities. The computational time for the prediction of the events using molecular dynamics is generally several days while implementation of neural networks reduces it to mere minutes. The functional relationship between various input parameters and event probabilities predicted by NN is found to agree well with the MD simulation results. iv ACKNOWLEDGMENTS Though myriad events happen in one’s life span, only a few remain precious and evergreen. Writing a thesis is one such event. I take immense pleasure in expressing my sincere thanks to all those who made this work a successful one. First, I would like to convey my hearty thanks to my parents and sisters for all their love, support, encouragement and prayers that made me to reach new pinnacles in every walk of my life. I would like to express my sincere thanks to the National Science Foundation (NSF) for lending their support to this project. I wish to express my deepest gratitude to my advisor, Dr. Ranga Komanduri, for believing in my abilities and offering me a chance to work on this project. His passion for research and exploring new areas will forever be remembered. Unfortunately, I have no amount of words to convey my appreciation to him for his guidance, patience and support throughout my graduate career. I am beholden to Dr. Lionel M. Raff, Dr. Paras M. Agrawal and Dr. Martin T. Hagan, for all their efforts and precious suggestions that made me to overcome many barriers during this project. Their ingenuity and ability to explain even the most intricate things in a simple manner have always fascinated me. v I am also grateful to Dr. Hong Bing Lu for being kind enough to serve on my thesis committee. I would like to thank Dr.Satish Bukkapatnam for his suggestions and ideas during the MD group meetings. To my friend, Bala subramanian, thank you for all the pains and efforts you have taken in making me achieve my goal. Your ability in problem solving and passion for learning new things have always mesmerized me. I would also like to thank my friends Ramya, Siva and Freeman for their encouragement, time and help. Thanks to Rutu, Milind, Krishnaveni and other members of our MD group, for their useful suggestions and discussions. Finally I would like to thank all my colleagues and friends who made my graduate career program a pleasant and memorable one. vi TABLE OF CONTENTS Chapter Page 1. INTRODUCTION 1 1.1. Chemical Vapor Deposition 1 1.2. Molecular Dynamics Simulations 3 1.3. Neural Networks 4 2. ESSENTIALS OF MOLECULAR DYNAMICS SIMULATIONS 8 2.1. Introduction 8 2.2. Procedure for conducting MD Simulation 8 2.3. Interatomic Potentials 9 2.4. Time Integration Algorithm 16 2.5. Advantages and Limitations of MD Simulations 18 3. ESSENTIAL OF NEURAL NETWORKS 21 3.1. Introduction 21 3.2. Evolution of neural networks: Milestones and Developments 22 3.3. Neural network components and parameters 23 3.3.1. Neuron 23 3.3.2. Weights and Biases 24 3.3.3. Transfer functions 25 3.4. Neural Network Classification 27 vii 3.4.1. Single layer neural network 27 3.4.2. Multilayer network 27 3.5. Neural network learning and testing 28 3.6. Feed forward neural networks 29 3.6.1. Architecture of the feed forward neural network 29 3.6.2. Working of the feed forward network 30 3.7. Learning Algorithms 31 3.7.1. LeastMeanSquared (LMS) rule 31 3.7.2. Back propagation algorithm 33 3.7.3. LevenbergMarquardt algorithm 37 4. LITERATURE REVIEW 39 5. PROBLEM STATEMENT 54 6. MOLECULAR DYNAMICS SIMULATION OF CARBON DIMER (C2) DEPOSITION ON DIAMOND (100) SURFACE 56 6.1. Computational Model 56 6.2. Parameters of Interest 59 6.2.1. Incident polar angle 59 6.2.2. Rotation angle 61 6.2.3. Impact parameter 61 6.2.4. Translational energy of the carbon dimer 62 6.2.5. Rotational energy of the carbon dimer 63 6.3. Predominant events in CVD dimer deposition 64 6.3.1. Chemisorption 65 6.3.2. Scattering 67 viii 6.3.3. Desorption 68 7. NEURAL NETWORKS (NN) FOR EVENT PROBABILITY PREDICTION 70 7.1. Architecture and working of the neural network 70 7.2. Implementation of the neural network 71 8. PREDICTION OF EVENT PROBABILITIES NEURAL NETWORK Vs MD SIMULATIONS 73 8.1. Data points generation for neural networks 73 8.2. Training and testing of the event probability Neural network 74 8.3. Effect of input parameters on event probabilities: Neural Network Vs MD predictions 78 8.3.1. Effect of Incidence angle 78 8.3.2. Effect of Rotation angle 80 8.3.3. Effect of Impact Parameter 82 8.3.4. Effect of Translational energy of the dimer 83 8.3.5. Effect of Rotational energy of the dimer 85 8.4. Statistical uncertainty : Neural network Vs MD 86 8.5. More results from neural networks 88 9. CONCLUSIONS AND FUTURE INVESTIGATIONS 94 9.1. Conclusions 94 9.2. Future Work 95 REFERENCES ix LIST OF TABLES Table Page 2.1. Parameters for carboncarbon pair terms 12 2.2. Parameters for the angular contribution to the carbon bond order 13 2.3. Parameters needed for carboncarbon cubic spline 15 2.4. LJ potential parameter values 16 x LIST OF FIGURES Figure Page 3.1. Single input neuron 24 3.2. Tan sigmoid transfer function 26 3.3. Purelin transfer function 26 3.4. Single layer neural network 27 3.5. Multilayer neural network 28 3.6 . Feed forward neural network 29 4.1. Six types of chemisorption configurations on diamond (001)  (2 x 1) surface. 45 6.1. Simulation model and carbon dimer (C2) 58 6.2. Top three layers of atoms of diamond (100) substrate and radical site 58 6.3. Sketch showing input variables for the trajectory calculations 60 6.4. Distribution of incidence angle 60 6.5. Distribution of rotation angle 61 6.6. Distribution of impact parameter 62 6.7. Distribution of translational energy of the dimer 63 6.8. Distribution of rotational energy of the dimer 64 6.9. Variation of systems potential energy, Z coordinate of COM of the xi dimer, distance between carbon atoms of the dimer (chemisorption event) 66 6.10. Variation of systems potential energy and Z coordinate of COM of the dimer (scattering event) 67 6.11. Variation of systems potential energy and Z coordinate of COM of the dimer (desorption event) 69 8.1. Neural network training and testing plots for the probability of chemisorption for one neural network 75 8.2. Neural network training and testing plots for the probability of scattering for one neural network 76 8.3. Neural network training and testing plots for the probability of desorption for one neural network 77 8.4. Effect of incidence angle on chemisorption, scattering and desorption probabilities – MD and Neural network predictions. The error bars represent one sigma limit of statistical uncertainty in the MD results. 79 8.5. Effect of rotation angle on chemisorption, scattering and desorption probabilities – MD and Neural network predictions. The error bars represent one sigma limit of statistical uncertainty in the MD results. 81 8.6. Effect of impact parameter on chemisorption, scattering, and desorption probabilities – MD and Neural network predictions. The error bars represent one sigma limit of statistical uncertainty xii in the MD results. 82 8.7. Effect of translational energy on chemisorption, scattering and desorption probabilities – MD and Neural network predictions. The error bars represent one sigma limit of statistical uncertainty in the MD results. 84 8.8. Effect of rotational energy on chemisorption, scattering and desorption probabilities – MD and Neural network predictions. The error bars represent one sigma limit of statistical uncertainty in the MD results. 85 8.9. Comparison of statistical uncertainty in neural network and MD 87 8.10. Effect of impact parameter on event probabilities for various translational energies of the dimer Neural network predictions 89 8.11. Effect of translational energy on event probabilities for various Incidence angle of the dimer Neural network predictions 90 8.12. Effect of rotation angle on event probabilities for various translational energies of the dimer Neural network predictions 91 8.13. Effect of incidence angle on event probabilities for various Impact parameters  Neural network predictions 92 8.14. Event of rotational energy on event probabilities for various Impact parameters  Neural network predictions 93 1 CHAPTER 1 INTRODUCTION 1.1. Chemical Vapor Deposition Diamond is the hardest material known. Its unique mechanical, chemical, and electrical properties make it not only one of the most scientifically and technologically valuable material but also one of the most fascinating material known to researchers. It was in the mid1950s that diamond was first successfully synthesized on a commercial scale using high pressure, high temperature (HPHT) techniques [1]. Efforts were made about the same time to grow diamond directly from gases, but since the growth rates were extremely low, the vaporphase deposition of diamond was not assigned much importance. In the early 1980’s, when it was shown that growth rates in the range of a few μm/hour can be obtained using vapor phase deposition [25], this technique became an area of general attraction and exploration among researchers. Chemical Vapor Deposition (CVD) is a vapor phase deposition technique in which the gaseous reactants undergo chemical reaction in an activated environment, such as plasma, leading to the formation of stable solid product, such as diamond powders or thin diamond films on the surface of the heated 2 substrate. In a CVD diamond film growth, some of the widely used gaseous species include CH3, C2H2, C2H4, and C2. The CVD process has a wide range of advantages over other thinfilm deposition processes. A few such advantages are the ability to produce highly dense and uniform films with good reproducibility and adhesion, reasonable deposition rates and processing cost, ability to control surface morphology and orientation of the films obtained by controlling the CVD parameters, ability to adjust the deposition rates easily and flexibility of using a wide range of chemical precursors, such as halides, hydrides which enable the deposition of wide variety of films apart from the diamond films [6]. The diamond produced by the CVD process is comparable in purity and properties to HPHT or natural diamond that makes it a potential candidate for numerous applications. The CVD produced diamond films are used as coatings on cutting tool inserts to enhance the tool life, protective windows or optical coatings with high transmittance in the visible and infrared region, as shadow mask supports in xray lithography of electronic components [7], etc. The process of diamond film growth by CVD process involves a number of complex reaction mechanisms taking place between the surface atoms of the substrate and the gaseous species. There are a few elementary reactions, such as chemisorption, insertion, scattering, and desorption that serve as the building blocks for the complex reactions leading to thindiamond film growth. Hence, the investigations of these preliminary reaction events, their probabilities, various parameters 3 affecting the probabilities of their occurrence have gained significant importance [811]. Molecular dynamics simulations is one of the most powerful and widely used tools for investigating such reaction mechanisms and reaction events occurring during diamond film growth in a CVD process. 1.2. Molecular Dynamics Simulations Molecular dynamics (MD) simulation, as the term indicates, deals with simulating the behavior of a system at the atomistic level under given processing conditions. In MD simulations, the system is represented as an ensemble of atoms. In such simulations, we make use of a potential energy function which gives the potential energy experienced by each atom due to its position relative to that of its neighbors. From this potential energy, we can determine the force experienced by each atom as time progresses. Molecular dynamics simulation is a deterministic approach, where, once the current positions of the atoms as well as the forces acting on the atoms due to their neighbors are known then the positions of the atoms after a very small time increment, (usually in the orders of femtosecond) can be easily evolved by integrating the Newtonian equations of motion using suitable time integration algorithms. In spite of its high computational cost, today MD simulations serve as a powerful tool in the study of nanometric cutting [7071], different types of fracture mechanism [72], filmgrowth mechanisms [7374], friction and surface property studies [75], and biomaterial engineering [76]. Carbon dimers are found to be an important reaction species in the growth of nanocrystalline diamond films using CVD process [2730]. Therefore the 4 investigation of various reaction channels and mechanism that occur when carbon dimer is used as growth species is always of immense interest among researchers [8]. In this study, MD simulations have been used to study the probabilities of various elementary gas phase reactions, such as chemisorption, desorption, insertion and scattering occurring during diamondfilm growth in a CVD process when carbon dimmers (C2) are used as the growth species. The results from the MD simulations are used to train a neural network (NN) and then that neural network is used to predict the probabilities of various events. 1.3. Neural Networks A neural network (NN) is an artificial network of neurons that mimics or emulates the real network of neurons present in the human brain. Though the neural networks are not as sophisticated as the networks present in the human brain, they have the capability to predict many complex underlying functions between variables of any particular process or event. A neural network basically consists of a number of artificial neurons or nodes, typically arranged in layers, interconnected through a set of links. Each link multiplies its input by a suitable parameter called the weight before supplying it to the next neuron. Each neuron sums over its input and passes the output, which is a weighted sum of the input and the bias, to a suitable transfer function. The output from this transfer function is the final output. This output can be made the inputs to the next layer of neurons. The network used in this study is a multilayered feedforward network, which has two layers, of which the first layer 5 is called a hidden layer because its input and output are not available to the outside world. The hidden layer is associated with a tan sigmoid transfer function that gives the network the ability to learn the linear and nonlinear relations that exist between the input and outputs. It also makes them an ideal choice for generalization and event probability predictions. We have two stages in the implementation of a neural network. The first stage is the learning stage and the second stage is the testing or the production stage. During the learning stage, the neural network is provided with a set of input data as well as the corresponding output data and the network is made to learn by examples. The difference between the neural network output and the actual target output is used to determine the error which is used in strengthening the network so that its subsequent predictions are better. Once the error from neural network prediction has been sufficiently reduced, the network is assumed to have completed its learning phase. The network is then subjected to the next stage, namely, the testing stage. In this, the neural network is provided with a set of input data. Corresponding output data are not given to the network. The network is allowed to make its choice or prediction based on its previous experience in encountering such data during the training. If the network is able to predict the corresponding output data correctly, then it indicates that the network has been properly trained and has attained the ability. In this study we have used neural networks to predict the probabilities of various events occurring in a CVD process when a carbon dimer (C2) is used as the species for diamond film growth. The input parameters, namely, incidence 6 angle ( ), rotation angle ( ), impact parameter (b), translational energy (ETrans), and rotational energy (ERot) form the input to the neural network. The probabilities of events, such as chemisorption, scattering, and desorption form the output of the neural network. The network is first trained by supplying the input parameters and the corresponding event probabilities. After the neural network has been trained well, the network is made to predict the event probabilities for a given set of input parameters. The conventional approach adopted so far by chemists is to use MD simulations for predicting the event probabilitities for a given set of input conditions [3233]. But, the drawback of MD simulations is that it involves computationally intensive and time consuming numerical integration algorithms. So, the exploration of the entire range of input parameters becomes a very difficult task, and also the time taken increases with the number of atoms comprising the system. It will be shown in this investigation that using neural networks the computational time for determining the effect of various input parameters on event probabilities can been reduced from hours to mere minutes. Also, this procedure is independent of the number of atoms comprising the system, thereby giving us an opportunity to explore a wide range of input data values. In this investigation, we initially ran MD trajectories for different sets of input conditions and determined the event probabilities which were then used for training the neural network. Chapter 2 will cover the empirical potentials used, the time integration algorithms, and the advantages and limitations of MD 7 simulations. Chapter 3 deals with the basic components of neural networks, their classifications, and stages in the implementation of neural network, different training algorithms used. Chapter 4 reviews the literature on various kinds of species employed in a CVD diamond film growth, MD simulations performed on various surfaces, such as diamond and silicon surfaces, time involved in such studies, and how neural networks have been effectively used in a CVD thin film growth processes. Chapter 5 discusses the drawbacks of using MD simulation for event probability predictions and the solution to overcome the present situation. Chapter 6 deals with the distributions for the five input parameters, and the event probabilities considered in this investigation. Chapter 7 presents the architecture of the network employed in this investigation and the implementation of the neural network. Chapter 8 deals with the neural network predictions of various event probabilities for different sets of input conditions and the comparison of neural network prediction with MD simulation results. Also this chapter discusses the statistical error involved in MD simulations as against the error in neural network predictions. Chapter 9 presents the conclusions based on the results of this investigation and proposes future investigations that can be carried over using the current neural network technique that has been implemented in this investigation. 8 CHAPTER 2 ESSENTIALS OF MOLECULAR DYNAMICS (MD) SIMULATIONS 2.1. Introduction In molecular dynamics (MD) simulations, the entire workpiece is represented as an ensemble of atoms. Given the initial positions and forces acting on the atoms, the subsequent positions and forces on the atoms can be evolved over time by integrating Newton’s equations of motion using a suitable time integration algorithm [12]. This chapter mainly focuses on the basic procedures for carrying out the molecular dynamics simulation, the advantages and disadvantages of MD, the interaction potential, and the time integration algorithm used in this study. 2.2. General Procedure for conducting MD simulations In molecular dynamics simulations, all the atoms in the system are considered as point masses. The initial position of the atoms are selected based on the structure of the system under consideration. The detailed procedure for carrying out MD simulation is as follows: 1. Based on the initial position of each atom with respect to its neighboring atoms, the potential energy (V ) experienced by the system at an initial time, say t0, is determined using an empirical potential. 9 2. The forces ( F ) acting on an atom are then determined by taking the derivative of the potential (V ) with respect to the position ( r ) of the atom, dr dV F = −ÑV = − . (2.1) 3. Newton’s second law can be mathematically expressed as follows F = ma , (2.2) where ‘m ’ is the mass of the atom and ‘ a ’ is its acceleration. The acceleration of the atom can be determined from Eqns. (2.1) and (2.2). 4. Once the acceleration of the atom at time ‘t0’ is known, the new velocity ( ) new v and the new position ( ) new r of the atom, after an infinitesimal time period of d t, can be calculated using the following equations new initial v = adt + v , (2.3) d new initial new r = r + v t , (2.4) where initial v represents the initial velocity and initial r the initial position of the atom. The infinitesimal time period used in MD simulations is usually on the order of a few femto seconds. 5. Finally, the atoms are displaced to their corresponding new positions calculated above and again the steps stating from 1 through 5 are repeated to monitor the evolution of the system with time under the given operating conditions. 2.3. Interatomic Potentials Interaction potentials form the main ingredient of MD simulations. A potential is a function of relative positions of atoms with respect to each other, 10 representing the potential energy of the system for a given configuration of the atoms comprising the system. The interatomic potential functions are both rotationally and translationally invariant. These functions are usually derived empirically and hence, are known as empirical potentials. A number of potentials exist today, some to mention are the Morse potential, StillingerWeber potential, Tersoff potential, Lennard Jones potential, and Brenner Potential [1314]. The potential used in this study for short range interaction is a manybody potential given by Brenner et al. [13]. The potential, V can be written as a sum over atomic sites i, = i i V E 2 1 , (2.5) where each contribution of i E is given by i E = ¹ − ( ) [ ( ) ( )] j i R ij ij A ij V r B V r . (2.6) In Eqn. (2.6), the summation is over the nearest neighbors j of atom i, excluding atom i, ij B is the many body coupling term between the bond from atom i to atom j and the local environment of atom i, V (r ) R and V (r ) A represents the pairadditive repulsive and attractive interactions. Eqn. (2.5) given by AbellTersoff [14] can realistically describe carboncarbon single, double, and triple bond lengths and energies in hydrocarbons and in solid graphite and diamond. However, the problem with this expression is that the assumption of nearneighbor interactions combined with the sum over atomic sites results in nonphysical behavior in the case of intermediate bonding situations. Nonphysical behavior arises again when conjugated and nonconjugated bonds are examined. 11 The aforesaid problems have been overcome by rewriting Eqns. (2.5) and (2.6) as follows > = − ( ) [ ( ) ( )] j i ij A ij ij R i V V r b V r . (2.7) Eqn. (2.7) represents the potential given by Brenner et al. [13] and is a modified form of AbellTersoff [14] potential function. Here, V represents the interaction potential, ( ) ij R V r and ( ) ij A V r are pairadditive interactions that represent all interatomic repulsions (corecore) and attraction, ij r is the distance between pairs of nearestneighbor atoms i and j, and ij b is the bond order between atoms i and j and is conveniently represented as follows s p s p p ij ij ji ij b = b + b + b [ − − ] 2 1 . (2.8) Values of s −p ij b and s −p ji b depend on the local coordination and bond angles for atom i and j. The term p ij b can be expressed as DH ij RC ij ij b = p + b p . (2.9) The terms indicating the interatomic repulsions and attractions are given by R c r V r f r Q r Ae −a ( ) = ( )(1 + / ) , (2.10) r n n V A r f c r B e n − b = = 1 , 3 ( ) ( ) . (2.11) 12 Table 2.1. Parameters for carboncarbon pair terms used in Eqns. (2.10) and (2.22). B1 = 12388.79197798 eV 1 = 4.7204523127 Å−1 Q = 0.3134602960833 Å B2 = 17.56740646509 eV 2 = 1.4332132499 Å−1 A = 10953.54416217 eV B3 = 30.71493208065 eV 3 = 1.3826912506 Å−1 A = 10953.544162170 eV Dmin = 1.7 Dmax = 2.0 The parameters used for the carboncarbon pair terms in Eqns. (2.10), (2.11), and (2.22) are given in Table 2.1. The first term in Eqn. (2.8) is given by ¹ − = + + − ( , ) [1 ( ) (cos( )) ( , )] 1 / 2 k i j H i C ik ijk ij i c ij ik b f r G e P N N ijk s p l q . (2.12) The subscripts i and j refer to the atom identity, the function P represents a bicubic spline. The function f (r) C ensures that the only nearest neighbors are included in the interactions. It limits the range of covalent interactions. C i N and H i N represent the number of carbon and hydrogen atom neighbors of atom i and are represented as: ¹ = carbon atoms ( , ) ( ) k i j ik c ik C i N f r , (2.13) ¹ = Hydrogen atoms ( , ) ( ) l i j il c il H i N f r . (2.14) The values of l and function P are taken to be zero for solid state carbon. An expression for s −p ji b can be obtained by interchanging the subscripts in Eqn. (2.12). The function (cos( )) ijk G q in Eqn. (2.12) controls the contribution each 13 nearest neighbor makes to the empirical bond order according to the cosine of the angle of the bonds between atoms i and k and atoms i and j. The parameters for the angular contribution to the carbon bond order are given in Table 2.2. Table 2.2. Parameters for the angular contribution to the carbon bond order q (rad) G(cos(q)) dG/d(cos(q)) d2G/d(cos(q))2 g(q) 0 8   1 p/3 2.0014   0.416335 p/2 0.37545   0.271856 0.6082p 0.09733 0.4 1.98  2p/3 0.05280 0.17 0.37  p 0.001 0.104 0.00  The term RC ij p in Eqn. (2.9) represents the influence of radical energetics and p bond conjugation on the bond energies. This term takes care of correctly describing the radical structures in diamond and accounts for nonlocal conjugation effects in graphite and benzene. This term was absent in the first generation form of the Brenner Potential [15]. The term RC ij p is taken as a tricubic spline F ( , , conj ) ij t j t ij i RC ij p = F N N N , (2.15) that depends on the total number of neighbors of bonded atoms i and j, as well as a function conj ij N that depends on local conjugation. The term t i N represents the coordination of atom i given by 14 H i C i t Ni = N + N . (2.16) The function conj ij N is represented as ¹ ¹ = + + carbon carbon ( , ) 2 ( , ) 1 [ ( ) ( )] 2 [ ( ) ( )] l i j jl jl c jl k i j ik ik c ik conj ij N f r F X f r F X , (2.17) where ( ) = 1, < 2 ik ik F x x ( ) = [1+ cos(2 ( − 2))] / 2, 2 < < 3 ik ik ik F x p x x (2.18) ik ik F ( x ) = 0, 3 < x and ( ) ik c ik t ik k x = N − f r . (2.19) The value of conj ij N becomes 1 if all the neighbors bonded to a pair of carbon atoms i and j have four or more neighbors and the bond between these atoms is considered to be part of a conjugated system. conj ij N becomes greater than 1, if the coordination number of the neighboring atoms decrease, indicating a conjugated bonding configuration. The term DH ij b in Eqn. 2.9 is given by ¹ ¹ = − Q ( , ) ( , ) ( , , )[ (1 cos 2 ( )) ( ) ( )] k i j l i j jl c ik jl c ijkl ik conj ij t j t ij i DH ij b T N N N f r f r , (2.20) where ijkl jik ijl Q = e e . (2.21) 15 The function ( , , conj ) ij t j t ij i T N N N is a tricubic spline, and the functions jik e and ijl e are unit vectors in the direction of the cross products ji R x ik R and ij R x jl R , respectively, where the R ’s are vectors connecting the subscripted atoms. Table 2.3. gives the values needed for the carboncarbon cubic spline T in Eqn. 2.20. All function values and derivatives not given in the table are equal to zero. Table 2.3. Parameters needed for carboncarbon cubic spline T in Eqn. (2.20). i j k T( i, j, k) Fitting data/Structure 2 2 1 0.070280085 Ethene 2 2 9 0.00809675 Solidstate structure The entire parameterfitting method discussed above was made much easier by assuming only nearestneighbor interactions. However, the best way to define this for a continuous function is problematic. The value of f (r) c ij is defined by a switching function of the form min ( ) 1 , r r D ij c f ij = < [ min max min ] min max 1 cos(( ) /( )) , 2 1 ( ) r r Dij Dij Dij Dij r Dij c fij = + − − < < max ( ) 0 , r r D ij c f ij = > (2.22) where max min ij ij D − D defines the distance over which the function goes from one to zero. 16 The Van Der Waal (VDW) interaction between the atoms comprising the system is taken care of by the LJ (126) potential. The LJ (126) potential has the following form − = Î 6 12 ( ) 4 r r V r s s . The well depth Î and the equilibrium separation r are the only adjustable parameters in the LJ (126) potential. In this investigation, the values of the well depth and equilibrium separation for the VDW interaction between two atoms are given in Table 2.4 Table 2.4. LJ potential parameter values Atoms Type Well depth (meV) Equilibrium separation (Å) Carbon  Carbon 4.412 2.28 Carbon  Hydrogen 1.806 2.54 Hydrogen  Hydrogen 0.740 2.81 2.4. Time Integration Algorithm In MD simulations we integrate the equations of motion over a given time period using numerical integration techniques. These time integration algorithms are based on finite difference methods. Here, the total time is discretized into a finite number of equal time intervals or time steps given byDt , which is 0.5 fs in the present investigation. If the positions and their time derivatives at time t are known, the integration algorithm gives the same quantities at a later time, t+Dt . The integration algorithms can show the evolution of the system with time by 17 repeating the procedure mentioned above. Currently, there exist a number of integration algorithms, such as the Verlet algorithm [16], Beeman algorithm [17]. The algorithm used in the present investigation for integrating the equations of motion is the “Gear PredictorCorrector” algorithm [18]. The algorithm consist of two parts, namely, the predictor part and the corrector part. Predictor Part: If we know the position r, velocity v, acceleration a, and some other time derivatives up to a certain degree q at a given time t, the Taylor expansion can be used to predict the values of these quantities at time t+Dt . The newly predicted values of the position, velocity, acceleration and the rate of change of acceleration after a time interval Dt is given by: rp(t+dt) = r(t) + dt v(t) + (1/2) (dt)2 a(t) + (1/6)(dt)3b(t)+… , (2.23) vp(t+dt) = v(t) + dt a(t) + (1/2) (dt)2 b(t) + … , (2.24) ap(t+dt) = a(t) + dt b(t) + … , (2.25) bp(t+dt) = b(t) + … , (2.26) where rp, vp, ap, and bp represents the position, velocity, acceleration, and rate of change of acceleration after a time interval of Dt from the initial time interval t. Force Calculation: The force on the atom is calculated by taking the derivative of the potential with respect to the position of the atom and is given by p p p i dr dV r F V ( ) = −Ñ = − . (2.27) 18 From the force, we can calculate the acceleration using Eqn. (2.2). The resulting value of acceleration will be different from that predicted by the above Taylor’s expression. The difference between these two values constitutes an “error signal” given by Da (t+dt)=ac(t+dt)  ap(t+dt) . (2.28) Corrector Part: The “error signal” along with certain other coefficients is used to correct the values of the position, velocity, and acceleration predicted by the predictor method. The corrected values of the position, velocity, acceleration, and the rate of change of acceleration at time t+Dt are given by rc(t+dt) = rp(t+dt) + c0 Da (t+dt) , (2.29) vc(t+dt) = vp(t+dt) + c1 Da (t+dt) , (2.30) ac(t+dt) = ap(t+dt) + c2 Da (t+dt) , (2.31) bc(t+dt) = bp(t+dt) + c3 Da (t+dt) , (2.32) where c0, c1, c2, and c3 are the coefficients of proportionality given by [16]: c0 = 1/6, c1 = 5/6, c2 = 1, and c3 = 1/3. 2.5. Advantages and Limitations of MD simulations Molecular dynamics simulation is one of the powerful tool widely used in the study of very complex reaction mechanisms. Some of the strengths of MD simulation that has made it to take a leadingedge method over other techniques are the following [12]: 1. Molecular dynamics simulations offer a great opportunity to explore the behavior of systems at atomistic and molecular levels. 19 2. MD simulations can be of immense use in simulating experiments that are very costly and difficult to carry in real world. For example, when MD is applied to experiments, such as nanometric machining, the effect of various parameters such as tool geometry and cutting speed can be easily studied at an insignificant fraction of the cost. 3. When simulations are running, the human involvement required is almost not there unlike the real world experiments where utmost care has to be taken when the experiments are in progress. 4. A major advantage of MD simulation is its repeatability. Any particular simulation can be exactly repeated any number of times with the same degree of accuracy. 5. MD simulation is a very deterministic technique, providing complete information, such as potential energy, velocity and force experienced by each and every atom comprising the system at any point of time which can be easily and accurately evolved. Though MD simulation has numerous advantages as stated above, it also has its own limitations, a few of these limitations are given in the following: 1. In MD simulations, results are purely dependent on the forces acting between atoms based on their positions. These forces are obtained by taking the derivative of the empirical potential function. Therefore, the extent to which molecular dynamics simulation can imitate real experiments depends on the ability of the potential function to reproduce the real behavior of the system. 20 2. MD simulations involve integration of Newton’s equations to obtain new velocities and positions of the atoms. The integration involves the use of numerical algorithms that require a very small integration timestep to give accurate results. Therefore, MD simulations are not preferred to simulate processes or reactions that take very long time periods. 3. The computational time and costs involved in MD simulations are significant because Newton’s equations of motion are to be integrated for every atom comprising the system and for each time step. The time and costs increase rapidly with increase in the size of the system considered. 4. When the temperature of the system considered is very low, quantum effects become significant. In such cases MD simulation results have to be interpreted with utmost caution due to the possibility of errors in the potential used. 21 CHAPTER 3 ESSENTIALS OF NEURAL NETWORKS (NN) 3.1. Introduction A neural network (NN) can be considered as a computational system made up of a number of simple and highly interconnected processing elements called nodes, which processes information by its active state of response to external inputs [69]. The structure and working of the human brain serves as the basic inspiration for the invention and development of neural networks[69]. Neural network attempts to emulate the adaptability, intelligent decision making and information processing ability of the brain. The greatest strength of neural networks is adaptive learning. It has the capability to learn, generalize, and reproduce from experience and examples. First, the neural network is trained using a number of examples and then the network is tested to see whether it can interpret new data based on previous experience. Neural networks offer a wide range of advantages, such as adaptive learning, selforganization, fault tolerance, and easy implementation that allow them to take a lead over other approaches. The neural network architecture, terminologies, training algorithms, and methodologies followed in this investigation were adopted from the book Neural Network Design Hagan et al. [69]. The book presents the most useful and 22 practical NN architectures, learning rules and training algorithms in a clear and consistent manner. Various topics of practical importance in the application of neural networks and neural network operations have been well explained in this book [69]. 3.2. Evolution of neural networks: milestones and development Neural networks is a field of recent origin. However, this field has a long history tracing back to periods before the invention of computers and has survived at least one major setback and several decades of oblivion. McCulloch and Pitts [59] used formal logic to create neural network models with simple neurons that were considered as binary devices with fixed thresholds. Their network was used mainly for simple logic functions such as “OR” and “AND”. Farley and Clark [60] created the first computer simulations of neuronal models and used normalization procedures to ensure better operation of their simulation models. Rosenblatt [61], a psychologist, designed and developed a threelayered system known as Perceptron network that exhibited adaptive behavior. Though Rosenbaltt’s design was considered a milestone in the field of neural network, it had some limitations such as inefficiency in solving pattern recognition problems and inability to handle large inputs. Widrow and Hoff [62] developed the ADALINE (ADAptive LINear Element) and MADALINE (Many ADALINEs) networks that employed a learning procedure called LeastMeanSquared (LMS) learning rule. The network operates by attempting to minimize the difference between the observed and desired output. Amari [63] published a mathematical model that served as the basis for error 23 correction methods employed in adaptive pattern classifications. Webros [64] first proposed the backpropagation algorithm that gave rise to backpropagation networks which are basically perceptrons with multiple layers with enhanced robustness and learning rules. Fukushima [65] developed competitive networks called Cognitron and Neocognitron for interpreting the handwritten characters. Klopf [66] developed the “drivereinforcement learning” for artificial neurons. This is similar to neuronal learning called “heterostasis” that occurs in biological neurons. Rumelhart and McClelland popularized the backpropagation algorithm in their book Parallel Distributed Processing [67]. Hopfield and Tank [68] developed the well known autoassociative network, the Hopfield Network, which attracted much attention due to its stability and ease of its fabrication using VSLI technology. Hagan et al. [69] introduced GaussianNewton approximation to Bayesian Regularization (GNBR) algorithm that reduced the cost of implementing the changes in the training algorithm and also produced optimal results with minimum computational time. Today, neural network concepts have been implemented on chips and are emerging as a prime solution to various complex problems representing the dominance of neural network in today’s scientific world. 3.3. Neural network components and parameters 3.3.1. Neuron Biological neurons are the building blocks in the human brain. Likewise the artificial neuron forms the fundamental data processing unit of the neural network. Figure 3.1 shows a simple neuron model. 24 Fig. 3.1: Single input neuron [69] Every neuron is associated with particular inputs, weights, biases, transfer functions, and outputs. The input ‘p’ is first multiplied (weighted) by a suitable weight ‘w ‘, and is passed on to the summer, the summer adds the weighted input ‘wp’ with suitable bias ‘b’ and passes the net input ‘n’ to the transfer function ‘ f ‘, which operates on the net input ‘n’ and produces an output ‘a’. This output can be made to become the input to the next layer of neurons. 3.3.2. Weights and Biases Every input supplied to a neuron is weighted before it is passed on to the summer in every neuron. Weights are a set of numbers associated with each interconnection between neurons in different layers. The weights indicate the strength of the interconnection between a neuron in one layer and another neuron in the next layer of the network. The initial values of the weights are set to zeroes or any small random number and are modified suitably during the network training to get the desired output from the network. Once the network has been completely trained, the final values of the weight matrix are stored and recurrently called during the testing session. 25 The bias can be considered as a threshold value added to the weighted inputs before they are passed on to the transfer function of the neuron. As can be seen from Figure 3.1, the bias has the effect of shifting the center of the transfer function f, while the weight changes the slope. 3.3.3. Transfer functions Every artificial neuron in a neural network is characterized by its transfer function. Two neurons which are fed with the same inputs can produce different outputs depending on the transfer functions to which they are associated. A neuron can take many input signals, multiply by the weights, add the bias, and pass the resulting scalar on to the associated transfer function. The transfer function decides how the neuron will scale its response to the input data, and generates the neuron’s activation. Some of the transfer functions that are of frequent use in the neural network are the following: • Hardlimit transfer function • Linear transfer function • Sigmoid transfer function The neural network used in this study uses a tangent sigmoid (tansig) transfer function in the hidden layer, and a purelinear transfer function in the output layer. The use of these transfer functions allow the network to understand the linear and non linear relationships that exist between it’s input vectors and output vectors. The two transfer functions employed in this investigation are described in the following. 26 a. Hyperbolic tangent sigmoid transfer function (tansig) The hyperbolic tangent sigmoid function, also known as the tan sigmoid transfer function takes input values between − ¥ and + ¥ , produces an output signal between –1 and +1. Fig 3.2: Tan sigmoid transfer function [69] The output is calculated using the equation 1 (1 exp( 2* )) 2 − + − = n a . (3.1) b. Purelinear transfer function (purelin) The purelinear transfer function produces an output, linearly increasing with the input supplied to it. The purelinear transfer function takes the following form a = n . (3.2) Fig 3.3: Purelin transfer function [69] 27 3.4. Neural network classification Neural networks are classified into two major categories, namely, single– layered and multilayered networks based on the number of layers in the network. Multilayer neural networks have sub classifications, such as multilayer feedforward networks and multilayer cooperative networks. The singlelayer networks can also be subdivided into singlelayer laterallyconnected networks and singlelayer topologically ordered networks. 3.4.1. Singlelayer network The single layer neural networks have only one layer of neurons. A singlelayer neural network can have one or more neurons in their single layer and can produce one or more outputs. Figure 3.4 shows a singlelayer neural network, having R inputs and S number of neurons Fig 3.4: Singlelayer neural network [69] 3.4.2. Multilayer network Multilayer neural networks have more than one layer of neurons. In the multilayered feedforward network, which has been used in this study, all neural responses flow in a forward direction through different layers of the network. 28 Figure 3.5 below shows a multilayered feed forward network having a single Fig 3.5: Multilayer neural network [69] hidden layer with a sigmoid transfer function that gives the network the ability to learn linear and nonlinear relations that exist between inputs and outputs and an output layer with purelinear transfer function allowing the neural network to produce values inside the range 1 to +1. The multilayered networks are an ideal choices for nonlinear regression and pattern recognition. 3.5. Neural network learning and testing Basically there are two important stages in the implementation of neural networks for any application, namely, learning and testing. The feed forward network used in this study employs a supervised learning procedure. In supervised learning, the network is provided with a set of input vectors, ‘A’ as well as with corresponding desired output vectors, ‘B’. During the learning process, the network compares its output vector, ‘C’, with the desired output vectors ‘B’ to produce the error percentage. The values of the weight matrix are adjusted so as to decrease the error. A network is said to be trained if its output responses are matching well with the desired outputs with minimum percent error. 29 After the completion of learning, the network is supplied with a testing data set which was never seen by the network during training and the output of the network is compared with the actual output to predict the network performance. 3.6. Feed forward neural networks In the feed forward networks the flow of data always occurs in the forward direction from the input to the neurons in the hidden layers and thereon to the output layers. No information is back propagated during the operation of the network. Generally, the multilayered feed forward networks are associated with one or more hidden layers having sigmoidal functions that allow the network to learn both nonlinear and linear relationships between input and output vectors. 3.6.1. Architecture of the feed forward neural network The multilayered feed forward network shown in Figure 3.6 has a total of three layers of neurons of which the first and second layers are known as the Fig 3.6: Feed forward neural network [69] hidden layers and the third layer which gives the final output of the entire neural network is called the output layer. We used the number of the layer as a superscript for the weights, neurons, biases, net inputs and outputs from each layer. As shown in Figure 3.6 there are s1 neurons in the first layer, s2 neurons in 30 the second layer, and s3 neurons in the third layer. There are R inputs to the network and the weight matrix for the first, second, and third layer are represented as w1, w2, and w3 . 3.6.2. Working of the feed forward neural network The ultimate task of the neural network is accomplished in two stages, namely, the training mode and production or testing mode. During the training mode, a set of examples known as the input vectors and their corresponding desired outputs are given to the neural network. The network is trained to learn the relationship that exists between the inputs and the outputs by using a learning algorithm known as the back propagation algorithm. During the training mode, the network, especially the hidden layer neurons, learn to respond to features and gradually the network develops the ability to generalize. After the network has been trained successfully, the next step is to test the neural network by giving it a set of input vectors that were not included in the sets used for training the network. If the network has been trained properly, it should be able to predict the outputs correctly for the input test data set that were never used during the training mode. In the training mode, the inputs are first passed to the neurons in the first layer. The final output from the neural network is compared with known desired output and error between the actual output of the network and the desired output is determined. The errors are used for adjusting the connection weights associated with the different layers of the network so that the error is progressively decreased in the subsequent prediction of the network. The 31 training process is repeated until the error has been reduced to a minimum value indicating that the network has learned to the maximum extent possible. After the training session, the network is put into the testing mode. During testing, the neural network is supplied with a set of test data which was not used during the training session, the neural network would not be given the corresponding outputs associated with the test data inputs. The neural networks response or the output is monitored and compared with the known desired outputs. If the output predicted by the neural network agrees closely with the known desired output with minimum error, it indicates that the network has been trained properly and has attained the power to generalize. During the testing process, the connection weights associated with the various layers of the neural network remain unmodified. 3.7. Learning Algorithms 3.7.1. LeastMeanSquared (LMS) rule The leastmeansquared (LMS) algorithm is a kind of supervised training algorithm where the neural network is provided with a set of inputs and their corresponding outputs during training. The algorithm works in such a manner as to reduce the mean square error between the actual network output and target output by adjusting the connection weights and biases of the neurons in different layers of the network. The least mean square error is calculated as follows F(x) = E[(t(k) − a(k))2 ], (3.3) Where the target output t(k) is the desired output, a(k) is the network output and E represents least mean squared error of the output that has to be reduced. The 32 least mean squared (LMS) algorithm is also known as WindrowHoff algorithm or the delta rule. The mean square error F(x) can be approximated by F(x) Ù = 2 ( ) e k = (t(k) − a(k))2 , (3.4) where k represents the iteration number. The estimate of the gradient is given by ( ) 2 ( ) Ñ F x @ Ñ e k . (3.5) The partial derivative of e(k) and 2 ( ) e k with respect to the weights for the th k iteration is given by ( ) ( ) ( ) ( ) 1 1, 1, 1, t k w p k b p k w w e k j R i i i j j − = − + ¶ ¶ = ¶ ¶ = , (3.6) j j w e k e k w e k 1, 1, 2 ( ) 2 ( ) ( ) ¶ ¶ = ¶ ¶ , (3.7) and the partial derivatives of 2 ( ) e k with respect to the biases at th k iteration is given by b e k e k b e k ¶ ¶ = ¶ ¶ ( ) 2 ( ) 2 ( ) , (3.8) Similarly, the derivative of e(k) with respect to the bias is given by 1 ( ) = − ¶ ¶ b e k . (3.9) Substituting Eqns. (3.6) and (3.9) in Eqns. (3.7) and (3.8), respectively yields the gradient of the squared error for the th k iteration: Ñ =Ñ = − 1 ( ) ( ) 2 ( ) 2 ( ) p k F x e k e k j . (3.10) 33 The steepest descent algorithm, with constant learning rate is given by k k x xk x x F x + = 1 = − Ñ  a ( ) . (3.11) Substitution of Eqn. (3.10) for Ñ F(x) in the Eqn. (3.11) yields 2 ( ) ( ) x 1 x e k z k k k = + a + , (3.12) or ( 1) ( ) 2 ( ) ( ) 1w k + =1w k + ae k p k (3.13) and b(k +1) = b(k) + 2ae(k) . (3.14) Eqns. (3.13) and (3.14) represent the least mean square algorithm for a single output network. For network with multiple neurons in the outer layer, the LMS algorithm can be written in matrix form as: W (k+1) = W (k) +2 e (k) pT (k) and (3.15) b (k+1) = b (k) +2 e (k) . (3.16) 3.7.2. Backpropagation algorithm The backpropagation algorithm is a generalization of the least mean square algorithm. It employs a generalized delta rule. Most of the multilayered feed forward networks, including the one used in this investigation employs backpropagation algorithm. At the end of forward propagation step, the error between the actual network output and targeted output is calculated and based on this error the weights associated with each neuron in the output layer is changed. In back propagation, the network weights are moved along the negative of the gradient of the performance function. The algorithm is known as back propagation algorithm because the change of weights starts from the output layer and then proceeds backwards until all weights associated with the first layer 34 has been changed in such a way that the error decreases in subsequent predictions of the network. The mean squared error (MSE) in a multilayered network is given by F(x) E[e e] E[(t a) (t a)] e (k)e(k) T T T = = − − @ . (3.17) The error in multilayered networks is an implicit function of connection weights of the hidden layers. Therefore, chain rule has to be used for calculating the gradient for the steepest descent algorithm. The approximate MSE is given by m i j m i j m i j w F w k w k , , , ( 1) ( ) ¶ ¶ + = −a , (3.18) m i m i m i b F b k b k ¶ ¶ ( +1) = ( ) −a . Using the chain rule, we get m i m i m i m i m i j m i m i m i j b n n F b F w n n F w F ¶ ¶ ¶ ¶ = ¶ ¶ ¶ ¶ ¶ ¶ = ¶ ¶ * * , , (3.19) The net input to any layer ‘m’ is a direct function of the weights and bias in that layer. So it is relatively easy to compute the second terms in the above equations. The net input m i n to the layer m is given by and 1 , 1 , 1 1 , 1 = ¶ ¶ = ¶ ¶ = + − = − − m i m m i m j i j m i S j m i m j m i j m i b n a w n n w a b m (3.20) 35 The change in the function F with respect to the change in the th i element of the net input at layer m is given by m i m i n F s ¶ ¶ = . (3.21) Using Eqn. (3.21) for sensitivity in Eqn. (3.19), yields m m i i m j m m i i j s b F s a w F = ¶ ¶ = ¶ ¶ −1 , (3.22) Now the steepest descent algorithm can be conveniently expressed in the following form the using matrix notation m m m m m m m T b k b k s W k W k s a a a + = − + = − − ( 1) ( ) ( 1) ( ) ( 1 ) , (3.23) ¶ ¶ ¶ ¶ ¶ ¶ = ¶ ¶ = m s m m m m m n F n F n F n F s . . . 2 1 , (3.24) where m W , m b and m s are the weight matrix, bias vector, and the sensitivity vector, respectively. As the term back propagation indicates, in this algorithm the sensitivity at layer m is computed using the sensitivity at its succeeding layer m+1. The Jacobian matrix is given by 36 m m n n ¶ ¶ +1 = ¶ ¶ ¶ ¶ ¶ ¶ ¶ ¶ ¶ ¶ ¶ ¶ ¶ ¶ ¶ ¶ ¶ ¶ + + + + + + + + + + + + m S m S m m S m m S m S m m m m m m S m m m m m m m m m m m n n n n n n n n n n n n n n n n n n 1 2 1 1 1 1 2 2 1 2 1 1 2 1 1 2 1 1 1 1 1 1 1 1 ... . . . . . . . . . ... ... . (3.25) Now, ( ) ( ) 1 , 1 , 1 , 1 1 1 1 , m j m m i j j m j m m m i j j m m j i j m j m i m l S l m i l m j m i w f n n f n w n a w n w a b n n m • + + + + = + + = ¶ ¶ = ¶ ¶ = ¶ ¶ + = ¶ ¶ , (3.26) using Eqn. (3.26), the Jacobian matrix can be rewritten as = = ¶ ¶ + + 0 0 ( ) . . . . . . . . . 0 ( ) ... 0 ( ) 0 ... 0 ( ) ( ) 2 1 1 1 m S m m m m m m m m m m f n f n f n F n W F n n n & & & & & (3.27) Now using the chain rule again, the recurrence relation can be expressed in matrix form as ( ) ( +1) +1 = m m m m T m s F n W s & , (3.28) 1 3 1 s s s s M M ® ® ® − . 37 From the above expression, we notice that the sensitivity is propagated backward from the last layer to the first layer in the network. The sensitivities at the output layer is expressed in matrix form as s 2F (n )(t a) M M M = − & − . (3.29) 3.7.3 LevenbergMarquardt algorithm The basic back propagation algorithm is often the simplest and slowest minimization method. The LevenbergMarquardt algorithm provides faster convergence and is successfully used to speed up the convergence of back propagation. It should be noted that LevenbergMarquardt algorithm uses the backpropagation procedure in which derivatives are processed from the last layer of the network to the first. Hence, the LevenbergMarquardt algorithm could be called a backpropagation algorithm. The secondorder Taylor series is represented as follows ... 2 1 F(w + Dw) = F(w) + g × Dw + Dw × H ×Dw + T T , where Dw is the adjustment to the weight, g is the gradient vector and H is the Hessian matrix. They are defined as: 2 2 ( ) ( ) w F w H w F w g ¶ ¶ = ¶ ¶ = , Dw = −H × g −1 In quasi Newton method, Hessian matrix is estimated by some positive definite matrix, which ensures the convergence. The Hessian matrix is approximated as: 38 H J J @ 2 T , where J is the Jacobian matrix that contains the first derivatives of the network errors with respect to the weights and biases. LevenbergMarquardt algorithm uses an approximation to the Hessian matrix and is given by [ T ] T w J J I J −1 D = + μ , where J is the Jacobian matrix and μ is a scalar. 39 CHAPTER 4 LITERATURE REVIEW Diamond has a unique combination of physical, chemical, electrical, and optical properties which make it a potential candidate for numerous industrial applications. For example, its very high hardness and wear resistance make it ideal for cutting tools and grinding wheels; its insulating properties, radiation hardness and high thermal conductivity make it an ideal member for applications in circuit packaging, high power, electrooptic, semiconductor devices and optical devices [20]. In the 80’s diamond films were grown at low pressures under metastable conditions using chemical vapor deposition (CVD) techniques, such as hot filament CVD, microwave plasma assisted CVD and DC arc plasma jet and flame plasma deposition [2123]. Due to the limited growth rate (<0.1 μm/hr), CVD process cannot effectively compete with the HPHT process on the basis of growth rate or overall cost. However, the low pressure diamond synthesis has led to a new era in diamond technology. (Extensive review presented by DeVries [24]). There are, however, some applications where LPCVD diamond synthesis is preferred over HPHT synthesis. For example, the lowpressure CVD process has a great potential for optical, infrared, and Xray applications as well as for a number of manufacturing and tribological applications. For example, in the manufacturing area, diamond coatings on cutting tools by the CVD technique can 40 be used to improve wear resistance, thereby improving the tool life. The diamond crystals obtained through the CVD technique find applications as cutting tools in nanometric machining and grinding operations and as heat sinks in electronic applications. Recently, nanocrystalline CVD diamond films have been synthesized with superior properties, such as, higher toughness, lower light scattering and higher Young’s modulus [25]. Nanocrystalline diamond (NCD) films are composed of diamond grains of the order of 50 nm, and display under certain conditions smooth morphology. They have been considered for applications in microelectro mechanical systems (MEMS) and its nano variant, namely, nanoelectromechanical systems (NEMS), where the mechanical, electrical, and corrosion properties of these fully dense films extend the range of applications of these novel devices [26]. Gruen et al. [2731] have conducted extensive research on nanocrystalline diamond film. One of the key factors in the CVD diamond film growth is the nature of the hydrocarbon species used. Gruen et al. [27] reported successful growth of diamond films using fullerene precursors in an argon microwave plasma without the addition of hydrogen or oxygen. The average grain size of the films obtained is reported to be 0.05 μm. They postulated that collisional fragmentation of C60 to give C2 could be responsible for the high growth rate of the veryfinegrained diamond films. Zhou et al. [28] investigated the transition from microcrystalline to nanocrystalline films grown from Ar/H2/CH4 microwave plasma; the transition becomes pronounced at an Ar/H2 volume ratio of 4, and the microcrystalline 41 diamond films are totally transformed to nanocrystalline at an Ar/H2 volume ratio of 9. They suggested that the transition in the microstructure to be due to change in the growth mechanism from CH3 in high hydrogen content to C2 as the growth species in low hydrogen content plasmas. Goyette et al. [29] experimentally determined the density of gas phase C2 in ArH2CH4 and ArH2C60 plasmas and reported C2 to be an important species in these growth environments. Gruen et al. [30] used optical spectroscopy to examine C60/Ar plasma and noticed the spectrum to be dominated by swan bands of C2.They proposed that collisionally induced dissociation of C60 in argon plasmas could be the mechanism for C2 production and that C2 is the principal growth species in their diamond film growth experiments. The above studies indicate that carbon dimer (C2) is an important growth species for nanocrystalline diamond growth and the elementary reactions of carbon dimer on a diamond substrate surface is of interest. A variety of hydrocarbon species are used in the microwave plasma CVD process for diamond growth. Some of the widely employed hydrocarbon species are C2H2, C2H, CH3, C2, C, and C2H4. The mechanism by which diamond film growth occurs differs from one hydrocarbon species to another and it also plays a vital role in the properties of the films obtained. Hence, the investigation of the mechanisms by which diamond film growth takes place and various reactions that occur between the gaseous hydrocarbon molecules and the substrate on which the films are grown is always of considerable interest. 42 Any complex reaction between the gaseous precursors and the atoms of the substrate involve basic elementary events, such as chemisorption, insertion, scattering, and desorption. The probability of occurrence of each of these events is greatly affected by factors, such as the incident translational energy, rotational energy of the molecules, the angle of incidence, rotation angle, temperature of the substrate, and impact parameter. Hence, the investigation of the effect of these parameters on various event probabilities in CVD and other film growth processes has attracted considerable attention among research community. Ulloa et al. [31] investigated the adsorption of hydrocarbons, such as CH3, CH2, and C2H4 on the flat terraces and near step edges of diamond (100) surfaces using MD simulations. They found that adsorption of CH3 on the (100) face and subsequent abstraction of one of its hydrogen will promote  scission essential for continued growth. Alfonso and Ulloa [32] studied methyl radical deposition on diamond (100) surfaces using MD simulations. The time step employed in their simulation was between 0.25 – 0.5 fs and most of their trajectories were monitored until elapse time of 2.5 ps. They reported that the adsorption probability of the CH3 radical on diamond substrate increases with the kinetic energy of the methyl radical and decreases with the incidence angle but the rise in adsorption probability becomes less pronounced as kinetic energy of the incident CH3 goes up. Hydrogen knock out events were reported to occur when the methyl radical is incident with a normal energy above 1 eV and is found to be more pronounced for a normal energy of 10 eV. 43 Hu and Sinnott [33] examined the deposition of an ethylene molecularcluster beam at various incident angles and incident energies on a diamond (111) surface that was terminated at the top and bottom with hydrogen atoms using a reactive potential coupled to LennardJones (LJ) potential. The substrate had 24 layers of carbon atoms and it contained 1370013900 atoms with an impact plane area of 69 x 40 Å2. All simulations were carried out for 3 ps and the timestep used was 0.2 fs. They predicted that with an increase in the incidence angle, the amount of adhesion of a thin film decreases. They also reported that crystallographic orientation and the incidence angle have less effect on the film structure and formation. Wang et al. [34] investigated the deposition of CH3 and CH2 radicals on diamond (001) surfaces at room temperature to determine the energy threshold (Eth) for chemisorption and reported that for CH3, the value of Eth on diamond (001)(2x1) H surface is higher than that on diamond (001)(2x1) surface and lower than that of C2H2 on the diamond (001)(2x1) surface. Perry and Raff [35, 36] computed rate coefficients, event probabilities, and desorption probabilities for many elementary chemisorption reactions on a diamond ledge and diamond terrace structures at 1250K. The diamond (111) terrace substrate they used had a total of 145 lattice atoms, a trajectory was carried out between 0.5  1.5 ps and each trajectory took an average of ~ 30 min of CPU time on a Digital (DEC ALPHA 3000/Model 400) workstation. The diamond ledge surface had a total of 147 atoms, the individual trajectory time varied over the range 0.10.87 ps, and the CPU time was the same as for the 44 terrace structure. They investigated molecules and radicals, such as C2H2, C2H, CH3, CH2, C, C2, C3H and found that chemisorption rate is lower for nonradical species, such as C2H2 and C2H4 than for radicals (1012  1013 cm3/mols). They reported that CH3 is the least reactive and atomic carbon has the largest chemisorption rate of all the species investigated. Izumi et al. [37] investigated the reaction probability of silane molecules on silicon (001) surface. They considered twenty different substrate conditions to overcome the scattering effect due to vibration of the substrate. They carried out a total of 20000 trials to obtain probability on the order of 103. They reported that the reaction probability depends significantly more on the internal energy of the silane than on the substrate temperature. The reaction probability increased linearly with the translational energy; quadratically with the vibrational energy of the silane molecule; and depends less on its rotational energy. Zhu et al. [38] studied the interaction between low energy C2H2 and diamond (001)2x1 surface, 200 trajectories were considered and each trajectory lasts for around 3 ps. Six types of chemisorption configurations (S1 through S6) were noticed as shown in Figure 4.1.The S1 structure is found to be the most stable because of its high binding energy and S6 structure to be the least stable. 45 Fig. 4.1: Six types of chemisorption configurations on diamond (001) – (2 x 1) surface [38]. The S2 and S3 configurations are found to be most frequently occurring configurations during diamond film growth and thereby playing an important role in diamond synthesis. Hansen and Hudson [39] studied interaction of oxygen molecules with clean and oxygen covered Ge (100) surfaces using molecular beam scattering techniques. They reported that sticking coefficient increased from 0.018 ± 0.002 to 0.079 as the incident beam energy was increased from 2.1 to 7.9 kcal/mol, but it decreased from 0.0176 at normal incidence to a minimum value when the incident angle of the beam is increased to = 700 . Belsky et al. [40] used MD simulations to study the sticking probability of Cu and Ta atoms on FCC Cu (111) and BCC Ta (110) crystal faces, respectively, for different values of incident energies ranging from 0 to 150 eV and angle of incidence from 00 to 900. The time step for the trajectories was on the order of 0.1 – 1 fs and the total duration for every trajectory was on the order of 100 – 1000 46 fs. The sticking probability of both Cu and Ta atoms was found to be inversely proportional to the incident beam energies. The sticking probability was found to decrease first with the incidence angle, but at larger angles, the probability began to increase. This behavior was ascribed to increased interaction time of the incoming atom with the substrate and a low normal velocity component. Vattuone et al. [41] investigated chemisorption of O2 on Ag (001) surfaces at 100 K by reflectivity method. The sticking probability was found to increase monotonously until the translational energy of the O2 molecule reached 0.7 eV and decreased thereafter because at high energies the collided molecules still retain sufficient energy to avoid being trapped on the surface and are able to escape into the gas phase by scattering inelastically off the repulsive part of the chemisorption potential. Huang et al. [42] studied the interaction between lowenergy CH3 and diamond (001)(2x1) at room temperature using MD simulations. An energy threshold (Eth) of 8 eV below which no chemisorption of CH3 would occur was noticed. Hydrogen dissociation from CH3 was observed for incident energies higher than 15 eV. They also found that below 10 eV incident energy, the chemisorption probability of C2H2 on a clean diamond (001)(2x1) surface was lower than that of CH3 on a hydrogen covered surface at the same impact energy. Neyts et al. [43] investigated the sticking efficiencies and hydrogen abstraction efficiencies for hydrocarbon species, such as C2, C2H, C3H2, and C3 on a diamond like carbon (DLC) layer at two different values of their initial kinetic 47 energies, namely, 0.1 eV and 1.0 eV using MD simulations. The DLC layer had 830 atoms, the time step used was 0.5 fs, and the trajectory duration was 1.25 ps for 1 eV kinetic energy impacts and 2.5 ps for 0.1eV kinetic energy impacts. All species had sticking efficiencies between 0.1 and 0.4. They found that species with no hydrogen atom had a smaller decrease in sticking efficiency with decreasing energy than species that contain hydrogen. They also noted that C3H2 had the highest hydrogen abstraction efficiency, C2 had the lowest abstraction efficiency, and C3 had zero hydrogen abstraction efficiency. Palithorpe [44] conducted MD simulation studies for the deposition of lowenergy carbon atoms onto a lowtemperature diamond (111) surface using StillingerWeber potential. The time step for the integration was 0.13 fs, the energy of the incident carbon atom was in the range of 1100 eV and the substrate temperature was maintained at 100 K. He reported that with intermediate energies (2060 eV), the incident atom penetrates beneath the exposed (111) surface and significantly increases the lateral compressive stress in the diamond film, thereby promoting amorphous diamond formation. From the above literature survey, we infer that MD simulation is a powerful tool for studying many complex reactions that takes place when different species are incident on the diamond substrate. But a main disadvantage of MD simulation is that it consumes huge amounts of computational time (and consequently higher cost), and this time increases rapidly as the number of atoms in the system considered increases. Even to study the effect of one parameter, say the effect of impact parameter on the chemisorption probability of 48 a carbon dimer on (100) diamond surface, keeping other inputs at a fixed value, consumes large amounts of computational time. Therefore, the exploration of the entire set of values become highly challenging and computationally costly when MD simulation has to be used. Any technique that permits the exploration of the wide range of input conditions will be of great benefit this area. Neural network is one such approach that offers such an advantage. This can be inferred from the following literature. Natale et al. [45] employed a modular neural network approach for enhancing the quality of films obtained during atmospheric chemical vapor deposition of doped silicon dioxide films. A neural network was used to establish a relationship between the equipment’s operating conditions and the characteristics of the resulting films. This in turn aids in finding the optimal set up conditions for obtaining high quality of films. Machine operating conditions are determined by factors, such as gas flow, chamber pressure, injector temperature, nitrogen flow. These factors are used as inputs to the network and film quality deciding factors, such as boron and phosphorus weight percent in the plasma and film thickness were used as outputs to the network for training and testing the neural network. The prediction by the network was good with an average error of about 1 % and a maximum error below 10 %. Erbil et al. [46] developed a semiempirical model using hybrid neural networks to determine the deposition rates of TiO2 films in a metalorganic CVD process. Temperature, total flow rate, reactor chamber pressure, source pressure, and precursor flow rate were used as inputs to the network and the TiO2 deposition rate was used as output during 49 the learning and testing stages of the network. The neural network used was able to identify three critical parameters, unknown in analytically derived deposition rate expression, leading to more general physical expression and methodology for predicting deposition rate over a wide range of operating conditions. Bhatikar and Mahajan [47] used a feedforward neural network to predict the performance of a CVD barrel reactor widely used in silicon epitaxy. Their approach involved spatial variation of the deposition rate of silicon on a facet of the reactor. They hypothesized that this spatial variation encodes a pattern that reflects the state of the reactor. A feedforward neural network with eight neurons in the hidden layers was used to predict and decode the pattern thereby predicting the state of the reactor so that it can be optimized to increase the production efficiency. Three different patterns or process faults were diagnosed and the network was able to predict and discriminate these process faults with 100% accuracy. Han and May [48] applied neural networks to predict the complex correlation between the deposition conditions and output parameters reflecting film quality in plasma enhanced CVD (PECVD) process. Deposition parameters, such as, substrate temperature, pressure, RF power, silane flow, and nitrous oxide flow were used as inputs and the corresponding deposition rate, permittivity, film stress, uniformity, silanol and water concentration in the films deposited were used as outputs for training the network. This trained network model was used “in reverse” to predict the necessary operating conditions to achieve the desired film quality. They were able to synthesize recipes to produce 50 novel film properties, such as uniformity, low permittivity, stresses, and impurity concentration using the optimized neural network models. Geisler et al. [49] modeled chemical vapor deposition of silicon nitride (Si3N4) films using a fivelayered feed forward network. The neural network was trained using both supervised and unsupervised learning techniques. The input vector had six variables, namely, substrate temperature, chamber pressure, RF power, NH3 flow, SiH4, and N2 flow and the output vector had three variables, namely, the film’s refractive index, the effective lifetime, and the positive charge density. Competitive learning algorithms were then used to determine the inputoutput relationship functions and these functions are then optimally adjusted to enhance the silicon nitride film properties. Lorenz et al. [50] used a multilayered feedforward network and developed an abinitio potential energy surface (PES). They showed the accuracy of the neural network developed PES using the hydrogen dissociation on the (2 x 2) potassium covered Pd (100) surface. The sticking probability of H2 on this potassium (2 x 2) covered Pd (100) surface is calculated using MD simulations on the neural network PES. The results were compared with the analytically developed potential energy surfaces and found to be in good agreement. Hobday et al. [51] showed that a feedforward network can be used to develop a potential energy surface to study the complex CH problem. The network used had an input vector set with five elements and six hidden nodes with a total of 43 weights and biases. The results were compared with the Brenner potential formulation for CH clusters which indicated a good agreement 51 with both structure and energetics. Numerical experiments showed that the PES developed using neural network though slower than Brenner potential by 6080 % it is still inexpensive compared to the ab initio calculations and can be efficiently used for still more complex systems like CN where bonds are more complex as compared to CH systems considered. Raff et al. [52] interpolated ab initio potentialenergy surfaces using a feed forward neural network and novelty sampling approach. They used various configurations of fiveatom silicon cluster and calculated the force and potential associated with each configuration at the MP4(SDQ) level of accuracy using 6 31G** basis set. They employed a novel sampling procedure and sampled the important regions of configuration space in iterative fashion using MD trajectories. A large number of new cluster configurations and corresponding potential and forces associated with those configurations were obtained using the novelty sampling technique. These cluster configurations, and the potential and forces associated with them were used to fit a neural network and obtain the potential energy surface (PES). The interpolated potential energy surface (PES) can be used efficiently for conducting MD and Monte Carlo studies of large systems involving complex reactions, nanometric cutting and nanotribology. The novelty sampling technique involves tight integration of MD calculations with NN and enables easy identification of new configurations in MD and also act as a good convergence test independent of MD computations. Early stopping and regularization techniques were used to give quick and precise results. The neural 52 network used was found to give good interpolation accuracy and easy usage of the obtained force fields directly for dynamic studies. Sumpter and Noid [53] employed a neural network with 426 input nodes, one hidden layer with 7 nodes, and an output layer of 18 nodes for obtaining a potential energy surface for macromolecules, such as a polyethylene molecule. An accurate anharmonic potential energy surface was formulated. The parameters in this PES were suitably changed and the corresponding vibrational spectra of the macromolecule is monitored. The neural network is then trained for 51 different vibrational spectra values of the macromolecules as inputs and the corresponding potential energy parameters outputs for 20000 cycles. Then the network was trained to determine the relation between the vibrational spectra and the corresponding parameters of the PES with a maximum error of less than 4%. This network was later used for obtaining parameters for a multidimensional PES. Noid et al. [54] used neural network to investigate the energy flow in molecular systems, such as H2O2. The neural network was made to learn the correlation between phasespace points along a classical trajectory and mode energies for stretch, bend, and torsion vibrations. The input vector to the network comprises of 12 cartesian atom positions (x, y, z), 12 cartesian momenta (px, py, pz), and four atomic masses. The output from the network comprised of six kinetic internal mode energies. The network employed had 28 input nodes, two hidden layers with 38 nodes in the first hidden layer and 12 nodes in the second hidden layer and an output layer of six nodes, giving a total of 84 nodes and 53 1648 connection values including bias values. The trained network was able to produce reasonably accurate results with an average error between 1% and 12%. Also the network has been employed for studying the energy flow in other tetratomic molecules, such as H2X2, X=C, and Se. From the above review of literature on neural networks, we find the application of neural networks to molecular dynamics can reduce the burden on MD simulations to a considerable extent. It can also provide an opportunity to explore the effect of different parameters on the probabilities of various events in a CVD process with less computational time and cost. 54 CHAPTER 5 PROBLEM STATEMENT From a review of the literature one can perceive that investigation of elementary reactions, such as chemisorption, scattering, insertion, and desorption that occur during thinfilm growth of microcrystalline diamond by chemical vapor deposition (CVD) process or any other growth process is of immense importance. Though MD simulations have been used successfully to study these reactions, a major limitation inherent with it is that it involves integration of numerous equations that consume an enormous amount of computational time and cost. For example, to study the influence of one of the input parameters, say the effect of impact parameter (b) on the chemisorption probability for a fixed set of other parameters, namely, translational energy (ETrans), rotational energy (ERot), incidence angle ( ), and azimuthal angle ( ) of the carbon dimer, we ran 50 trajectories for every value of impact parameter (b), ranging from b = 0 to b = 3.5 Å. By dividing this range into ten equal intervals, a total of 550 trajectories were run. We used a system of 324 atoms and time for running a single trajectory was ~1 minute. So, it took us a total of 550 minutes (~9.2 hours) to study the influence of a single parameter on one of the reaction probabilities for a single set of other input parameters. 55 Therefore, the investigation of the entire domain consisting of the dependence of various reaction probabilities on different variables is a very time consuming and challenging job. Any technique that cut shorts the enormous simulation time and offers an opportunity to explore the entire domain of input variables will be of immense interest to researchers and deserves to be explored. To achieve this objective, we intend to use neural networks to determine the probabilities of various reaction events, namely, chemisorption, scattering, and desorption that occur during the deposition of carbon dimer on a diamond (100) surface as a function of various input parameters. The implementation of the neural network for predicting the probabilities of various events occurring during carbon dimer (C2) deposition on a diamond (100) surface is achieved through following stages: • We first employed MD simulations to compute the probabilities of chemisorption, scattering, and desorption as a function of input parameters, such as rotational energy (ERot), translational energy (ETrans), angle of incidence ( ), impact parameter (b), and rotation angle ( ). • Training the neural network by feeding the values of outputs for some known values of input parameters over a wide range. • Testing the NN by supplying it with few input parameters and asking it to predict the output, and comparing the NN prediction with MD results. 56 CHAPTER 6 MOLECULAR DYNAMICS (MD) SIMULATIONS OF CARBON DIMER (C2) DEPOSITION ON DIAMOND (100) SURFACE Diamond film growth in a chemical vapor deposition (CVD) process involves complex reaction mechanisms taking place between the atoms of the substrate and the gaseous radicals used in the process. However complex the reaction mechanism might be, it all starts with preliminary elementary reactions, such as chemisorption of the radical species, scattering and desorption. This chapter focuses on these elementary reactions that occur during the deposition of carbon dimer (C2) on to the (100) diamond surface in a CVD process. 6.1. Computational Model In this study, a (100) diamond surface has been used. The surface was modeled using a slab of five layers of carbon atoms with the (100) face exposed. Except for one carbon atom that serves as the radical site, every carbon atom on the top layer is capped with one hydrogen atom. The atoms on all five faces of 57 the substrate, except the top face, are made to be nonmoving atoms, while the remaining atoms were allowed to move. The system used in this study has a total of 324 atoms of which two are the atoms of the carbon dimmer and remaining 322 atoms are of the diamond substrate and hydrogen atoms. The dimensions of the substrate are 17.7 x 3.54 x 17.7 Å. The two atoms of the carbon dimer are placed above the radical site such that the center of mass of the dimer is at a vertical distance of ~10 Å from the top surface of the substrate to make sure that the longrange interactions between the carbon dimer and the substrate atoms is near zero. An empirical manybody Brenner potential (Brenner et al. [13]), which realistically describes the bonding in hydrocarbon systems, is used to account for the shortrange interactions. A Lennard – Jones 612 potential is used to model the longrange interactions. The substrate temperature was maintained at Ts=600 K using a thermostat that employs the velocity scaling method of Berendsen [55]. A constant time step of 0.5 fs is used for numerical integration of the equations of motion and the Gear predictor corrector [18] method was used for numerical integration. Before the dimer deposition process, the substrate is relaxed in a 600 K thermal bath for 30 ps allowing it to approach the thermal equilibrium state. The simulation model used is shown along with the carbon dimer (C2) in Figure 6.1. The top three layers of the diamond substrate and the radical site are shown in Figure 6.2. 58 Fig 6.1: Simulation model and carbon dimer (C2) Fig 6.2: Top three layers of atoms of diamond (100) substrate and radical site 59 6.2. Parameters of Interest The mechanism of diamondfilm growth has been investigated by several research groups using theoretical and/or experimental methods. A number of elementary reactions have been suggested as playing a vital role in diamondfilm formation. The occurrence of these reactions depend not only on the surface structure of the substrate on which the hydrocarbon atoms are deposited but also on a number of parameters [31, 33, 35, 36, 38], such as • Incident azimuthal angle ( ) • Rotation angle ( ) • Impact parameter (b) • Translational energy of the Carbon dimer (ETrans) • Rotational energy of the Carbon dimer (ERot). In the following section we will be dealing with the distribution of each of these input parameters as well as with the events considered in this investigation. 6.2.1. Incident polar angle ( ): Incident angle of the dimer ( ) is the angle between the velocity vector of the center of mass of the dimer and the normal from the aimed point on the substrate. The polar angles were selected from the distribution function P ( ) d = C sin d over the range 0 max. This can be conveniently accomplished using a cumulative distribution function that leads to Eqn. (6.1) [56]. = cos1{1 – 1 (1cos max)} , (6.1) where 1 is a random number selected over the range [0, 1] and max is determined as follows. In the case of an infinite lattice model, max = /2. In the 60 present calculations, the size of the lattice model used requires that the value of max be limited to 23o. With this choice for max, normalization of the distribution function gives C=12.579. The incidence angle of the hydrocarbon atom is shown along with other input parameters in Figure 6.3 Fig 6.3: Sketch showing input variables for the trajectory calculations [35] The distribution for the incidence angle of the dimer follows the smooth linearly increasing curve shown in Figure 6.4. The histogram is an example of the distribution obtained using Eqn. (6.1) 0 5 10 15 20 Incidence angle (q°) 0 0.02 0.04 0.06 0.08 P(q) Statistical result Theoretical result Fig 6.4: Distribution of incidence angle 61 6.2.2. Rotation angle ( ): The probability distribution for the rotation angle is of the function form P ( ) d = C d and is uniform over the interval 0 2 ; the initial value of has been selected from in Eqn. (6.2) = 2 2 , (6.2) where 2 is a random number selected over the range [0, 1]. The normalization of the distribution function gives C= 0.1591.The theoretical distribution of the rotation angle is a constant straight line shown in Figure 6.5 as the line. The statistical result obtained from Eqn. (6.2) [56] is shown as the histogram. 50 100 150 200 250 300 350 Rotation angle (fº) 0.02 0.025 0.03 0.035 P(f) Statistical result Theoretical result Fig 6.5: Distribution of rotation angle 6.2.3. Impact parameter (b): The impact parameter represents the distance between the radical site and the aiming point on the surface of the substrate. The impact parameters are 62 selected from the distribution P(b)db = N2pbdb over the range 0 b bmax, where the upper limit, bmax, is chosen such that for impact parameters b > bmax, the chemisorption probability is zero. Using a cumulative distribution function, this selection can be made by obtaining b for each trajectory from the equation b = x 3 bmax , (6.3) where 3 is a random number selected from a uniform distribution on the interval [0, 1]. The maximum impact parameter (bmax) is found to be 3.5 Å. With this choice for bmax the normalization of the distribution function gives N=0.1633. The impact parameter distribution obtained using Eqn. (6.3) is compared with the theoretical result obtained from the probability distribution function [36]. 0 0.5 1 1.5 2 2.5 3 3.5 Impact parameter (Å) 0 0.01 0.02 0.03 0.04 0.05 0.06 P(b) Statistical result Theoretical result Fig 6.6: Distribution of impact parameter 6.2.4. Translational energy of the carbon dimer (ETrans) The initial translational velocity of the carbon dimer was selected from a Boltzmann distribution at the same temperature as the lattice which is Ts = 600 K. The functional form of the Boltzmann distribution is given by [57] 63 0 0.05 0.1 0.15 0.2 0.25 Translational energy (eV) 0 0.05 0.1 0.15 0.2 P( ETrans ) Statistical result Theoretical result Fig 6.7: Distribution of translational energy of the dimmer P ( E Trans) = A E KT e − / , (6.4) The distribution of translational energy of the dimer is shown in Figure 6.7. 6.2.5. Rotational energy of the carbon dimer (ERot) The rotational energy of the carbon dimer was calculated assuming a rigidrotor type rotational energy quantization [58]: J E = J (J +1)h2 / 2I , where h = h / 2p . (6.5) Here, I represents the equilibrium moment of inertia and J represents a continuous quantum number [58] given by the Eqn (6.6) 1/ 2({1 8 ln(1 ) / } 1) 2 1/ 2 J = − IkT −x h − , (6.6) where T is the temperature and x is a random number selected from a uniform distribution in the interval [0, 1]. The spread for the rotational energy of the dimer is shown in Figure 6.8. The theoretical result for the rotational energy is given by the distribution function [58] in Eqn (6.7). P (J ) dJ Cg exp(E / kT) J J = , (6.7) 64 0 0.005 0.01 Rotational energy (eV) 0 0.01 0.02 0.03 0.04 Statistical result Theoretical result P(E Rot ) Fig 6.8: Distribution of rotational energy of the dimer In Eqn (6.7) C represents the normalization constant and J g represents the degeneracy. The vibrational energy of the dimer corresponds to the zero point energy ( ZPE ) of the dimer. ZPE = h / 2 = Ev u0 , (6.8) (1/ 2) 2 v R E = μV , (6.9) where h is the Planck’s constant, u0 is frequency, μ is the reduced mass of the dimer, and R V is the relative vibrational velocity of the dimer. Eqn (6.9) assumes the initial vibrational phase of the dimer corresponds to the equilibrium positon. 6.3. Predominant events in CVD dimer deposition Many complex chemical reactions on a surface begin with simple elementary steps. These steps include adsorption on the surface, diffusion of the adsorbed atoms or molecules between binding sites, bondbreaking, insertion of 65 atoms or molecules and desorption of product molecules. In our present studies we have focused on the following events and their probabilities. • Chemisorption • Scattering/Reflection • Desorption 6.3.1. Chemisorption The initial conditions for the substrate and the carbon dimer are selected as discussed above. The position and the force on each atom in the system is determined by solving the Newtons equations of motion. The potential energy of the system is monitored after every integration step. A sudden drop in the potential energy of the system was noticed as the dimer approaches the substrate indicating the bond formation between the carbon dimer and the radical site. The trajectory calculations are carried out for an additional time of 1 ps after the dimer has reached the surface. Chemisorption of the carbon dimer is said to have occurred if the adsorbed atom undergoes ten or more inner turning points with respect to motion in the surface normal direction and the distance between the radical site and one of the carbon atoms of the dimer is within a cutoff radius of 2 Å of the radical site. The chemisorption probability is determined by running 50 trajectories keeping the input parameters, such as the incidence angle, rotational angle, translational energy, rotational energy and impact parameter constant and averaging over other factors such as the thermal vibrations of the lattice, vibrational phase angles of the lattice, rotational plane of the dimer and 66 initial orientation of the dimer. Figure 6.9 gives the variation of potential energy of the system V, the Z coordinate of the center of mass of the dimer, and the 1680 1675 1670 1665 V 0 500 1000 1500 2000 Time (fs) 1 1.2 1.4 1.6 1.8 R(CC) 0 2 4 6 8 10 Z Fig 6.9: Variation of system’s potential energy, Z coordinate of COM of the dimer distance between carbon atoms of the dimer (chemisorption event) 67 distance between two carbon atoms of the dimer, R, as a function of time. It can be seen from Figure 6.9, there is a sudden drop in the potential energy of the system by ~ 6 eV at the instance of bond formation. 6.3.2. Scattering Scattering of the dimer is said to have occurred if the dimer executes only 1675 1670 1665 V 0 500 1000 1500 2000 2500 3000 3500 Time (fs) 0 5 10 15 20 Z Fig 6.10: Variation of system’s potential energy and Z coordinate of COM of the dimer (scattering event) 68 one inner turning point on the surface of the (100) lattice and then bounces back. The system energy is monitored over every integration time step and there is found to be no drop in the potential energy of the system as there is no bond formation between the incoming dimer atoms and the atoms of the diamond surface. The scattering probability is determined using the same procedure as used for chemisorption. Figure 6.10 gives the variation of potential energy of the system,V , and the Z coordinates of the center of mass of the dimer as a function of time. It can be seen from Figure 6.10, there is no change in the potential energy of the system as a result of scattering of the dimer. 6.3.3. Desorption In the desorption event, the carbon dimer comes to the surface of the lattice, gets adsorbed without appreciable change in the potential energy and then desorbs back after a few oscillations. The probability of desorption is determined by running 50 trajectories keeping the incidence angle, rotation angle, translational velocity, rotational velocity and impact parameter constant and averaging over the thermal vibrations and vibrational phase angle of the lattice. Desorption is said to have occurred if the carbon dimer (C2) executes less than 10 inner turning points on the diamond surface and bounces back. There is no change in the potential energy of the system. 69 1672 1670 1668 1666 1664 V 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Time (fs) 0 5 10 15 Z Fig 6.11: Variation of system’s potential energy and Z coordinate of COM of the dimer (desorption event) Figure 6.11 shows the potential energy of the system and Z coordinate of the dimer’s center of mass with time for the desorption event. As can be seen from the plot, the dimer stays on the substrate for greater period of time as compared to that for a scattering event. 70 CHAPTER 7 NEURAL NETWORKS (NN) FOR EVENT PROBABILITY PREDICTION The probabilities of various events, such as, chemisorption, scattering and desorption determined by MD simulation has been used to implement a neural network and subsequently use that neural network to predict the probabilities of these events as well as to find the effect of various input parameters on these probabilies. In studies described in Chapters 8, three separate networks have been implemented, one each, for predicting the probabilities of the three events. This chapter discusses the architecture of the network used, the structure of the input and output data set to the network, the total number of data sets used, and the procedure for implementation of the network, namely, training and testing of the network. 7.1. Architecture and working of the neural network The neural network used in this investigation for predicting the probabilities of various events that occur during carbon dimmer (C2) deposition on a diamond (100) surface is a multilayered feed forward network. The network has two layers, the first layer is a hidden layer that has 50 hidden neurons and a tansigmoid transfer function associated with every hidden neuron in the layer. The second layer is the output layer that has one neuron and a purelinear transfer function 71 associated with it. The output from this second layer is the probability of a particular event and this forms the final output of the neural network. The input vector to the neural network has five components, namely, the incidence azimuthal angle of the dimer ( ), the rotation angle ( ), impact parameter (b), translational energy of the dimer (ETrans), and rotational energy of the dimer (ERot). The output vector of the neural network has a single component which forms the probability of a particular event predicted by the neural network. 7.2. Implementation of the neural network The implementation of the neural network for predicting the probabilities is carried over in two stages, namely, the training stage and the testing stage. The network is trained using LevenbergMarquardt algorithm that employs a procedure known as early stopping [69]. In this procedure, the training of the neural network was met within approximately 20 iterations or epochs. The total input data containing 2000 data sets to the network is divided into two subsets, of which 85% of the data (1700 data sets) becomes the training set, which was used for training the network, and the remaining 15 % (300 data sets) of the data becomes the validation set. Fifty different neural networks were trained by random selection of the 85% of the training data and the average of outputs of these 50 networks is computed to arrive at the predicted probabilities. It may be noted that during initial stage of each training, the error on the training and validation sets decrease. But, when the network starts overfitting, the error on the training set continues to decrease while that on the validation set starts increasing. When the error on the validation set begins to increase for a 72 specified number of iterations, it indicates the network is attempting to overfit and so the training is stopped. Such an early stopping procedure has been successfully used to prevent the network from overfitting [69]. After the network has been trained successfully, it is tested by supplying it with a set of input data to predict the output. If the network has been trained properly, it will be able to predict an output that matches closely with the desired output. 73 CHAPTER 8 PREDICTION OF EVENT PROBABILITIES – NEURAL NETWORK VS MD SIMULATION In this chapter we report results of MD simulations for predicting the probabilities of various events, such as, chemisorption, scattering, and desorption that occur during the deposition of carbon dimer (C2) species on a diamond (100) surface in the CVD process. We shall use these results to train the neural network and determine the underlying relationships between the five input parameters of the dimer and each of the three event probabilities. 8.1. Data points generation for neural networks The five input parameters used in the synthesis of diamond by CVD process, namely, the incidence angle ( ), rotation angle ( ), impact parameter (b), translational energy (ETrans), and rotational energy (ERot) of the dimer forms the input vector for the neural network and the corresponding event probabilities forms the output vector of the neural network. A total of 2000 data points are used for training and testing the neural network. Every point for the neural network is generated by running 50 MD trajectories. All the five input parameters were kept constant during these 50 trajectories. The probability of occurrence of each of the three events were estimated at the end of these 50 trajectories by 74 taking the ratio of the number of times a particular event has occurred to the total number of trajectories computed. 8.2. Training and testing of the neural network As mentioned in Chapter 7, the implementation of the neural network involves training and testing. First, the five input parameters and the output probabilities were normalized to make the range to lie between 1 and +1. Normalizing is done using the formula 1 ( ) 2 ( ) max min min − − − = p p p p pn , (8.1) where p is the variable to be scaled, pmin and max pmax are the minimum and maximum values of each variable in the input or output vectors for the entire database consisting of all the points. pn is the normalized value corresponding to p. 85 % of the normalized data have been used for training and the remaining 15% is used for the validation of the network. 50 neural networks were generated by a random selection of 85% of the training data, and the average of the outputs of these 50 networks is computed to obtain the final predicted probabilities. The training of the neural network was accomplished within approximately 20 iterations or epochs. The initial weight matrices for each training were randomly chosen. This is done to enable the network to get trained for any randomness. Each neural network was trained using supervised learning mentioned in Chapter 3. Early stopping was used to prevent the network from overfitting [69]. After each neural network has been trained, the network was tested with a test data set to see whether the network is able to predict the outputs correctly. 75 Figures 8.1 through 8.3 show the training and testing plots for the three events, namely, chemisorption, scattering, and desorption for one of the neural networks. The scatter present in the training and testing plots is because of the uncertainty occurring due to averaging over just 50 trajectories for calculating the probabilities of the events. 0 0.2 0.4 0.6 0.8 1 MD 0 0.2 0.4 0.6 0.8 1 Neural Network 0 0.2 0.4 0.6 0.8 1 Neural Network Training Testing Fig 8.1: Neural network training and testing plots for the probability of chemisorption for one neural network 76 The scatter in the plots can be greatly minimized by averaging over a larger number of MD trajectories, say 500 trajectories per data point instead of 50 trajectories per data point. For example, if we average over 500 MD trajectories, and assume one chemisorption event occurred then the statistical uncertainty involved here is calculated using Eqn (8.2) to be 0.001999. Now, let us take the 0 0.2 0.4 0.6 0.8 1 MD 0 0.2 0.4 0.6 0.8 1 Neural Network 0 0.2 0.4 0.6 0.8 1 Neural Network Training Testing Fig 8.2: Neural network training and testing plots for the probability of scattering for one neural network 77 present case in which we average over 50 trajectories, and one chemisorption event occurred, the statistical uncertainty in this case is 0.0197. We see that the statistical uncertainty reduces by ten times if the number of trajectories is increased. As mentioned in Chapter 3, three individual neural networks have been used for predicting the probabilities of the three events, one network for each event probability. 0 0.2 0.4 0.6 0.8 1 MD 0 0.2 0.4 0.6 0.8 1 Neural Network 0 0.2 0.4 0.6 0.8 1 Neural Network Training Testing Fig 8.3: Neural network training and testing plots for the probability of desorption for one neural network 78 The rms error in the training was found to be 0.0422 for the chemisorption probability network, 0.0561 for the scattering probability network and 0.0550 for the desorption probability network. The rms error during testing was found to be 0.0512 for the chemisorption probability network, 0.0695 for the scattering probability, network and 0.0630 for the desorption probability network. 8.3. Effect of input parameters on event probabilities: Neural network Versus MD predictions The effect of the five input parameters, on the probabilities of chemisorption, scattering, and desorption has been studied using MD simulations. Subsequently, neural networks were used to predict the relationship existing between the input parameters and the event probabilities. In this section we will investigate the predictions made by the neural networks and compare their predictions with MD simulation results. 8.3.1. Effect of incidence angle ( ) The effect of incidence angle ( ) on the three probabilities was studied using MD simulations by running trajectories in which the other four input parameters are maintained constant. For every value of the incidence angle of the dimer 50 trajectories were run in order to average over the thermal vibrations and vibrational phase angles of the lattice. The probabilities of the three events were determined as described in Section 8.1. The input parameters for which MD trajectories were run are as follows: = 110°, b = 1 Å, ETrans = 0.124 eV, ERot = 0.052 eV and the incidence angle is varied from = 0° to the maximum incidence angle max = 23° in steps of 2°. The neural networks that were trained (as described in Section 8.2) 79 are used to simulate the event probabilities for the same input data and the results are plotted in Figure 8.4 along with the MD results. It can be seen from Figure 8.4, 0 0.2 0.4 0.6 0.8 1 P MD Neural Network 0 0.2 0.4 0.6 0.8 P 0 5 10 15 20 Incidence angle (qº) 0 0.2 0.4 0.6 0.8 PD S C Fig 8.4: Effect of incidence angle on chemisorption, scattering and desorption probabilities  MD and Neural network predictions. The error bars represent one sigma limit of statistical uncertainty in the MD results. the results predicted by MD calculations and neural network agree well with each other.The chemisorption probability, scattering probability, and desorption probability are represented as PC, PS, and PD, respectively, in the graphs. The 80 statistical error present in MD is found to be one sigma limit of uncertainty. It is calculated using the formula [36] [( ) / ]1/ 2 * ( ) N N NN P R R Ñ = − s . (8.2) Here, N represents the total number of trajectories, R N represents the number of times a particular event has occurred, and s (P) represents the event probability. For example, say that out of the 50 trajectories ran, 40 events are chemisorption, then the one sigma limit of uncertainty in MD using Eqn. (8.2) is 0.0565. 8.3.2. Effect of rotation angle ( ) The effect of rotation angle on the event probabilities is studied using MD simulations using the same procedure. In this case, all parameters except the rotation angle ( ) are kept constant for all trajectories. The rotation angle is varied from 10° to 360° in steps of 20° for every 50 trajectories. The input parameters for which MD trajectories were run are given as: = 11°, b = 1 Å, ETrans = 0.06 eV, ERot = 0.052 eV. The same input data sets that were used for running the MD trajectories are used as the test data set for the neural networks and the average neural network output is plotted along with the MD results in Figure 8.5. The error bars in the figure corresponds to the statistical error in MD and is computed using Eqn (8.2). We notice that the MD results and the neural network results agree well with each other. 81 0 0.2 0.4 0.6 0.8 1 P MD Neural Network 0 0.2 0.4 0.6 0.8 P 0 50 100 150 200 250 300 350 Rotation angle (f°) 0 0.2 0.4 0.6 0.8 P D S C FIG 8.5: Effect of rotation angle on chemisorption, scattering and desorptionprobability MD and Neural network predictions. The error bars represent one sigma limit of statistical uncertainty in the MD results. 82 8.3.3. Effect of impact parameter (b) The effect of impact parameter on the three event probabilities are determined using MD simulations using the same procedure as described in Section 8.3.1. 0 0.2 0.4 0.6 0.8 1 P MD Neural Network 0 0.2 0.4 0.6 0.8 P 0 0.5 1 1.5 2 2.5 3 3.5 Impact parameter (Å) 0 0.2 0.4 0.6 0.8 P D S C FIG 8.6: Effect of impact parameter on chemisorption, scattering and desorption probabilities MD and Neural network predictions. The error bars represent one sigma limit of statistical uncertainty in the MD results. 83 The impact parameter is varied from 0.25 Å to maximum impact parameter bmax = 3.5 Å in steps of 0.25 Å for every 50 MD trajectories. The values of other input parameters are as follows: = 17°, = 310°, ETrans = 0.06 eV and ERot = 0.052 eV. The neural networks are now used for predicting the event probabilities by using the same test data as used for running the MD calculations. The results are plotted along with MD results and uncertainty associated with MD in Figure 8.6. It can be seen that the output of the neural network is in accordance with that of MD. 8.3.4. Effect of translational energy of the dimer (ETrans) The effect of translational energy of the dimer on each of the three event probabilities were determined using MD using the same procedure as described above for the other parameters. The values of input parameters are as follows: = 11°, = 110°, b = 1 Å and ERot = 0.052 eV. The same input data set is used for the neural networks. The average output of the networks and MD results along with statistical error associated with MD are shown in Figure 8.7. Here again, we note a good agreement between MD and NN. 84 0 0.2 0.4 0.6 0.8 P MD Neural Network 0 0.2 0.4 0.6 0.8 P 0 0.05 0.1 0.15 0.2 0.25 Translational energy (eV) 0 0.2 0.4 0.6 0.8 PD S C Fig 8.7: Effect of translational energy on chemisorption, scattering and desorptio desorption probabilities – MD and Neural network predictions. The error bars represent one sigma limit of statistical uncertainty in the MD results. 85 8.3.5. Effect of rotational energy of the dimer (ERot) The effect of rotational energy of the dimer on three probabilities is determined using MD simulations and also using neural networks using the same 0 0.2 0.4 0.6 0.8 1 P MD Neural Network 0 0.2 0.4 0.6 0.8 P 0 0.05 0.1 0.15 0.2 0.25 Rotational energy (eV) 0 0.2 0.4 0.6 0.8 PD S C Fig 8.8: Effect of rotational energy on chemisorption, scattering and desorption probabilities – MD and Neural network predictions. The error bars represent one sigma limit of statistical uncertainty in the MD results. 86 procedure as described in section 8.3.1.The input data set for MD calculations are as follows: = 17°, = 110°, b = 1 Å, ETrans = 0.06 eV. The neural networks are tested to predict the output for the same data set and the results of MD and neural network are shown in Figure 8.8. We note that the neural network and MD results agree well with each other. 8.4. Statistical uncertainty: Neural network Vs MD The results given by molecular dynamics simulation have a statistical uncertainty that can be calculated using Eqn. (8.2). Figures 8.4 through 8.8 show the MD results with error bars to indicate the one sigma limit of statistical uncertainty in the MD calculations. The neural network plots are obtained by averaging over 50 sets of neural network matrices. Therefore, the neural network predictions also have statistical errors associated with them, but, the error in neural network prediction is very small compared to MD (See Figure 8.9). The figure shows the neural network predictions and MD predictions along with the error bars to show the one sigma limit of statistical uncertainty associated with each case. The error bars in dotted lines represent the statistical noise associated with neural network, and the error bars in solid lines represent the statistical error associated with MD. We infer from Figure 8.9 that the functional relationship between various input parameters and different event probabilities predicted by the neural network are continuous and have less statistical uncertainty associated with them than do the MD results. 87 0 0.2 0.4 0.6 0.8 1 P MD Neural Network 0 0.2 0.4 0.6 0.8 P 0 5 10 15 20 Incidence angle (qº) 0 0.2 0.4 0.6 0.8 PD S C Figure 8.9: Comparison of statistical uncertainty in neural network and MD 88 8.5. More results from neural networks In Section 8.3, we have seen that neural network is able to predict the underlying relationship existing between the incidence angle ( ), rotation angle ( ), impact parameter (b), translational energy (ETrans) and rotational energy (ERot) of the dimer, and the probabilities for chemisorption, scattering and desorption. After training the neural network, it is easy to compute the probabilities of different events for arbitrary sets of input parameters. In Figures 8.10 through 8.14, we present additional results given by the trained neural network. The time taken by the neural network for predicting the relationship between each of the input parameters and the three event probabilities is approximately 3 minutes, whereas MD simulation takes 550 minutes (~9.2 hours) to study the influence of a single parameter on three event probabilities. So, it is easy to note that the computation of the Figures 8.10 through 8.14 by MD simulations would require hundreds of CPU hours in contrast to a few minutes by the trained neural networks. 89 0 0.2 0.4 0.6 0.8 1 P Translational Energy = 0.15eV Translational Energy = 0.17 eV Translational Energy = 0.27eV 0 0.2 0.4 0.6 0.8 P 0 0.5 1 1.5 2 2.5 3 3.5 Impact Parameter (Å) 0 0.2 0.4 0.6 0.8 P Incidence Angle = 17º Azimuthal Angle = 110º Rotational Energy = 0.05 eV D S C Fig 8.10: Effect of impact parameter on event probabilities for various translational energies of the dimer NN predictions 90 0 0.2 0.4 0.6 0.8 P Incidence Angle = 1º Incidence Angle = 11º Incidence Angle = 15º Incidence Angle = 21º 0 0.2 0.4 0.6 0.8 P 0 0.05 0.1 0.15 0.2 0.25 Translational energy (eV) 0 0.2 0.4 0.6 0.8 P Impact Parameter = 1Å Azimuthal Angle = 110 Rotational Energy = 0.052 eV D S C Fig 8.11: Effect of translational energy on event probabilities for various incidence angles of the dimer NN predictions 91 0 0.2 0.4 0.6 0.8 P Translational Energy = 0.001 eV Translational Energy = 0.03 eV Translational Energy = 0.12 eV Translational Energy = 0.27 eV 0 0.2 0.4 0.6 0.8 P 0 50 100 150 200 250 300 350 Rotation angle (fº) 0 0.2 0.4 0.6 0.8 P Impact Parameter = 1Å Incidence Angle = 17º Rotational Energy = 0.052 eV D S C Fig 8.12: Effect of rotation angle on event probabilities for various translational energies of the dimer – NN predictions 92 0 0.2 0.4 0.6 0.8 P Impact Parameter = 1Å Impact Parameter = 1.5Å Impact Parameter = 2Å Impact Parameter = 2.5Å 0 0.2 0.4 0.6 0.8 P 0 5 10 15 20 Incidence angle (qº) 0 0.2 0.4 0.6 0.8 P Translational Energy = 0.06 eV Azimuthal Angle = 110 Rotational Energy = 0.052 eV D S C Fig 8.13: Effect of incidence angle on event probabilities for various impact p parameters – NN predictions 93 0 0.2 0.4 0.6 0.8 P Impact Parameter = 0.5Å Impact Parameter = 1Å Impact Parameter = 1.5Å Impact Parameter = 2Å 0 0.2 0.4 0.6 0.8 P 0 0.05 0.1 0.15 0.2 0.25 Rotational energy (eV) 0 0.2 0.4 0.6 0.8 P Impact Parameter = 1Å Incidence Angle = 17º Rotational Energy = 0.052 eV D S C Fig 8.14: Effect of rotational energy on event probabilities for various impact parameters – NN predictions 94 CHAPTER 9 CONCLUSIONS AND FUTURE INVESTIGATION MD simulations were conducted to generate the initial data required for training the neural network. The neural networks have been successfully applied to study the effects of five input parameters (incidence angle ( ), rotation angle ( ), impact parameter (b), translational energy (ETrans) and rotational energy (ERot) of the dimer), on the probabilities of three events (chemisorption, scattering, and desorption events), that occur during the deposition of carbon dimer (C2) onto the diamond (100) surface in a CVD process for thin film growth. The conclusions of this study and future investigations are presented in the following. 9.1. Conclusions 1. Neural networks (NN) can be used effectively to predict the underlying relationships between the five input parameters of the dimer, and the probabilities of three events outlined above. 2. The chemisorption probability is found to decrease with increase in impact parameter (b). The scattering probability and desorption probability are found to increase with the impact parameter (b). 3. The chemisorption probability is found to increase with increase in the translational energy (ETrans) of the C2 dimer but the scattering and desorption 95 probabilities are found to decrease with the increase in the translational energy (ETrans) of the dimer. 4. The chemisorption probability, scattering probability, and desorption probability are found to be independent of the rotation angle ( ). 5. The chemisorption probability is found to decrease with increase in the incidence angle ( ) of the dimer. The scattering probability and desorption probability are found to increase with the incidence angle ( ) of the dimer. 6. The chemisorption probability is found to decrease with increase in the rotational energy (ERot) of the C2 dimer but the scattering and desorption probability are found to increase with increase in the rotational energy (ERot) of the dimer. 9.2. Future Work 1. The approach presented in this investigation can be extended to investigate different types of reaction channels and mechanisms that occur during diamond film growth. 2. The neural network concept applied here can be extended to investigate the event probabilities of other types of growth species, such as CH3, C2H2, and C2H4. 3. With a slight modification to the neural network used in this study, the effects of the type of substrate used, lattice plane(s), temperature and pressure effects on event probabilities, reaction channels, and growth rates can be investigated to determine the optimum temperature and pressure conditions, and appropriate crystal planes for achieving high growth rates and better quality of the films deposited. 96 4. By training the network with more data sets for a particular event, say for example, insertion or hydrogen abstraction, the network can be strengthened to predict the probabilities of such rarely occurring events with high accuracy and less time. 5. The neural network approach used in this study can also be successfully applied to investigate the reaction channels leading to the growth of other thin films, such as polycrystalline silicon and gallium arsenide. 97 REFERENCES 1. Bundy, F.P, Hall H.T., Strong, H.M. and R.H. Wentorf, Jun. “Man – made diamonds,” Nature 176 (1955). 17. 2. Mitura, S., “Nucleation of diamond powder particles in an RF methane plasma,” J. Cryst. Growth. 80 (1987) 417424.N 3. Howard, W., Huang, D., Yuan, Frenkiach, M., Spear, K.E., Koba, R. and A.W. Phelps, “Syntehsis of diamond powder in acetylene oxygen plasma,” J. Appl. Phys. 68 (1990) 12471251. 4. Frenkiach, M., Howard, W., Huang, D., Yuan, J., Spear, K.E., and R. Koba, “Induced nucleation of diamond powder,” Appl. Phys. Lett. 59 (1991) 546548. 5. Buerki, P. R. and Samuel Leutwyler, “Homogeneous nucleation of diamond powder by CO2 –laserdriven gasphase reactions,” J. Appl. Phys. 69 (1991) 37393744. 6. Choy, K.L., “Chemical vapor deposition of coatings,” Progress in Materials Science 48 (2003) 57170 7. Celli, F.G., “Diamond chemical vapor deposition,” Annu. Rev. Phys. Chem. 42 (1991) 643684. 98 8. Yang, S.W., Xie, X., Wu, P. and K.P. Loh, “Chemisorption of C2 biradical and acetylene on reconstructed diamond (111)(2 x 1),” J.Phys. Chem. B. 107 (2003) 985993. 9. Hukka, T.I. and T.A. Pakkanen, “Chemisorption of hydrogen on the diamond (100)2 X 1 surface: An ab Initio study,” J. Phys. Chem. 98 (1994) 12420 12430. 10. Rettner, C.T., DeLouise, L. A. and D.J. Auerbach, “Effect of incidence kinetic energy and surface coverage on the dissociative chemisorption of oxygen on W (110),” J. Chem. Phys. 85 (1986) 11311149. 11. Dyson, A.J. and P.V. Smith, “A molecular dynamics study of the chemisorption of C2H2 and CH3 on the Si(001)(2 X 1)surface,” Surface Science 375 (1997) 4554. 12. Komanduri, R. and L.M. Raff, “A review on the molecular dynamics simulation of machining at the atomic scale,” Proc. Instn. Mech. Engrs. Part B 215 (2001) 16391672. 13. Brenner, D. W., Shenderova, O. A., Harrison, J. A., Stuart, S. J., Ni, B., Susan B Sinott, “A secondgeneration reactive empirical bond order (REBO) potential energy expression for hydrocarbons “, J. Phys.: Condens. Matter 14 (2002) 783802. 14. Tersoff, J., “Modeling solidstate chemistry: Interatomic potentials for multicomponent systems,” Phys. Rev. B. 39 (1989) 55665568. 99 15. Brenner, D. W., “Empirical potential for hydrocarbons for use in simulating the chemical vapor deposition of diamond films,” Phys. Rev. B. 42 (1990) 94589471. 16. Verlet, L., “Computer “Experiments” on classical fluids. II. equilibrium correlation functions,” Phy. Rev. 165 (1968) 201214. 17. Beeman, D., “Some multistep methods for use in molecular dynamics calculations,” J. Comp. Phys. 20 (1976) 130139. 18. Allen, M.P. and D.J., Tildesley, “Computer simulation of liquids,” (Oxford University Press, 1989). 19. Zhou, D., Gruen, D.M., Qin, L.C., McCauley, T.G. and A.R. Krauss, “Control of diamond film microstructure by Ar additions to CH4/H2 microwave plasmas,” J. Appl. Phys. 84 (1998) 19811989. 20. Spitsyn, B. V., Bouilov, L. L. and B.V. Derajaguin, “Vapor growth of diamond on diamond and other surfaces,” Journal of Crystal Growth 52 (1981) 219 226. 21. Reinke, P., Kania, P., P.Oelhafen, “Investigation of the nucleation mechanism in biasenhanced diamond deposition pn silicon and molybdenum,” Thin Solid Films 270 (1995) 124129. 22. Kurihara, K., Sasaki, K., Kawarada, M. and Nagaaki Koshino, “High rate synthesis of diamond by dc plasma jet chemical vapor deposition,” Appl. Phys. Lett. 52 (1988) 437438. 23. Cappelli, C.A. and P.H. Paul, “An investigation of diamond film deposition in a premixed oxyacetylene flame,” J. Appl. Phys. 67 (1990) 25962602. 100 24. DeVries, R.C., “Synthesis of diamond under metastable conditions,” Ann. Rev. Mater. Sci. 17 (1987) 161187. 25. Philip, J., Hess, P., Feygelson, T., Butler, J.E., Chattopadhyay, S., chen, K.H. and L.C. Chen, “Elastic, mechanical, and thermal properties of nanocrystalline diamond films,” J. Appl. Phys. 93 (2003) 2164  2171. 26. Sekaric, L., Parpia, J.M., Craighead, H.G., Feygelson, T., Houston, B.H. and J.E. Butler, “Nanomechanical resonant structures in nanocrystalline diamond,” Appl. P 



A 

B 

C 

D 

E 

F 

I 

J 

K 

L 

O 

P 

R 

S 

T 

U 

V 

W 


